Research on Recommendation Algorithms in Short Video Platforms: From Traditional Methods to Deep Learning and Multimodal Fusion
DOI:
https://doi.org/10.71222/j5082f38Keywords:
short video platform, recommendation algorithm, deep learning, multimodal fusion, sequential behavior modeling, data sparsityAbstract
With the rapid advancement of mobile internet technologies, short video platforms have swiftly risen to prominence, attracting hundreds of millions of users worldwide. At the heart of these platforms, recommendation algorithms serve as essential tools for mitigating information overload and enhancing user engagement and satisfaction. This paper provides a systematic and comprehensive review of the development of recommendation algorithms tailored for short video platforms, tracing their evolution from traditional recommendation methods to deep learning techniques and, more recently, to multimodal fusion strategies. A detailed analysis is conducted on the strengths, limitations, and application scenarios of different approaches. First, the fundamental concepts and evaluation metrics related to short video recommendation are introduced. Next, the applications and shortcomings of traditional recommendation algorithms in this domain are examined thoroughly. Subsequently, deep learning–based methods, including deep neural networks and sequence modeling, are explored extensively, followed by an in-depth investigation of the most recent advances in multimodal fusion for short video recommendation. Finally, the paper discusses current challenges and outlines potential future research directions. By presenting these in-depth discussions and critical comparisons, this work aims to provide researchers and practitioners with a panoramic perspective on short video recommendation algorithms and to foster further academic progress and practical innovation in this rapidly evolving field.
References
1. D. Cao, L. Miao, H. Rong, Z. Qin, and L. Nie, “Hashtag our stories: Hashtag recommendation for micro-videos via har-nessing multiple modalities,” Knowledge-Based Systems, vol. 203, p. 106114, 2020, doi: 10.1016/j.knosys.2020.106114.
2. N. Li, et al., “An exploratory study of information cocoon on short-form video platform,” in Proc. 31st ACM Int. Conf. Inf. Knowl. Manage., Oct. 2022, pp. 4178–4182, doi: 10.1145/3511808.3557548.
3. Y. Liu, C. Lyu, Z. Liu, and D. Tao, “Building effective short video recommendation,” in 2019 IEEE Int. Conf. Multimedia & Expo Workshops (ICMEW), Jul. 2019, pp. 651–656, doi: 10.1109/ICMEW.2019.00126.
4. M. Quadrana, D. Jannach, and P. Cremonesi, “Tutorial: Sequence-aware recommender systems,” in Companion Proc. 2019 World Wide Web Conf., May 2019, pp. 1316–1316, doi: 10.1145/3308560.3320091.
5. J. Shokeen and C. Rana, “A study on features of social recommender systems,” Artificial Intelligence Review, vol. 53, no. 2, pp. 965–988, 2020, doi: 10.1007/s10462-019-09684-w.
6. S. Liu and Z. Chen, “Sequential behavior modeling for next micro-video recommendation with collaborative transformer,” in 2019 IEEE Int. Conf. Multimedia and Expo (ICME), Jul. 2019, pp. 460–465, doi: 10.1109/ICME.2019.00086.
7. Z. Y. Khan, Z. Niu, S. Sandiwarno, and R. Prince, “Deep learning techniques for rating prediction: a survey of the state-of-the-art,” Artificial Intelligence Review, vol. 54, no. 1, pp. 95–135, 2021, doi: 10.1007/s10462-020-09892-9.
8. X. Du, H. Yin, L. Chen, Y. Wang, Y. Yang, and X. Zhou, “Personalized video recommendation using rich contents from videos,” IEEE Trans. Knowl. Data Eng., vol. 32, no. 3, pp. 492–505, 2018, doi: 10.1109/TKDE.2018.2885520.
9. H. Fang, D. Zhang, Y. Shu, and G. Guo, “Deep learning for sequential recommendation: Algorithms, influential factors, and evaluations,” ACM Trans. Inf. Syst. (TOIS), vol. 39, no. 1, pp. 1–42, 2020, doi: 10.1145/3426723.
10. G. Zhang, Y. Liu, and X. Jin, “A survey of autoencoder-based recommender systems,” Frontiers Comput. Sci., vol. 14, no. 2, pp. 430–450, 2020, doi: 10.1007/s11704-018-8052-6.
11. D. Guo, J. Hong, B. Luo, Q. Yan, and Z. Niu, “Multi-modal representation learning for short video understanding and recommendation,” in 2019 IEEE Int. Conf. Multimedia & Expo Workshops (ICMEW), Jul. 2019, pp. 687–690, doi: 10.1109/ICMEW.2019.00134.
12. A. Almeida, J. P. de Villiers, A. De Freitas, and M. Velayudan, “The complementarity of a diverse range of deep learning features extracted from video content for video recommendation,” Expert Syst. Appl., vol. 192, p. 116335, 2022, doi: 10.1016/j.eswa.2021.116335.
13. J. Ma, J. Wen, M. Zhong, W. Chen, X. Zhou, and J. Indulska, “Multi-source multi-net micro-video recommendation with hidden item category discovery,” in Proc. Int. Conf. Database Syst. Adv. Appl., Apr. 2019, pp. 384–400. Cham: Springer, doi: 10.1007/978-3-030-18579-4_23.
14. S. Ghosh, “Multimodal machine learning for video and image analysis,” in Proc. 26th ACM SIGKDD Int. Conf. Knowl. Discovery & Data Mining, Aug. 2020, p. 3608, doi: 10.1145/3394486.3409558.
15. X. Song, B. Xu, and Y.-G. Jiang, “Predicting content similarity via multimodal modeling for video-in-video advertising,” IEEE Trans. Circuits Syst. Video Technol., vol. 31, no. 2, pp. 569–581, 2020, doi: 10.1109/TCSVT.2020.2979928.
16. X. Hu, Y. Ji, and G. A. Kumie, “Multi-level multi-modal feature fusion for action recognition in videos,” in Proc. 3rd Int. Workshop Human-Centric Multimedia Anal., Oct. 2022, pp. 25–33, doi: 10.1145/3552458.3556449.
17. P. Wang, Y. Jiang, C. Xu, and X. Xie, “Overview of content-based click-through rate prediction challenge for video recom-mendation,” in Proc. 27th ACM Int. Conf. Multimedia, Oct. 2019, pp. 2593–2596, doi: 10.1145/3343031.3356085.
18. X. Pan, Y. Chen, C. Tian, Z. Lin, J. Wang, H. Hu, and W. X. Zhao, “Multimodal meta-learning for cold-start sequential recommendation,” in Proc. 31st ACM Int. Conf. Inf. & Knowl. Manage., Oct. 2022, pp. 3421–3430, doi: 10.1145/3511808.3557101.
19. S. Natarajan, S. Vairavasundaram, S. Natarajan, and A. H. Gandomi, “Resolving data sparsity and cold start problem in collaborative filtering recommender system using linked open data,” Expert Syst. Appl., vol. 149, p. 113248, 2020, doi: 10.1016/j.eswa.2020.113248.
20. H. Zare, M. A. N. Pour, and P. Moradi, “Enhanced recommender system using predictive network approach,” Physica A: Stat. Mech. Appl., vol. 520, pp. 322–337, 2019, doi: 10.1016/j.physa.2019.01.053.
21. R. Yin, K. Li, G. Zhang, and J. Lu, “A deeper graph neural network for recommender systems,” Knowl.-Based Syst., vol. 185, p. 105020, 2019, doi: 10.1016/j.knosys.2019.105020.
22. Y. Chen, J. Wang, Z. Wu, and Y. Lin, “Integrating user-group relationships under interest similarity constraints for social recommendation,” Knowl.-Based Syst., vol. 249, p. 108921, 2022, doi: 10.1016/j.knosys.2022.108921.
23. Z. Yu, J. Lian, A. Mahmoody, G. Liu, and X. Xie, “Adaptive user modeling with long and short-term preferences for per-sonalized recommendation,” in Proc. 28th Int. Joint Conf. Artif. Intell. (IJCAI), Aug. 2019, vol. 7, pp. 4213–4219, doi: 10.24963/ijcai.2019/585.
24. Z. Yao, X. Chen, S. Wang, Q. Dai, Y. Li, T. Zhu, and M. Long, “Recommender transformers with behavior pathways,” in Proc. ACM Web Conf. (WWW), May 2024, pp. 3643–3654, doi: 10.1145/3589334.3645528.
25. B. Song, Y. Cao, W. Zhang, and C. Xu, “Session-based recommendation with hierarchical memory networks,” in Proc. 28th ACM Int. Conf. Inf. & Knowl. Manage. (CIKM), Nov. 2019, pp. 2181–2184, doi: 10.1145/3357384.3358120.
26. J. J. Cai, J. Tang, Q. G. Chen, Y. Hu, X. Wang, and S. J. Huang, “Multi-view active learning for video recommendation,” in Proc. 28th Int. Joint Conf. Artif. Intell. (IJCAI), Aug. 2019, pp. 2053–2059, doi: 10.24963/ijcai.2019/284.
27. E. Zangerle and C. Bauer, “Evaluating recommender systems: Survey and framework,” ACM Comput. Surv., vol. 55, no. 8, pp. 1–38, 2022, doi: 10.1145/3556536.
28. K. Abdalla, I. Menezes, and L. Oliveira, “Modelling perceptions on the evaluation of video summarization,” Expert Syst. Appl., vol. 131, pp. 254–265, 2019, doi: 10.1016/j.eswa.2019.04.065.
29. J. Y. Chin, Y. Chen, and G. Cong, “The datasets dilemma: How much do we really know about recommendation datasets?,” in Proc. 15th ACM Int. Conf. Web Search Data Mining (WSDM), Feb. 2022, pp. 141–149, doi: 10.1145/3488560.3498519.
30. A. Hertz, T. Kuflik, and N. Tuval, “Resolving sets and integer programs for recommender systems,” J. Global Optim., vol. 81, no. 1, pp. 153–178, 2021, doi: 10.1007/s10898-020-00982-0.
31. Y. Li, Y. Ge, and Y. Zhang, “Tutorial on fairness of machine learning in recommender systems,” in Proc. 44th Int. ACM SIGIR Conf. Res. Develop. Inf. Retrieval (SIGIR), Jul. 2021, pp. 2654–2657, doi: 10.1145/3404835.3462814.
32. M. Ferrari Dacrema, S. Boglio, P. Cremonesi, and D. Jannach, “A troubling analysis of reproducibility and progress in rec-ommender systems research,” ACM Trans. Inf. Syst. (TOIS), vol. 39, no. 2, pp. 1–49, 2021, doi: 10.1145/3434185.
33. M. Singh, “Scalability and sparsity issues in recommender datasets: A survey,” Knowl. Inf. Syst., vol. 62, no. 1, pp. 1–43, 2020, doi: 10.1007/s10115-018-1254-2.
Downloads
Published
Issue
Section
License
Copyright (c) 2025 Yeqiang Zheng (Author)

This work is licensed under a Creative Commons Attribution 4.0 International License.