A Comparative Study of LSTM, GRU, and Transformer Models for AI Music Generation

Ava Zihan Gao; Cynthia Bailey

doi:10.71222/f8mxj033

Authors

Ava Zihan Gao Montgomery High School, Skillman, NJ 08558, United States Author
Cynthia Bailey Stanford University, Stanford, CA 94305, United States Author

DOI:

https://doi.org/10.71222/f8mxj033

Keywords:

AI music generation, LSTM models, GRU models, Transformer models, generative models, deep learning in music, comparative study in AI music

Abstract

This study compares the performance of three deep learning models-LSTM, GRU, and Transformer-on single-voice and multi-voice melodies across different musical styles. The LSTM model demonstrates strong capabilities in generating melodies with simplicity and temporal continuity. For smaller datasets, the GRU model is particularly effective, as it offers similar performance to LSTM while initiating computations more quickly, resulting in lower computational costs. When the self-attention mechanism is incorporated in the Transformer model, it can handle sequences of unprecedented length, enabling the generation of complex rhythms that can be rendered and performed by synthesized instruments. The BLEU scores of these generated musical pieces provide quantitative insights into the efficiency of longer compositions compared to shorter ones. While longer pieces can offer richness and depth, their contribution to musical quality warrants careful evaluation, as they may become overly repetitive or simply serve as an experimental demonstration of the model's capacity. This study provides valuable insights into the impact of model architecture on music generation and emphasizes the importance of aligning model choice with dataset characteristics. Researchers in AI-driven music generation can benefit from the findings of Slevinsky and colleagues, guiding future work toward more effective and contextually aware music generation approaches.

References

1. M. Zhang, L. J. Ferris, L. Yue, and M. Xu, "Emotionally Guided Symbolic Music Generation Using Diffusion Models: The AGE-DM Approach," In Proceedings of the 6th ACM International Conference on Multimedia in Asia, December, 2024, pp. 1-5. doi: 10.1145/3696409.3700289.

2. B. Yu, P. Lu, R. Wang, W. Hu, X. Tan, W. Ye, and T. Y. Liu, "Museformer: Transformer with fine-and coarse-grained attention for music generation," Advances in neural information processing systems, vol. 35, pp. 1376-1388, 2022.

3. Y. S. Huang, and Y. H. Yang, "Pop music transformer: Beat-based modeling and generation of expressive pop piano compo-sitions," In Proceedings of the 28th ACM international conference on multimedia, October, 2020, pp. 1180-1188.

4. Y. Zhang, Y. Zhou, X. Lv, J. Li, H. Lu, Y. Su, and H. Yang, "TARREAN: a novel transformer with a gate recurrent unit for stylized music generation," Sensors (Basel, Switzerland), vol. 25, no. 2, p. 386, 2025. doi: 10.3390/s25020386.

5. P. Neves, J. Fornari, and J. Florindo, "Generating music with sentiment using Transformer-GANs," arXiv preprint arXiv:2212.11134, 2022.

6. P. P. Li, B. Chen, Y. Yao, Y. Wang, A. Wang, and A. Wang, "Jen-1: Text-guided universal music generation with omnidirectional diffusion models," In 2024 IEEE Conference on Artificial Intelligence (CAI), June, 2024, pp. 762-769. doi: 10.1109/cai59869.2024.00146.

7. R. Annamalai, S. Sudharson, T. Pratap, and H. Kaushik, "LSTM Based Monophonic Piano Melody Synthesis," In 2023 IEEE 7th Conference on Information and Communication Technology (CICT), December, 2023, pp. 1-6. doi: 10.1109/cict59886.2023.10455209.

8. U. Rawat, and S. Singh, "Automatic music generation: Comparing LSTM and GRU," In 2022 3rd International Conference on Intelligent Engineering and Management (ICIEM), April, 2022, pp. 693-698. doi: 10.1109/iciem54221.2022.9853112.

9. N. Kulshrestha, "Use of Deep Learning methods such as LSTM and GRU in polyphonic music generation (Doctoral dissertation, Dublin, National College of Ireland)," 2020.

10. S. S. Patil, S. H. Patil, A. M. Pawar, R. Shandilya, A. K. Kadam, R. B. Jadhav, and M. S. Bewoor, "Music generation using RNN-LSTM with GRU," In 2023 International Conference on Integration of Computational Intelligent System (ICICIS), November, 2023, pp. 1-5. doi: 10.1109/icicis56802.2023.10430293.

11. J. P. Briot, G. Hadjeres, and F. D. Pachet, "Deep learning techniques for music generation--a survey," arXiv preprint arXiv:1709.01620, 2017.

12. S. A. Fathima, S. Hariram, and S. M. Kanagalingam, "Neural Harmony: Advancing Composition with RNN-LSTM in Music Generation," In 2024 IEEE International Conference on Contemporary Computing and Communications (InC4), March, 2024, pp. 1-6.

13. D. Stoller, "Deep Learning for Music Information Retrieval in Limited Data Scenarios (Doctoral dissertation, Queen Mary University of London)," 2020.

14. H. W. ud Din, and R. Ullah, "Advancements in Transformer-Based Music Generation: Exploring Applications in Personalized Composition and Music Therapy," The Asian Bulletin of Big Data Management, vol. 4, no. 4, pp. 255-263, 2024.

15. S. Mangal, R. Modak, and P. Joshi, "LSTM based music generation system," arXiv preprint arXiv:1908.01080, 2019. doi: 10.17148/iarjset.2019.6508.

16. Y. Huang, X. Huang, and Q. Cai, "Music Generation Based on Convolution-LSTM," Comput. Inf. Sci, vol. 11, no. 3, pp. 50-56, 2018.

17. D. Bryce, "Artificial Intelligence and Music: Analysis of Music Generation Techniques Via Deep Learning and the Implications of AI in the Music Industry," 2024.

18. S. Agarwal, and N. Sultanova, "Music Generation through Transformers," International Journal of Data Science and Advanced Analytics, vol. 6, no. 6, pp. 302-306, 2024. doi: 10.69511/ijdsaa.v6i6.231.

19. J. P. Briot, and F. Pachet, "Deep learning for music generation: challenges and directions," Neural Computing and Applications, vol. 32, no. 4, pp. 981-993, 2020.

20. C. Jin, T. Wang, S. Liu, Y. Tie, J. Li, X. Li, and S. Lui, "A transformer-based model for multi-track music generation," International Journal of Multimedia Data Engineering and Management (IJMDEM), vol. 11, no. 3, pp. 36-54, 2020. doi: 10.4018/ijmdem.2020070103.

21. R. Mitra, and I. Zualkernan, "Music generation using deep learning and generative AI: a systematic review," IEEE Access, 2025. doi: 10.1109/access.2025.3531798.

22. A. Radford, J. Wu, R. Child, D. Luan, D. Amodei, and I. Sutskever, "Language models are unsupervised multitask learners," OpenAI blog, vol. 1, no. 8, p. 9, 2019.

23. T. Sexton, "MuseNet," Music Reference Services Quarterly, vol. 26, no. 3-4, pp. 151-153, 2023. doi: 10.1080/10588167.2023.2247289.

24. K. Choi, G. Fazekas, K. Cho, and M. Sandler, "The effects of noisy labels on deep convolutional neural networks for music tagging," IEEE Transactions on Emerging Topics in Computational Intelligence, vol. 2, no. 2, pp. 139-149, 2018. doi: 10.1109/tetci.2017.2771298.

25. N. Pelchat, and C. M. Gelowitz, "Neural network music genre classification," Canadian Journal of Electrical and Computer Engi-neering, vol. 43, no. 3, pp. 170-173, 2020. doi: 10.1109/cjece.2020.2970144.

26. L. Lu, L. Xu, B. Xu, G. Li, and H. Cai, "Fog computing approach for music cognition system based on machine learning algo-rithm," IEEE Transactions on Computational Social Systems, vol. 5, no. 4, pp. 1142-1151, 2018. doi: 10.1109/tcss.2018.2871694.

27. C. H. Liu, and C. K. Ting, "Computational intelligence in music composition: A survey," IEEE Transactions on Emerging Topics in Computational Intelligence, vol. 1, no. 1, pp. 2-15, 2016.

28. S. Sigtia, E. Benetos, and S. Dixon, "An end-to-end neural network for polyphonic piano music transcription," IEEE/ACM Transactions on Audio, Speech, and Language Processing, vol. 24, no. 5, pp. 927-939, 2016. doi: 10.1109/taslp.2016.2533858.

29. F. Thalmann, G. A. Wiggins, and M. B. Sandler, "Representing modifiable and reusable musical content on the web with con-strained multi-hierarchical structures," IEEE Transactions on Multimedia, vol. 22, no. 10, pp. 2645-2658, 2019. doi: 10.1109/tmm.2019.2961207.

30. A. Ycart, and E. Benetos, "Learning and evaluation methodologies for polyphonic music sequence prediction with LSTMs," IEEE/ACM Transactions on Audio, Speech, and Language Processing, vol. 28, pp. 1328-1341, 2020. doi: 10.1109/taslp.2020.2987130.

31. S. Sheykhivand, Z. Mousavi, T. Y. Rezaii, and A. Farzamnia, "Recognizing emotions evoked by music using CNN-LSTM networks on EEG signals," IEEE access, vol. 8, pp. 139332-139345, 2020. doi: 10.1109/access.2020.3011882.

A Comparative Study of LSTM, GRU, and Transformer Models for AI Music Generation

Authors

DOI:

Keywords:

Abstract

References

Downloads

Published

Issue

Section

License

How to Cite

ISSN

Make a Submission

Indexing & Abstracting