Design and Implementation of Code Completion System Based on LLM and CodeBERT Hybrid Subsystem

Authors

  • Bingbing Zhang Xiamen Institute of Technology, Xiamen, China Author
  • Ziyu Lin Google LLC, Seattle, Washington, USA Author
  • Yingxin Su University of California, Davis, California, USA Author

DOI:

https://doi.org/10.71222/ww7w6c36

Keywords:

code completion, CodeBERT, GPT-3.5, code generation, deep learning

Abstract

In the rapidly evolving industry of software development, coding efficiency and accuracy play significant roles in delivering high-quality software. Various code suggestion and completion tools, such as CodeBERT from Microsoft and GPT-3.5 from OpenAI, have been developed using deep learning techniques and integrated into IDEs to assist software engineers' development. Researches have shown that CodeBERT has outstanding performance in code summarization and capturing code semantics, while GPT-3.5 demonstrated its adept capability at code generation. This study focuses on implementing a hybrid model that integrates CodeBERT and GPT-3.5 models to accomplish code suggestion and autocomplete tasks, leveraging the context-aware effectiveness of CodeBERT and taking advantage of advanced code generation abilities of GPT-3.5. Evaluated in three main metrics: accuracy, quality of generated code and performance efficiency with various software and hardware, the hybrid model outperforms benchmarks, demonstrating its feasibility and effectiveness. Robustness testing further confirms the reliability and stability of the hybrid model. This study not only further emphasizes the importance of deep learning in the software development industry, but also reveals the potential of synthesizing complementary deep learning models to fully exploit strengths of each model.

References

1. M. Alenezi, and M. Akour, "Ai-driven innovations in software engineering: a review of current practices and future direc-tions," Applied Sciences, vol. 15, no. 3, p. 1344, 2025. doi: 10.3390/app15031344

2. X. Zhou, D. Han, and D. Lo, "Assessing generalizability of codebert," In 2021 IEEE International Conference on Software Maintenance and Evolution (ICSME), September, 2021, pp. 425-436. doi: 10.1109/icsme52107.2021.00044

3. Z. Feng, D. Guo, D. Tang, N. Duan, X. Feng, M. Gong, and M. Zhou, "Codebert: A pre-trained model for programming and natural languages," arXiv preprint arXiv:2002.08155, 2020. doi: 10.18653/v1/2020.findings-emnlp.139

4. Z. Zheng, K. Ning, Q. Zhong, J. Chen, W. Chen, L. Guo, and Y. Wang, "Towards an understanding of large language models in software engineering tasks," Empirical Software Engineering, vol. 30, no. 2, p. 50, 2025. doi: 10.1007/s10664-024-10602-0

5. T. Brown, B. Mann, N. Ryder, M. Subbiah, J. D. Kaplan, P. Dhariwal, and D. Amodei, "Language models are few-shot learners," Advances in neural information processing systems, vol. 33, pp. 1877-1901, 2020.

6. Y. Wang, and H. Li, "Code completion by modeling flattened abstract syntax trees as graphs," In Proceedings of the AAAI conference on artificial intelligence, May, 2021, pp. 14015-14023. doi: 10.1609/aaai.v35i16.17650

7. S. Lu, D. Guo, S. Ren, J. Huang, A. Svyatkovskiy, A. Blanco, and S. Liu, "Codexglue: A machine learning benchmark dataset for code understanding and generation," arXiv preprint arXiv:2102.04664, 2021.

8. Z. Li, S. Lu, D. Guo, N. Duan, S. Jannu, G. Jenks, and N. Sundaresan, "Codereviewer: Pre-training for automating code review activities," arXiv preprint arXiv:2203.09095, 2022.

9. D. Guo, S. Lu, N. Duan, Y. Wang, M. Zhou, and J. Yin, "Unixcoder: Unified cross-modal pre-training for code representa-tion," arXiv preprint arXiv:2203.03850, 2022. doi: 10.18653/v1/2022.acl-long.499

10. M. Singh, J. Cambronero, S. Gulwani, V. Le, C. S. Negreanu, and G. Verbruggen, "Codefusion: A pre-trained diffusion model for code generation," In The 2023 Conference on Empirical Methods in Natural Language Processing., November, 2023.

Downloads

Published

30 October 2025

Issue

Section

Article

How to Cite

Zhang, B., Lin, Z., & Su, Y. (2025). Design and Implementation of Code Completion System Based on LLM and CodeBERT Hybrid Subsystem. Journal of Computer, Signal, and System Research, 2(6), 49-56. https://doi.org/10.71222/ww7w6c36