ACE-Sync: An Adaptive Cloud-Edge Synchronization Framework for Communication-Efficient Large-Scale Distributed Model Training

Yi Yang; Ziyu Lin; Liesheng Wei

doi:10.71222/s729gb16

Authors

Yi Yang Sichuan Agricultural University, Chengdu, Sichuan, China Author
Ziyu Lin Google LLC, Seattle, Washington, WA, USA Author
Liesheng Wei College of Information Technology, ShangHai Ocean University, Shanghai, China Author

DOI:

https://doi.org/10.71222/s729gb16

Keywords:

distributed training, cloud-edge computing, communication-efficient learning, parameter synchronization, gradient compression, large-scale deep learning

Abstract

Large-scale deep learning models impose substantial communication overhead in distributed training, particularly in bandwidth-constrained or heterogeneous cloud-edge environments. Conventional synchronous or fixed-compression techniques often struggle to balance communication cost, convergence stability, and model accuracy. To address these challenges, we propose ACE-Sync, an Adaptive Cloud-Edge Synchronization Framework that integrates (1) an attention-based gradient importance predictor, (2) a differentiated parameter compression strategy, and (3) a hierarchical cloud-edge coordination mechanism. ACE-Sync dynamically selects which parameter groups to synchronize and determines appropriate compression levels under per-device bandwidth budgets. A knapsack-based optimization strategy is adopted to maximize important gradient preservation while reducing redundant communication. Furthermore, residual-based error compensation and device clustering ensure long-term convergence and cross-device personalization. Experiments show that ACE-Sync substantially reduces communication overhead while maintaining competitive accuracy. Compared with FullSync, ACE-Sync lowers communication cost from 112.5 GB to 44.7 GB (a 60% reduction) and shortens convergence from 41 to 39 epochs. Despite aggressive communication reduction, ACE-Sync preserves high model quality, achieving 82.1% Top-1 accuracy-only 0.3% below the full-synchronization baseline-demonstrating its efficiency and scalability for large-scale distributed training. These results indicate that ACE-Sync provides a scalable, communication-efficient, and accuracy-preserving solution for large-scale cloud-edge distributed model training.

References

1. J. Dean, G. Corrado, R. Monga, K. Chen, M. Devin, M. Mao, and A. Ng, "Large scale distributed deep networks," Advances in neural information processing systems, vol. 25, 2012.

2. P. Goyal, P. Dollár, R. Girshick, P. Noordhuis, L. Wesolowski, A. Kyrola, and K. He, "Accurate, large minibatch sgd: Training imagenet in 1 hour," arXiv preprint arXiv:1706.02677, 2017.

3. X. Lian, C. Zhang, H. Zhang, C. J. Hsieh, W. Zhang, and J. Liu, "Can decentralized algorithms outperform centralized algorithms? a case study for decentralized parallel stochastic gradient descent," Advances in neural information processing systems, vol. 30, 2017.

4. M. Assran, N. Loizou, N. Ballas, and M. Rabbat, "Stochastic gradient push for distributed deep learning," In International Conference on Machine Learning, May, 2019, pp. 344-353.

5. W. Wen, C. Xu, F. Yan, C. Wu, Y. Wang, Y. Chen, and H. Li, "Terngrad: Ternary gradients to reduce communication in distributed deep learning," Advances in neural information processing systems, vol. 30, 2017.

6. D. Alistarh, D. Grubic, J. Li, R. Tomioka, and M. Vojnovic, "QSGD: Communication-efficient SGD via gradient quantization and encoding," Advances in neural information processing systems, vol. 30, 2017.

7. Y. Lin, S. Han, H. Mao, Y. Wang, and W. J. Dally, "Deep gradient compression: Reducing the communication bandwidth for distributed training," arXiv preprint arXiv:1712.01887, 2017.

8. Y. Mao, C. You, J. Zhang, K. Huang, and K. B. Letaief, "Mobile edge computing: Survey and research outlook," arXiv preprint arXiv:1701.01090, pp. 1-37, 2017.

9. B. McMahan, E. Moore, D. Ramage, S. Hampson, and B. A. y Arcas, "Communication-efficient learning of deep networks from decentralized data," In Artificial intelligence and statistics, April, 2017, pp. 1273-1282.

10. R. Zhu, "Gradient-based sampling: An adaptive importance sampling for least-squares," Advances in neural information processing systems, vol. 29, 2016.

11. S. U. Stich, J. B. Cordonnier, and M. Jaggi, "Sparsified SGD with memory," Advances in neural information processing systems, vol. 31, 2018.

12. S. Dutta, J. Wang, and G. Joshi, "Slow and stale gradients can win the race," IEEE Journal on Selected Areas in Information Theory, vol. 2, no. 3, pp. 1012-1024, 2021. doi: 10.1109/jsait.2021.3103770

13. Q. Tan, F. Zhu, and J. Zhang, "ABS: Adaptive Bounded Staleness Converges Faster and Communicates Less," arXiv preprint arXiv:2301.08895, 2023. doi: 10.21203/rs.3.rs-3179662/v1

ACE-Sync: An Adaptive Cloud-Edge Synchronization Framework for Communication-Efficient Large-Scale Distributed Model Training

Authors

DOI:

Keywords:

Abstract

References

Downloads

Published

Issue

Section

License

How to Cite

ISSN

Make a Submission

Indexing & Abstracting