Profit-Oriented Production and Pricing Optimization for Manufacturing Enterprises Using Proximal Policy Optimization

Authors

  • Pingmei Fan Guangxi Vocational Normal University, Nanning, Guangxi, China Author
  • Hanwu Li Amazon.com Services LLC, Bellevue, WA, 98004, USA, Author
  • Mengdie Hu Systems Engineering, University of Pennsylvania, Philadelphia, PA, 19104, USA Author

DOI:

https://doi.org/10.71222/zsm3tb33

Keywords:

reinforcement learning, Proximal Policy Optimization, intelligent manufacturing, dynamic pricing, profit optimization, economic decision-making, industrial automation

Abstract

In modern intelligent manufacturing, enterprises face increasingly dynamic market environments where production costs, consumer demand, and pricing strategies interact in complex, nonlinear ways. Traditional static or rule-based decision models fail to capture these interdependencies, often leading to suboptimal profit margins and excessive inventory accumulation. To address this challenge, this study proposes a profit-oriented production and pricing optimization system for manufacturing enterprises based on Proximal Policy Optimization (PPO), an advanced reinforcement learning algorithm well-suited for continuous control and dynamic environments. The proposed system autonomously learns optimal production quantities and pricing strategies through interactions with a simulated economic environment characterized by stochastic demand, fluctuating raw material costs, and inventory constraints. By modeling the problem as a Markov Decision Process, the PPO agent optimizes a reward function that balances short-term profitability with long-term inventory stability. Experimental results on a simulated manufacturing dataset demonstrate that the proposed PPO-based optimization system achieves an 12.8% improvement in cumulative profit, a 16.4% reduction in inventory risk, and a 50.9% decrease in final loss compared with the Deep Q-Network (DQN) baseline. Moreover, the PPO-P³OS framework exhibits highly stable convergence and superior adaptability under dynamic market fluctuations, highlighting its effectiveness in real-time production and pricing decision-making for manufacturing enterprises. These results highlight the model's ability to dynamically adapt to market volatility and enhance decision-making efficiency. This research contributes to the integration of reinforcement learning and business analytics, offering a scalable, data-driven framework for real-time profit optimization in intelligent manufacturing systems.

References

1. W. B. Yahya, M. K. Garba, S. O. Ige, and A. E. Adeyosoye, "Profit maximization in a product mix company using linear programming," European Journal of Business and management, vol. 4, no. 17, pp. 126-131, 2012.

2. D. Bertsimas, and G. Perakis, "Dynamic pricing: A learning approach," In Mathematical and computational models for congestion charging, 2006, pp. 45-79. doi: 10.1007/0-387-29645-x_3

3. F. S. Gazijahani, and J. Salehi, "Game theory based profit maximization model for microgrid aggregators with presence of EDRP using information gap decision theory," IEEE Systems Journal, vol. 13, no. 2, pp. 1767-1775, 2018. doi: 10.1109/jsyst.2018.2864578

4. D. T. Nguyen, and L. B. Le, "Risk-constrained profit maximization for microgrid aggregators with demand response," IEEE Transactions on smart grid, vol. 6, no. 1, pp. 135-146, 2014. doi: 10.1109/tsg.2014.2346024

5. Z. Zhuang, K. Lei, J. Liu, D. Wang, and Y. Guo, "Behavior proximal policy optimization," arXiv preprint arXiv:2302.11312, 2023.

6. T. Zhou, D. Tang, H. Zhu, and Z. Zhang, "Multi-agent reinforcement learning for online scheduling in smart factories," Robotics and computer-integrated Manufacturing, vol. 72, p. 102202, 2021. doi: 10.1016/j.rcim.2021.102202

7. M. A. Dittrich, and S. Fohlmeister, "A deep q-learning-based optimization of the inventory control in a linear process chain," Production Engineering, vol. 15, no. 1, pp. 35-43, 2021.

8. H. Xu, J. Xuan, G. Zhang, and J. Lu, "Twin trust region policy optimization," IEEE Transactions on Systems, Man, and Cybernetics: Systems, 2025. doi: 10.1109/tsmc.2025.3573513

9. A. K. Kalusivalingam, A. Sharma, N. Patel, and V. Singh, "Optimizing Industrial Systems Through Deep Q-Networks and Proximal Policy Optimization in Reinforcement Learning," International Journal of AI and ML, vol. 1, no. 3, 2020.

10. J. Schulman, F. Wolski, P. Dhariwal, A. Radford, and O. Klimov, "Proximal policy optimization algorithms," arXiv preprint arXiv:1707.06347, 2017.

11. N. Chopra, A. Patel, N. Singh, and V. Sharma, "Leveraging Reinforcement Learning and Neural Networks for Optimized Dynamic Pricing Strategies in E-Commerce," International Journal of AI Advancements, vol. 9, no. 4, 2020.

12. I. Giannoccaro, and P. Pontrandolfo, "Inventory management in supply chains: a reinforcement learning approach," International Journal of Production Economics, vol. 78, no. 2, pp. 153-161, 2002. doi: 10.1016/s0925-5273(00)00156-0.

Downloads

Published

18 March 2026

Issue

Section

Article

How to Cite

Fan, P., Li, H., & Hu, M. (2026). Profit-Oriented Production and Pricing Optimization for Manufacturing Enterprises Using Proximal Policy Optimization. Economics and Management Innovation, 3(2), 8-17. https://doi.org/10.71222/zsm3tb33