RAGN-IIoT: A Retrieval-Augmented NL2SQL Framework with Dynamic Sensor-Selection Guardrails for Industrial IoT Time-Series Data Warehouses

Authors

  • Haoyan Wei School of Electronic Information, Sichuan University, Chengdu, China Author
  • Yuqing Wu Uber Technologies, Inc., Seattle, USA Author
  • Mengtong Li School of Engineering and Applied Science, Columbia University in the City of New York, New York, NY, USA Author

DOI:

https://doi.org/10.71222/esyn7b58

Keywords:

industrial IoT, time-series data warehouse, NL2SQL, Retrieval-Augmented Generation, semantic guardrails, large language models, automated sensor selection

Abstract

Industrial Internet of Things (IIoT) systems generate massive time-series data streams characterized by high dimensionality, heterogeneous sensor modalities, and complex domain-specific schemas. Traditional SQL-based data access remains a barrier for operational engineers, limiting fine-grained and real-time insights in production environments. To address this challenge, we propose RAGN-IIoT, a Retrieval-Augmented Generation (RAG) guided Natural Language to SQL (NL2SQL) framework tailored for large-scale IIoT time-series data warehouses. RAGN-IIoT integrates (1) schema-aware context retrieval, (2) a domain-adaptive large language model for SQL synthesis, and (3) a novel Sensor-Selection Guardrail Module that constrains hallucinations by validating referenced sensors, KPIs, or device identifiers through a semantic index. Extensive experiments on two real-world industrial IIoT datasets and one synthetic benchmark demonstrate that the proposed RAGN-IIoT framework substantially improves NL2SQL performance across all evaluation metrics. Compared with the strongest baseline, RAG-SQL, RAGN-IIoT achieves an absolute +11.7 percent gain in exact-match accuracy and a +13.4 percent improvement in execution accuracy, while raising schema precision to 96.1 percent and reducing guardrail violations to only 1.8 percent. Ablation studies further confirm the contribution of each module-sensor-selection guardrails, retrieval augmentation, and temporal templates-to overall model stability and correctness. These results highlight RAGN-IIoT's robustness and its practical suitability for natural-language analytics in industrial time-series data warehouses.

References

1. R. L. Cento, "Agentic Retrieval Augmented Generation for Estimation of Distribution Algorithms," Algorithms, 2025.

2. J. Guo, Z. Zhan, Y. Gao, Y. Xiao, J. G. Lou, T. Liu, and D. Zhang, "Towards complex text-to-sql in cross-domain database with intermediate representation," arXiv preprint arXiv:1905.08205, 2019.

3. X. Li, Q. Cai, Y. Shu, C. Guo, and B. Yang, "AID-SQL: Adaptive In-Context Learning of Text-to-SQL with Difficulty-Aware Instruction and Retrieval-Augmented Generation," In 2025 IEEE 41st International Conference on Data Engineering (ICDE), May, 2025, pp. 3945-3957. doi: 10.1109/icde65448.2025.00294

4. P. Lewis, E. Perez, A. Piktus, F. Petroni, V. Karpukhin, N. Goyal, and D. Kiela, "Retrieval-augmented generation for knowledge-intensive nlp tasks," Advances in neural information processing systems, vol. 33, pp. 9459-9474, 2020.

5. A. Marshan, A. N. Almutairi, A. Ioannou, D. Bell, A. Monaghan, and M. Arzoky, "MedT5SQL: a transformers-based large language model for text-to-SQL conversion in the healthcare domain," Frontiers in Big Data, vol. 7, p. 1371680, 2024. doi: 10.3389/fdata.2024.1371680

6. A. Mohammadjafari, A. S. Maida, and R. Gottumukkala, "From natural language to sql: Review of llm-based text-to-sql systems," arXiv preprint arXiv:2410.01066, 2024.

7. H. Noor, "What Do You Mean? Using Large Language Models for Semantic Evaluation of NL2SQL Queries," 2025.

8. M. Saqlain, M. Piao, Y. Shim, and J. Y. Lee, "Framework of an IoT-based industrial data management for smart manufacturing," Journal of Sensor and Actuator Networks, vol. 8, no. 2, p. 25, 2019. doi: 10.3390/jsan8020025

9. B. Wang, R. Shin, X. Liu, O. Polozov, and M. Richardson, "Rat-sql: Relation-aware schema encoding and linking for text-to-sql parsers," In Proceedings of the 58th annual meeting of the association for computational linguistics, July, 2020, pp. 7567-7578. doi: 10.18653/v1/2020.acl-main.677

10. X. Xu, C. Liu, and D. Song, "Sqlnet: Generating structured queries from natural language without reinforcement learning," arXiv preprint arXiv:1711.04436, 2017.

11. T. Yu, R. Zhang, K. Yang, M. Yasunaga, D. Wang, Z. Li, and D. Radev, "Spider: A large-scale human-labeled dataset for complex and cross-domain semantic parsing and text-to-sql task," arXiv preprint arXiv:1809.08887, 2018. doi: 10.18653/v1/d18-1425

12. V. Zhong, C. Xiong, and R. Socher, "Seq2sql: Generating structured queries from natural language using reinforcement learning," arXiv preprint arXiv:1709.00103, 2017.

Downloads

Published

27 December 2025

Issue

Section

Article

How to Cite

Wei, H., Wu, Y., & Li, M. (2025). RAGN-IIoT: A Retrieval-Augmented NL2SQL Framework with Dynamic Sensor-Selection Guardrails for Industrial IoT Time-Series Data Warehouses. Journal of Computer, Signal, and System Research, 2(7), 78-88. https://doi.org/10.71222/esyn7b58