A Multi-Stage Ensemble-Based Intelligent Healthcare Conversational System for Symptom-Driven Disease Prediction Using Integrated Multi-Dataset Learning

Authors:
S. V. Jagadesh Kumar, T. A. Sanjay, Mohammed Salman, R. Regin, K. Senthamilselvan, Farrukh Arslan

Addresses:
Department of Computer Science and Engineering in AIML, SRM Institute of Science and Technology, Ramapuram, Chennai, Tamil Nadu, India. Department of Electronics and Communication Engineering, Dhaanish Ahmed College of Engineering, Chennai, Tamil Nadu, India. Department of Computer Science, Purdue University, West Lafayette, Indiana, United States of America.

Abstract:

The development of AI technology has had a great effect on contemporary healthcare services. Some of the most remarkable advancements have been in technologies such as clinical decision support, telemedicine, and automated patient sorting. Intelligent healthcare conversational systems have become valuable resources that can provide patients with preliminary medical consultations. Intelligent systems can analyze patient-reported symptoms and predict disease. Nevertheless, many existing systems employ rule-based logical approaches or simplistic binary coding to represent symptoms. This restricts their ability to consider symptom context and work with diverse medical datasets. Besides, medical datasets often exhibit an imbalanced class distribution, with commonly diagnosed diseases dominating rare conditions. This study proposes an approach to building intelligent, symptom-based conversational healthcare models for disease prediction. The process encompasses symptom normalization, duplication elimination, and schema alignment. This helps enhance generalization across various disease patterns. To enhance symptom representation, researchers perform TF-IDF vectorization. Symptom descriptions are mapped into numeric vectors through this method. Feature space dimensionality is then reduced with the application of Chi-Square statistical feature selection. To correct the imbalance, researchers produce synthetic data via SMOTE. Researchers will train Deep Neural Networks and gradient-boosting classifiers on the optimized feature space. Soft voting ensembles will combine model predictions. Improves forecast accuracy and stability. Our experiments demonstrate the strong predictive performance and efficiency of our framework. 

Keywords: Artificial Intelligence; Healthcare Conversational Systems; TF-IDF Feature Engineering; Class Imbalance Handling; Ensemble Machine Learning; Deep Neural Networks; Medical Data Integration.

Received on: 12/05/2025, Revised on: 17/07/2025, Accepted on: 04/09/2025, Published on: 03/03/2026

DOI: 10.69888/FTSNL.2026.000643

FMDB Transactions on Sustainable Neuroscience Letters, 2026 Vol. 1 No. 1, Pages: 32–47

  • Views : 30
  • Downloads : 9
Download PDF