Articles
| Open Access |
https://doi.org/10.37547/ijmsphr/Volume07Issue02-02
Predicting Infectious Disease Outbreaks Using Machine Learning and Real-Time Epidemiological Data: Leverage Social Media, Environmental, And Public Health Data to Forecast Outbreaks Like Influenza, COVID-19, Or RSV
Abstract
Accurate and timely prediction of infectious disease outbreaks is critical for effective public health response. In this study, we developed a machine learning framework that integrates real-time epidemiological data, social media signals, environmental variables, and policy interventions to forecast influenza and COVID‑19 outbreaks. We evaluated multiple models, including logistic regression, random forest, XGBoost, and LSTM neural networks, across classification and regression tasks. XGBoost achieved the highest accuracy for influenza outbreak detection, while LSTM networks outperformed other models in forecasting COVID‑19 case counts, particularly for longer-term predictions. Feature analysis revealed that social media indicators, environmental conditions, and policy measures significantly enhanced predictive performance. The results demonstrate that multimodal machine learning models can provide early warnings, inform resource allocation, and support data-driven decision-making in the US public healthcare system. Our findings highlight the potential of integrating diverse real-time data streams with advanced machine learning techniques to strengthen epidemic preparedness and response.
Keywords
Infectious disease prediction, machine learning, real-time epidemiological data, social media analytics, influenza, COVID 19, public health forecasting
References
Boser, B. E., Guyon, I. M., & Vapnik, V. N. (1992). A training algorithm for optimal margin classifiers. Proceedings of the Fifth Annual Workshop on Computational Learning Theory, 144–152. https://doi.org/10.1145/130385.130401
Breiman, L. (2001). Random forests. Machine Learning, 45(1), 5–32. https://doi.org/10.1023/A:1010933404324
Centers for Disease Control and Prevention. (2020). Overview of influenza surveillance in the United States. https://www.cdc.gov/flu/weekly/overview.htm
Dong, E., Du, H., & Gardner, L. (2020). An interactive web-based dashboard to track COVID-19 in real time. The Lancet Infectious Diseases, 20(5), 533–534. https://doi.org/10.1016/S1473-3099(20)30120-1
Ginsberg, J., Mohebbi, M. H., Patel, R. S., Brammer, L., Smolinski, M. S., & Brilliant, L. (2009). Detecting influenza epidemics using search engine query data. Nature, 457(7232), 1012–1014. https://doi.org/10.1038/nature07634
Hochreiter, S., & Schmidhuber, J. (1997). Long short-term memory. Neural Computation, 9(8), 1735–1780. https://doi.org/10.1162/neco.1997.9.8.1735
Kermack, W. O., & McKendrick, A. G. (1927). A contribution to the mathematical theory of epidemics. Proceedings of the Royal Society A, 115(772), 700–721. https://doi.org/10.1098/rspa.1927.0118
Liu, Y., Gayle, A. A., Wilder-Smith, A., & Rocklöv, J. (2020). The reproductive number of COVID-19 is higher compared to SARS coronavirus. Journal of Travel Medicine, 27(2), taaa021. https://doi.org/10.1093/jtm/taaa021
Paul, M. J., & Dredze, M. (2011). You are what you tweet: Analyzing Twitter for public health. Proceedings of the Fifth International AAAI Conference on Weblogs and Social Media, 265–272.
Shaman, J., Pitzer, V. E., Viboud, C., Grenfell, B. T., & Lipsitch, M. (2010). Absolute humidity and the seasonal onset of influenza in the continental United States. PLoS Biology, 8(2), e1000316. https://doi.org/10.1371/journal.pbio.1000316
Umam, S., & Razzak, R. B. (2024, October). Linguistic disparities in mental health services: Analyzing the impact of spanish language support availability in saint louis region, Missouri. In APHA 2024 Annual Meeting and Expo. APHA.
UCI Machine Learning Repository. (2019). Influenza outbreak event prediction via Twitter dataset. University of California, Irvine. https://archive.ics.uci.edu/ml/datasets/Influenza+Outbreak+Event+Prediction+via+Twitter
World Health Organization. (2020). Coronavirus disease (COVID-19) pandemic. https://www.who.int/emergencies/diseases/novel-coronavirus-2019
Xie, J., Zhu, Y., & Li, Y. (2020). Modeling COVID-19 epidemic trends and patterns using machine learning. IEEE Access, 8, 201833–201843. https://doi.org/10.1109/ACCESS.2020.3037070
Zhou, X., Ye, J., & Feng, Y. (2020). Tuberculosis surveillance by analyzing Twitter data. IEEE Transactions on Computational Social Systems, 7(3), 604–613. https://doi.org/10.1109/TCSS.2020.2980207
Umam, S., & Razzak, R. B. (2025, November). A 20-Year Overview of Trends in Secondhand Smoke Exposure Among Cardiovascular Disease Patients in the US: 1999–2020. In APHA 2025 Annual Meeting and Expo. APHA.
Razzak, R. B., & Umam, S. (2025, November). Health Equity in Action: Utilizing PRECEDE-PROCEED Model to Address Gun Violence and associated PTSD in Shaw Community, Saint Louis, Missouri. In APHA 2025 Annual Meeting and Expo. APHA.
Razzak, R. B., & Umam, S. (2025, November). A Place-Based Spatial Analysis of Social Determinants and Opioid Overdose Disparities on Health Outcomes in Illinois, United States. In APHA 2025 Annual Meeting and Expo. APHA.
Umam, S., Razzak, R. B., Munni, M. Y., & Rahman, A. (2025). Exploring the non-linear association of daily cigarette consumption behavior and food security-An application of CMP GAM regression. PLoS One, 20(7), e0328109.
Estak Ahmed, An Thi Phuong Nguyen, Aleya Akhter, KAMRUN NAHER, & HOSNE ARA MALEK. (2025). Advancing U.S. Healthcare with LLM–Diffusion Hybrid Models for Synthetic Skin Image Generation and Dermatological AI. Journal of Medical and Health Studies, 6(5), 83-90. https://doi.org/10.32996/jmhs.2025.6.5.11
Nitu, F. N., Mia, M. M., Roy, M. K., Yezdani, S., FINDIK, B., & Nipa, R. A. (2025). Leveraging Graph Neural Networks for Intelligent Supply Chain Risk Management in the Era of Industry 4.0. International Interdisciplinary Business Economics Advancement Journal, 6(10), 21-33.
Siddique, M. T., Uddin, M. N., Gharami, A. K., Khan, M. S., Roy, M. K., Sharif, M. K., & Chambugong, L. (2025). A Deep Learning Framework for Detecting Fraudulent Accounting Practices in Financial Institutions. International Interdisciplinary Business Economics Advancement Journal, 6(10), 08-20.
Mia, M. M., Al Mamun, A., Ahmed, M. P., Tisha, S. A., Habib, S. A., & Nitu, F. N. (2025). Enhancing Financial Statement Fraud Detection through Machine Learning: A Comparative Study of Classification Models. Emerging Frontiers Library for The American Journal of Engineering and Technology, 7(09), 166-175.
Akhi, S. S., Ahamed, M. I., Alom, M. S., Rakin, A., Awal, A., & Al Mamoon, I. (2025, July). Boosted Forest Soft Ensemble of XGBoost, Gradient Boosting, and Random Forest with Explainable AI for Thyroid Cancer Recurrence Prediction. In 2025 International Conference on Quantum Photonics, Artificial Intelligence, and Networking (QPAIN) (pp. 1-6). IEEE.
Alom, M. S., Akhi, S. S., Borsha, S. N., Mia, N., Tamim, F. S., & Nabin, J. A. (2025, July). Federated Machine Learning for Cardiovascular Risk Assessment: A Decentralized XGBoost Approach. In 2025 International Conference on Quantum Photonics, Artificial Intelligence, and Networking (QPAIN) (pp. 1-6). IEEE.
Akhi, S. S., Rahaman, M. A., & Alom, M. S. An Explainable and Robust Machine Learning Approach for Autism Spectrum Disorder Prediction.
Rabbi, M. A., Rijon, R. H., Akhi, S. S., Hossain, A., & Jeba, S. M. (2025, January). A Detailed Analysis of Machine Learning Algorithm Performance in Heart Disease Prediction. In 2025 4th International Conference on Robotics, Electrical and Signal Processing Techniques (ICREST) (pp. 259-263). IEEE.
Mujiba Shaima, Mazharul Islam Tusher, Estak Ahmed, Sharmin Sultana Akhi, & Rayhan Hassan Mahin. (2025). Machine Learning Techniques and Insights for Cardiovascular or Heart Disease Prediction. Academic International Journal of Engineering Science, 3(01), 22-35.
Jamee, S. S., Arif, M., Rahman, M. M., YASSAR, I. S., & Hossain, M. A. (2025). Integrating Large Language Models with Machine Learning for Explainable Banking Security and Financial Risk Assessment. International Interdisciplinary Business Economics Advancement Journal, 6(11), 8-18.
Article Statistics
Downloads
Copyright License
Copyright (c) 2026 Md. Emran Hossen, Aleya Akhter, Sonya Ghosh

This work is licensed under a Creative Commons Attribution 4.0 International License.