Articles | Open Access | https://doi.org/10.37547/ijmsphr/Volume07Issue02-02

Predicting Infectious Disease Outbreaks Using Machine Learning and Real-Time Epidemiological Data: Leverage Social Media, Environmental, And Public Health Data to Forecast Outbreaks Like Influenza, COVID-19, Or RSV

Md. Emran Hossen , Department of Science in Biomedical Engineering, Gannon University, USA
Aleya Akhter , Master of Public Health Northern University Bangladesh, Dhaka, Bangladesh
Sonya Ghosh , Department of public health, Monroe University, USA
Musomi Khandaker , Department of Public Health, king Graduate School, Monroe University.
Md Noman Azam , Department of Health Sciences and Leadership, St. Francis College
Hosne Ara Malek , MBBS(USTC), DMU(DU), CCD(BIRDEM), University of Greifswald, Germany
Kamrun Naher , MBBS (USTC), DMU, RDMS, USA
Md Mahabubur Rahman Bhuiyan , Washington Dc. Department of Healthcare informatics, University of Potomac, USA

Abstract

Accurate and timely prediction of infectious disease outbreaks is critical for effective public health response. In this study, we developed a machine learning framework that integrates real-time epidemiological data, social media signals, environmental variables, and policy interventions to forecast influenza and COVID‑19 outbreaks. We evaluated multiple models, including logistic regression, random forest, XGBoost, and LSTM neural networks, across classification and regression tasks. XGBoost achieved the highest accuracy for influenza outbreak detection, while LSTM networks outperformed other models in forecasting COVID‑19 case counts, particularly for longer-term predictions. Feature analysis revealed that social media indicators, environmental conditions, and policy measures significantly enhanced predictive performance. The results demonstrate that multimodal machine learning models can provide early warnings, inform resource allocation, and support data-driven decision-making in the US public healthcare system. Our findings highlight the potential of integrating diverse real-time data streams with advanced machine learning techniques to strengthen epidemic preparedness and response.

Keywords

Infectious disease prediction, machine learning, real-time epidemiological data, social media analytics, influenza, COVID 19, public health forecasting

References

Boser, B. E., Guyon, I. M., & Vapnik, V. N. (1992). A training algorithm for optimal margin classifiers. Proceedings of the Fifth Annual Workshop on Computational Learning Theory, 144–152. https://doi.org/10.1145/130385.130401

Breiman, L. (2001). Random forests. Machine Learning, 45(1), 5–32. https://doi.org/10.1023/A:1010933404324

Centers for Disease Control and Prevention. (2020). Overview of influenza surveillance in the United States. https://www.cdc.gov/flu/weekly/overview.htm

Dong, E., Du, H., & Gardner, L. (2020). An interactive web-based dashboard to track COVID-19 in real time. The Lancet Infectious Diseases, 20(5), 533–534. https://doi.org/10.1016/S1473-3099(20)30120-1

Ginsberg, J., Mohebbi, M. H., Patel, R. S., Brammer, L., Smolinski, M. S., & Brilliant, L. (2009). Detecting influenza epidemics using search engine query data. Nature, 457(7232), 1012–1014. https://doi.org/10.1038/nature07634

Hochreiter, S., & Schmidhuber, J. (1997). Long short-term memory. Neural Computation, 9(8), 1735–1780. https://doi.org/10.1162/neco.1997.9.8.1735

Kermack, W. O., & McKendrick, A. G. (1927). A contribution to the mathematical theory of epidemics. Proceedings of the Royal Society A, 115(772), 700–721. https://doi.org/10.1098/rspa.1927.0118

Liu, Y., Gayle, A. A., Wilder-Smith, A., & Rocklöv, J. (2020). The reproductive number of COVID-19 is higher compared to SARS coronavirus. Journal of Travel Medicine, 27(2), taaa021. https://doi.org/10.1093/jtm/taaa021

Paul, M. J., & Dredze, M. (2011). You are what you tweet: Analyzing Twitter for public health. Proceedings of the Fifth International AAAI Conference on Weblogs and Social Media, 265–272.

Shaman, J., Pitzer, V. E., Viboud, C., Grenfell, B. T., & Lipsitch, M. (2010). Absolute humidity and the seasonal onset of influenza in the continental United States. PLoS Biology, 8(2), e1000316. https://doi.org/10.1371/journal.pbio.1000316

Umam, S., & Razzak, R. B. (2024, October). Linguistic disparities in mental health services: Analyzing the impact of spanish language support availability in saint louis region, Missouri. In APHA 2024 Annual Meeting and Expo. APHA.

UCI Machine Learning Repository. (2019). Influenza outbreak event prediction via Twitter dataset. University of California, Irvine. https://archive.ics.uci.edu/ml/datasets/Influenza+Outbreak+Event+Prediction+via+Twitter

World Health Organization. (2020). Coronavirus disease (COVID-19) pandemic. https://www.who.int/emergencies/diseases/novel-coronavirus-2019

Xie, J., Zhu, Y., & Li, Y. (2020). Modeling COVID-19 epidemic trends and patterns using machine learning. IEEE Access, 8, 201833–201843. https://doi.org/10.1109/ACCESS.2020.3037070

Zhou, X., Ye, J., & Feng, Y. (2020). Tuberculosis surveillance by analyzing Twitter data. IEEE Transactions on Computational Social Systems, 7(3), 604–613. https://doi.org/10.1109/TCSS.2020.2980207

Umam, S., & Razzak, R. B. (2025, November). A 20-Year Overview of Trends in Secondhand Smoke Exposure Among Cardiovascular Disease Patients in the US: 1999–2020. In APHA 2025 Annual Meeting and Expo. APHA.

Razzak, R. B., & Umam, S. (2025, November). Health Equity in Action: Utilizing PRECEDE-PROCEED Model to Address Gun Violence and associated PTSD in Shaw Community, Saint Louis, Missouri. In APHA 2025 Annual Meeting and Expo. APHA.

Razzak, R. B., & Umam, S. (2025, November). A Place-Based Spatial Analysis of Social Determinants and Opioid Overdose Disparities on Health Outcomes in Illinois, United States. In APHA 2025 Annual Meeting and Expo. APHA.

Umam, S., Razzak, R. B., Munni, M. Y., & Rahman, A. (2025). Exploring the non-linear association of daily cigarette consumption behavior and food security-An application of CMP GAM regression. PLoS One, 20(7), e0328109.

Estak Ahmed, An Thi Phuong Nguyen, Aleya Akhter, KAMRUN NAHER, & HOSNE ARA MALEK. (2025). Advancing U.S. Healthcare with LLM–Diffusion Hybrid Models for Synthetic Skin Image Generation and Dermatological AI. Journal of Medical and Health Studies, 6(5), 83-90. https://doi.org/10.32996/jmhs.2025.6.5.11

Nitu, F. N., Mia, M. M., Roy, M. K., Yezdani, S., FINDIK, B., & Nipa, R. A. (2025). Leveraging Graph Neural Networks for Intelligent Supply Chain Risk Management in the Era of Industry 4.0. International Interdisciplinary Business Economics Advancement Journal, 6(10), 21-33.

Siddique, M. T., Uddin, M. N., Gharami, A. K., Khan, M. S., Roy, M. K., Sharif, M. K., & Chambugong, L. (2025). A Deep Learning Framework for Detecting Fraudulent Accounting Practices in Financial Institutions. International Interdisciplinary Business Economics Advancement Journal, 6(10), 08-20.

Mia, M. M., Al Mamun, A., Ahmed, M. P., Tisha, S. A., Habib, S. A., & Nitu, F. N. (2025). Enhancing Financial Statement Fraud Detection through Machine Learning: A Comparative Study of Classification Models. Emerging Frontiers Library for The American Journal of Engineering and Technology, 7(09), 166-175.

Akhi, S. S., Ahamed, M. I., Alom, M. S., Rakin, A., Awal, A., & Al Mamoon, I. (2025, July). Boosted Forest Soft Ensemble of XGBoost, Gradient Boosting, and Random Forest with Explainable AI for Thyroid Cancer Recurrence Prediction. In 2025 International Conference on Quantum Photonics, Artificial Intelligence, and Networking (QPAIN) (pp. 1-6). IEEE.

Alom, M. S., Akhi, S. S., Borsha, S. N., Mia, N., Tamim, F. S., & Nabin, J. A. (2025, July). Federated Machine Learning for Cardiovascular Risk Assessment: A Decentralized XGBoost Approach. In 2025 International Conference on Quantum Photonics, Artificial Intelligence, and Networking (QPAIN) (pp. 1-6). IEEE.

Akhi, S. S., Rahaman, M. A., & Alom, M. S. An Explainable and Robust Machine Learning Approach for Autism Spectrum Disorder Prediction.

Rabbi, M. A., Rijon, R. H., Akhi, S. S., Hossain, A., & Jeba, S. M. (2025, January). A Detailed Analysis of Machine Learning Algorithm Performance in Heart Disease Prediction. In 2025 4th International Conference on Robotics, Electrical and Signal Processing Techniques (ICREST) (pp. 259-263). IEEE.

Mujiba Shaima, Mazharul Islam Tusher, Estak Ahmed, Sharmin Sultana Akhi, & Rayhan Hassan Mahin. (2025). Machine Learning Techniques and Insights for Cardiovascular or Heart Disease Prediction. Academic International Journal of Engineering Science, 3(01), 22-35.

Jamee, S. S., Arif, M., Rahman, M. M., YASSAR, I. S., & Hossain, M. A. (2025). Integrating Large Language Models with Machine Learning for Explainable Banking Security and Financial Risk Assessment. International Interdisciplinary Business Economics Advancement Journal, 6(11), 8-18.

Article Statistics

Downloads

Download data is not yet available.

Copyright License

Download Citations

How to Cite

Hossen, M. E. ., Akhter, A. ., Ghosh, S. ., Khandaker, M. ., Azam, M. N. ., Malek, H. A. ., Naher, K. ., & Bhuiyan, M. M. R. . (2026). Predicting Infectious Disease Outbreaks Using Machine Learning and Real-Time Epidemiological Data: Leverage Social Media, Environmental, And Public Health Data to Forecast Outbreaks Like Influenza, COVID-19, Or RSV. International Journal of Medical Science and Public Health Research, 7(02). https://doi.org/10.37547/ijmsphr/Volume07Issue02-02