Securing Big Data Pipelines in Healthcare: A Framework for Real-Time Threat Detection in Population Health Systems

Authors

  • Adaeze Ojinika Ezeogu Affiliation: University of West Georgia, USA. Department: MSc. Cybersecurity & Information Management Author
  • Asafa Emmanuel Affiliation: Joseph Sarwan Tarka University, Makurdi, Benue state, Nigeria Department of Biochemistry and Biostatistics, Author

Keywords:

Healthcare cybersecurity, Real-time threat detection, Big data security, Stream processing, Population health, Apache Kafka, Splunk, SIEM, Anomaly detection

Abstract

Purpose: The paper aims to develop a comprehensive security framework for big data pipelines in healthcare, focusing on real-time threat detection and mitigation. It addresses the increasing security and privacy risks associated with the rapid growth of health data streams from IoT devices, electronic health records, and wearable technologies. Methodology: The study extends previous work on real-time survival risk prediction by designing a layered security architecture that integrates Apache Kafka for stream processing and Splunk SIEM for event monitoring. A machine learning-based anomaly detection algorithm is implemented to identify potential security breaches within 500 milliseconds, achieving 97.3% detection accuracy and a false positive rate of 0.02%.

The framework is evaluated in a simulated population health system processing 2.5 million health events per second and is specifically designed to tackle five key security challenges: unauthorized data access, data injection attacks, privacy breaches, insider threats, and compliance violations.

Findings: The results show that the proposed framework:

  • Achieves near-instantaneous threat detection (500 ms)
  • Delivers 97.3% detection accuracy with 0.02% false positive rate
  • Reduces mean time to threat detection (MTTD) by 84% compared to batch-processing systems
  • Maintains HIPAA compliance throughout the data pipeline
    Detects multi-stage and sophisticated attack patterns by correlating threats across multiple data streams.

Contribution: This research provides a practical and scalable solution for securing healthcare big data infrastructures while enabling advanced population health analytics. The combination of Apache Kafka, Splunk SIEM, and ML-based anomaly detection offers significant improvements in detection speed, accuracy, and compliance. The work contributes to the field by presenting a real-time, multi-layered security framework that can be adopted by healthcare organizations to enhance data security, privacy, and operational resilience.

Downloads

Published

2025-05-25