Penerapan Arsitektur Kappa dengan Kafka dan Spark untuk Pemrosesan Data Hipertensi di Media Sosial X

Authors

  • Legawan Perkasa Universitas 'Aisyiyah Yogyakarta
  • Tikardiha Hardiani Universitas 'Aisyiyah Yogyakarta

DOI:

https://doi.org/10.32493/jtsi.v9i1.57819

Keywords:

Kappa Architecture, Apache Kafka, Apache Spark, Hipertensi, X

Abstract

Hypertension is one of the major public health problems with a continuously increasing prevalence and is widely discussed on social media platform X. The dynamic and continuously flowing nature of social media data requires a Big Data-based processing approach capable of operating in real-time and in a scalable manner. This study aims to implement a streaming-based Big Data architecture (Kappa Architecture) using Apache Kafka and Apache Spark to process and analyze conversations about hypertension on the social media platform X in real-time. The proposed system integrates the X API as the data source, Apache Kafka as the immutable event log and streaming backbone, Apache Spark Structured Streaming as the real-time data processing engine, and MongoDB as the serving layer. The research methodology includes a literature review, system design, streaming-based data collection, real-time text cleaning and feature extraction, and performance evaluation using throughput, latency, and success rate parameters. A total of 10,000 tweets were collected over a two-month period and processed through a unified streaming pipeline. The implementation results show that the system successfully established a consistent end-to-end processing workflow, enabling real-time data ingestion and processing without separating batch and speed layers. The system achieved an average throughput of 19.23 tweets per second, a latency of approximately 520 seconds, and a success rate of 100%. This study concludes that the Kappa Architecture is effective, stable, and scalable for real-time processing and analysis of social media data in monitoring public health issues such as hypertension.

References

Abirami T and Dr. Chandrasekar B S. (2024). Kappa and Lambda Architectures for Telecom Big Data Pipelines. International Journal of Research Publication and Reviews, 5(9), 739–743. https://doi.org/10.13140/RG.2.2.16197.56803

Anom, H., Aji, S., & Prasetyo, A. C. (2024). Evaluasi Kinerja Jaringan WiFi Mahasiswa : Analisis Throughput , Delay , Jitter , dan Packet loss. Jurnal BATIRSI, 8(1), 23–27.

Bertin, J., Penka, N., & Debauche, O. (2022). An Optimized Kappa Architecture for IoT Data Management in Smart. Journal of Ubiquitous Systems & Pervasive Networks, 17(2), 59–65. https://doi.org/10.5383/JUSPN.17.02.002

Gede, K., Gede, I. P., Suputra, H., & Gde, I. A. (2024). Pengolahan Big Data Dengan Sharding Database Dan Kappa Architecture Untuk Data Time-Series. Jurnal Elektronik Ilmu Komputer Udayana, 13(1), 43–54. https://doi.org/https://doi.org/10.24843/JLK.2024.v13.i01.p05

Hilmy Farid, Dadang Yusup, C. (2022). Analisis Usability Pada Aplikasi Momby Spa Menggunakan Metode Usability Testing. Jurnal Ilmiah Wahana Pendidikan, 8(14), 155–163. https://doi.org/https://doi.org/10.5281/zenodo.6982246

Maulana, M. R., Fazilatunnisa, A., Febriansyah, M. Y., Muiz, A., & Fauzan, I. (2026). Analisis dan Prediksi Curah Hujan Bulanan Kota Serang Berbasis Apache Spark Menggunakan Dataset BPS Provinsi Banten. Jurnal Ilmu Komputer Dan Teknik Informatika, 2(1), 15–21. https://doi.org/https://doi.org/10.64803/juikti.v2i1.78

Mikola, A., Sari, M., Informasi, T., Informasi, S., Informasi, F. T., Kristen, U., & Wacana, S. (2022). Analisis Sistem Jaringan Berbasis QoS untuk Hot-Spot Di Institut Shanti Bhuana. JIFOTECH (JOURNAL OF INFORMATION TECHNOLOGY), 2(1), 2–6. https://doi.org/10.46229/jifotech.v2i1.398

Mita Permatasari, T. H. (2025). Implementation Of The K-Nearest Neighbor Algorithm For Low Sodium Food. Jurnal Sistem Informasi DanTeknologi Informasi, 7(3), 867–879. https://doi.org/https://doi.org/10.52005/jursistekni.v7i3.505

Musababa, M. A., Fachrie, M., & Yogyakarta, U. T. (2025). Data Streaming Pipeline Model Using DBSTREAM-Based Online Machine Learning for E-Commerce User Segmentation. Journal of Applied Informatics and Computing (JAIC), 9(6), 3346–3355. https://doi.org/https://doi.org/10.30871/jaic.v9i6.11522

Nabawi, F. (2022). Jurnal Implementasi Sistem Distribusi Pesan dan Proses Data Secara Real Time dengan Apache Kafka. Jurnal Teknologi Informatika Dan Komputer, 8(1), 173–189. https://doi.org/10.37012/jtik.v8i1.836

Nursinggah, L., Mufizar, T., & Perjuangan, U. (2024). Analisis Sentimen Pengguna Aplikasi X Terhadap Program Makan Siang Gratis. JITET (Jurnal Informatika Dan Teknik Elektro Terapan), 12(3). https://doi.org/https://doi.org/10.23960/jitet.v12i3.4336

Park, S., & Huh, J. H. (2023). A Study on Big Data Collecting and Utilizing Smart Factory Based Grid Networking Big Data Using Apache Kafka. IEEE Access, 11(September), 96131–96142. https://doi.org/10.1109/ACCESS.2023.3305586

Parmar, T. (2025). Data Architectures and Methods for Fast Track Data Processing Using Hot and Cold Paths. SSRN Electronic Journal, February. https://doi.org/10.2139/ssrn.5190568

Pradinata, A., Lestari Lokapitasari B, P., & Azis, D. H. (2023). Perancangan Aplikasi E-ticketing Dengan Model Arsitektur Microservice Menggunakan Kafka. Buletin Sistem Informasi Dan Teknologi Islam, 4(3), 286–295. https://doi.org/https://doi.org/10.33096/busiti.v4i3.1806

Puthenpariyarath, S. (2025). REAL-TIME DATA PROCESSING WITH KAFKA VS . PUB / SUB. International Journal of Data Analytics (IJDA), 5(1), 1–12. https://doi.org/https://doi.org/10.34218/IJDA_05_01_001

Sabrina, D., Iqbal, M., & Suri, N. (2026). Komponen Biaya yang Mempengaruhi Total Cost of Illness pada Pasien Hipertensi Rawat Inap: Narrative Review. Sains Medisina, 4(3), 218–223. https://doi.org/10.63004/snsmed.v4i3.902

Studies, M., & Guntupalli, B. (2023). ETL Architecture Patterns: Hub-and-Spoke, Lambda, and More. International Journal of AI, BigData, Computational and Management Studies, 4(3), 61–71. https://doi.org/10.63282/3050-9416.ijaibdcms-v4i3p107

Tri Buana, D. M. (2022). Penggunaan aplikasi tik tok (versi terbaru) dan kreativitas anak. Jurnal Inovasi, 16(12), 34–44. https://doi.org/https://doi.org/10.33557/ji.v16i2.2227

Vaghani Divyeshkumar. (2024). Hybrid Data Processing Approaches: Combining Batch and Real Time Processing with Spark. SSRN 49533. https://doi.org/https://doi.org/10.2139/ssrn.4953336

Zhou, Z., & Zhou, L. (2024). applied sciences A Distributed Real-Time Monitoring Scheme for Air Pressure Stream Data Based on Kafka. 14(12), 4967. https://doi.org/https://doi.org/10.3390/app14124967

Zulkifli, R. (2025). Analisis Sentimen Real-Time Media Sosial Menggunakan Edge Computing dan Apache Kafka. Bit-Tech, 7(3), 1106–1117. https://doi.org/10.32877/bt.v7i3.2372

Published

2026-01-30

How to Cite

Perkasa, L., & Hardiani, T. (2026). Penerapan Arsitektur Kappa dengan Kafka dan Spark untuk Pemrosesan Data Hipertensi di Media Sosial X. Jurnal Teknologi Sistem Informasi Dan Aplikasi, 9(1), 13–22. https://doi.org/10.32493/jtsi.v9i1.57819