Classification of Daily Rainfall Using XGBoost on Imbalanced Data in the Special Region of Yogyakarta
DOI:
https://doi.org/10.21063/jtif.2026.V14.1.59-67Kata Kunci:
Rainfall, XGBoost, SMOTE, ClassificationAbstrak
Rainfall is a meteorological parameter that influences various sectors, such as agriculture, water resource management, and disaster mitigation; however, the process of classifying it still faces challenges, particularly due to imbalanced data across categories. This study aims to evaluate the performance of the XGBoost algorithm in classifying daily rainfall in the Special Region of Yogyakarta using NASA POWER data from 2000 to 2025, with input variables including air temperature, relative humidity, wind speed, and surface pressure. The evaluation was conducted using accuracy, precision, recall, and F1-score metrics to provide a more comprehensive overview of the model’s performance. The results indicate that the model achieved an accuracy of 0.82 and performed well in identifying light rain, and began to identify moderate rain, although not yet optimally; however, its performance remains limited for higher-intensity rain classes. This suggests that imbalanced data distribution remains a primary factor affecting model performance, making data quality and balance critical considerations in the development of rainfall classification models.
Referensi
[1] D. A. H. Panggabean, F. M. Sihombing, and N. M. Aruan, “Prediksi Tinggi Curah Hujan dan Kecepatan Angin Berdasarkan Data Cuaca dengan Penerapan Algoritma Artificial Neural Network (ANN),” SEMINASTIKA, vol. 3, no. 1, pp. 1–7, Nov. 2021, doi: 10.47002/seminastika.v3i1.237.
[2] H. Sitepu, D. Harisuseno, and J. S. Fidari, “Evaluasi Data Curah Hujan Satelit ERA-5 pada Berbagai Periode Data Hujan di Sub DAS Bodor Evaluation of ERA5 Satellite Rainfall Data at Various Rainfall Data Periods in Bodor Sub Watershed,” Jurnal Teknologi dan Rekayasa Sumber Daya Air, vol. 03, no. 02, pp. 626–636, 2023, doi: 10.21776/ub.jtresda.003.vol.no02.053.
[3] M. Sulistiyono, B. Satria, A. Sidauruk, and R. Wardhana, “Rainfall Prediction Using Multiple Linear Regression Algorithm,” JITK (Jurnal Ilmu Pengetahuan dan Teknologi Komputer), vol. 9, no. 1, pp. 17–22, Aug. 2023, doi: 10.33480/jitk.v9i1.4203.
[4] Badan Pusat Statistik DIY., Statistik Lingkungan Hidup Daerah Istimewa Yogyakarta. Badan Pusat Statistik Provinsi Daerah Istimewa Yogyakarta, 2025.
[5] I Gusti Ngurah Putu Dharmayasa, Cathleen Ariella Simatupang, and Doni Marisi Sinaga, “NASA Power’s: an Alternative Rainfall Data Resources for Hydrology Research and Planning Activities in Bali Island, Indonesia,” Journal of Infrastructure Planning and Engineering (JIPE), vol. 1, no. 1, pp. 1–7, Apr. 2022, doi: 10.22225/jipe.1.1.2022.1-7.
[6] D. Sangaji and T. Sutabri, “Analisis XGBoost dan Random Forest untuk Prediksi Curah Hujan dalam Mendukung Mitigasi Karhutla,” Jurnal Pustaka AI (Pusat Akses Kajian Teknologi Artificial Intelligence), vol. 5, no. 1, pp. 13–18, Apr. 2025, doi: 10.55382/jurnalpustakaai.v5i1.905.
[7] A. S. Agung, A. A. Fauzi, A. A. Nur Risal, and F. Adiba, “Implementasi Teknik Data Mining terhadap Klasifikasi Data Prediksi Curah Hujan BMKG Di Sulawesi Selatan,” Jurnal Tekno Insentif, vol. 17, no. 1, pp. 22–23, Apr. 2023, doi: 10.36787/jti.v17i1.955.
[8] T. Hardiani and R. N. Putri, “Implementasi Metode Naïve Bayes Classifier Untuk Klasifikasi Stunting Pada Balita,” Digital Transformation Technology, vol. 4, no. 1, pp. 621–627, Aug. 2024, doi: 10.47709/digitech.v4i1.4481.
[9] W. Puji and A. 1, “Penggunaan Aplikasi Machine Learning (Ml) dalam Kurikulum Perubahan Iklim,” Journal of Education Research, vol. 5, no. 4, 2024.
[10] G. Almuzadid and R. Subhiyakto, “Stroke Risk Classification Using the Ensemble Learning Method of XGBoost and Random Forest,” Journal of Applied Informatics and Computing (JAIC), vol. 9, no. 3, p. 828, 2025, [Online]. Available: http://jurnal.polibatam.ac.id/index.php/JAIC
[11] A. Syahreza, N. K. Ningrum, and M. A. Syahrazy, “Perbandingan Kinerja Model Prediksi Cuaca: Random Forest, Support Vector Regression, dan XGBoost,” Edumatic: Jurnal Pendidikan Informatika, vol. 8, no. 2, pp. 526–534, Dec. 2024, doi: 10.29408/edumatic.v8i2.27640.
[12] C. Valentino et al., “Analisis Kinerja XGBoost Menggunakan Bayesian Optimization dalam Prediksi Harga Ethereum,” JNATIA: Jurnal Nasional Teknologi Informasi dan Aplikasinya, vol. 3, no. 4, Aug. 2025.
[13] A. Khairunnisa, “Perbandingan Model Random Forest dan XGBoost Untuk Prediksi Kejahatan Kesusilaan di Provinsi Jawa Barat,” JIKO (Jurnal Informatika dan Komputer), vol. 7, no. 2, p. 202, Sep. 2023, doi: 10.26798/jiko.v7i2.799.
[14] A. D. P. Putri, M. Al Haris, F. Fauzi, and S. Amri, “K-Nearest Neighbor (KNN) Method for Weather Data Prediction,” Journal Of Data Insights, vol. 3, no. 1, pp. 56–64, Jun. 2025, doi: 10.26714/jodi.v3i1.214.
[15] I. Fau, “Penerapan Data Mining Dengan Metode Support Vector Machine Untuk Prediksi Cuaca,” Bulletin of Data Science, vol. 4, no. 1, Oct. 2024, [Online]. Available: https://ejurnal.seminar-id.com/index.php/bulletinds
[16] N. Akhir dari Penulis Pertama et al., “Penerapan Algoritma Decision Tree C4.5 untuk Prediksi Cuaca di Kota Semarang,” INDEXIA: Informatic and Computational Intelligent Journal, vol. 07, no. 01, pp. 45–52, 2025.
[17] P. Ayu Firnanda et al., “Analisis Perbandingan Decision Tree dan Random Forest dalam Klasifikasi Penjualan Produk pada Supermarket,” Emerging Statistics and Data Science Journal, vol. 3, no. 1, 2025.
[18] N. A. Prakoso Indaryono, “Analisa Perbandingan Algoritma Random Forest dan Naive Bayes untuk Klasifikasi Curah Hujan Berdasarkan Iklim di Indonesia,” JIPI (Jurnal Ilmiah Penelitian dan Pembelajaran Informatika), vol. 9, no. 1, pp. 158–167, Feb. 2024, doi: 10.29100/jipi.v9i1.4421.
[19] J. Zhen et al., “Performance of XGBoost Ensemble Learning Algorithm for Mangrove Species Classification with Multisource Spaceborne Remote Sensing Data,” Journal of Remote Sensing (United States), vol. 4, Jan. 2024, doi: 10.34133/remotesensing.0146.
[20] E. P. Cendana1*, “Visualization of COVID-19 Data in Yogyakarta City Using Data Studio,” 2022.
[21] A. Luthfiarta, A. Febriyanto, H. Lestiawan, and W. Wicaksono, “Analisa Prakiraan Cuaca dengan Parameter Suhu, Kelembaban, Tekanan Udara, dan Kecepatan Angin Menggunakan Regresi Linear Berganda,” JOINS (Journal of Information System), vol. 5, no. 1, pp. 10–17, May 2020, doi: 10.33633/joins.v5i1.2760.
[22] I Dewa Gede Loka Maheswara and Ahmad Hanif Al’aziz, “Perbandingan Model Machine Learning pada Klasifikasi Curah Hujan di Bogor,” INTI Nusa Mandiri, vol. 19, no. 2, pp. 202–210, Feb. 2025, doi: 10.33480/inti.v19i2.6296.
[23] S. Sandiwarno, “Penerapan Machine Learning untuk Prediksi Bencana Banjir,” Jurnal Sistem Informasi Bisnis, vol. 14, no. 1, pp. 62–76, Jan. 2024, doi: 10.21456/vol14iss1pp62-76.
[24] A. Aprilia, A. B. Wahidin, and A. F. Abdurrahman, “Integration of Machine Learning and NASA POWER Dataset for Predicting Coffee Production in Lampung,” Jurnal Fisika Flux: Jurnal Ilmiah Fisika FMIPA Universitas Lambung Mangkurat, vol. 22, no. 1, p. 44, Mar. 2025, doi: 10.20527/flux.v22i1.20980.
[25] F. Yulian Pamuji, A. Rofiqul Muslikh, R. Muhammad Arief, and D. Muti, “Komparasi Metode Mean dan KNN Imputation Dalam Mengatasi Missing Value pada Dataset Kecil,” JIP (Jurnal Informatika Polinema), vol. 10, no. 2, Feb. 2024, [Online]. Available: https://archive.ics.uci.edu/datasets.
[26] P. A. Saputra, R. Rahmaddeni, S. S. Irawan, R. Prianto, and D. Delfi, “Analisis Faktor Dominan Minat Beli Generasi Z di Shopee Menggunakan Algoritma Naïve Bayes,” sudo Jurnal Teknik Informatika, vol. 4, no. 3, pp. 247–256, Nov. 2025, doi: 10.56211/sudo.v4i3.1137.
[27] C. Emilia Sukmawati et al., “Efektivitas Algoritma AdaBoost dan XGBoost pada Dataset Obesitas Populasi Dewasa,” Jambura Journal of Informatics, vol. 6, no. 2, pp. 101–111, 2024, doi: 10.37905/jji.
[28] R. Winurputra and D. E. Ratnawati, “Peramalan Penjualan Produk Menggunakan Extreme Gradient Boosting (XGBoost) dan Kerangka Kerja CRISP-DM untuk Pengoptimalan Manajemen Persediaan (Studi Kasus: UB Mart),” Jurnal Teknologi Informasi dan Ilmu Komputer, vol. 12, no. 2, pp. 417–428, Apr. 2025, doi: 10.25126/jtiik.2025129451.
[29] I. Muslim Karo Karo, “Implementasi Metode XGBoost dan Feature Importance untuk Klasifikasi pada Kebakaran Hutan dan Lahan,” Journal of Software Engineering, Information and Communication Technology, vol. 1, no. 1, pp. 11–18, 2020, doi: Vol.1No.1,November2020pp.11-18.
[30] S. N. S. Muslim, F. Nurdiyansyah, and A. Y. Rahman, “Perbandingan Algoritma Naive Bayes dan KNN Dalam Analisis Sentimen Ulasan Pengguna Aplikasi Capcut,” Jurnal Informatika dan Teknik Elektro Terapan, vol. 12, no. 3S1, Oct. 2024, doi: 10.23960/jitet.v12i3s1.5156.
[31] D. D. N. Cahyo and A. Sunyoto, “Analisis Perbandingan Klasifikasi dalam Data Mining pada Prediksi Hujan dengan menggunakan Algoritma LSTM dan GRU,” Jurnal Sains dan Informatika, vol. 11, no. 1, pp. 40–49, Jun. 2025, doi: 10.34128/jsi.v11i1.1212.
Unduhan
Diterbitkan
Terbitan
Bagian
Lisensi
Hak Cipta (c) 2026 Ayyesa Azzahra Mulia Ramadani, Danur Wijayanto

Artikel ini berlisensiCreative Commons Attribution-ShareAlike 4.0 International License.
Jurnal ini dilisensikan berdasarkan Lisensi Internasional Creative Commons Attribution-ShareAlike 4.0 International License (CC BY-SA 4.0).
Penulis memegang hak cipta dan memberikan jurnal hak penerbitan pertama.
Karya tersebut dapat dibagikan dan diadaptasi, bahkan untuk tujuan komersial, selama penghargaan yang sesuai diberikan dan setiap kreasi baru dilisensikan dengan ketentuan yang sama.
