IMPLEMENTATION OF GEMINI PRE-PROCESSING ON 2024 SIREKAP REVIEWS USING THE RANDOM FOREST ALGORITHM
Main Article Content
Amru Omar
Naufal Azmi Verdikha
Muhamad Ridwan
This study aims to classify reviews of the SIREKAP 2024 application by utilizing Large Language Model (LLM)-based Gemini pre-processing, Term Frequency–Inverse Document Frequency (TF-IDF) feature extraction, and the Random Forest algorithm as the classification method. The data used consist of user reviews obtained from the Google Play Store and categorized into five rating classes. Model performance evaluation was conducted using the 10-Fold Cross-Validation method with the Macro F1-Score metric. The testing results indicate that the lowest F1-Score achieved was 31.87%, while the highest reached 37.28%, with an overall average Macro F1-Score of 34.62%. These findings demonstrate that the Random Forest algorithm is capable of producing relatively stable classification performance through its ensemble learning mechanism, which combines multiple decision trees. However, its performance is still influenced by the imbalance in data distribution across classes. Therefore, Random Forest plays a role in maintaining prediction stability and reducing overfitting, although further development is required to improve classification performance on imbalanced review data
Amelia Yoga Lestari, & Joy Nashar Utamajaya. (2024). Audit Sistem Informasi Aplikasi Sirekap KPU: Analisis Keamanan dan Efisiensi. Switch : Jurnal Sains Dan Teknologi Informasi, 2(4), 23–32. https://doi.org/10.62951/switch.v2i4.178
Apriliah, W., Kurniawan, I., Baydhowi, M., & Haryati, T. (2021). Prediksi Kemungkinan Diabetes pada Tahap Awal Menggunakan Algoritma Klasifikasi Random Forest. Sistemasi, 10(1), 163. https://doi.org/10.32520/stmsi.v10i1.1129
Aziz, A., & Zakir, S. (2022). Indonesian Research Journal on Education: Jurnal Ilmu Pendidikan. 2(3), 1030–1037.
Aziz, W. A. (2021). Implementasi metode random forest pada klasifikasi data ulasan konsumen perusahaan (studi kasus: Aplikasi kai access). In Repository.Uinjkt.Ac.Id.
Azzahri, R. (2024). Tinjauan Kritis terhadap Penggunaan Aplikasi Sirekap dalam Proses Pemilihan Umum Presiden Tahun 2024. Iapa Proceedings Conference, 398. https://doi.org/10.30589/proceedings.2024.1067
Devia, E., & Jariah, A. (2023). Analisis Sentimen Review Aplikasi Video Conference Menggunakan Algoritma Support Vector Machine (Studi Kasus: Skype Dan Zoom). Jurnal Information System, 3(2), 65–72.
Hanafi, M. R., & R, R. K. (2024). Sentiment Analysis on Sirekap App Reviews on Google Play Using Naive Bayes Algorithm Analisis Sentimen pada Ulasan Aplikasi Sirekap di Google Play Menggunakan Algoritma Naive Bayes. 4(October), 1578–1586.
Larasati, F. A., Ratnawati, D. E., & Hanggara, B. T. (2022). Sentiment Analysis of Dana Application Reviews Using the Random Forest Method. Jurnal Pengembangan Teknologi Informasi Dan Ilmu Komputer, 6(9), 4305–4313.
Meguellati, E., Pratama, N., Sadiq, S., & Demartini, G. (2025). Are Large Language Models Good Data Preprocessors? WWW Companion 2025 - Companion Proceedings of the ACM Web Conference 2025, 2129–2132. https://doi.org/10.1145/3701716.3717568
Muthmainnah, D. (2025). KLASIFIKASI ULASAN APLIKASI SIREKAP 2024 DENGAN EKSTRAKSI FITUR WORD2VEC DAN METODE SUPPORT VECTOR MACHINE ( SVM ). 9(2), 3013–3019.
Prakoso Indaryono, N. A. (2024). Analisa Perbandingan Algoritma Random Forest Dan Naïve Bayes Untuk Klasifikasi Curah Hujan Berdasarkan Iklim Di Indonesia. JIPI (Jurnal Ilmiah Penelitian Dan Pembelajaran Informatika), 9(1), 158–167. https://doi.org/10.29100/jipi.v9i1.4421
Putri, J. A., Sari, N. Y., & Rahayuningtyas, F. D. (2025). EFEKTIVITAS PENGGUNAAN APLIKASI SIREKAP ( SISTEM INFORMASI REKAPITULASI ) DALAM PEMILU 2024. 2(2), 351–360.
Ramadhansyah, D., Asrofiq, A., & Yunefri, Y. (2024). Analisis Sentimen Ulasan penumpang maskapai penerbangan di Indonesia… ZONAsi. Jurnal Sistem Informasi, 6(2), 287–297.
Ridhoi, R., Azmi Verdikha, N., & Yulianto, F. (2025). Analisis Klasifikasi Ulasan Aplikasi Sirekap 2024 menggunakan Ekstraksi Fitur DistilBert Dan Metode Support Vector Machine. Jurnal Ilmiah Informatika (JIF).
Ridwan, M., & Utami, E. (2026). Optimized Hyperparameter Tuning for Improved Hate Speech Detection. 5(158), 525–534.
Saputra, A. C., & Saragih, A. S. (2022). KLASIFIKASI RATING APLIKASI ANDROID DI GOOGLE PLAY STORE MENGGUNAKAN ALGORITMA GRADIENT BOOST Agus Sehatman Saragih. Oktober, 6(1), 18–29.
Shanmugasundar, G., Vanitha, M., Čep, R., Kumar, V., Kalita, K., & Ramachandran, M. (2021). A comparative study of linear, random forest and adaboost regressions for modeling non-traditional machining. Processes, 9(11). https://doi.org/10.3390/pr9112015
Zhang, H., Dong, Y., Xiao, C., & Oyamada, M. (2023). Large Language Models as Data Preprocessors. 3–6.









