ejecvnlp Open Access Journal

European Journals of Emerging Computer Vision and Natural Language Processing

eISSN: Applied
Publication Frequency : 2 Issues per year.

  • Peer Reviewed & International Journal
Table of Content
Issues (Year-wise)
Loading…
✓ Article Published

Open Access iconOpen Access

ARTICLE

Enhancing Indonesian Scientific Article Management through Machine Learning and NLP

1 Department of Communication, University of Nevada, Las Vegas (UNLV), Las Vegas, NV, USA
2 Department of Political Science, University of Arkansas, Fayetteville, AR, USA

Citations: Loading…
ABSTRACT VIEWS: 4   |   FILE VIEWS: 3   |   PDF: 3   HTML: 0   OTHER: 0   |   TOTAL: 7
Views + Downloads (Last 90 days)
Cumulative % included

Abstract

The exponential growth of scientific literature in Indonesia necessitates efficient automated systems for organizing, retrieving, and assessing the originality of scholarly articles. This paper explores the application of computational methods, specifically machine learning algorithms for classification and similarity measures, to enhance the management of Indonesian scientific journal articles. We investigate the effectiveness of Naive Bayes and Support Vector Machine (SVM) algorithms for thematic categorization and employ Cosine Similarity for identifying content proximity. The proposed framework includes data preprocessing, feature extraction using TF-IDF, and rigorous evaluation of the models. The findings demonstrate the viability of these approaches in improving the accessibility, discoverability, and integrity of the burgeoning volume of Indonesian academic publications. The Naive Bayes method, when applied to a balanced dataset, achieved an impressive F1-score of 98%, indicating high classification accuracy, with the classification process taking less than 60 minutes. Article similarity detection using the Cosine Similarity method accurately reflected the degree of similarity between concatenated titles and abstracts. This research offers a robust framework for enhancing the classification and search capabilities within national aggregator services like Garba Rujukan Digital (GARUDA).


Keywords

Similarity Detection,, Indonesian Scientific Journals, Natural Language Processing, Support Vector Machine

References

[1] L. Lukman et al., “Proposal of the s-score for measuring the performance of researchers, institutions, and journals in Indonesia,” Science Editing, vol. 5, no. 2, pp. 135–141, 2018, doi: 10.6087/KCSE.138.

[2] M. M. Saritas and A. Yasar, “Performance analysis of ANN and naive Bayes classification algorithm for data classification,” International Journal of Intelligent Systems and Applications in Engineering, vol. 7, no. 2, pp. 88–91, 2019, doi: 10.18201//ijisae.2019252786.

[3] A. S. Osman, “Data mining techniques: review,” International Journal of Data Science Research, vol. 2, no. 1, pp. 1–4, 2019.

[4] F. R. Lumbanraja, E. Fitri, Ardiansyah, A. Junaidi, and R. Prabowo, “Abstract classification using support vector machine algorithm (case study: abstract in a computer science journal),” Journal of Physics: Conference Series, vol. 1751, no. 1. 2021, doi: 10.1088/1742-6596/1751/1/012042.

[5] S. Latif, U. Suwardoyo, and E. A. W. Sanadi, “Content abstract classification using naive Bayes,” Journal of Physics: Conference Series, vol. 979, no. 1, 2018, doi: 10.1088/1742-6596/979/1/012036.

[6] B. Parlak and A. K. Uysal, “On classification of abstracts obtained from medical journals,” Journal of Information Science, vol. 46, no. 5, pp. 648–663, 2020, doi: 10.1177/0165551519860982.

[7] I. C. Chang, T. K. Yu, Y. J. Chang, and T. Y. Yu, “Applying text mining, clustering analysis, and latent dirichlet allocation techniques for topic classification of environmental education journals,” Sustainability, vol. 13, no. 19, 2021, doi: 10.3390/su131910856.

[8] K. Yasaswi, V. K. Kambala, P. S. Pavan, M. Sreya, and V. Jasmika, “News classification using natural language processing,” in Proceedings of 3rd International Conference on Intelligent Engineering and Management, ICIEM 2022, 2022, pp. 63–67, doi: 10.1109/ICIEM54221.2022.9853174.

[9] N. N. Qomariyah, A. S. Araminta, R. Reynaldi, M. Senjaya, S. D. A. Asri, and D. Kazakov, “NLP text classification for COVID-19 automatic detection from radiology report in Indonesian language,” in 2022 5th International Seminar on Research of Information Technology and Intelligent Systems, ISRITI 2022, 2022, pp. 565–569, doi: 10.1109/ISRITI56927.2022.10053077.

[10] N. Kumar, R. R. Suman, and S. Kumar, “Text classification and topic modelling of web extracted data,” in 2021 2nd Global Conference for Advancement in Technology, GCAT 2021, 2021, pp. 1–8, doi: 10.1109/GCAT52182.2021.9587459.

[11] Y. Kang, Z. Cai, C. W. Tan, Q. Huang, and H. Liu, “Natural language processing (NLP) in management research: a literature review,” Journal of Management Analytics, vol. 7, no. 2, pp. 139–172, 2020, doi: 10.1080/23270012.2020.1756939.

[12] N. Malik, A. Bilal, M. Ilyas, S. Razzaq, F. Maqbool, and Q. Abbas, “Plagiarism detection using natural language processing techniques,” Technical Journal, University of Engineering and Technology (UET), vol. 26, no. 1, pp. 2313–7770, 2021.

[13] A. Hizqil and Y. Ruldeviani, “Sentiment analysis of online licensing service quality in the energy and mineral resources sector of the Republic of Indonesia,” Computer Science and Information Technologies, vol. 5, no. 1, pp. 63–71, 2024, doi: 10.11591/csit.v5i1.pp63-71.

[14] U. Mardatillah, W. B. Zulfikar, A. R. Atmadja, I. Taufik, and W. Uriawan, “Citation analysis on scientific articles using Cosine Similarity,” in Proceeding of 2021 7th International Conference on Wireless and Telematics, ICWT 2021, 2021, pp. 1–4, doi: 10.1109/ICWT52862.2021.9678402.

[15] A. Islam, E. Rahman, A. A. Chowdhury, and M. A. N. Mojumder, “A deep learning approach to detect plagiarism in Bengali textual content using similarity algorithms,” in Proceedings of IEEE InC4 2023-2023 IEEE International Conference on Contemporary Computing and Communications, 2023, vol. 1, pp. 1–5, doi: 10.1109/InC457730.2023.10262998.

[16] P. Y. Ristanti, A. P. Wibawa, and U. Pujianto, “Cosine Similarity for title and abstract of economic journal classification,” in Proceeding-2019 5th International Conference on Science in Information Technology: Embracing Industry 4.0: Towards Innovation in Cyber Physical System, ICSITech 2019, 2019, pp. 123–127, doi: 10.1109/ICSITech46713.2019.8987547.

[17] T. P. Rinjeni, A. Indriawan, and N. A. Rakhmawati, “Matching scientific article titles using Cosine Similarity and Jaccard Similarity algorithm,” Procedia Computer Science, vol. 234, pp. 553–560, 2024, doi: 10.1016/j.procs.2024.03.039.

[18] V. Nuipian and J. Chuaykhun, “Book recommendation system based on course descriptions using Cosine Similarity,” in ACM International Conference Proceeding Series, 2023, pp. 273–277, doi: 10.1145/3639233.3639335.

[19] R. Singh and S. Singh, “Text similarity measures in news articles by vector space model using NLP,” Journal of The Institution of Engineers (India): Series B, vol. 102, no. 2, pp. 329–338, 2021, doi: 10.1007/s40031-020-00501-5.

[20] A. S. Dina, A. B. Siddique, and D. Manivannan, “Effect of balancing data using synthetic data on the performance of machine learning classifiers for intrusion detection in computer networks,” IEEE Access, vol. 10, pp. 96731–96747, 2022, doi: 10.1109/ACCESS.2022.3205337.

[21] P. Mooijman, C. Catal, B. Tekinerdogan, A. Lommen, and M. Blokland, “The effects of data balancing approaches: a case study,” Applied Soft Computing, vol. 132, 2023, doi: 10.1016/j.asoc.2022.109853.

[22] H. Chen, S. Hu, R. Hua, and X. Zhao, “Improved naive Bayes classification algorithm for traffic risk management,” Eurasip Journal on Advances in Signal Processing, no. 1, 2021, doi: 10.1186/s13634-021-00742-6.

[23] Z. Liu, J. Zhu, X. Cheng, and Q. Lu, “Optimized algorithm design for text similarity detection based on artificial intelligence and natural language processing,” Procedia Computer Science, vol. 228, pp. 195–202, 2023, doi: 10.1016/j.procs.2023.11.023.

[24] S. W. Kim and J. M. Gil, “Research paper classification systems based on TF-IDF and LDA schemes,” Human-centric Computing and Information Sciences, vol. 9, no. 1, 2019, doi: 10.1186/s13673-019-0192-7.

[25] H. Yuan, Y. Tang, W. Sun, and L. Liu, “A detection method for android application security based on TF-IDF and machine learning,” PLoS ONE, vol. 15, pp. 1–19, 2020, doi: 10.1371/journal.pone.0238694.

[26] C. Padurariu and M. E. Breaban, “Dealing with data imbalance in text classification,” Procedia Computer Science, vol. 159, pp. 736–745, 2019, doi: 10.1016/j.procs.2019.09.229.


How to Cite

Enhancing Indonesian Scientific Article Management through Machine Learning and NLP. (2025). European Journals of Emerging Computer Vision and Natural Language Processing, 2(02), 29-44. https://www.parthenonfrontiers.com/index.php/ejecvnlp/article/view/451

Share Link