• Home
  • Browse
    • Current Issue
    • By Issue
    • By Author
    • By Subject
    • Author Index
    • Keyword Index
  • Journal Info
    • About Journal
    • Aims and Scope
    • Editorial Board
    • Publication Ethics
    • Peer Review Process
  • Guide for Authors
  • Submit Manuscript
  • Contact Us
 
  • Login
  • Register
Home Articles List Article Information
  • Save Records
  • |
  • Printable Version
  • |
  • Recommend
  • |
  • How to cite Export to
    RIS EndNote BibTeX APA MLA Harvard Vancouver
  • |
  • Share Share
    CiteULike Mendeley Facebook Google LinkedIn Twitter
Journal of Computing and Communication
arrow Articles in Press
arrow Current Issue
Journal Archive
Volume Volume 4 (2025)
Issue Issue 2
Issue Issue 1
Volume Volume 3 (2024)
Volume Volume 2 (2023)
Volume Volume 1 (2022)
Salah, A., Mahdi, M., badawy, A. (2025). Exploring Thematic Structures in Surah Al-Baqarah using TF-IDF, Dimensionality Reduction, and K-means Clustering. Journal of Computing and Communication, 4(2), 1-12. doi: 10.21608/jocc.2025.446634
Ahmad Salah; Mahmoud Mahdi; amro ali badawy. "Exploring Thematic Structures in Surah Al-Baqarah using TF-IDF, Dimensionality Reduction, and K-means Clustering". Journal of Computing and Communication, 4, 2, 2025, 1-12. doi: 10.21608/jocc.2025.446634
Salah, A., Mahdi, M., badawy, A. (2025). 'Exploring Thematic Structures in Surah Al-Baqarah using TF-IDF, Dimensionality Reduction, and K-means Clustering', Journal of Computing and Communication, 4(2), pp. 1-12. doi: 10.21608/jocc.2025.446634
Salah, A., Mahdi, M., badawy, A. Exploring Thematic Structures in Surah Al-Baqarah using TF-IDF, Dimensionality Reduction, and K-means Clustering. Journal of Computing and Communication, 2025; 4(2): 1-12. doi: 10.21608/jocc.2025.446634

Exploring Thematic Structures in Surah Al-Baqarah using TF-IDF, Dimensionality Reduction, and K-means Clustering

Article 1, Volume 4, Issue 2, July 2025, Page 1-12  XML PDF (1.47 MB)
Document Type: Original Article
DOI: 10.21608/jocc.2025.446634
View on SCiNiTO View on SCiNiTO
Authors
Ahmad Salah1; Mahmoud Mahdi1; amro ali badawy2
1Faculty of Computers and Informatics, Zagazig University
2Computer science ,faculity of computers and informatics ,zagazig univercity
Abstract
This research work delineates a computational investigation into the thematic structures of all  verses from Surah Al-Baqarah usingunsupervised semantic clustering. Given the unique character of Quranic Arabic, the data was pre-processed in a manner aligning with the specific text, starting with diacritic removal and utilizing standard Arabic and customized Quranic stopword lists. The verses were vectorized utilizing the Term Frequency-Inverse Document frequency (TF-IDF) and as such, representing the verses semantic content. Following vectorization, dimensionality reduction was then conducted using Truncated Singular Value Decomposition (SVD) along with Uniform Manifold Approximation and Projection (UMAP), primarily for purposes of visualization and then clustering. Next, K-means clustering was employed to segment the resulting UMAP embeddings into groupings of semantically similar verses. The quality of the clustering solutions ranging from (k=2 to 10) clusters were assessed utilizing three standard clustering metrics: Silhouette Score, Calinski-Harabasz Index, and Davies-Bouldin Index. Inclusively, from the data of these metrics a 10-cluster solution was identified as the optimal solution for this data utilizing a weighted average of each index using combination values of one third among the three metrics. The visualizations using UMAP project the verses into a 2D space and show how the different clustering configurations of 3, 5, 7 and 10 clusters provide a distinct separation in the thematic groups identified by the K-means algorithm. The 10-cluster solution was interpreted by examining the top TF-IDF terms and examples of verses in each cluster and assigned themes of 'Faith and Belief', 'Divine Law and Guidance', 'Family Relations', and 'Moral Conduct' as examples of themes observed in each group. This study illustrates the use of a TF-IDF, SVD, UMAP and K-means pipeline to quantitatively examine and visualize a representation of theme organization of the semantic structures of verses in Surah Al-Baqarah. 
Keywords
Quranic studies; topic modeling; clustering; digital humanities; natural language processing; sentiment analysis
References

References

 

[1] Abuzayed, M., and H. S. Al-Khalifa. "BERT for Arabic Topic Modeling: An Experimental Study on BERTopic Technique." Procedia Computer       Science 189 (2021): 191–194.

[2] Alangari, Haya, and Nahlah Algethami. "Exploring the Effects of Pre-Processing Techniques on Topic Modeling of an Arabic News Article Data Set." Applied Sciences 14, no. 23 (2024): 11350.

[3] Akhter, K. J., H. Farooq, and M. T. Siddique. "Topic Modeling of Quranic Verses Using Latent Dirichlet Allocation with English Language." VFAST Transactions on Software Engineering 12, no. 4 (2024): 239–251.

[4] Elnahas, Ayat, Nawal Elfishawy, Mohamed Nour, and Maha Tolba. "Machine learning and feature selection approaches for categorizing arabic text: Analysis, comparison, and proposal." The Egyptian Journal of Language Engineering 7, no. 2 (2020): 1-19.

[5] Al-Harbi, Omar. "A comparative study of feature selection methods for dialectal Arabic sentiment classification using support vector machine." arXiv preprint arXiv:1902.06242 (2019).

[6] Alshammeri, Menwa, Eric Atwell, and Mhd Ammar Alsalka. "Quranic topic modelling using paragraph vectors." In Intelligent Systems and Applications: Proceedings of the 2020 Intelligent Systems Conference (IntelliSys) Volume 2, pp. 218-230. Springer International Publishing, 2021.

[7] González-Baquero, William, Javier J. Amores, and Carlos Arcila-Calderón. "The Conversation around Islam on Twitter: Topic Modeling and Sentiment Analysis of Tweets about the Muslim Community in Spain since 2015." Religions 14, no. 6 (2023): 724.

[8] Un Nisa, Badar. "IMPROVING THE CONCEPT SEARCH OF QURAN USING LINGUISTIC SEMANTIC RESOURCES AND DEEP LEARNING." PhD diss., Faculty of Computer Software Engineering, National University of Sciences and Technology, 2019.

[9] Assiri, Adel, Ahmed Emam, and Hmood Aldossari. "Arabic sentiment analysis: a survey." International Journal of Advanced Computer Science and Applications 6, no. 12 (2015).

[10] Ghallab, Abdullatif, Abdulqader Mohsen, and Yousef Ali. "Arabic sentiment analysis: A systematic literature review." Applied Computational Intelligence and Soft Computing 2020, no. 1 (2020): 7403128.

[11] Soufan, Ayah. "Deep learning for sentiment analysis of Arabic text." In Proceedings of the ArabWIC 6th annual international conference research track, pp. 1-8. 2019.

[12] Alhanjouri, Mohammed. "Pre processing techniques for Arabic documents clustering." International Journal of Engineering and Management Research (IJEMR) 7, no. 2 (2017): 70-79.

[13] Hamza, Manar Ahmed Mohammed, Tarig Mohamed Ahmed, and Anwer Mustafa Mohamedsalih Hilal. "Text mining: A survey of Arabic root extraction algorithms." International Journal of Advanced and Applied Sciences 8, no. 1 (2021): 11-19.

       [14] Abdelrazek, Aly, Walaa Medhat, Eman Gawish, and Ahmed Hassan. "Topic modeling on arabic language dataset: comparative study."

In International Conference on Model and Data Engineering, pp. 61-71. Cham: Springer Nature Switzerland, 2022.

 [15] Wang, Haoriqin, Mingyang Jiang, Jianhong Qi, Xinhong Zhang, Qinghu Wang, Yuxin Zhou, Mingyu Bai, Lisha Liu, and Zhili Pei. "Application of Deep Learning in Text Mining." In 2014 International Conference on Mechatronics, Control and Electronic Engineering (MCE-14), pp. 361-364. Atlantis Press, 2014.

[16] Farghaly, Ali, and Khaled Shaalan. "Arabic natural language processing: Challenges and solutions." ACM Transactions on Asian Language Information Processing (TALIP) 8, no. 4 (2009): 1-22.

[17] Darwish, K., and W. Magdy. "Arabic Information Retrieval." Foundations and Trends in Information Retrieval 13, no. 5 (2019): 342–445.

[18] Ahmed, Majid Hameed, Sabrina Tiun, Nazlia Omar, and Nor Samsiah Sani. "Short text clustering algorithms, application and challenges: a survey." Applied Sciences 13, no. 1 (2022): 342.

[19] Alhawarat, Mohammad, and Mohamed Hegazi. "Revisiting k-means and topic modeling, a comparison study to cluster arabic documents." IEEE Access 6 (2018): 42740-42749.

[20] Al Qudah, Islam, Ibrahim Hashem, Abdelaziz Soufyane, Weisi Chen, and Tarek Merabtene. "Applying latent Dirichlet allocation technique to classify topics on sustainability using Arabic text." In Science and Information Conference, pp. 630-638. Cham: Springer International Publishing, 2022.

[21] Hamzaoui, Benamar, Djelloul Bouchiha, and Abdelghani Bouziane.
"A Comprehensive Survey on Arabic Text Classification: Progress, Challenges, and Techniques." Brazilian Journal of Technology 8, no. 1 (2025): 1–24.

[22] Alshargi, F., and A. Awajan. "Arabic Text Mining: Review and Future Directions." Procedia Computer Science 199 (2022): 185–192.

[23] M Alashqar, Abdelkareem. "A classification of Quran verses using deep learning." International Journal of Computing and Digital Systems 16, no. 1 (2024): 1041-1053.

[24] Sangaiah, Arun Kumar, Ahmed E. Fakhry, Mohamed Abdel-Basset, and Ibrahim El-henawy. "Arabic text clustering using improved clustering algorithms with dimensionality reduction." Cluster Computing 22, no. Suppl 2 (2019): 4535-4549.

[25] Almutairi, Tahani, Shireen Saifuddin, Reem Alotaibi, Shahendah Sarhan, and Sarah Nassif. "Preprocessing Techniques for Clustering Arabic Text: Challenges and Future Directions." International Journal of Advanced Computer Science & Applications 15, no. 8 (2024).

[26] Bashir, Muhammad Huzaifa, Aqil M. Azmi, Haq Nawaz, Wajdi Zaghouani, Mona Diab, Ala Al-Fuqaha, and Junaid Qadir. "Arabic natural language processing for Qur’anic research: a systematic review." Artificial Intelligence Review 56, no. 7 (2023): 6801-6854.

[27] El-Khair, Ibrahim Abu. "Effects of stop words elimination for Arabic information retrieval: a comparative study." arXiv preprint arXiv:1702.01925 (2017).

[28] Kelaiaia, Abdessalem, and Hayet Farida Merouani. "Clustering with probabilistic topic models on arabic texts: a comparative study of LDA and K-means." Int. Arab J. Inf. Technol. 13, no. 2 (2016): 332-338.

[29] Elnagar, Ashraf, Ridhwan Al-Debsi, and Omar Einea. "Arabic text classification using deep learning models." Information Processing & Management 57, no. 1 (2020): 102121.

[30] Badaro, Gilbert, Ramy Baly, Hazem Hajj, Wassim El-Hajj, Khaled Bashir Shaban, Nizar Habash, Ahmad Al-Sallab, and Ali Hamdi. "A survey of opinion mining in Arabic: A comprehensive system perspective covering challenges and advances in tools, resources, models, applications, and visualizations." ACM Transactions on Asian and Low-Resource Language Information Processing (TALLIP) 18, no. 3 (2019): 1-52.

Statistics
Article View: 22
PDF Download: 26
Home | Glossary | News | Aims and Scope | Sitemap
Top Top

Journal Management System. Designed by NotionWave.