SpamML: An Efficient Framework for Detecting Spam Emails Using Machine Learning

AbdElminaam, Diaa S; Farouk, Maged; Shaker, Nashwa; Elrashidy, Omnia; Elazab, Reda

doi:10.21608/jocc.2025.411113

|
|
|
How to cite
RIS EndNote BibTeX APA MLA Harvard Vancouver
|
Share

	SpamML: An Efficient Framework for Detecting Spam Emails Using Machine Learning
Article 4, Volume 4, Issue 1, February 2025, Page 43-54 PDF (853.71 K)
Document Type: Original Article
DOI: 10.21608/jocc.2025.411113
View on SCiNiTO
Authors
Diaa s AbdElminaam ¹; Maged Farouk²; Nashwa Shaker²; Omnia Elrashidy²; Reda Elazab²
¹Department of Data Science , Faculty of Computer Science , Misr International University , Cairo , Egypt
²Department of Business Information Systems, Faculty of Business, Alamein International University, Alamein, Egypt
Abstract
Spam detection or anti-spam techniques are methods to identify and filter out unwanted, unsolicited, or malicious emails, commonly known as spam. These techniques aim to enhance email security, reduce the risk of phishing attacks, and improve the overall user experience. The prediction of spam emails falls under the broader email filtering or classification category. Specifically, it is a part of the field of machine learning and data mining, where techniques are employed to automatically categorize emails into different classes, such as "spam" or "non-spam" (ham). This process involves using various algorithms and features to analyze emails' content, structure, and metadata to determine whether they will likely be spam or legitimate messages. Our objective is to use Machine Learning to predict and identify simplistically whether the Email is Spam Or Not. It was concluded and considered that the two datasets we can use have many Machine Learning algorithms. The proposed algorithms were tested: k-nearest Neighbor, Gradient Boosting, Random Forest, Naïve Bayes, Decision Tree, and Logistic Regression. After rigorous testing, the only algorithm, Gradiant boosting, stayed dominant in most of the testing, achieving accuracies of 98.5%; also, the other dataset with the best algorithm was Gradiant boosting, which scored the highest accuracy in all the testing, which was 98.6%. As shown in this paper, Machine Learning algorithms, such as supervised or unsupervised models, are trained on datasets containing examples of both spam and legitimate emails. These models then use the learned patterns to classify incoming emails. Can adapt to new spam patterns, effectively handling complex relationships in data.
Keywords
Spam Email Prediction; Machine Learning; Classification; Naïve Bayes; Gradient Boosting; Linear Regression; K-Nearest Neighbor


References
[1] Karim, A., Azam, S., Shanmugam, B., Kannoorpatti, K., & Alazab, M. (2019). A comprehensive survey for intelligent spam email detection. IEEE Access, 7, 168261-168295.Olusanya, B. O., & Newton, V. E. (2007). Global burden of childhood hearing impairment and disease control priorities for developing countries. The Lancet, 369(9569), 1314-1317. [2] Delany, S. J., Buckley, M., & Greene, D. (2012). SMS spam filtering: Methods and data. Expert Systems with Applications, 39(10), 9899-9908.Stokoe Jr, W. C. (2005). Sign language structure: An outline of the visual communication systems of the American deaf. Journal of deaf studies and education, 10(1), 3-37. [3] Karim, A., Azam, S., Shanmugam, B., Kannoorpatti, K., & Alazab, M. (2019). A comprehensive survey for intelligent spam email detection. IEEE Access, 7, 168261-168295.Bungeroth, J., & Ney, H. (2004, May). Statistical sign language translation. In sign-lang@ LREC 2004 (pp. 105-108). European Language Resources Association (ELRA). [4] Rayan, A. (2022). Analysis of Email Spam Detection Using a Novel Machine Learning-Based Hybrid Bagging Technique. Computational Intelligence and Neuroscience, 2022.San-Segundo, R., Barra, R., Córdoba, R., D’Haro, L. F., Fernández, F., Ferreiros, J., ... & Pardo, J. M. (2008). Speech-to-sign language translation system for Spanish. Speech Communication, 50(11-12), 1009-1020. [5] Sharma, P., & Bhardwaj, U. (2018). Machine Learning-based Spam Email Detection—International Journal of Intelligent Engineering & Systems, 11(3). [6] Siddique, Z. B., Khan, M. A., Din, I. U., Almogren, A., Mohiuddin, I., & Nazir, S. (2021). Machine learning-based detection of spam emails. Scientific Programming, 2021, 1-11Arvanitis, N., Constantinopoulos, C., & Kosmopoulos, D. (2019, November). Translation of sign language glosses to text using sequence-to-sequence attention models. In 2019 15th International Conference on Signal-Image Technology & Internet-Based Systems (SITIS) (pp. 296-302). IEEE. [7] Awad, W. A., & ELseuofi, S. M. (2011). Machine learning methods for spam email classification. International Journal of Computer Science & Information Technology (IJCSIT), 3(1), 173-184.Stoll, S., Camgoz, N. C., Hadfield, S., & Bowden, R. (2020). Text2Sign: towards sign language production using neural machine translation and generative adversarial networks. International Journal of Computer Vision, 128(4), 891-908. [8] Sarju, S., & Thomas, R. (2014). Spam email detection using structural features. International Journal of Computer Applications, 89(3). [9] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A. N., ... & Polosukhin, I. (2017). Attention is all you need. Advances in neural information processing systems, 30. [10] Yüksel, A., Çankaya, Ş., & Üncü, I. (2017). Design of a machine learning-based predictive analytics system for spam problems. Acta Physica Polonica A, 132(3), 500-504. [11] Hair Jr, J., Black, W., Babin, B. and Anderson, R., Multivariate data analysis, Seventh Ed., Pearson, Harlow, UK, 2014.Klima, E. S., & Bellugi, U. (1979). The signs of language. Harvard University Press. [12] Fraley, C. and Raftery, A., How many clusters?. Which clustering method? Answers via model-based cluster analysis. The Computer Journal, 41(8), pp. 578-588, 1998. DOI: 10.1093/comjnl/41.8.578Moryossef, A., Yin, K., Neubig, G., & Goldberg, Y. (2021). Data augmentation for sign language gloss translation. arXiv preprint arXiv:2105.07476. [13] Li, X. M., & Kim, U. M. (2012, June). A hierarchical framework for content-based image spam filtering. In 8th International Conference on Information Science and Digital Content Technology (ICIDT) (pp. 149–155). Jeju [14] Dada, E. G., Bassi, J. S., Chiroma, H., Adetunmbi, A. O., & Ajibuwa, O. E. (2019). Machine learning for email spam filtering: review, approaches and open research problems. Heliyon, 5(6). [15] Christina, V., Karpagavalli, S., & Suganya, G. (2010). Email spam filtering using supervised machine learning techniques. International Journal on Computer Science and Engineering (IJCSE), 2(09), 3126-3129. [16] El Naqa, I., & Murphy, M. J. (2015). What is machine learning? (pp. 3-11). Springer International Publishing. [17] Cruz, J. A., & Wishart, D. S. (2006). Applications of machine learning in cancer prediction and prognosis. Cancer informatics, 2, 117693510600200030.
Statistics Article View: 205 PDF Download: 245