Chandra Sekhar Sanaboina

N-gram-based machine learning approach for bot or human detection from text messages

  • Authors Details :  
  • Chandra Sekhar Sanaboina

Journal title : 2022 6th International Conference on Intelligent Systems, Metaheuristics & Swarm Intelligence

Publisher : ACM

440 Views Conference

Social bots are computer programs created for automating general human activities like the generation of messages. The rise of bots in social network platforms has led to malicious activities such as content pollution like spammers or malware dissemination of misinformation. Most of the researchers focused on detecting bot accounts in social media platforms to avoid the damages done to the opinions of users. In this work, n-gram based approach is proposed for a bot or human detection. The content-based features of character n-grams and word n-grams are used. The character and word n-grams are successfully proved in various authorship analysis tasks to improve accuracy. A huge number of n-grams is identified after applying different pre-processing techniques. The high dimensionality of features is reduced by using a feature selection technique of the Relevant Discrimination Criterion. The text is represented as vectors by using a reduced set of features. Different term weight measures are used in the experiment to compute the weight of n-grams features in the document vector representation. Two classification algorithms, Support Vector Machine, and Random Forest are used to train the model using document vectors. The proposed approach was applied to the dataset provided in PAN 2019 competition bot detection task. The Random Forest classifier obtained the best accuracy of 0.9456 for bot/human detection.

Article DOI & Crossmark Data

DOI : https://doi.org/10.1145/3533050.3533063

Article Subject Details


Article Keywords Details



Article File

Full Text PDF


Article References

  • (1). 10.1145/2818717
  • (2). Zakaria el Hjouji D. Scott Hunter N.G.d.M.T.Z.: The impact of bots on opinions in social networks. In: arXiv preprint arXiv:1810.12398 (2018) Zakaria el Hjouji D. Scott Hunter N.G.d.M.T.Z.: The impact of bots on opinions in social networks. In: arXiv preprint arXiv:1810.12398 (2018)
  • (3). Fernquist , J. , Kaati , L. : Online monitoring of large events . In: 2019 IEEE International Conference on Intelligence and Security Informatics (ISI). IEEE (2018) Fernquist, J., Kaati, L.: Online monitoring of large events. In: 2019 IEEE International Conference on Intelligence and Security Informatics (ISI). IEEE (2018)
  • (4). Johan Fernquist , “ A Four Feature Types Approach for DetectingBot and Gender of Twitter Users ”, Notebook for PAN at CLEF 2019 Johan Fernquist, “A Four Feature Types Approach for DetectingBot and Gender of Twitter Users”, Notebook for PAN at CLEF 2019
  • (5). 10.1145/2556609
  • (6). Davis , C.A. , Varol , O. , Ferrara , E. , Flammini , A. , Menczer , F. : Botornot: A system to evaluate social bots . In: Proceedings of the 25th International Conference Companion on World Wide Web. pp. 273–274. WWW ’16 Companion, International World Wide Web Conferences Steering Committee, Republic and Canton of Geneva , Switzerland ( 2016 ), https://doi.org/10.1145/2872518.2889302 Davis, C.A., Varol, O., Ferrara, E., Flammini, A., Menczer, F.: Botornot: A system to evaluate social bots. In: Proceedings of the 25th International Conference Companion on World Wide Web. pp. 273–274. WWW ’16 Companion, International World Wide Web Conferences Steering Committee, Republic and Canton of Geneva, Switzerland (2016), https://doi.org/10.1145/2872518.2889302
  • (7). Dickerson , J.P. , Kagan , V. , Subrahmanian , V.S. : Using sentiment to detect bots on twitter: Are humans more opinionated than bots? In: Proceedings of the 2014 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining. pp. 620– 627 . ASONAM ’14, IEEE Press, Piscataway, NJ, USA (2014), http://dl.acm.org/citation.cfm?id=3191835.3191957 Dickerson, J.P., Kagan, V., Subrahmanian, V.S.: Using sentiment to detect bots on twitter: Are humans more opinionated than bots? In: Proceedings of the 2014 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining. pp. 620–627. ASONAM ’14, IEEE Press, Piscataway, NJ, USA (2014), http://dl.acm.org/citation.cfm?id=3191835.3191957
  • (8). 10.1109/TDSC.2012.75
  • (9). Shaina Ashraf , Omer Javed, Muhammad Adeel , Haider Ali Rao Muhammad Adeel Nawab , “ Bots and Gender Prediction Using Language Independent Stylometry-Based Approach ”, Notebook for PAN at CLEF 2019 Shaina Ashraf, Omer Javed, Muhammad Adeel, Haider Ali Rao Muhammad Adeel Nawab, “Bots and Gender Prediction Using Language Independent Stylometry-Based Approach”, Notebook for PAN at CLEF 2019
  • (10). Lee , K. , Eoff , B.D. , Caverlee , J. : Seven months with the devils: a long-term study of content polluters on twitter. In : In AAAI Int'l Conference on Weblogs and Social Media (ICWSM ( 2011 ) Lee, K., Eoff, B.D., Caverlee, J.: Seven months with the devils: a long-term study of content polluters on twitter. In: In AAAI Int'l Conference on Weblogs and Social Media (ICWSM (2011)
  • (11). Andrea Bacciu , Massimo La Morgia, Alessandro Mei , Eugenio Nerio Nemmi, Valerio Neri , and Julinda Stefa , “ Bot and Gender Detection of Twitter Accounts Using Distortion and LSA ”, Notebook for PAN at CLEF 2019 Andrea Bacciu, Massimo La Morgia, Alessandro Mei, Eugenio Nerio Nemmi, Valerio Neri, and Julinda Stefa, “Bot and Gender Detection of Twitter Accounts Using Distortion and LSA”, Notebook for PAN at CLEF 2019
  • (12). Flóra Bolonyai , Jakab Buda, Eszter Katona , “ Bot Or Not: A Two-Level Approach In Author Profiling ”, Notebook for PAN at CLEF 2019 Flóra Bolonyai, Jakab Buda, Eszter Katona, “Bot Or Not: A Two-Level Approach In Author Profiling”, Notebook for PAN at CLEF 2019
  • (13). Alarifi , A. , Alsaleh , M. , Al-Salman , A. : Twitter turing test . Inf. Sci. 372(C) , 332–346 ( Dec 2016 ), https://doi.org/ 10 .1016/j.ins.2016.08.036 Alarifi, A., Alsaleh, M., Al-Salman, A.: Twitter turing test. Inf. Sci. 372(C), 332–346 (Dec 2016), https://doi.org/10.1016/j.ins.2016.08.036
  • (14). Daniel Yacob Espinosa , Helena Gómez-Adorno, and Grigori Sidorov , “ Bots and Gender Profiling using Character Bigrams ”, Notebook for PAN at CLEF 2019 Daniel Yacob Espinosa, Helena Gómez-Adorno, and Grigori Sidorov, “Bots and Gender Profiling using Character Bigrams”, Notebook for PAN at CLEF 2019
  • (15). Rangel , F. , Rosso , P. : Overview of the 7th Author Profiling Task at PAN 2019 : Bots and Gender Profiling. In : Cappellato, L., Ferro, N., Losada, D., Müller, H. (eds.) CLEF 2019 Labs and Workshops, Notebook Papers. CEUR-WS. org (Sep 2019) Rangel, F., Rosso, P.: Overview of the 7th Author Profiling Task at PAN 2019: Bots and Gender Profiling. In: Cappellato, L., Ferro, N., Losada, D., Müller, H. (eds.) CLEF 2019 Labs and Workshops, Notebook Papers. CEUR-WS.org (Sep 2019)
  • (16). A.-Z. Ala' M, A. A. Heidari , M. Habib , H. Faris , I. Aljarah , M. A. Hassonah , Salp chainbased optimization of support vector machines and feature weighting for medical diagnostic information systems , in: Evolutionary Machine Learning Techniques , Springer , 2020 , pp. 11– 34 A.-Z. Ala'M, A. A. Heidari, M. Habib, H. Faris, I. Aljarah, M. A. Hassonah, Salp chainbased optimization of support vector machines and feature weighting for medical diagnostic information systems, in: Evolutionary Machine Learning Techniques, Springer, 2020, pp. 11–34
  • (17). C. Cortes V. Vapnik Support-vector networks Machine learning 20 (3) (1995) 273–297. C. Cortes V. Vapnik Support-vector networks Machine learning 20 (3) (1995) 273–297.
  • (18). B. Scholkopf , A. J. Smola , Learning with kernels: support vector machines, regularization, optimization, and beyond , MIT press , 2001 . B. Scholkopf, A. J. Smola, Learning with kernels: support vector machines, regularization, optimization, and beyond, MIT press, 2001.
  • (19). 10.1023/A:1010933404324
  • (20). Van Der Maaten , L. , Postma , E. and Van den Herik , J. , 2009. Dimensionality reduction: a comparative. J Mach Learn Res, 10(66-71) , p. 13 . Van Der Maaten, L., Postma, E. and Van den Herik, J., 2009. Dimensionality reduction: a comparative. J Mach Learn Res, 10(66-71), p.13.
  • (21). Das , S. , 2001, June . Filters, wrappers and a boosting-based hybrid for feature selection . In Icml (Vol. 1 , pp. 74 - 81 ). Das, S., 2001, June. Filters, wrappers and a boosting-based hybrid for feature selection. In Icml (Vol. 1, pp. 74-81).
  • (22). Rehman , A. , Javed , K. , Babri , H. A. , & Saeed , M. ( 2015 ). Relative discrimination criterion–A novel feature ranking method for text data. Expert Systems with Applications, 42, 3670–3681 Rehman, A., Javed, K., Babri, H. A., & Saeed, M. (2015). Relative discrimination criterion–A novel feature ranking method for text data. Expert Systems with Applications, 42, 3670–3681
  • (23). 10.1145/361219.361220
  • (24). 10.1109/TPAMI.2008.110
  • (25). Liu , Y. , Loh , H. T. , & Sun , A. ( 2009 ). Imbalanced text classification: A term weighting approach. Expert Systems with Applications, 36 (1), 690–701 . http://doi.org/10. 1016/j.eswa.2007.10.042 Liu, Y., Loh, H. T., & Sun, A. (2009). Imbalanced text classification: A term weighting approach. Expert Systems with Applications, 36 (1), 690–701. http://doi.org/10. 1016/j.eswa.2007.10.042



More Article by Chandra Sekhar Sanaboina

Impact of mobility on power consumption in rpl

The main theme of this paper is to implement the mobility model in the cooja simulator and to investigate the impact of mobility on the performance of routing protocol over low pow...

N-gram-based machine learning approach for bot or human detection from text messages

Social bots are computer programs created for automating general human activities like the generation of messages. the rise of bots in social network platforms has led to malicious...