Dr. Nkechi Ifeanyi-reuben Profile Dr. Nkechi Ifeanyi-reuben

N-gram and k-nearest neighbour based igbo text classification model

  • Authors Details :  
  • Ifeanyi-reuben Nkechi J.,  
  • Odikwa Ndubuisi,  
  • Ugwu Chidiebere

591 Views Original Article

The evolution in Information Technology has gone a long way of bringing Igbo, one of the major Nigerian languages evolved. Some online service providers report news, publish articles and search with this language. The advancement will likely result to generation of huge textual data in the language, that needs to be organized, managed and classified efficiently for easy information access, extraction and retrieval by the end users. This work presents an enhanced model for Igbo text classification. The classification was based on N-gram and K-Nearest Neighbour techniques. Considering the peculiarities in Igbo language, N-gram model was adopted for the text representation. The text was represented with Unigram, Bigram and Trigram techniques. The classification of the represented text was done using the K-Nearest Neighbour technique. The model is implemented with the Python programming language together with the tools from Natural Language Toolkit (NLTK). The evaluation of the Igbo text classification system performance was done by calculating the recall, precision and F1-measure on N-gram represented text. The result shows text classification on bigram represented Igbo text has highest degree of exactness (precision); trigram has the lowest level of precision and result obtained with the three N-gram techniques has the same level of completeness (recall). Bigram text representation technique is extremely recommended for any text-based system in Igbo. This model can be adopted in text analysis, text mining, information retrieval, natural language processing and any intelligent text-based system in the language.

Article Subject Details


Article Keywords Details



Article File

Full Text PDF