MACHINE LEARNING ALGORITHMS FOR MULTILINGUAL TEXT CLASSIFICATION OF NIGERIAN LOCAL LANGUAGES


Many Natural Language Processing tasks have seen significant advances thanks to multilingual text classification models in a range of languages. Due to the rising diversity in internet access from non-English speakers, the idea of multilingualism comes into play. However, developing a multilingual text classification model can become a very difficult task when it comes to the phase of choosing the machine learning algorithm to use.
A multilingual text classification models was created. A total of 3 models were trained in three languages, English, Yoruba and Hausa model containing 200853,1967, 2917 articles headlines using Python with Tensor Flow. The highest accuracy for English model using BERT Transformer is 89%, The Yoruba version had the Long Short Term Memory (LSTM) 74% and the Hausa scored 88%
Bert Transformer outperformed LSTM and Naïve Bayes (NB). Although the BERT Yoruba model was overfitting. it has been proven that BERT Transformer has the best accuracy for multilingual text classification when compared to LSTM and NB. Surprisingly, it is also demonstrated that LSTM and NB can be viable options when selecting a text classification algorithm.