Tokenizers for African Languages

AI Summary1 min read

TL;DR

African languages, as low-resource languages, face performance gaps in NLP due to tokenization issues. Tokenizers are crucial for improving NLP tasks in these languages.

Tokenizers for African Languages

Goodwill Erasmo Ndomba; Medard Edmund Mswahili; Young-Seob Jeong
https://doi.org/10.1109/ACCESS.2024.3522285
Volume 13

Despite incredible development in the field of natural language processing (NLP), there has been a huge gap in the performance of NLP tasks between high-resource languages (HRLs) and low-resource languages (LRLs). African languages belong mainly to the LRLs, and one of the major contributing factors to the performance gap is tokenization, which plays a crucial role in NLP performance in general. M...

Visit Website