2026
Troy E. Spier
- Assistant Professor
- Florida A&M University
Abstract
There are approximately seven thousand languages spoken on this planet, two thousand of which are found on the African continent. Of the world’s languages, however, only approximately two hundred are considered to be sufficiently documented (i.e. with a descriptive grammar, a dictionary, and a collection of texts). This project focuses on CiBemba, which is spoken in Zambia and the D.R. Congo. Two weeks of research at the Herskovits Library of African Studies resulted in the manual scanning of nearly three thousand pages of literature in CiBemba, none of which is easily accessible outside Zambia. To this end, this project focuses on the preprocessing and rendering of these scanned texts into computationally-readable data to be used in the development of a searchable, part-of-speech-tagged corpus of CiBemba, which will aid in readability, linguistic, and computational studies on the language.