2022, 2024
William Mattingly
- Postdoctoral Fellow
- Smithsonian Institution
Abstract
Project Narrative - This project uses machine learning (ML) models to extract data from an archive of anti-apartheid solidarity letters predominantly written by Black South African women. This project intends to utilize newly developed optical character recognition (OCR) and handwritten text recognition (HTR) methods to render images of handwritten letters into machine readable text. Once processed, we will then train custom ML models to produce triplets, meaning two or more nouns related via a verb that indicate a qualitative relationship between two categories of data. A knowledge base derived from entity triplets will permit us to better understand the lives, struggles and contributions of Black women in South Africa by collecting data on relations embedded in their own words.
Abstract
The Personal Writes the Political (PWP) is a digital humanities project that applies advanced machine learning (ML) models to anti-apartheid solidarity letters predominantly authored by Black South African women. We created a software pipeline called Careful Recall (Caracal) which automates the transcription of handwritten materials into machine readable text and conducts named entity recognition to distinguish between private identifying information from relevant research data in highly sensitive handwritten archives. We will use Caracal to extract modest datasets from selections of thematically united letters we call focal clusters. In addition we will conduct skills transfer to this collections' home archive, the Mayibuye Centre Archive.