Tools and Results in Collection-Driven Documentary Linguistics


ACLS Digital Innovation Fellowships


English Language and Literature


This project involves the creation of open source tools designed to facilitate collection-driven documentary linguistics, including a tool for the transcription, translation, annotation, time-coding, and metadata management of a large-scale audiovisual collection; and a cross-document multimedia concordancer enabling students, researchers, and community members to construct composite multimedia presentations from large-scale document collections. Although for general use, these tools are being tested on the Tibetan and Himalayan Digital Library's collection of Tibetan videos and associated transcripts. Along with a parsing tool capable of segmenting Tibetan words to a high degree of accuracy, this software suite is driving the creation of an innovative multimedia Tibetan dictionary and grammar illustrating the radically transformative effect of the shift from document-driven to collection-driven documentary linguistics.