Understanding Genre in a Collection of a Million Volumes


Information about genre makes large digital collections much more useful, but is largely missing in existing metadata. “Understanding Genre” makes it possible to recognize genre algorithmically, and shows that a digital approach has several important advantages: it allows classification schemes to be modified at will, and allows membership in a genre to be a matter of degree rather than a hard boundary. This project will develop software that can classify HathiTrust collections by genre, drawing on existing machine learning research while also modifying it to fit this specific domain (e.g., recognizing that genre classifications need to change continuously across the timeline). The resulting software, which will be available for other scholars, will help develop a book on nineteenth century literary history.