Machine Learning Applications in Digital Humanities
Designing a Semi automated Subject Indexing System for a Low resource Domain
Abstract
This research study explores the potential of machine learning tools and techniques to organize knowledge objects pertaining to various aspects of the gender spectrum (LGBTQIA+) in order to address the low-resource features of the LGBTQIA+ knowledge domain in Indian libraries. It aims to develop a semi-automated subject indexing system using an open source machine learning framework (Annif) and deploying the Homosaurus, a domain-specific vocabulary system. It develops programmatically a comprehensive training dataset from open-access bibliographic data sources with the help of data carpentry tools and NLP services from OpenAI. The study also measures the efficiencies of the automated indexing framework and investigates the potential for widespread adoption of a REST/API call-based approach for rapid indexing of a substantial number of records related to the LGBTQIA+ domain.
Except where otherwise noted, the Articles on this site are licensed under Creative Commons License: CC Attribution-Noncommercial-No Derivative Works 2.5 India