Machine Learning Applications in Digital Humanities

Designing a Semi automated Subject Indexing System for a Low resource Domain

Keywords: Annif, Homosaurus, Inclusive librarianship, Large language model (OpenAI), Machine learning,, Retrieval metrics, LGBTQIA


This research study explores the potential of machine learning tools and techniques to organize knowledge objects pertaining to various aspects of the gender spectrum (LGBTQIA+) in order to address the low-resource features of the LGBTQIA+ knowledge domain in Indian libraries. It aims to develop a semi-automated subject indexing system using an open source machine learning framework (Annif) and deploying the Homosaurus, a domain-specific vocabulary system. It develops programmatically a comprehensive training dataset from open-access bibliographic data sources with the help of data carpentry tools and NLP services from OpenAI. The study also measures the efficiencies of the automated indexing framework and investigates the potential for widespread adoption of a REST/API call-based approach for rapid indexing of a substantial number of records related to the LGBTQIA+ domain.

How to Cite
Mitra, R., & Mukhopadhyay, P. (2023). Machine Learning Applications in Digital Humanities. DESIDOC Journal of Library & Information Technology, 43(04), 219-225.