Machine Learning Applications in Digital Humanities: Designing a Semi automated Subject Indexing System for a Low resource Domain

Roshni Mitra; Parthasarathi Mukhopadhyay

doi:10.14429/djlit.43.04.19227

Machine Learning Applications in Digital Humanities

Designing a Semi automated Subject Indexing System for a Low resource Domain

Authors

Roshni Mitra Department of Library and Information Science, University of Kalyani, Kalyani, Nadia, West Bengal- 741 235, India https://orcid.org/0000-0002-4182-5733
Parthasarathi Mukhopadhyay Department of Library and Information Science, University of Kalyani, Kalyani, Nadia, West Bengal-741235, India https://orcid.org/0000-0003-0717-9413

DOI:

https://doi.org/10.14429/djlit.43.04.19227

Keywords:

Annif, Homosaurus, Inclusive librarianship, Large language model (OpenAI), Machine learning,, Retrieval metrics, LGBTQIA

Abstract

This research study explores the potential of machine learning tools and techniques to organize knowledge objects pertaining to various aspects of the gender spectrum (LGBTQIA+) in order to address the low-resource features of the LGBTQIA+ knowledge domain in Indian libraries. It aims to develop a semi-automated subject indexing system using an open source machine learning framework (Annif) and deploying the Homosaurus, a domain-specific vocabulary system. It develops programmatically a comprehensive training dataset from open-access bibliographic data sources with the help of data carpentry tools and NLP services from OpenAI. The study also measures the efficiencies of the automated indexing framework and investigates the potential for widespread adoption of a REST/API call-based approach for rapid indexing of a substantial number of records related to the LGBTQIA+ domain.

Downloads

Published

2023-07-11

How to Cite

Mitra, R., & Mukhopadhyay, P. (2023). Machine Learning Applications in Digital Humanities: Designing a Semi automated Subject Indexing System for a Low resource Domain. DESIDOC Journal of Library & Information Technology, 43(04), 219–225. https://doi.org/10.14429/djlit.43.04.19227

Download Citation

Issue

Vol. 43 No. 04 (2023): Digital Humanities and Librarianship

Section

Research Paper

License

Except where otherwise noted, the Articles on this site are licensed under Creative Commons License: CC Attribution-Noncommercial-No Derivative Works 2.5 India