A Comparative Study of Machine Learning, Natural Language Processing, and Hybrid Models for Academic Paper Acceptance Prediction: From Reviews to Decisions

Chandra Shekhar Pandey; Shriram Pandey; Tejash Pandey; Shweta Pandey; Harish Pandey; Patanjali Mishra

doi:10.14429/djlit.21138

Authors

Chandra Shekhar Pandey Department of Education, Mahatma Gandhi Antarrashtriya Hindi Vishwavidyalaya, Wardha - 442 001, India https://orcid.org/0000-0002-3660-1858
Shriram Pandey Department of Library and Information Science, Central University of Haryana, Mahendragarh - 123 031, India https://orcid.org/0000-0002-1690-6603
Tejash Pandey Guru Gobind Singh Indraprastha University, Delhi - 110 078, India https://orcid.org/0009-0003-0307-1779
Shweta Pandey GSV Central Library, Chhatrapati Shahu Ji Maharaj University, Kanpur - 208 024, India https://orcid.org/0000-0001-7705-3563
Harish Pandey Department of Education, Mahatma Gandhi Antarrashtriya Hindi Vishwavidyalaya, Wardha - 442 001, India https://orcid.org/0000-0002-8396-6446
Patanjali Mishra Department of Education, University of Allahabad, Allahabad – 211 002, India https://orcid.org/0000-0001-8163-3018

DOI:

https://doi.org/10.14429/djlit.21138

Keywords:

Peer review automation, Academic paper acceptance prediction, Machine learning models, Transformer based models, Hybrid models

Abstract

The exponential increase in submissions to top-tier conferences and journals has placed unprecedented strain on editorial systems. To address this challenge, the present study explores the potential of computational modelling for predicting paper acceptance decisions based on peer review content as textual input as well as confidence score and recommendation score as numerical input in the models. We utilised the PeerConf dataset by Hasan, et al. which contains 3,242 reviews across 1,236 papers. In the study we design and evaluate three modelling approaches, including traditional ML models, transformer-based and sentiment-integrated NLP models (BERT, DistilBERT), and a novel hybrid model incorporating structured features, textual inputs and sentiment within ML pipelines. We have used accuracy and F1 scores to capture and compare the predictive effectiveness of the models. Python 3.10 environment and scikit-learn library were used for machine learning models, and Hugging Face Transformers v4.x was used for transformer-based models. The study contributes to the understanding of how hybrid models compare with ML and NLP-based models and provide a viable solution to predict the paper acceptance decisions. All models were trained in a GPU-enabled environment using PyTorch and Scikit-learn. The study also suggests the viability of different approaches for designing editorial support systems. We found that hybrid models outperformed ML and sentiment-integrated NLP models with 83.51 % accuracy and an F1 score of 72.91 %.

A Comparative Study of Machine Learning, Natural Language Processing, and Hybrid Models for Academic Paper Acceptance Prediction

From Reviews to Decisions

Authors

DOI:

Keywords:

Abstract

Downloads

Published

How to Cite

Issue

Section

License

Make a Submission

Information

Announcements