Automated Multi Label Classification on Fertilizer Themed Patent Documents in Indonesia

Keywords: Topic modeling, Multi-label classification, Patent document, LDA, ML-KNN, CC-KNN



Patent literature research has a high scientific value for the industrial, commercial, legal, and policymaking communities. Therefore, patent analysis has become crucial. Patent topic classification is an important process in patent topic modeling analysis. However, the classification process is time-consuming and expensive, as it is usually carried out manually by an expert. Moreover, a patent document may be categorised in more than one category or label, further complicating the task. As the number of patent documents submitted increases, creating an automated patent classification system that yields accurate results becomes increasingly critical. Therefore, in this paper, we analyse the performance of two algorithms with regard to multi-label classification in patent documents: multi-label k-nearest neighbor (ML-KNN) and classifier chain k-nearest neighbor (CC-KNN), combined with latent Dirichlet allocation (LDA). These two methods have a considerable advantage in handling the continuously updated dataset; they also exhibit superior performance compared to other multi-label learning algorithms. This study also compares these two algorithms with the term frequency (TF)-weighting measure. The optimal value obtained is based on the following evaluation parameters: micro F1, accuracy, Hamming loss, and one error. The result shows that the ML-KNN method is better than the CC-KNN method and that the multi-label classification based on topics (patent LDA) is better than the TF-weighting technique.

How to Cite
Yaman, A., Sartono, B., Indrawati, A., Kartika, Y., & Soleh, A. M. (2022). Automated Multi Label Classification on Fertilizer Themed Patent Documents in Indonesia. DESIDOC Journal of Library & Information Technology, 42(4), 218-226.
Research Paper