Automated Multi Label Classification on Fertilizer Themed Patent Documents in Indonesia

Aris Yaman; Bagus Sartono; Ariani Indrawati; Yulia Aris Kartika; Agus M. Soleh

doi:10.14429/djlit.42.4.17733

Authors

Aris Yaman National Research, and Innovation Agency (BRIN), Indonesia https://orcid.org/0000-0002-0305-9054
Bagus Sartono Department Statistics and Data Science at IPB University, Indonesia https://orcid.org/0000-0003-1115-4737
Ariani Indrawati National Research, and Innovation Agency (BRIN), Indonesia https://orcid.org/0000-0002-1387-9419
Yulia Aris Kartika National Research, and Innovation Agency (BRIN), Indonesia https://orcid.org/0000-0003-2883-7585
Agus M. Soleh Department Statistics and Data Science at IPB University, Indonesia https://orcid.org/0000-0002-2732-1985

DOI:

https://doi.org/10.14429/djlit.42.4.17733

Keywords:

Topic modeling, Multi-label classification, Patent document, LDA, ML-KNN, CC-KNN

Abstract

Patent literature research has a high scientific value for the industrial, commercial, legal, and policymaking communities. Therefore, patent analysis has become crucial. Patent topic classification is an important process in patent topic modeling analysis. However, the classification process is time-consuming and expensive, as it is usually carried out manually by an expert. Moreover, a patent document may be categorised in more than one category or label, further complicating the task. As the number of patent documents submitted increases, creating an automated patent classification system that yields accurate results becomes increasingly critical. Therefore, in this paper, we analyse the performance of two algorithms with regard to multi-label classification in patent documents: multi-label k-nearest neighbor (ML-KNN) and classifier chain k-nearest neighbor (CC-KNN), combined with latent Dirichlet allocation (LDA). These two methods have a considerable advantage in handling the continuously updated dataset; they also exhibit superior performance compared to other multi-label learning algorithms. This study also compares these two algorithms with the term frequency (TF)-weighting measure. The optimal value obtained is based on the following evaluation parameters: micro F1, accuracy, Hamming loss, and one error. The result shows that the ML-KNN method is better than the CC-KNN method and that the multi-label classification based on topics (patent LDA) is better than the TF-weighting technique.

Automated Multi Label Classification on Fertilizer Themed Patent Documents in Indonesia

Authors

DOI:

Keywords:

Abstract

Downloads

Published

How to Cite

Issue

Section

License

Make a Submission

Information

Announcements