Multi Label Topic Classification for Hadith Bukhari in Indonesian Translation using Random Forest

  • Adhitia Wiraguna Telkom Indonesia
  • said al faraby Telkom University
  • Adiwijaya Adiwijaya

Abstract

Hadith is a mandatory thing to be studied and practiced by Muslims. There are many types of teachings
that humans can take by studying the hadith. To assist Muslims in studying the hadith, a multi label
classification system is needed to categorize Sahih Bukhari Hadi in Indonesian translation based on three
topics, namely prohibition, advice and information. In building a text classification system, there are various
classification methods that can be used, in this study using Random Forest (RF). The simplicity of the RF
algorithm and good ability to deal with high dimensional data, make RF a suitable method of text
classification. But, there is not widely known RF capability for the multi label classification. This study uses
the Problem Transformation approach method, namely Binary Relevance (BR) and Label Powerset (LP)
to adapt RF in building a multi label classification system. The results showed that the best hamming loss
performance obtained from a system that used BR and does not use stemming which is equal to 0,0663.
These results indicate that the BR method is better than the LP method in adapting the RF algorithm to
perform multi label classification of hadith data. This is happened because the BR method produces a
classification model of the number of labels in the hadith data and on the other hand, the transformation of
data from the use of LP makes the data are imbalanced.

Downloads

Download data is not yet available.
Published
2021-10-23
How to Cite
Wiraguna, A., faraby, said, & Adiwijaya, A. (2021, October 23). Multi Label Topic Classification for Hadith Bukhari in Indonesian Translation using Random Forest. Journal of Data Science and Its Applications, 4(1), 43 - 47. https://doi.org/https://doi.org/10.34818/jdsa.2021.4.70

Most read articles by the same author(s)