Measuring the Novelty of Natural Language Text using the Conjunctive Clauses of a Tsetlin Machine Text Classifier

被引:10
|
作者
Bhattarai, Bimal [1 ]
Granmo, Ole-Christoffer [1 ]
Jiao, Lei [1 ]
机构
[1] Univ Agder, Dept Informat & Commun Technol, Grimstad, Norway
关键词
Novelty Detection; Deep Learning; Interpretable; Tsetlin Machine;
D O I
10.5220/0010382204100417
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Most supervised text classification approaches assume a closed world, counting on all classes being present in the data at training time. This assumption can lead to unpredictable behaviour during operation, whenever novel, previously unseen, classes appear. Although deep learning-based methods have recently been used for novelty detection, they are challenging to interpret due to their black-box nature. This paper addresses interpretable open-world text classification, where the trained classifier must deal with novel classes during operation. To this end, we extend the recently introduced Tsetlin machine (TM) with a novelty scoring mechanism. The mechanism uses the conjunctive clauses of the TM to measure to what degree a text matches the classes covered by the training data. We demonstrate that the clauses provide a succinct interpretable description of known topics, and that our scoring mechanism makes it possible to discern novel topics from the known ones. Empirically, our TM-based approach outperforms seven other novelty detection schemes on three out of five datasets, and performs second and third best on the remaining, with the added benefit of an interpretable propositional logic-based representation.
引用
收藏
页码:410 / 417
页数:8
相关论文
共 50 条
  • [31] Natural Language Processing System for Text Classification Corpus Based on Machine Learning
    Su, Yawen
    ACM TRANSACTIONS ON ASIAN AND LOW-RESOURCE LANGUAGE INFORMATION PROCESSING, 2024, 23 (08)
  • [32] MACHINE INDEXING AND THESAURUS CONSTRUCTION FROM THE ANALYSIS OF NATURAL-LANGUAGE TEXT
    GENUARDI, MT
    PROCEEDINGS OF THE ASIS ANNUAL MEETING, 1991, 28 : 335 - 335
  • [33] MACHINE RECOGNITION OF PRINTED ARABIC TEXT UTILIZING NATURAL-LANGUAGE MORPHOLOGY
    AMIN, A
    ALFEDAGHI, S
    INTERNATIONAL JOURNAL OF MAN-MACHINE STUDIES, 1991, 35 (06): : 769 - 788
  • [34] "What Does My Classifier Learn?" A Visual Approach to Understanding Natural Language Text Classifiers
    Winkler, Jonas Paul
    Vogelsang, Andreas
    NATURAL LANGUAGE PROCESSING AND INFORMATION SYSTEMS, NLDB 2017, 2017, 10260 : 468 - 479
  • [35] MACHINE SEARCHING OF NATURAL LANGUAGE TEXT VS MANUAL SEARCHING OF PRINTED INDEXES
    KALIKOW, AK
    SLOANE, EM
    RICHMAN, S
    ABSTRACTS OF PAPERS OF THE AMERICAN CHEMICAL SOCIETY, 1972, 164 (AUG-S): : 9 - &
  • [36] Measuring Short Text Reuse for the Urdu Language
    Sameen, Sara
    Sharjeel, Muhammad
    Nawab, Rao Muhammad Adeel
    Rayson, Paul
    Muneer, Iqra
    IEEE ACCESS, 2018, 6 : 7412 - 7421
  • [37] A programming system for text-editing task using Japanese natural language text and direct manipulation
    Kaneko, Nozomu
    Onisawa, Takehisa
    2006 IEEE INTERNATIONAL CONFERENCE ON SYSTEMS, MAN, AND CYBERNETICS, VOLS 1-6, PROCEEDINGS, 2006, : 1915 - +
  • [38] Indian Sign Language to Forecast Text using Leap Motion Sensor and RF Classifier
    Chavan, Poonam
    Ghorpade, Tushar
    Padiya, Puja
    2016 SYMPOSIUM ON COLOSSAL DATA ANALYSIS AND NETWORKING (CDAN), 2016,
  • [39] Classifying Arabic Text Using KNN Classifier
    Al-Badarenah, Amer
    Al-Shawakfa, Emad
    Al-Rababah, Khaleel
    Shatnawi, Safwan
    Bani-Ismail, Basel
    INTERNATIONAL JOURNAL OF ADVANCED COMPUTER SCIENCE AND APPLICATIONS, 2016, 7 (06) : 259 - 268
  • [40] Machine learning for Asian language text classification
    Peng, Fuchun
    Huang, Xiangji
    JOURNAL OF DOCUMENTATION, 2007, 63 (03) : 378 - 397