A data-driven text similarity measure based on classification algorithms

被引:0
|
作者
机构
[1] Cho, Su Gon
[2] Kim, Seoung Bum
来源
Kim, Seoung Bum (sbkim1@korea.ac.kr) | 1600年 / University of Cincinnati卷 / 24期
基金
新加坡国家研究基金会;
关键词
Application problems - Classification accuracy - Classification algorithm - Comparative experiments - Machine learning repository - Similarity measure - Text similarity - University of California;
D O I
暂无
中图分类号
学科分类号
摘要
Measuring text similarity has shown its fundamental utilization in various text mining application problems. This paper proposes a new method based on classification algorithms for measuring the similarity between two texts. Specifically, a sentence-term matrix that describes the frequency of terms that occur in a collection of sentences was created to measure the classification accuracy of two texts. Our idea is based on the fact that similar texts are difficult to distinguish from each other, which should lead to a low classification accuracy between similar texts. By doing comparative experiments on several widely used text similarity measures, analysis results with real data from the Machine Learning Repository at the University of California, Irvine demonstrate that the proposed method is able to achieve outperformed the other existing similarity measures across the entire range of term selection filters. © International Journal of Industrial Engineering.
引用
收藏
相关论文
共 50 条
  • [41] New image distortion measure based on a data-driven multisensor organization
    Martinez-Baena, J.
    Fdez-Valdivia, J.
    Garcia, J.A.
    Fdez-Vidal, Xose R.
    1998, Elsevier Sci Ltd, Exeter, United Kingdom (31)
  • [42] A DATA-DRIVEN VLSI ARRAY FOR ARBITRARY ALGORITHMS
    KOREN, I
    MENDELSON, B
    PELED, I
    SILBERMAN, GM
    COMPUTER, 1988, 21 (10) : 30 - 43
  • [43] Residual Useful Life Estimation by a Data-Driven Similarity-Based Approach
    Li, Ling L.
    Ma, Dong J.
    Li, Zhi G.
    QUALITY AND RELIABILITY ENGINEERING INTERNATIONAL, 2017, 33 (02) : 231 - 239
  • [44] Direct data-driven algorithms for multiscale mechanics
    Prume, E.
    Gierden, C.
    Ortiz, M.
    Reese, S.
    COMPUTER METHODS IN APPLIED MECHANICS AND ENGINEERING, 2025, 433
  • [45] Approximate Algorithms for Data-Driven Influence Limitation
    Medya, Sourav
    Silva, Arlei
    Singh, Ambuj
    IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2022, 34 (06) : 2641 - 2652
  • [46] Algorithms for data-driven ASR parameter quantization
    Filali, Karim
    Li, Xiao
    Bilmes, Jeff
    COMPUTER SPEECH AND LANGUAGE, 2006, 20 (04): : 625 - 643
  • [47] Data-driven algorithms for engine friction estimation
    Stotsky, A. A.
    PROCEEDINGS OF THE INSTITUTION OF MECHANICAL ENGINEERS PART D-JOURNAL OF AUTOMOBILE ENGINEERING, 2007, 221 (D7) : 901 - 909
  • [48] Diagnosing bias in data-driven algorithms for healthcare
    Wiens, Jenna
    Price, W. Nicholson
    Sjoding, Michael W.
    NATURE MEDICINE, 2020, 26 (01) : 25 - 26
  • [49] Data-driven educational algorithms pedagogical framing
    Dominguez Figaredo, Daniel
    RIED-REVISTA IBEROAMERICANA DE EDUCACION A DISTANCIA, 2020, 23 (02): : 65 - 84
  • [50] Data-driven AI algorithms for construction machinery
    Liang, Ke
    Zhao, Jiahao
    Zhang, Zhiqing
    Guan, Wei
    Pan, Mingzhang
    Li, Mantian
    AUTOMATION IN CONSTRUCTION, 2024, 167