Arabic text data mining: A root-based hierarchical indexing model

被引:0
|
作者
Eldos, T.M. [1 ]
机构
[1] Department of Computer Engineering, Fac. of Comp./Information Technology, Jordan Univ. of Sci. and Technology, Irbid 22110-3030, Jordan
来源
关键词
Digital libraries - Indexing (of information) - Information retrieval - Linguistics;
D O I
10.1080/02286203.2003.11442267
中图分类号
学科分类号
摘要
The world has recently witnessed a tremendous growth in the volume of text documents available on the Internet, digital libraries, news sources, and company-wide intranets. Text data mining, as a multidisciplinary field involving information retrieval, text analysis, information extraction, clustering, categorization, linguistics, database technology, machine learning, and data mining, is becoming more significant, and efforts have been intensified in studies like information retrieval, practical applications of which are becoming more and more necessary to end users and to the scientific community itself, in order to fetch the increasingly available information efficiently. In the past few years, not only have new documents been produced directly in digital form, thus being suitable for automatic indexing, but also many of the older documents have been ported from their physical medium to the digital one. The meaning of a document is represented by a vector of features, which are weighted according to a measure that best estimate relevance. Text categorization presents unique challenges due to the large number of attributes present in the data set, large number of training samples, and attributes dependencies. This article focuses on speeding up the information retrieval process in Arabic document base by using a root-based hierarchical indexing model. Simulation results demonstrated that speed gain in the range of 50-100 can be achieved for typical queries.
引用
收藏
页码:158 / 166
相关论文
共 50 条
  • [31] Text and image compression based on data mining perspective
    Oswald C.
    Sivaselvan B.
    Data Science Journal, 2018, 17
  • [33] HMATC: Hierarchical multi-label Arabic text classification model using machine learning
    Aljedani, Nawal
    Alotaibi, Reem
    Taileb, Mounira
    EGYPTIAN INFORMATICS JOURNAL, 2021, 22 (03) : 225 - 237
  • [34] Text Classification Based on a Novel Bayesian Hierarchical Model
    Zhou, Shibin
    Li, Kan
    Liu, Yushu
    FIFTH INTERNATIONAL CONFERENCE ON FUZZY SYSTEMS AND KNOWLEDGE DISCOVERY, VOL 2, PROCEEDINGS, 2008, : 218 - 221
  • [35] Scene text detection method based on the hierarchical model
    Zhou, Gang
    Liu, Yuehu
    Xu, Liang
    Jia, Zhenhong
    IET COMPUTER VISION, 2015, 9 (04) : 500 - 510
  • [36] Hierarchical Density-Based Clustering based on GPU Accelerated Data Indexing Strategy
    Melo, Danilo
    Toledo, Savyo
    Mourao, Fernando
    Sachetto, Rafael
    Andrade, Guilherme
    Ferreira, Renato
    Parthasarathy, Srinivasan
    Rocha, Leonardo
    INTERNATIONAL CONFERENCE ON COMPUTATIONAL SCIENCE 2016 (ICCS 2016), 2016, 80 : 951 - 961
  • [37] OBJECT ORIENTED AGGLOMERATIVE HIERARCHICAL CLUSTERING MODEL IN DATA MINING
    Yesilbudak, Mehmet
    Kahraman, Hamdi Tolga
    Karacan, Hacer
    JOURNAL OF THE FACULTY OF ENGINEERING AND ARCHITECTURE OF GAZI UNIVERSITY, 2011, 26 (01): : 27 - 39
  • [38] REST-for-Physics, a ROOT-based framework for event oriented data analysis and combined Monte Carlo response
    Altenmuller, Konrad
    Cebrian, Susana
    Dafni, Theopisti
    Diez-Ibanez, David
    Galan, Javier
    Galindo, Javier
    Antonio Garcia, Juan
    Irastorza, Igor G.
    Luzon, Gloria
    Margalejo, Cristina
    Mirallas, Hector
    Obis, Luis
    Perez, Oscar
    Han, Ke
    Ni, Kaixiang
    Bedfer, Yann
    Biasuzzi, Barbara
    Ferrer-Ribas, Esther
    Neyret, Damien
    Papaevangelou, Thomas
    Cogollos, Cristian
    Picatoste, Eduardo
    COMPUTER PHYSICS COMMUNICATIONS, 2022, 273
  • [39] Logistics Policy Evaluation Model Based on Text Mining
    Miao, Guangwei
    Wang, Shuaiqi
    Cui, Chengyou
    WEB AND BIG DATA, 2021, 1505 : 105 - 116
  • [40] A Model for Traffic Management based on Text Mining Techniques
    Naguib, Ahmed Ibrahim
    Abdel-Galil, Hala
    AbdelGaber, Sayed
    INTERNATIONAL JOURNAL OF ADVANCED COMPUTER SCIENCE AND APPLICATIONS, 2020, 11 (12) : 690 - 698