Effects on Time and Quality of Short Text Clustering during Real-Time Presentations

被引:9
|
作者
Fuentealba, Diego [1 ]
Lopez, Mario [2 ]
Ponce, Hector [3 ]
机构
[1] Univ Santiago Chile, VirtuaLab, Dept Ind Engn, Santiago, Chile
[2] Univ Santiago Chile, Dept Ind Engn, Santiago, Chile
[3] Univ Santiago Chile, Dept Accounting & Auditing, Fac Adm & Econ, Santiago, Chile
关键词
Silicon compounds; Blogs; Real-time systems; Clustering algorithms; Social networking (online); IEEE transactions; Visualization; Text Mining; TF-IDF; K-Means; Short Phrases; Short Text; Sentences; Clustering; Interactivity; OPTIMIZATION APPROACH; MODEL;
D O I
10.1109/TLA.2021.9475870
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Technologies for live presentations should consider users' capabilities to manage large amounts of data in real-time, particularly, exchanges of short texts (e.g., phrases). This study examines the effects on time and quality of text clustering algorithms applied to short, medium, and long size texts, and examines whether short text clustering shows a reasonable performance for live presentations. We run several simulations in which we varied the number of phrases (from 5 to 200) contained in each text type (long, medium, and short) and the number of generated clusters (from 2 to 10). The algorithms used were snowball steamers, TF-IDF, and K-means for clustering; and the text types were Reuters, 20 NewsGroup and an experimental data set, for the long, medium, and short size texts, respectively. The first result showed that text size had a large effect on the algorithms execution time, with the shortest average time for the short texts and longer average time for the longest texts. The second result showed that the number of phrases in each text type significantly predicts execution time but not the number of clusters generated by K-means. Inertia and purity measures were used to test the quality of the clusters generated. Text size, number of phrases and number of clusters predict inertia; showing the lowest inertia for the short texts. Purity measures were like previously reported results for all text types. Thus, clustering algorithms for short texts can confidently be used in real-time presentations.
引用
收藏
页码:1391 / 1399
页数:9
相关论文
共 50 条
  • [31] A Text Categorization System with Soft Real-Time Guarantee
    WANG Hua-yong
    WuhanUniversityJournalofNaturalSciences, 2006, (01) : 226 - 229
  • [32] A real-time arbitrary-shape text detector
    Lu, Manhuai
    Li, Langlang
    Chen, Chin-Ling
    PLOS ONE, 2024, 19 (04):
  • [33] Real-time text information extraction from videos
    Ou, Guobin
    Zhang, Li
    Xie, Pan
    Qinghua Daxue Xuebao/Journal of Tsinghua University, 2002, 42 (07): : 869 - 872
  • [34] Real-Time Scene Text Detection with Differentiable Binarization
    Liao, Minghui
    Wan, Zhaoyi
    Yao, Cong
    Chen, Kai
    Bai, Xiang
    THIRTY-FOURTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, THE THIRTY-SECOND INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE CONFERENCE AND THE TENTH AAAI SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2020, 34 : 11474 - 11481
  • [35] Assuring quality and waiting time in real-time spatial crowdsourcing
    Wu, Zhibin
    Peng, Lijie
    Xiang, Chuankai
    DECISION SUPPORT SYSTEMS, 2023, 164
  • [36] EFFECT OF EXAMINATION TIME OF DAY ON THE QUALITY OF REAL-TIME CHOLESONOGRAMS
    HESS, ML
    CUNNINGHAM, JJ
    AMERICAN JOURNAL OF ROENTGENOLOGY, 1984, 143 (02) : 251 - 253
  • [37] Real-Time Classification of Real-Time Communications
    Perna, Gianluca
    Markudova, Dena
    Trevisan, Martino
    Garza, Paolo
    Meo, Michela
    Munafo, Maurizio Matteo
    Carofiglio, Giovanna
    IEEE TRANSACTIONS ON NETWORK AND SERVICE MANAGEMENT, 2022, 19 (04): : 4676 - 4690
  • [38] Real-time application clustering in wide area networks
    Takyi, Kate
    Bagga, Amandeep
    COMPUTERS & ELECTRICAL ENGINEERING, 2020, 85 (85)
  • [39] Stochastic clustering and pattern matching for real-time geosteering
    Wu, Mingqi
    Miao, Yinsen
    Panchal, Neilkunal
    Kowal, Daniel R.
    Vannucci, Marina
    Vila, Jeremy
    Liang, Faming
    GEOPHYSICS, 2019, 84 (05) : ID13 - ID24
  • [40] Automatic clustering method for real-time construction simulation
    Hung, Wei-Han
    Kang, Shih-Chung Jessy
    ADVANCED ENGINEERING INFORMATICS, 2014, 28 (02) : 138 - 152