Effects on Time and Quality of Short Text Clustering during Real-Time Presentations

被引:9
|
作者
Fuentealba, Diego [1 ]
Lopez, Mario [2 ]
Ponce, Hector [3 ]
机构
[1] Univ Santiago Chile, VirtuaLab, Dept Ind Engn, Santiago, Chile
[2] Univ Santiago Chile, Dept Ind Engn, Santiago, Chile
[3] Univ Santiago Chile, Dept Accounting & Auditing, Fac Adm & Econ, Santiago, Chile
关键词
Silicon compounds; Blogs; Real-time systems; Clustering algorithms; Social networking (online); IEEE transactions; Visualization; Text Mining; TF-IDF; K-Means; Short Phrases; Short Text; Sentences; Clustering; Interactivity; OPTIMIZATION APPROACH; MODEL;
D O I
10.1109/TLA.2021.9475870
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Technologies for live presentations should consider users' capabilities to manage large amounts of data in real-time, particularly, exchanges of short texts (e.g., phrases). This study examines the effects on time and quality of text clustering algorithms applied to short, medium, and long size texts, and examines whether short text clustering shows a reasonable performance for live presentations. We run several simulations in which we varied the number of phrases (from 5 to 200) contained in each text type (long, medium, and short) and the number of generated clusters (from 2 to 10). The algorithms used were snowball steamers, TF-IDF, and K-means for clustering; and the text types were Reuters, 20 NewsGroup and an experimental data set, for the long, medium, and short size texts, respectively. The first result showed that text size had a large effect on the algorithms execution time, with the shortest average time for the short texts and longer average time for the longest texts. The second result showed that the number of phrases in each text type significantly predicts execution time but not the number of clusters generated by K-means. Inertia and purity measures were used to test the quality of the clusters generated. Text size, number of phrases and number of clusters predict inertia; showing the lowest inertia for the short texts. Purity measures were like previously reported results for all text types. Thus, clustering algorithms for short texts can confidently be used in real-time presentations.
引用
收藏
页码:1391 / 1399
页数:9
相关论文
共 50 条
  • [41] Real-time inspection for real-time decisions
    Moran, T
    MANUFACTURING ENGINEERING, 2004, 133 (04): : 12 - 12
  • [42] REAL-TIME ADAPTIVE CLUSTERING OF FLOW CYTOMETRIC DATA
    FU, L
    YANG, M
    BRAYLAN, R
    BENSON, N
    PATTERN RECOGNITION, 1993, 26 (02) : 365 - 373
  • [43] Disatra: A Real-Time Distributed Abstract Trajectory Clustering
    Chen, Liang
    Chao, Pingfu
    Fang, Junhua
    Chen, Wei
    Xu, Jiajie
    Zhao, Lei
    WEB INFORMATION SYSTEMS ENGINEERING - WISE 2021, PT I, 2021, 13080 : 619 - 635
  • [44] Real-time statistical clustering for event trace reduction
    Department of Computer Science, Univ. Illinois at Urbana-Champaign, Urbana, IL 61801, United States
    不详
    不详
    不详
    不详
    不详
    不详
    不详
    Int J Supercomput Appl High Perform Comput, 2 (144-159):
  • [45] A Clustering Framework for Real-Time Rendering of Tree Foliage
    Rebollo, C.
    Remolar, I.
    Chover, M.
    Gumbau, J.
    Ripolles, O.
    JOURNAL OF COMPUTERS, 2007, 2 (04) : 57 - 67
  • [46] Framework for real-time clustering over sliding windows
    Badiozamany, Sobhan
    Orsborn, Kjell
    Risch, Tore
    28TH INTERNATIONAL CONFERENCE ON SCIENTIFIC AND STATISTICAL DATABASE MANAGEMENT (SSDBM) 2016), 2016,
  • [47] ADWICE - Anomaly detection with real-time incremental clustering
    Burbeck, K
    Nadjm-Tehrani, S
    INFORMATION SECURITY AND CRYPTOLOGY - ICISC 2004, 2004, 3506 : 407 - 424
  • [48] Real-time statistical clustering for event trace reduction
    Nickolayev, OY
    Roth, PC
    Reed, DA
    INTERNATIONAL JOURNAL OF SUPERCOMPUTER APPLICATIONS AND HIGH PERFORMANCE COMPUTING, 1997, 11 (02): : 144 - 159
  • [49] Real-Time Superpixel Segmentation by DBSCAN Clustering Algorithm
    Shen, Jianbing
    Hao, Xiaopeng
    Liang, Zhiyuan
    Liu, Yu
    Wang, Wenguan
    Shao, Ling
    IEEE TRANSACTIONS ON IMAGE PROCESSING, 2016, 25 (12) : 5933 - 5942
  • [50] A Mixed Clustering Approach for Real-Time Anomaly Detection
    Mazarbhuiya, Fokrul Alom
    Shenify, Mohamed
    APPLIED SCIENCES-BASEL, 2023, 13 (07):