Effects on Time and Quality of Short Text Clustering during Real-Time Presentations

被引:9
|
作者
Fuentealba, Diego [1 ]
Lopez, Mario [2 ]
Ponce, Hector [3 ]
机构
[1] Univ Santiago Chile, VirtuaLab, Dept Ind Engn, Santiago, Chile
[2] Univ Santiago Chile, Dept Ind Engn, Santiago, Chile
[3] Univ Santiago Chile, Dept Accounting & Auditing, Fac Adm & Econ, Santiago, Chile
关键词
Silicon compounds; Blogs; Real-time systems; Clustering algorithms; Social networking (online); IEEE transactions; Visualization; Text Mining; TF-IDF; K-Means; Short Phrases; Short Text; Sentences; Clustering; Interactivity; OPTIMIZATION APPROACH; MODEL;
D O I
10.1109/TLA.2021.9475870
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Technologies for live presentations should consider users' capabilities to manage large amounts of data in real-time, particularly, exchanges of short texts (e.g., phrases). This study examines the effects on time and quality of text clustering algorithms applied to short, medium, and long size texts, and examines whether short text clustering shows a reasonable performance for live presentations. We run several simulations in which we varied the number of phrases (from 5 to 200) contained in each text type (long, medium, and short) and the number of generated clusters (from 2 to 10). The algorithms used were snowball steamers, TF-IDF, and K-means for clustering; and the text types were Reuters, 20 NewsGroup and an experimental data set, for the long, medium, and short size texts, respectively. The first result showed that text size had a large effect on the algorithms execution time, with the shortest average time for the short texts and longer average time for the longest texts. The second result showed that the number of phrases in each text type significantly predicts execution time but not the number of clusters generated by K-means. Inertia and purity measures were used to test the quality of the clusters generated. Text size, number of phrases and number of clusters predict inertia; showing the lowest inertia for the short texts. Purity measures were like previously reported results for all text types. Thus, clustering algorithms for short texts can confidently be used in real-time presentations.
引用
收藏
页码:1391 / 1399
页数:9
相关论文
共 50 条
  • [21] Real-time video quality monitoring
    Liu, Tao
    Narvekar, Niranjan
    Wang, Beibei
    Ding, Ran
    Zou, Dekun
    Cash, Glenn
    Bhagavathy, Sitaram
    Bloom, Jeffrey
    EURASIP JOURNAL ON ADVANCES IN SIGNAL PROCESSING, 2011, : 1 - 18
  • [22] Real-Time Data Quality Analysis
    Iyengar, Arun
    Patel, Dhaval
    Shrivastava, Shrey
    Zhou, Nianjun
    Bhamidipaty, Anuradha
    2020 IEEE SECOND INTERNATIONAL CONFERENCE ON COGNITIVE MACHINE INTELLIGENCE (COGMI 2020), 2020, : 101 - 108
  • [23] Quality contracts for real-time enterprises
    Labrinidis, Alexandros
    Qu, Huiming
    Xu, Jie
    BUSINESS INTELLIGENCE FOR THE REAL-TIME ENTERPRISES, 2007, 4365 : 143 - +
  • [24] Real-Time Prediction of Segmentation Quality
    Robinson, Robert
    Oktay, Ozan
    Bai, Wenjia
    Valindria, Vanya V.
    Sanghvi, Mihir M.
    Aung, Nay
    Paiva, Jose M.
    Zemrak, Filip
    Fung, Kenneth
    Lukaschuk, Elena
    Lee, Aaron M.
    Carapella, Valentina
    Kim, Young Jin
    Kainz, Bernhard
    Piechnik, Stefan K.
    Neubauer, Stefan
    Petersen, Steffen E.
    Page, Chris
    Rueckert, Daniel
    Glocker, Ben
    MEDICAL IMAGE COMPUTING AND COMPUTER ASSISTED INTERVENTION - MICCAI 2018, PT IV, 2018, 11073 : 578 - 585
  • [25] Real-time video quality monitoring
    Tao Liu
    Niranjan Narvekar
    Beibei Wang
    Ran Ding
    Dekun Zou
    Glenn Cash
    Sitaram Bhagavathy
    Jeffrey Bloom
    EURASIP Journal on Advances in Signal Processing, 2011
  • [26] Real-Time Quality Inspection Based on Transfer Learning and Feature Clustering for Wave Soldering
    Liu, Daoyuan
    Guo, Yu
    Xie, Jian
    Gao, Hanpeng
    Huang, Shaohua
    IEEE TRANSACTIONS ON INSTRUMENTATION AND MEASUREMENT, 2024, 73 : 1 - 10
  • [27] Performance Optimizations for Distributed Real-time Text Indexing
    Narang, Ankur
    Swaminathan, Karthik
    Agrawal, Prashant
    16TH INTERNATIONAL CONFERENCE ON HIGH PERFORMANCE COMPUTING (HIPC), PROCEEDINGS, 2009, : 398 - 407
  • [28] A Real-Time Automatic Translation of Text to Sign Language
    Sanaullah, Muhammad
    Ahmad, Babar
    Kashif, Muhammad
    Safdar, Tauqeer
    Hassan, Mehdi
    Hasan, Mohd Hilmi
    Aziz, Norshakirah
    CMC-COMPUTERS MATERIALS & CONTINUA, 2022, 70 (02): : 2471 - 2488
  • [29] THE ROLE OF SCHEMAS IN READING TEXT - A REAL-TIME EXAMINATION
    SMITH, EE
    SWINNEY, DA
    DISCOURSE PROCESSES, 1992, 15 (03) : 303 - 316
  • [30] An Improved System For Real-Time Scene Text Recognition
    Yang, Haojin
    Wang, Cheng
    Che, Xiaoyin
    Luo, Sheng
    Meinel, Christoph
    ICMR'15: PROCEEDINGS OF THE 2015 ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA RETRIEVAL, 2015, : 657 - 660