A study of unsupervised clustering techniques for language modeling

被引:0
|
作者
Hahn, Sangyun [1 ]
Sethy, Abhinav [2 ]
Kuo, Hong-Kwang J. [2 ]
Ramabhadran, Bhuvana [2 ]
机构
[1] Univ Washington, Dept Comp Sci & Engn, Seattle, WA 98195 USA
[2] IBM TJ Watson Res Ctr, Yorktown Hts, NY 10598 USA
来源
INTERSPEECH 2008: 9TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2008, VOLS 1-5 | 2008年
关键词
Clustering; Language Model Adaptation; Entropy;
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
There has been recent interest in clustering text data to build topic-specific language models for large vocabulary speech recognition. In this paper, we studied various unsupervised clustering algorithms on several corpora. First we compared the clustering methods with quality metrics such as entropy and purity. Of the techniques studied, two-phase bisecting K-means achieved good performance with relatively fast speed. Then we performed speech recognition experiments on English and Arabic systems using the automatically derived topic-based language models. We obtained modest word error rate improvements, comparable to previously published studies. A careful analysis of the correlation between word error rate and the distribution of misrecognized words, including an information-gain metric, is presented.
引用
收藏
页码:1598 / +
页数:2
相关论文
共 50 条
  • [41] An unsupervised language model adaptation based on keyword clustering and query availability estimation
    Ito, Akinori
    Kajiura, Yasutomo
    Makino, Shozo
    Suzuki, Motoyuki
    2008 INTERNATIONAL CONFERENCE ON AUDIO, LANGUAGE AND IMAGE PROCESSING, VOLS 1 AND 2, PROCEEDINGS, 2008, : 1412 - 1418
  • [42] Modeling language and cognition with deep unsupervised learning: a tutorial overview
    Zorzi, Marco
    Testolin, Alberto
    Stoianov, Ivilin P.
    FRONTIERS IN PSYCHOLOGY, 2013, 4
  • [43] Monitoring of Bridges by MT-InSAR and Unsupervised Machine Learning Clustering Techniques
    Gagliardi, Valerio
    Tosti, Fabio
    Ciampoli, Luca Bianchini
    D'Amico, Fabrizio
    Alani, Amir M.
    Battagliere, Maria L.
    Benedetto, Andrea
    EARTH RESOURCES AND ENVIRONMENTAL REMOTE SENSING/GIS APPLICATIONS XII, 2021, 11863
  • [44] On the relative value of clustering techniques for Unsupervised Effort-Aware Defect Prediction
    Yang, Peixin
    Zhu, Lin
    Zhang, Yanjiao
    Ma, Chuanxiang
    Liu, Liming
    Yu, Xiao
    Hu, Wenhua
    EXPERT SYSTEMS WITH APPLICATIONS, 2024, 245
  • [45] Computer-aided diagnosis in breast MRI based on unsupervised clustering techniques
    Meyer-Bäse, A
    Wismüller, A
    Lange, O
    Leinsinger, G
    INTELLIGENT COMPUTING: THEORY AND APPLICATIONS II, 2004, 5421 : 29 - 37
  • [46] Solar flare forecasting using learning vector quantity and unsupervised clustering techniques
    Rong Li
    HuaNing Wang
    YanMei Cui
    Xin Huang
    Science China Physics, Mechanics and Astronomy, 2011, 54 : 1546 - 1552
  • [47] A Framework for Clustering Cardiac Patient's Records Using Unsupervised Learning Techniques
    Liaqat, Rao Muzamal
    Mehboob, Bilal
    Saqib, Nazar Abbas
    Khan, Muazzam A.
    7TH INTERNATIONAL CONFERENCE ON EMERGING UBIQUITOUS SYSTEMS AND PERVASIVE NETWORKS (EUSPN 2016)/THE 6TH INTERNATIONAL CONFERENCE ON CURRENT AND FUTURE TRENDS OF INFORMATION AND COMMUNICATION TECHNOLOGIES IN HEALTHCARE (ICTH-2016), 2016, 98 : 368 - 373
  • [48] Unstructured Oncological Image Cluster Identification Using Improved Unsupervised Clustering Techniques
    Kumar, S. Sreedhar
    Ahmed, Syed Thouheed
    Xin, Qin
    Sandeep, S.
    Madheswaran, M.
    Basha, Syed Muzamil
    CMC-COMPUTERS MATERIALS & CONTINUA, 2022, 72 (01): : 281 - 299
  • [49] Empirical Study on Unsupervised Feature Selection for Document Clustering
    Mackute-Varoneckiene, Ausra
    Krilavicius, Tomas
    HUMAN LANGUAGE TECHNOLOGIES - THE BALTIC PERSPECTIVE, BALTIC HLT 2014, 2014, 268 : 107 - +
  • [50] Unsupervised study of plethysmography signals through DTW clustering
    Germain, Thibaut
    Truong, Charles
    Oudre, Laurent
    Krejci, Eric
    2022 44TH ANNUAL INTERNATIONAL CONFERENCE OF THE IEEE ENGINEERING IN MEDICINE & BIOLOGY SOCIETY, EMBC, 2022, : 3396 - 3400