Subword Lexical Chaining for Automatic Story Segmentation in Chinese Broadcast News

被引:0
|
作者
Xie, Lei [1 ]
Yang, Yulian [1 ]
Zeng, Jia [2 ]
机构
[1] Northwestern Polytech Univ, Sch Comp Sci, Audio Speech & Language Proc Grp ASLP, Xian 710072, Peoples R China
[2] Hong Kong Baptist Univ, Dept Comp Sci, Hong Kong, Hong Kong, Peoples R China
基金
中国国家自然科学基金;
关键词
Story segmentation; topic segmentation; spoken document retrieval; multimedia; Chinese;
D O I
暂无
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
We present a subword lexical chaining approach to automatic story segmentation of Chinese broadcast news (BN). Conventional lexical chains link related words with cohesion (e.g. repetition of words) and high concentration points of starting and ending chains are indicative of story boundaries. However, inevitable speech recognition errors in BN transcripts may destroy the cohesiveness of words, resulting in word match failures. We show the robustness of Chinese subwords (characters and syllables) in lexical matching in errorful ASR transcripts. This motivates us to discover story boundaries on chains formed by character and syllable n-gram units. Experimental results on the TDT2 Mandarin corpus show that chaining by character unigram exhibits the best story segmentation performance with relative F-measure improvement of 6.06% over conventional word chaining. Integrations of multi-scales (words and subwords) exhibit further improvement. For example, fusion by voting from different scales achieves an F-measure gain of 9.04% over words.
引用
收藏
页码:248 / +
页数:3
相关论文
共 50 条
  • [21] GENRE EFFECTS ON AUTOMATIC SENTENCE SEGMENTATION OF SPEECH: A COMPARISON OF BROADCAST NEWS AND BROADCAST CONVERSATIONS
    Kolar, Jachym
    Liu, Yang
    Shriberg, Elizabeth
    2009 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOLS 1- 8, PROCEEDINGS, 2009, : 4701 - +
  • [22] A DETECTION-BASED APPROACH TO BROADCAST NEWS VIDEO STORY SEGMENTATION
    Ma, Chengyuan
    Byun, Byungki
    Kim, Ilseo
    Lee, Chin-Hui
    2009 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOLS 1- 8, PROCEEDINGS, 2009, : 1957 - 1960
  • [23] Broadcast news story segmentation using sticky hierarchical dirichlet process
    Jia Yu
    Hongxiang Shao
    Applied Intelligence, 2022, 52 : 12788 - 12800
  • [24] Broadcast news story segmentation using sticky hierarchical dirichlet process
    Yu, Jia
    Shao, Hongxiang
    APPLIED INTELLIGENCE, 2022, 52 (11) : 12788 - 12800
  • [25] BROADCAST NEWS STORY SEGMENTATION USING LATENT TOPICS ON DATA MANIFOLD
    Lu, Xiaoming
    Leung, Cheung-Chi
    Xie, Lei
    Ma, Bin
    Li, Haizhou
    2013 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2013, : 8465 - 8469
  • [26] Modeling Latent Topics and Temporal Distance for Story Segmentation of Broadcast News
    Chen, Hongjie
    Xie, Lei
    Leung, Cheung-Chi
    Lu, Xiaoming
    Ma, Bin
    Li, Haizhou
    IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2017, 25 (01) : 112 - 123
  • [27] A hierarchical approach to story segmentation of large broadcast news video corpus
    Chaisorn, L
    Chua, TS
    Lee, CH
    Tian, Q
    2004 IEEE INTERNATIONAL CONFERENCE ON MULTIMEDIA AND EXP (ICME), VOLS 1-3, 2004, : 1095 - 1098
  • [28] Multiple style exploration for story unit segmentation of broadcast news video
    Bailan Feng
    Zhineng Chen
    Rong Zheng
    Bo Xu
    Multimedia Systems, 2014, 20 : 347 - 361
  • [29] Multiple style exploration for story unit segmentation of broadcast news video
    Feng, Bailan
    Chen, Zhineng
    Zheng, Rong
    Xu, Bo
    MULTIMEDIA SYSTEMS, 2014, 20 (04) : 347 - 361
  • [30] Maximum Lexical Cohesion for Fine-Grained News Story Segmentation
    Liu, Zihan
    Xie, Lei
    Feng, Wei
    11TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2010 (INTERSPEECH 2010), VOLS 1-2, 2010, : 1301 - +