Subword Lexical Chaining for Automatic Story Segmentation in Chinese Broadcast News

被引:0
|
作者
Xie, Lei [1 ]
Yang, Yulian [1 ]
Zeng, Jia [2 ]
机构
[1] Northwestern Polytech Univ, Sch Comp Sci, Audio Speech & Language Proc Grp ASLP, Xian 710072, Peoples R China
[2] Hong Kong Baptist Univ, Dept Comp Sci, Hong Kong, Hong Kong, Peoples R China
基金
中国国家自然科学基金;
关键词
Story segmentation; topic segmentation; spoken document retrieval; multimedia; Chinese;
D O I
暂无
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
We present a subword lexical chaining approach to automatic story segmentation of Chinese broadcast news (BN). Conventional lexical chains link related words with cohesion (e.g. repetition of words) and high concentration points of starting and ending chains are indicative of story boundaries. However, inevitable speech recognition errors in BN transcripts may destroy the cohesiveness of words, resulting in word match failures. We show the robustness of Chinese subwords (characters and syllables) in lexical matching in errorful ASR transcripts. This motivates us to discover story boundaries on chains formed by character and syllable n-gram units. Experimental results on the TDT2 Mandarin corpus show that chaining by character unigram exhibits the best story segmentation performance with relative F-measure improvement of 6.06% over conventional word chaining. Integrations of multi-scales (words and subwords) exhibit further improvement. For example, fusion by voting from different scales achieves an F-measure gain of 9.04% over words.
引用
收藏
页码:248 / +
页数:3
相关论文
共 50 条
  • [31] Automatic Segmentation of Broadcast News Audio using Self Similarity Matrix
    Soni, Sapna
    Ahmed, Imran
    Kopparapu, Sunil Kumar
    2014 INTERNATIONAL CONFERENCE FOR CONVERGENCE OF TECHNOLOGY (I2CT), 2014,
  • [32] Initial experiments on automatic story segmentation in Chinese spoken documents using lexical cohesion of extracted named entities
    Li, Devon
    Lo, Wai-Kit
    Meng, Helen
    CHINESE SPOKEN LANGUAGE PROCESSING, PROCEEDINGS, 2006, 4274 : 693 - +
  • [33] MULTI-MODAL INFORMATION FUSION FOR NEWS STORY SEGMENTATION IN BROADCAST VIDEO
    Feng, Bailan
    Ding, Peng
    Chen, Jiansong
    Bai, Jinfeng
    Xu, Su
    Xu, Bo
    2012 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2012, : 1417 - 1420
  • [34] Broadcast News Story Segmentation Using Conditional Random Fields and Multimodal Features
    Wang, Xiaoxuan
    Xie, Lei
    Lu, Mimi
    Ma, Bin
    Chng, Eng Siong
    Li, Haizhou
    IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS, 2012, E95D (05) : 1206 - 1215
  • [35] Automatic transcription of Broadcast News
    Chen, SS
    Eide, E
    Gales, MJF
    Gopinath, RA
    Kanvesky, D
    Olsen, P
    SPEECH COMMUNICATION, 2002, 37 (1-2) : 69 - 87
  • [36] AUTOMATIC COMPOSITION OF BROADCAST NEWS SUMMARIES USING RANK CLASSIFIERS TRAINED WITH ACOUSTIC AND LEXICAL FEATURES
    Hasan, Taufiq
    Abdelwahab, Mohammed
    Parthasarathy, Srinivas
    Busso, Carlos
    Liu, Yang
    2016 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING PROCEEDINGS, 2016, : 6080 - 6084
  • [37] Varying Input Segmentation for Story Boundary Detection in English, Arabic and Mandarin Broadcast News
    Rosenberg, Andrew
    Sharifi, Mehrbod
    Hirschberg, Julia
    INTERSPEECH 2007: 8TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION, VOLS 1-4, 2007, : 1745 - 1748
  • [38] Automatic Story Segmentation for TV News Video Using Multiple Modalities
    Dumont, Emilie
    Quenot, Georges
    INTERNATIONAL JOURNAL OF DIGITAL MULTIMEDIA BROADCASTING, 2012, 2012
  • [39] Automatic Identification of Broadcast News Story Boundaries Using the Unification Method for Popular Nouns
    Khalaf, Zainab Ali
    Ping, Tan Tien
    2013 FEDERATED CONFERENCE ON COMPUTER SCIENCE AND INFORMATION SYSTEMS (FEDCSIS), 2013, : 577 - 584
  • [40] Improving broadcast news segmentation processing
    Boykin, S
    Merlino, A
    IEEE INTERNATIONAL CONFERENCE ON MULTIMEDIA COMPUTING AND SYSTEMS, PROCEEDINGS VOL 1, 1999, : 744 - 749