Visual features extracting & selecting for lipreading

被引:0
|
作者
Yao, HX [1 ]
Gao, W
Shan, W
Xu, MH
机构
[1] Harbin Inst Technol, Dept Comp Sci & Engn, Harbin 150001, Peoples R China
[2] Chinese Acad Sci, Inst Comp Technol, Beijing 100080, Peoples R China
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
This paper has put forward a way to select and extract visual features effectively for lipreading. These features come from both low-level and high-level, those are compensatory each other. There are 41 dimensional features to be used for recognition. Tested on a bimodal database AVCC which consists of sentences including all Chinese pronunciation, it achieves an accuracy of 87.8% from 84.1% for automatic speech recognition by lipreading assisting. It improves 19.5% accuracy from 31.7% to 51.2% for speakers dependent and improves 27.7% accuracy from 27.6% to 55.3% for speakers independent when speech recognition under noise conditions. And the paper has proves that visual speech information can reinforce the loss of acoustic information effectively by improving recognition rate from 10% to 30% various with the different amount of noises in speech signals in our system, the improving scope is higher than ASR system of IBM. And it performs better in noisy environments.
引用
收藏
页码:251 / 259
页数:9
相关论文
共 50 条
  • [31] Improved ROI and within frame discriminant features for lipreading
    IBM Thomas J. Watson Research Center, Yorktown Heights, NY 10598, United States
    IEEE Int Conf Image Process, (250-253):
  • [32] Improved ROI and within frame discriminant features for lipreading
    Potamianos, G
    Neti, C
    2001 INTERNATIONAL CONFERENCE ON IMAGE PROCESSING, VOL III, PROCEEDINGS, 2001, : 250 - 253
  • [33] Extracting Data Records from Query Result Pages Based on Visual Features
    Weng, Daiyue
    Hong, Jun
    Bell, David A.
    ADVANCES IN DATABASES, 2011, 7051 : 140 - 153
  • [34] VISUAL PRESENTATION OF VOICING AND OTHER CUES AN AID TO LIPREADING
    AINSWORTH, WA
    PROCEEDINGS : INSTITUTE OF ACOUSTICS, VOL 8, PART 7: SPEECH & HEARING, 1986, 8 : 233 - 240
  • [35] Extracting Features of Interest from Small Deep Networks for Efficient Visual Tracking
    Luo, Zhao
    Ge, Shiming
    Hua, Yingying
    Liu, Haolin
    Jin, Xin
    ADVANCES IN MULTIMEDIA INFORMATION PROCESSING, PT I, 2018, 11164 : 414 - 425
  • [36] Extracting and selecting discriminative features from high density NIRS-based BCI for numerical cognition
    Ang, Kai Keng
    Yu, Juanhong
    Guan, Cuntai
    2012 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), 2012,
  • [37] SENTENCE FAMILIARITY AS A FACTOR IN VISUAL SPEECH RECEPTION (LIPREADING)
    LLOYD, LL
    JOURNAL OF SPEECH AND HEARING DISORDERS, 1964, 29 (04): : 409 - 413
  • [38] EXTRACTING AUDIO-VISUAL FEATURES FOR EMOTION RECOGNITION THROUGH ACTIVE FEATURE SELECTION
    Haider, Fasih
    Pollak, Senja
    Albert, Pierre
    Luz, Saturnino
    2019 7TH IEEE GLOBAL CONFERENCE ON SIGNAL AND INFORMATION PROCESSING (IEEE GLOBALSIP), 2019,
  • [39] Extracting semantic information from basketball video based on audio-visual features
    Kim, K
    Choi, J
    Kim, N
    Kim, P
    IMAGE AND VIDEO RETRIEVAL, 2002, 2383 : 278 - 288
  • [40] Selecting hyperspectral bands and extracting features with a custom shallow convolutional neural network to classify citrus peel defects
    Frederick, Quentin
    Burks, Thomas
    Watson, Adam
    Yadav, Pappu Kumar
    Qin, Jianwei
    Kim, Moon
    Ritenour, Mark A.
    SMART AGRICULTURAL TECHNOLOGY, 2023, 6