Improving multimedia retrieval with a video OCR

被引:0
|
作者
Das, Dipanjan [1 ]
Chen, Datong [2 ]
Hauptmann, Alexander G. [1 ]
机构
[1] Carnegie Mellon Univ, Language Technol Inst, Pittsburgh, PA 15213 USA
[2] Carnegie Mellon Univ, Comp Sci Dept, Pittsburgh, PA 15213 USA
关键词
video OCR; OCR; multimedia retrieval; video retrieval; optical character recognition;
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
We present a set of experiments with a video OCR system (VOCR) tailored for video information retrieval and establish its importance in multimedia search in general and for some specific queries in particular. The system, inspired by an existing work on text detection and recognition in images, has been developed using, techniques involving detailed analysis of video frames producing candidate text regions. The text regions are then binarized and sent to a commercial OCR resulting in ASCII text, that is finally used to create search indexes. The system is evaluated using the TREVID data.. We compare the system's performance from an information retrieval perspective with another VOCR developed, using multi-frame integration and empirically demonstrate that deep analysis on individual video frames result in better video retrieval. We also evaluate the effect of various textual sources on multimedia retrieval by combining the VOCR outputs with automatic speech recognition (ASR) transcripts. For general search queries, the VOCR system coupled with ASR sources outperforms the other system by a very large extent. For search queries that involve named entities, especially people names, the VOCR system even outperforms speech transcripts, demonstrating that source selection for particular query types is extremely essential.
引用
收藏
页数:12
相关论文
共 50 条
  • [31] VTC: Improving Video-Text Retrieval with User Comments
    Hanu, Laura
    Thewlis, James
    Asano, Yuki M.
    Rupprecht, Christian
    COMPUTER VISION - ECCV 2022, PT XXXV, 2022, 13695 : 616 - 633
  • [32] Improving interactive video retrieval by exploiting automatically-extracted video structural semantics
    Mezaris, Vasileios
    Sidiropoulos, Panagiotis
    Kompatsiaris, Ioannis
    FIFTH IEEE INTERNATIONAL CONFERENCE ON SEMANTIC COMPUTING (ICSC 2011), 2011, : 224 - 227
  • [33] Video OCR for digital news archive
    Sato, T
    Kanade, T
    Hughes, EK
    Smith, MA
    1998 IEEE INTERNATIONAL WORKSHOP ON CONTENT-BASED ACCESS OF IMAGE AND VIDEO DATABASE, PROCEEDINGS, 1998, : 52 - 60
  • [34] Time and date OCR in CCTV video
    García-Mateos, G
    García-Meroño, A
    Vicente-Chicote, C
    Ruiz, A
    López-de-Teruel, PE
    IMAGE ANALYSIS AND PROCESSING - ICIAP 2005, PROCEEDINGS, 2005, 3617 : 703 - 710
  • [35] MULTIMEDIA VIDEO
    POURNELLE, J
    BYTE, 1990, 15 (12): : 73 - &
  • [36] Imaged document text retrieval without OCR
    Tan, CL
    Huang, WH
    Yu, ZH
    Xu, Y
    IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2002, 24 (06) : 838 - 844
  • [37] Effect of OCR error correction on Arabic retrieval
    Magdy, Walid
    Darwish, Kareern
    INFORMATION RETRIEVAL, 2008, 11 (05): : 405 - 425
  • [38] Effect of OCR error correction on Arabic retrieval
    Walid Magdy
    Kareem Darwish
    Information Retrieval, 2008, 11 : 405 - 425
  • [39] RATE-COVERAGE ANALYSIS AND OPTIMIZATION FOR JOINT AUDIO-VIDEO MULTIMEDIA RETRIEVAL
    Ning, Guanghan
    Zhang, Zhi
    Ren, Xiaobo
    Wang, Haohong
    He, Zhihai
    2017 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2017, : 2911 - 2915
  • [40] Multimedia Information Retrieval
    Henrich, Andreas
    IT-INFORMATION TECHNOLOGY, 2009, 51 (06): : 336 - 342