Large-Scale Multimodal Movie Dialogue Corpus

被引:3
|
作者
Yasuhara, Ryu [1 ]
Inoue, Masashi [1 ]
Suga, Ikuya [1 ]
Kosaka, Tetsuo [1 ]
机构
[1] Yamagata Univ, 3-16,4 Jyonan, Yonezawa, Yamagata, Japan
关键词
Dialogue; Multimodal; Corpus; Movie; Film; VAD; DNN;
D O I
10.1145/2993148.2998523
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
We present an outline of our newly created multimodal dialogue corpus that is constructed from public domain movies. Dialogues in movies are useful sources for analyzing human communication patterns. In addition, they can be used to train machine-learning-based dialogue processing systems. However, the movie files are processing intensive and they contain large portions of non-dialogue segments. Therefore, we created a corpus that contains only dialogue segments from movies. The corpus contains 165, 368 dialogue segments taken from 1, 722 movies. These dialogues are automatically segmented by using deep neural network-based voice activity detection with filtering rules. Our corpus can reduce the human workload and machine-processing effort required to analyze human dialogue behavior by using movies.
引用
收藏
页码:414 / 415
页数:2
相关论文
共 50 条
  • [21] Mining Preconditions of APIs in Large-Scale Code Corpus
    Hoan Anh Nguyen
    Dyer, Robert
    Nguyen, Tien N.
    Rajan, Hridesh
    22ND ACM SIGSOFT INTERNATIONAL SYMPOSIUM ON THE FOUNDATIONS OF SOFTWARE ENGINEERING (FSE 2014), 2014, : 166 - 177
  • [22] A large-scale corpus system for identifying thesaural relations
    Collier, A
    Pacey, M
    CORPUS-BASED STUDIES IN ENGLISH, 1997, (20): : 87 - 100
  • [23] Development of a Large-Scale Mandarin Radio Speech Corpus
    Chang, Yung-hsiang Shawn
    Liao, Yuan-fu
    Wang, Sheng-ming
    Wang, Jenq-haur
    Wang, Sing-yue
    Chen, Jhih-wei
    Chen, You-dian
    2017 IEEE INTERNATIONAL CONFERENCE ON CONSUMER ELECTRONICS - TAIWAN (ICCE-TW), 2017,
  • [24] Captioning Videos Using Large-Scale Image Corpus
    Xiao-Yu Du
    Yang Yang
    Liu Yang
    Fu-Min Shen
    Zhi-Guang Qin
    Jin-Hui Tang
    Journal of Computer Science and Technology, 2017, 32 : 480 - 493
  • [25] Captioning Videos Using Large-Scale Image Corpus
    Du, Xiao-Yu
    Yang, Yang
    Yang, Liu
    Shen, Fu-Min
    Qin, Zhi-Guang
    Tang, Jin-Hui
    JOURNAL OF COMPUTER SCIENCE AND TECHNOLOGY, 2017, 32 (03) : 480 - 493
  • [26] New word detection based on large-scale corpus
    Digital Technology Laboratory, Institute of Computing Technology, Chinese Academy of Sciences, Beijing 100080, China
    不详
    不详
    Jisuanji Yanjiu yu Fazhan, 2006, 5 (927-932):
  • [27] LANS: Large-scale Arabic News Summarization Corpus
    Alhamadani, Abdulaziz
    Zhang, Xuchao
    He, Jianfeng
    Khatri, Aadyant
    Lu, Chang-Tien
    ArabicNLP 2023 - 1st Arabic Natural Language Processing Conference, Proceedings, 2023, : 89 - 100
  • [28] Problems on large-scale speech corpus and the applications in TTS
    Zhang S.
    Liu L.
    Diao L.-H.
    Jisuanji Xuebao/Chinese Journal of Computers, 2010, 33 (04): : 687 - 696
  • [29] Quantitative Study of Preposition based on Large-scale Corpus
    Wang, Zhimin
    He, Wei
    Lacasella, Pierangelo
    2015 IEEE/WIC/ACM INTERNATIONAL CONFERENCE ON WEB INTELLIGENCE AND INTELLIGENT AGENT TECHNOLOGY (WI-IAT), VOL 3, 2015, : 177 - 180
  • [30] Itihasa: A large-scale corpus for Sanskrit to English translation
    Aralikatte, Rahul
    de Lhoneux, Miryam
    Kunchukuttan, Anoop
    Sogaard, Anders
    WAT 2021: THE 8TH WORKSHOP ON ASIAN TRANSLATION, 2021, : 191 - 197