Large-Scale Multimodal Movie Dialogue Corpus

被引：3

作者：

Yasuhara, Ryu ^{[1
]}

Inoue, Masashi ^{[1
]}

Suga, Ikuya ^{[1
]}

Kosaka, Tetsuo ^{[1
]}

机构：

[1] Yamagata Univ, 3-16,4 Jyonan, Yonezawa, Yamagata, Japan

来源：

ICMI'16: PROCEEDINGS OF THE 18TH ACM INTERNATIONAL CONFERENCE ON MULTIMODAL INTERACTION | 2016年

关键词：

Dialogue; Multimodal; Corpus; Movie; Film; VAD; DNN;

D O I：

10.1145/2993148.2998523

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

We present an outline of our newly created multimodal dialogue corpus that is constructed from public domain movies. Dialogues in movies are useful sources for analyzing human communication patterns. In addition, they can be used to train machine-learning-based dialogue processing systems. However, the movie files are processing intensive and they contain large portions of non-dialogue segments. Therefore, we created a corpus that contains only dialogue segments from movies. The corpus contains 165, 368 dialogue segments taken from 1, 722 movies. These dialogues are automatically segmented by using deep neural network-based voice activity detection with filtering rules. Our corpus can reduce the human workload and machine-processing effort required to analyze human dialogue behavior by using movies.

引用

页码：414 / 415

页数：2

共 50 条

[21] Mining Preconditions of APIs in Large-Scale Code Corpus
Hoan Anh Nguyen
Dyer, Robert
Nguyen, Tien N.
Rajan, Hridesh
22ND ACM SIGSOFT INTERNATIONAL SYMPOSIUM ON THE FOUNDATIONS OF SOFTWARE ENGINEERING (FSE 2014), 2014, : 166 - 177
[22] A large-scale corpus system for identifying thesaural relations
Collier, A
Pacey, M
CORPUS-BASED STUDIES IN ENGLISH, 1997, (20): : 87 - 100
[23] Development of a Large-Scale Mandarin Radio Speech Corpus
Chang, Yung-hsiang Shawn
Liao, Yuan-fu
Wang, Sheng-ming
Wang, Jenq-haur
Wang, Sing-yue
Chen, Jhih-wei
Chen, You-dian
2017 IEEE INTERNATIONAL CONFERENCE ON CONSUMER ELECTRONICS - TAIWAN (ICCE-TW), 2017,
[24] Captioning Videos Using Large-Scale Image Corpus
Xiao-Yu Du
Yang Yang
Liu Yang
Fu-Min Shen
Zhi-Guang Qin
Jin-Hui Tang
Journal of Computer Science and Technology, 2017, 32 : 480 - 493
[25] Captioning Videos Using Large-Scale Image Corpus
Du, Xiao-Yu
Yang, Yang
Yang, Liu
Shen, Fu-Min
Qin, Zhi-Guang
Tang, Jin-Hui
JOURNAL OF COMPUTER SCIENCE AND TECHNOLOGY, 2017, 32 (03) : 480 - 493
[26] New word detection based on large-scale corpus
Digital Technology Laboratory, Institute of Computing Technology, Chinese Academy of Sciences, Beijing 100080, China
不详
不详
Jisuanji Yanjiu yu Fazhan, 2006, 5 (927-932):
[27] LANS: Large-scale Arabic News Summarization Corpus
Alhamadani, Abdulaziz
Zhang, Xuchao
He, Jianfeng
Khatri, Aadyant
Lu, Chang-Tien
ArabicNLP 2023 - 1st Arabic Natural Language Processing Conference, Proceedings, 2023, : 89 - 100
[28] Problems on large-scale speech corpus and the applications in TTS
Zhang S.
Liu L.
Diao L.-H.
Jisuanji Xuebao/Chinese Journal of Computers, 2010, 33 (04): : 687 - 696
[29] Quantitative Study of Preposition based on Large-scale Corpus
Wang, Zhimin
He, Wei
Lacasella, Pierangelo
2015 IEEE/WIC/ACM INTERNATIONAL CONFERENCE ON WEB INTELLIGENCE AND INTELLIGENT AGENT TECHNOLOGY (WI-IAT), VOL 3, 2015, : 177 - 180
[30] Itihasa: A large-scale corpus for Sanskrit to English translation
Aralikatte, Rahul
de Lhoneux, Miryam
Kunchukuttan, Anoop
Sogaard, Anders
WAT 2021: THE 8TH WORKSHOP ON ASIAN TRANSLATION, 2021, : 191 - 197

← 1 2 3 4 5 →