The JDDC Corpus: A Large-Scale Multi-Turn Chinese Dialogue Dataset for E-commerce Customer Service

被引:0
|
作者
Chen, Meng [1 ]
Liu, Ruixue [1 ]
Shen, Lei [1 ]
Yuan, Shaozu [1 ]
Zhou, Jingyan [1 ]
Wu, Youzheng [1 ]
He, Xiaodong [1 ]
Zhou, Bowen [1 ]
机构
[1] JD AI, Beijing, Peoples R China
关键词
large-scale dataset; multi-turn dialogues; real E-commerce scenario;
D O I
暂无
中图分类号
TP39 [计算机的应用];
学科分类号
081203 ; 0835 ;
摘要
Human conversations are complicated and building a human-like dialogue agent is an extremely challenging task. With the rapid development of deep learning techniques, data-driven models become more and more prevalent which need a huge amount of real conversation data. In this paper, we construct a large-scale real scenario Chinese E-commerce conversation corpus, JDDC, with more than 1 million multi-turn dialogues, 20 million utterances, and 150 million words. The dataset reflects several characteristics of human-human conversations, e.g., goal-driven, and long-term dependency among the context. It also covers various dialogue types including task-oriented, chitchat and question-answering. Extra intent information and three well-annotated challenge sets are also provided. Then, we evaluate several retrieval-based and generative models to provide basic benchmark performance on the JDDC corpus. And we hope JDDC can serve as an effective testbed and benefit the development of fundamental research in dialogue task.
引用
收藏
页码:459 / 466
页数:8
相关论文
共 50 条
  • [1] E-ConvRec: A Large-Scale Conversational Recommendation Dataset for E-Commerce Customer Service
    Jia, Meihuizi
    Liu, Ruixue
    Wang, Peiying
    Song, Yang
    Xi, Zexi
    Li, Haobin
    Shen, Xin
    Chen, Meng
    Pang, Jinhui
    He, Xiaodong
    LREC 2022: THIRTEEN INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION, 2022, : 5787 - 5796
  • [2] Multi-Turn Dialogue Generation in E-Commerce Platform with the Context of Historical Dialogue
    Zhang, Weisheng
    Song, Kaisong
    Kang, Yangyang
    Wang, Zhongqing
    Sun, Changlong
    Liu, Xiaozhong
    Li, Shoushan
    Zhang, Min
    Si, Luo
    FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, EMNLP 2020, 2020, : 1981 - 1990
  • [3] MMDialog: A Large-scale Multi-turn Dialogue Dataset Towards Multi-modal Open-domain Conversation
    Feng, Jiazhan
    Sun, Qingfeng
    Xu, Can
    Zhao, Pu
    Yang, Yaming
    Tao, Chongyang
    Zhao, Dongyan
    Lin, Qingwei
    PROCEEDINGS OF THE 61ST ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, ACL 2023, VOL 1, 2023, : 7348 - 7363
  • [4] MEP-3M: A large-scale multi-modal E-commerce product dataset
    Liu, Fan
    Chen, Delong
    Du, Xiaoyu
    Gao, Ruizhuo
    Xu, Feng
    PATTERN RECOGNITION, 2023, 140
  • [5] NaturalConv: A Chinese Dialogue Dataset Towards Multi-turn Topic-driven Conversation
    Wang, Xiaoyang
    Li, Chen
    Zhao, Jianqiao
    Yu, Dong
    THIRTY-FIFTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, THIRTY-THIRD CONFERENCE ON INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE AND THE ELEVENTH SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2021, 35 : 14006 - 14014
  • [6] MatDC: A Multi-turn Multi-domain Annotated Task-oriented Dialogue Dataset in Chinese
    Tseng, Yu-Hsiang
    Hsieh, Shu-Kai
    Lian, Richard
    Chiang, Chiung-Yu
    Chang, Yu-Lin
    Chang, Li-Ping
    Hsieh, Ji-Lung
    2020 25TH INTERNATIONAL CONFERENCE ON TECHNOLOGIES AND APPLICATIONS OF ARTIFICIAL INTELLIGENCE (TAAI 2020), 2020, : 165 - 170
  • [7] Turn-Level User Satisfaction Estimation in E-commerce Customer Service
    Liang, Runze
    Takanobu, Ryuichi
    Li, Fenglin
    Zhang, Ji
    Chen, Haiqing
    Huang, Minlie
    ECNLP 4: THE FOURTH WORKSHOP ON E-COMMERCE AND NLP, 2021, : 26 - 32
  • [8] Large-scale Visual Search and Similarity for E-Commerce
    Anand, Gaurav
    Wang, Siyun
    Ni, Karl
    APPLICATIONS OF MACHINE LEARNING 2021, 2021, 11843
  • [9] Ontology management for large-scale e-commerce applications
    Lee, J
    Goodwin, R
    DEEC 2005: International Workshop on Data Engineering Issues in E-Commerce, Proceedings, 2005, : 7 - 15
  • [10] On the Semi-unsupervised Construction of Auto-keyphrases Corpus from Large-Scale Chinese Automobile E-Commerce Reviews
    Li, Yang
    Qian, Cheng
    Che, Haoyang
    Wang, Rui
    Wang, Zhichun
    Zhang, Jiacai
    CHINESE COMPUTATIONAL LINGUISTICS, CCL 2019, 2019, 11856 : 452 - 464