Cross-modal Representation Learning for Understanding Manufacturing Procedure

被引:0
|
作者
Hashimoto, Atsushi [1 ]
Nishimura, Taichi [2 ]
Ushiku, Yoshitaka [1 ]
Kameko, Hirotaka [2 ]
Mori, Shinsuke [2 ]
机构
[1] OMRON SINIC X Corp, Tokyo, Japan
[2] Kyoto Univ, Kyoto, Japan
关键词
Procedural text generation; Image captioning; Video captioning; Understanding manufacturing activity;
D O I
10.1007/978-3-031-06047-2_4
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Assembling, biochemical experiments, and cooking are representatives that create a new value from multiple materials through multiple processes. If a machine can computationally understand such manufacturing tasks, we will have various options of human-machine collaboration on those tasks, from video scene retrieval to robots that act for on behalf of humans. As one form of such understanding, this paper introduces a series of our studies that aim to associate visual observation of the processes and the procedural texts that instruct such processes. In those studies, captioning is the key task, where input is image sequence or video clips and our methods are still state-of-the-arts. Through the explanation of such techniques, we overview machine learning technologies that deal with the contextual information of manufacturing tasks.
引用
收藏
页码:44 / 57
页数:14
相关论文
共 50 条
  • [1] Cross-Modal Discrete Representation Learning
    Liu, Alexander H.
    Jin, SouYoung
    Lai, Cheng-I Jeff
    Rouditchenko, Andrew
    Oliva, Aude
    Glass, James
    PROCEEDINGS OF THE 60TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2022), VOL 1: (LONG PAPERS), 2022, : 3013 - 3035
  • [2] Quaternion Representation Learning for cross-modal matching
    Wang, Zheng
    Xu, Xing
    Wei, Jiwei
    Xie, Ning
    Shao, Jie
    Yang, Yang
    KNOWLEDGE-BASED SYSTEMS, 2023, 270
  • [3] Hybrid representation learning for cross-modal retrieval
    Cao, Wenming
    Lin, Qiubin
    He, Zhihai
    He, Zhiquan
    NEUROCOMPUTING, 2019, 345 : 45 - 57
  • [4] Disentangled Representation Learning for Cross-Modal Biometric Matching
    Ning, Hailong
    Zheng, Xiangtao
    Lu, Xiaoqiang
    Yuan, Yuan
    IEEE TRANSACTIONS ON MULTIMEDIA, 2022, 24 : 1763 - 1774
  • [5] Cross-modal Representation Learning with Nonlinear Dimensionality Reduction
    Kaya, Semih
    Vural, Elif
    2019 27TH SIGNAL PROCESSING AND COMMUNICATIONS APPLICATIONS CONFERENCE (SIU), 2019,
  • [6] Learning Cross-Modal Aligned Representation With Graph Embedding
    Zhang, Youcai
    Cao, Jiayan
    Gu, Xiaodong
    IEEE ACCESS, 2018, 6 : 77321 - 77333
  • [7] Enhanced Multimodal Representation Learning with Cross-modal KD
    Chen, Mengxi
    Xing, Linyu
    Wang, Yu
    Zhang, Ya
    2023 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2023, : 11766 - 11775
  • [8] Towards Cross-Modal Causal Structure and Representation Learning
    Mao, Haiyi
    Liu, Hongfu
    Dou, Jason Xiaotian
    Benos, Panayiotis V.
    MACHINE LEARNING FOR HEALTH, VOL 193, 2022, 193 : 120 - 140
  • [9] Variational Deep Representation Learning for Cross-Modal Retrieval
    Yang, Chen
    Deng, Zongyong
    Li, Tianyu
    Liu, Hao
    Liu, Libo
    PATTERN RECOGNITION AND COMPUTER VISION, PRCV 2021, PT II, 2021, 13020 : 498 - 510
  • [10] Deep Understanding of Cooking Procedure for Cross-modal Recipe Retrieval
    Chen, Jing-Jing
    Ngo, Chong-Wah
    Feng, Fu-Li
    Chua, Tat-Seng
    PROCEEDINGS OF THE 2018 ACM MULTIMEDIA CONFERENCE (MM'18), 2018, : 1020 - 1028