Cross-modal Representation Learning for Understanding Manufacturing Procedure

被引：0

作者：

Hashimoto, Atsushi ^{[1
]}

Nishimura, Taichi ^{[2
]}

Ushiku, Yoshitaka ^{[1
]}

Kameko, Hirotaka ^{[2
]}

Mori, Shinsuke ^{[2
]}

机构：

[1] OMRON SINIC X Corp, Tokyo, Japan

[2] Kyoto Univ, Kyoto, Japan

来源：

CROSS-CULTURAL DESIGN-APPLICATIONS IN LEARNING, ARTS, CULTURAL HERITAGE, CREATIVE INDUSTRIES, AND VIRTUAL REALITY, CCD 2022, PT II | 2022年 / 13312卷

关键词：

Procedural text generation; Image captioning; Video captioning; Understanding manufacturing activity;

D O I：

10.1007/978-3-031-06047-2_4

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

Assembling, biochemical experiments, and cooking are representatives that create a new value from multiple materials through multiple processes. If a machine can computationally understand such manufacturing tasks, we will have various options of human-machine collaboration on those tasks, from video scene retrieval to robots that act for on behalf of humans. As one form of such understanding, this paper introduces a series of our studies that aim to associate visual observation of the processes and the procedural texts that instruct such processes. In those studies, captioning is the key task, where input is image sequence or video clips and our methods are still state-of-the-arts. Through the explanation of such techniques, we overview machine learning technologies that deal with the contextual information of manufacturing tasks.

引用

页码：44 / 57

页数：14

共 50 条

[1] Cross-Modal Discrete Representation Learning
Liu, Alexander H.
Jin, SouYoung
Lai, Cheng-I Jeff
Rouditchenko, Andrew
Oliva, Aude
Glass, James
PROCEEDINGS OF THE 60TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2022), VOL 1: (LONG PAPERS), 2022, : 3013 - 3035
[2] Quaternion Representation Learning for cross-modal matching
Wang, Zheng
Xu, Xing
Wei, Jiwei
Xie, Ning
Shao, Jie
Yang, Yang
KNOWLEDGE-BASED SYSTEMS, 2023, 270
[3] Hybrid representation learning for cross-modal retrieval
Cao, Wenming
Lin, Qiubin
He, Zhihai
He, Zhiquan
NEUROCOMPUTING, 2019, 345 : 45 - 57
[4] Disentangled Representation Learning for Cross-Modal Biometric Matching
Ning, Hailong
Zheng, Xiangtao
Lu, Xiaoqiang
Yuan, Yuan
IEEE TRANSACTIONS ON MULTIMEDIA, 2022, 24 : 1763 - 1774
[5] Cross-modal Representation Learning with Nonlinear Dimensionality Reduction
Kaya, Semih
Vural, Elif
2019 27TH SIGNAL PROCESSING AND COMMUNICATIONS APPLICATIONS CONFERENCE (SIU), 2019,
[6] Learning Cross-Modal Aligned Representation With Graph Embedding
Zhang, Youcai
Cao, Jiayan
Gu, Xiaodong
IEEE ACCESS, 2018, 6 : 77321 - 77333
[7] Enhanced Multimodal Representation Learning with Cross-modal KD
Chen, Mengxi
Xing, Linyu
Wang, Yu
Zhang, Ya
2023 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2023, : 11766 - 11775
[8] Towards Cross-Modal Causal Structure and Representation Learning
Mao, Haiyi
Liu, Hongfu
Dou, Jason Xiaotian
Benos, Panayiotis V.
MACHINE LEARNING FOR HEALTH, VOL 193, 2022, 193 : 120 - 140
[9] Variational Deep Representation Learning for Cross-Modal Retrieval
Yang, Chen
Deng, Zongyong
Li, Tianyu
Liu, Hao
Liu, Libo
PATTERN RECOGNITION AND COMPUTER VISION, PRCV 2021, PT II, 2021, 13020 : 498 - 510
[10] Deep Understanding of Cooking Procedure for Cross-modal Recipe Retrieval
Chen, Jing-Jing
Ngo, Chong-Wah
Feng, Fu-Li
Chua, Tat-Seng
PROCEEDINGS OF THE 2018 ACM MULTIMEDIA CONFERENCE (MM'18), 2018, : 1020 - 1028

← 1 2 3 4 5 →