Cross-modal Representation Learning for Understanding Manufacturing Procedure

被引：0

作者：

Hashimoto, Atsushi ^{[1
]}

Nishimura, Taichi ^{[2
]}

Ushiku, Yoshitaka ^{[1
]}

Kameko, Hirotaka ^{[2
]}

Mori, Shinsuke ^{[2
]}

机构：

[1] OMRON SINIC X Corp, Tokyo, Japan

[2] Kyoto Univ, Kyoto, Japan

来源：

CROSS-CULTURAL DESIGN-APPLICATIONS IN LEARNING, ARTS, CULTURAL HERITAGE, CREATIVE INDUSTRIES, AND VIRTUAL REALITY, CCD 2022, PT II | 2022年 / 13312卷

关键词：

Procedural text generation; Image captioning; Video captioning; Understanding manufacturing activity;

D O I：

10.1007/978-3-031-06047-2_4

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

Assembling, biochemical experiments, and cooking are representatives that create a new value from multiple materials through multiple processes. If a machine can computationally understand such manufacturing tasks, we will have various options of human-machine collaboration on those tasks, from video scene retrieval to robots that act for on behalf of humans. As one form of such understanding, this paper introduces a series of our studies that aim to associate visual observation of the processes and the procedural texts that instruct such processes. In those studies, captioning is the key task, where input is image sequence or video clips and our methods are still state-of-the-arts. Through the explanation of such techniques, we overview machine learning technologies that deal with the contextual information of manufacturing tasks.

引用

页码：44 / 57

页数：14

共 50 条

[21] Robust Cross-Modal Representation Learning with Progressive Self-Distillation
Andonian, Alex
Chen, Shixing
Hamid, Raffay
2022 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2022), 2022, : 16409 - 16420
[22] Cross-Modal Guided Visual Representation Learning for Social Image Retrieval
Guan, Ziyu
Zhao, Wanqing
Liu, Hongmin
Nakashima, Yuta
Noboru, Babaguchi
He, Xiaofei
IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2025, 47 (03) : 2186 - 2198
[23] CrossFormer: Cross-Modal Representation Learning via Heterogeneous Graph Transformer
Liang, Xiao
Yang, Erkun
Deng, Cheng
Yang, Yanhua
ACM TRANSACTIONS ON MULTIMEDIA COMPUTING COMMUNICATIONS AND APPLICATIONS, 2024, 20 (12)
[24] Cross-modal Representation Learning for Zero-shot Action Recognition
Lin, Chung-Ching
Lin, Kevin
Wang, Lijuan
Liu, Zicheng
Li, Linjie
2022 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2022), 2022, : 19946 - 19956
[25] Infant cross-modal learning
Chow, Hiu Mei
Tsui, Angeline Sin-Mei
Ma, Yuen Ki
Yat, Mei Ying
Tseng, Chia-huei
I-PERCEPTION, 2014, 5 (04): : 463 - 463
[26] IMPROVING CROSS-MODAL UNDERSTANDING IN VISUAL DIALOG VIA CONTRASTIVE LEARNING
Chen, Feilong
Chen, Xiuyi
Xu, Shuang
Xu, Bo
2022 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2022, : 7937 - 7941
[27] Deep discriminative image feature learning for cross-modal semantics understanding
Zhang, Hong
Liu, Fangming
Li, Bo
Zhang, Ling
Zhu, Yihai
Wang, Ziwei
KNOWLEDGE-BASED SYSTEMS, 2021, 216
[28] Contrastive Cross-Modal Representation Learning Based Active Learning for Visual Question Answer
Zhang B.-C.
Li L.
Zha Z.-J.
Huang Q.-M.
Jisuanji Xuebao/Chinese Journal of Computers, 2022, 45 (08): : 1730 - 1745
[29] Deep Learning and Shared Representation Space Learning Based Cross-Modal Multimedia Retrieval
Zou, Hui
Du, Ji-Xiang
Zhai, Chuan-Min
Wang, Jing
INTELLIGENT COMPUTING THEORIES AND APPLICATION, ICIC 2016, PT II, 2016, 9772 : 322 - 331
[30] Cross-modal representation of identity in the primate hippocampus
Tyree, Timothy J.
Metke, Michael
Miller, Cory T.
SCIENCE, 2023, 382 (6669) : 417 - 423

← 1 2 3 4 5 →