A Survey of Cross-Modal Visual Content Generation

被引：3

作者：

Nazarieh, Fatemeh ^{[1
,2
]}

Feng, Zhenhua ^{[1
,2
]}

Awais, Muhammad ^{[3
]}

Wang, Wenwu ^{[3
]}

Kittler, Josef ^{[3
]}

机构：

[1] Univ Surrey, Sch Comp Sci & Elect Engn, Guildford GU2 7XH, England

[2] Univ Surrey, Nat Inspired Comp & Engn NICE Res Grp, Guildford GU2 7XH, England

[3] Univ Surrey, Ctr Vis Speech & Signal Proc, Guildford GU2 7XH, England

来源：

IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY | 2024年 / 34卷 / 08期

基金：

英国工程与自然科学研究理事会;

关键词：

Visualization; Surveys; Data models; Task analysis; Measurement; Training; Generative adversarial networks; Generative models; cross-modal; visual content generation;

D O I：

10.1109/TCSVT.2024.3351601

中图分类号：

TM [电工技术]; TN [电子技术、通信技术];

学科分类号：

0808 ; 0809 ;

摘要：

Cross-modal content generation has become very popular in recent years. To generate high-quality and realistic content, a variety of methods have been proposed. Among these approaches, visual content generation has attracted significant attention from academia and industry due to its vast potential in various applications. This survey provides an overview of recent advances in visual content generation conditioned on other modalities, such as text, audio, speech, and music, with a focus on their key contributions to the community. In addition, we summarize the existing publicly available datasets that can be used for training and benchmarking cross-modal visual content generation models. We provide an in-depth exploration of the datasets used for audio-to-visual content generation, filling a gap in the existing literature. Various evaluation metrics are also introduced along with the datasets. Furthermore, we discuss the challenges and limitations encountered in the area, such as modality alignment and semantic coherence. Last, we outline possible future directions for synthesizing visual content from other modalities including the exploration of new modalities, and the development of multi-task multi-modal networks. This survey serves as a resource for researchers interested in quickly gaining insights into this burgeoning field.

引用

页码：6814 / 6832

页数：19

共 50 条

[41] Visual, haptic and cross-modal recognition of objects and scenes
Woods, AT
Newell, FN
JOURNAL OF PHYSIOLOGY-PARIS, 2004, 98 (1-3) : 147 - 159
[42] Cross-modal integration of auditory and visual motion signals
Meyer, GF
Wuerger, SM
NEUROREPORT, 2001, 12 (11) : 2557 - 2560
[43] Cross-modal transfer in visual and nonvisual cues in bumblebees
Harrap, Michael J. M.
Lawson, David A.
Whitney, Heather M.
Rands, Sean A.
JOURNAL OF COMPARATIVE PHYSIOLOGY A-NEUROETHOLOGY SENSORY NEURAL AND BEHAVIORAL PHYSIOLOGY, 2019, 205 (03): : 427 - 437
[44] Cross-modal correspondence between visual symmetry and taste
Turoman, Nora
Spence, Charles
PERCEPTION, 2016, 45 : 329 - 329
[45] Utilizing visual attention for cross-modal coreference interpretation
Byron, D
Mampilly, T
Sharma, V
Xu, TF
MODELING AND USING CONTEXT, PROCEEDINGS, 2005, 3554 : 83 - 96
[46] TACTILE-VISUAL PERCEPTION AND CROSS-MODAL TRANSFER
SCHNEIDERMAN, DZ
PERCEPTUAL AND MOTOR SKILLS, 1971, 32 (01) : 159 - +
[47] Cross-modal priming in the central and peripheral visual field
Tindell, AJ
del Pino, N
Gazzaniga, MS
Wessinger, CM
JOURNAL OF COGNITIVE NEUROSCIENCE, 2000, : 74 - 74
[48] Visual localization ability influences cross-modal bias
Hairston, WD
Wallace, MT
Vaughan, JW
Stein, BE
Norris, JL
Schirillo, JA
JOURNAL OF COGNITIVE NEUROSCIENCE, 2003, 15 (01) : 20 - 29
[49] CROSS-MODAL LEARNING OF AUDITORY AND VISUAL RHYTHMS IN MAN
COLE, M
ETTLINGER, G
CHOROVER, S
BULLETIN OF THE BRITISH PSYCHOLOGICAL SOCIETY, 1961, (44): : A13 - A13
[50] HEMISPHERIC ASYMMETRIES IN CROSS-MODAL AND VISUAL SPATIAL MATCHING
DUDA, PD
ADAMS, JO
JOURNAL OF CLINICAL AND EXPERIMENTAL NEUROPSYCHOLOGY, 1985, 7 (06) : 636 - 636

← 1 2 3 4 5 →