A Survey of Cross-Modal Visual Content Generation

被引:3
|
作者
Nazarieh, Fatemeh [1 ,2 ]
Feng, Zhenhua [1 ,2 ]
Awais, Muhammad [3 ]
Wang, Wenwu [3 ]
Kittler, Josef [3 ]
机构
[1] Univ Surrey, Sch Comp Sci & Elect Engn, Guildford GU2 7XH, England
[2] Univ Surrey, Nat Inspired Comp & Engn NICE Res Grp, Guildford GU2 7XH, England
[3] Univ Surrey, Ctr Vis Speech & Signal Proc, Guildford GU2 7XH, England
基金
英国工程与自然科学研究理事会;
关键词
Visualization; Surveys; Data models; Task analysis; Measurement; Training; Generative adversarial networks; Generative models; cross-modal; visual content generation;
D O I
10.1109/TCSVT.2024.3351601
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
Cross-modal content generation has become very popular in recent years. To generate high-quality and realistic content, a variety of methods have been proposed. Among these approaches, visual content generation has attracted significant attention from academia and industry due to its vast potential in various applications. This survey provides an overview of recent advances in visual content generation conditioned on other modalities, such as text, audio, speech, and music, with a focus on their key contributions to the community. In addition, we summarize the existing publicly available datasets that can be used for training and benchmarking cross-modal visual content generation models. We provide an in-depth exploration of the datasets used for audio-to-visual content generation, filling a gap in the existing literature. Various evaluation metrics are also introduced along with the datasets. Furthermore, we discuss the challenges and limitations encountered in the area, such as modality alignment and semantic coherence. Last, we outline possible future directions for synthesizing visual content from other modalities including the exploration of new modalities, and the development of multi-task multi-modal networks. This survey serves as a resource for researchers interested in quickly gaining insights into this burgeoning field.
引用
收藏
页码:6814 / 6832
页数:19
相关论文
共 50 条
  • [41] Visual, haptic and cross-modal recognition of objects and scenes
    Woods, AT
    Newell, FN
    JOURNAL OF PHYSIOLOGY-PARIS, 2004, 98 (1-3) : 147 - 159
  • [42] Cross-modal integration of auditory and visual motion signals
    Meyer, GF
    Wuerger, SM
    NEUROREPORT, 2001, 12 (11) : 2557 - 2560
  • [43] Cross-modal transfer in visual and nonvisual cues in bumblebees
    Harrap, Michael J. M.
    Lawson, David A.
    Whitney, Heather M.
    Rands, Sean A.
    JOURNAL OF COMPARATIVE PHYSIOLOGY A-NEUROETHOLOGY SENSORY NEURAL AND BEHAVIORAL PHYSIOLOGY, 2019, 205 (03): : 427 - 437
  • [44] Cross-modal correspondence between visual symmetry and taste
    Turoman, Nora
    Spence, Charles
    PERCEPTION, 2016, 45 : 329 - 329
  • [45] Utilizing visual attention for cross-modal coreference interpretation
    Byron, D
    Mampilly, T
    Sharma, V
    Xu, TF
    MODELING AND USING CONTEXT, PROCEEDINGS, 2005, 3554 : 83 - 96
  • [46] TACTILE-VISUAL PERCEPTION AND CROSS-MODAL TRANSFER
    SCHNEIDERMAN, DZ
    PERCEPTUAL AND MOTOR SKILLS, 1971, 32 (01) : 159 - +
  • [47] Cross-modal priming in the central and peripheral visual field
    Tindell, AJ
    del Pino, N
    Gazzaniga, MS
    Wessinger, CM
    JOURNAL OF COGNITIVE NEUROSCIENCE, 2000, : 74 - 74
  • [48] Visual localization ability influences cross-modal bias
    Hairston, WD
    Wallace, MT
    Vaughan, JW
    Stein, BE
    Norris, JL
    Schirillo, JA
    JOURNAL OF COGNITIVE NEUROSCIENCE, 2003, 15 (01) : 20 - 29
  • [49] CROSS-MODAL LEARNING OF AUDITORY AND VISUAL RHYTHMS IN MAN
    COLE, M
    ETTLINGER, G
    CHOROVER, S
    BULLETIN OF THE BRITISH PSYCHOLOGICAL SOCIETY, 1961, (44): : A13 - A13
  • [50] HEMISPHERIC ASYMMETRIES IN CROSS-MODAL AND VISUAL SPATIAL MATCHING
    DUDA, PD
    ADAMS, JO
    JOURNAL OF CLINICAL AND EXPERIMENTAL NEUROPSYCHOLOGY, 1985, 7 (06) : 636 - 636