Image and audio caps: automated captioning of background sounds and images using deep learning

被引:32
|
作者
Poongodi, M. [1 ]
Hamdi, Mounir [1 ]
Wang, Huihui [2 ]
机构
[1] Hamad Bin Khalifa Univ, Coll Sci & Engn, Dept Informat & Comp Technol, Doha, Qatar
[2] St Bonaventure Univ, CyberSecur Program, St Bonaventure, NY 14778 USA
关键词
Computer vision; Image to caption; Scene recognition; Image analysis; Social networks; IEEE-802.11; WLANS; ALGORITHM; CONTENTION; NETWORKING; SYSTEM;
D O I
10.1007/s00530-022-00902-0
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Image recognition based on computers is something human beings have been working on for many years. It is one of the most difficult tasks in the field of computer science, and improvements to this system are made when we speak. In this paper, we propose a methodology to automatically propose an appropriate title and add a specific sound to the image. Two models have been extensively trained and combined to achieve this effect. Sounds are recommended based on the image scene and the headings are generated using a combination of natural language processing and state-of-the-art computer vision models. A Top 5 accuracy of 67% and a Top 1 accuracy of 53% have been achieved. It is also worth mentioning that this is also the first model of its kind to make this forecast.
引用
收藏
页码:2951 / 2959
页数:9
相关论文
共 50 条
  • [41] Advanced Generative Deep Learning Techniques for Accurate Captioning of Images
    Chandar, J. Navin
    Kavitha, G.
    WIRELESS PERSONAL COMMUNICATIONS, 2024,
  • [42] Hiding Audio in Images: A Deep Learning Approach
    Gandikota, Rohit
    Mishra, Deepak
    PATTERN RECOGNITION AND MACHINE INTELLIGENCE, PREMI 2019, PT II, 2019, 11942 : 389 - 399
  • [43] Automated Anomaly Detection in Histology Images using Deep Learning
    Shelton, Lillie
    Soans, Rajath
    Shah, Tosha
    Forest, Thomas
    Janardhan, Kyathanahalli
    Napolitano, Michael
    Gonzalez, Raymond
    Carlson, Grady
    Shah, Jyoti K.
    Chen, Antong
    DIGITAL AND COMPUTATIONAL PATHOLOGY, MEDICAL IMAGING 2024, 2024, 12933
  • [44] Automated Labeling of Electron Microscopy Images Using Deep Learning
    Weber, Gunther H.
    Ophus, Colin
    Ramakrishnan, Lavanya
    PROCEEDINGS OF 2018 IEEE/ACM MACHINE LEARNING IN HPC ENVIRONMENTS (MLHPC 2018), 2018, : 26 - 36
  • [45] RETRACTED: Medical Image Captioning Using Optimized Deep Learning Model (Retracted Article)
    Singh, Arjun
    Raguru, Jaya Krishna
    Prasad, Gaurav
    Chauhan, Surbhi
    Tiwari, Pradeep Kumar
    Zaguia, Atef
    Ullah, Mohammad Aman
    COMPUTATIONAL INTELLIGENCE AND NEUROSCIENCE, 2022, 2022
  • [46] Application of human computing in image captioning under deep learning
    Zhihong Zeng
    Xiaowen Li
    Microsystem Technologies, 2021, 27 : 1687 - 1692
  • [47] Deep Learning Image Captioning in Construction Management: A Feasibility Study
    Xiao, Bo
    Wang, Yiheng
    Kang, Shih-Chung
    JOURNAL OF CONSTRUCTION ENGINEERING AND MANAGEMENT, 2022, 148 (07)
  • [48] A Deep Learning Approach for Nepali Image Captioning and Speech Generation
    Sharma, Sagar
    Chapagain, Samikshya
    Acharya, Sachin
    Panday, Sanjeeb Prasad
    INTERNATIONAL JOURNAL OF ADVANCED COMPUTER SCIENCE AND APPLICATIONS, 2025, 16 (02) : 1258 - 1264
  • [49] AraCap: A hybrid deep learning architecture for Arabic Image Captioning
    Afyouni, Imad
    Azhar, Imtinan
    Elnagar, Ashraf
    AI IN COMPUTATIONAL LINGUISTICS, 2021, 189 : 382 - 389
  • [50] Application of human computing in image captioning under deep learning
    Zeng, Zhihong
    Li, Xiaowen
    MICROSYSTEM TECHNOLOGIES-MICRO-AND NANOSYSTEMS-INFORMATION STORAGE AND PROCESSING SYSTEMS, 2021, 27 (04): : 1687 - 1692