Enhancing video temporal grounding with large language model-based data augmentation

被引:0
|
作者
Tian, Yun [1 ]
Guo, Xiaobo [1 ]
Wang, Jinsong [1 ]
Li, Bin [2 ]
机构
[1] Changchun Univ Sci & Technol, Sch Optoelect Engn, Changchun 130022, Peoples R China
[2] Chinese Acad Sci, Shenzhen Inst Adv Technol, Shenzhen 518055, Peoples R China
来源
JOURNAL OF SUPERCOMPUTING | 2025年 / 81卷 / 05期
关键词
Video temporal grounding; Large language model; Data augmentation; Video description; Semantic enrichment; ANNOTATION; QUALITY;
D O I
10.1007/s11227-025-07159-0
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
Given an untrimmed video and a natural language query, the task of video temporal grounding (VTG) aims to precisely identify the temporal segment in the video that semantically matches the query. Existing datasets for this task often provide natural language queries that are overly simplistic and manually annotated, which lack sufficient semantic richness to fully capture the video's content. This limitation hinders the model's ability to comprehend complex semantic scenarios and degrades its overall performance. To address these challenges, we introduce a novel, low-cost, large language model-based data augmentation method, that can enrich the original samples and expand the dataset without requiring external data. We propose a fine-grained image captioning module with a noise filter to extract unexploited information from videos. Additionally, we design a hierarchical semantic prompting framework to guide GPT-3.5 in producing semantically rich and contextually coherent natural language queries. Our method outperforms the SOTA method MRTNet when combined with 2D-TAN and VSLNet across three public VTG datasets, particularly excelling in complex semantics and long-duration segment localization.
引用
收藏
页数:31
相关论文
共 50 条
  • [21] Model-Based Video Compression For Real World Data
    Feller, Christian
    Wuenschmann, Juergen
    Wagner, Raimar
    Rothermel, Albrecht
    2013 IEEE THIRD INTERNATIONAL CONFERENCE ON CONSUMER ELECTRONICS - BERLIN (ICCE-BERLIN), 2013,
  • [22] Data Augmentation and Large Language Model for Legal Case Retrieval and Entailment
    Minh-Quan Bui
    Dinh-Truong Do
    Nguyen-Khang Le
    Dieu-Hien Nguyen
    Khac-Vu-Hiep Nguyen
    Trang Pham Ngoc Anh
    Minh Le Nguyen
    The Review of Socionetwork Strategies, 2024, 18 : 49 - 74
  • [23] Diffusion Model-Based Data Augmentation for Lung Ultrasound Classification with Limited Data
    Zhang, Xiaohui
    Gangopadhyay, Ahana
    Chang, Hsi-Ming
    Soni, Ravi
    MACHINE LEARNING FOR HEALTH, ML4H, VOL 225, 2023, 225 : 664 - 676
  • [24] Deimos: A Model-Based NoSQL Data Generation Language
    Hernandez Chillon, Alberto
    Sevilla Ruiz, Diego
    Garcia Molina, Jesus
    ADVANCES IN CONCEPTUAL MODELING, ER 2020, 2020, 12584 : 151 - 161
  • [25] Data augmentation based on large language models for radiological report classification
    Collado-Montanez, Jaime
    Martin-Valdivia, Maria-Teresa
    Martinez-Camara, Eugenio
    KNOWLEDGE-BASED SYSTEMS, 2025, 308
  • [26] Video Temporal Grounding with Multi-Model Collaborative Learning
    Tian, Yun
    Guo, Xiaobo
    Wang, Jinsong
    Li, Bin
    Zhou, Shoujun
    APPLIED SCIENCES-BASEL, 2025, 15 (06):
  • [27] Model Optimization for Model-Based Compression of Real World Video Data
    Feller, Christian
    Wuenschmann, Juergen
    Wagner, Raimar
    Rothermel, Albrecht
    2014 IEEE FOURTH INTERNATIONAL CONFERENCE ON CONSUMER ELECTRONICS BERLIN (ICCE-BERLIN), 2014,
  • [28] Large Language Model-Based Critical Care Big Data Deployment and Extraction: Descriptive Analysis
    Yang, Zhongbao
    Xu, Shan-Shan
    Liu, Xiaozhu
    Xu, Ningyuan
    Chen, Yuqing
    Wang, Shuya
    Miao, Ming-Yue
    Hou, Mengxue
    Liu, Shuai
    Zhou, Yi-Min
    Zhou, Jian-Xin
    Zhang, Linlin
    JMIR MEDICAL INFORMATICS, 2025, 13
  • [29] Language Model Data Augmentation Based on Text Domain Transfer
    Ogawa, Atsunori
    Tawara, Naohiro
    Delcroix, Marc
    INTERSPEECH 2020, 2020, : 4926 - 4930
  • [30] PiTe: Pixel-Temporal Alignment for Large Video-Language Model
    Liu, Yang
    Ding, Pengxiang
    Huang, Siteng
    Zhang, Min
    Zhao, Han
    Wang, Donglin
    COMPUTER VISION - ECCV 2024, PT V, 2025, 15063 : 160 - 176