Enhancing video temporal grounding with large language model-based data augmentation

被引:0
|
作者
Tian, Yun [1 ]
Guo, Xiaobo [1 ]
Wang, Jinsong [1 ]
Li, Bin [2 ]
机构
[1] Changchun Univ Sci & Technol, Sch Optoelect Engn, Changchun 130022, Peoples R China
[2] Chinese Acad Sci, Shenzhen Inst Adv Technol, Shenzhen 518055, Peoples R China
来源
JOURNAL OF SUPERCOMPUTING | 2025年 / 81卷 / 05期
关键词
Video temporal grounding; Large language model; Data augmentation; Video description; Semantic enrichment; ANNOTATION; QUALITY;
D O I
10.1007/s11227-025-07159-0
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
Given an untrimmed video and a natural language query, the task of video temporal grounding (VTG) aims to precisely identify the temporal segment in the video that semantically matches the query. Existing datasets for this task often provide natural language queries that are overly simplistic and manually annotated, which lack sufficient semantic richness to fully capture the video's content. This limitation hinders the model's ability to comprehend complex semantic scenarios and degrades its overall performance. To address these challenges, we introduce a novel, low-cost, large language model-based data augmentation method, that can enrich the original samples and expand the dataset without requiring external data. We propose a fine-grained image captioning module with a noise filter to extract unexploited information from videos. Additionally, we design a hierarchical semantic prompting framework to guide GPT-3.5 in producing semantically rich and contextually coherent natural language queries. Our method outperforms the SOTA method MRTNet when combined with 2D-TAN and VSLNet across three public VTG datasets, particularly excelling in complex semantics and long-duration segment localization.
引用
收藏
页数:31
相关论文
共 50 条
  • [31] ROMA: Reverse Model-Based Data Augmentation for Offline Reinforcement Learning
    Wei, Xiaochen
    Huang, Wenzhen
    Zhai, Ziming
    BIG DATA AND SECURITY, ICBDS 2023, PT I, 2024, 2099 : 178 - 193
  • [32] FairFlow: An Automated Approach to Model-Based Counterfactual Data Augmentation for NLP
    Tokpo, Ewoenam Kwaku
    Calders, Toon
    MACHINE LEARNING AND KNOWLEDGE DISCOVERY IN DATABASES: RESEARCH TRACK, PT VII, ECML PKDD 2024, 2024, 14947 : 160 - 176
  • [33] Fault Detection of Bearing by Resnet Classifier with Model-Based Data Augmentation
    Qian, Lu
    Pan, Qing
    Lv, Yaqiong
    Zhao, Xingwei
    MACHINES, 2022, 10 (07)
  • [34] Model-based data augmentation for user-independent fatigue estimation
    Jiang, Yanran
    Malliaras, Peter
    Chen, Bernard
    Kulic, Dana
    COMPUTERS IN BIOLOGY AND MEDICINE, 2021, 137
  • [35] Improved Model-based Learning with Data Augmentation for Quantitative Susceptibility Mapping
    Liu, Juan
    MEDICAL IMAGING WITH DEEP LEARNING, VOL 143, 2021, 143 : 431 - 450
  • [36] Large language model-based evolutionary optimizer: Reasoning with elitism
    Brahmachary, Shuvayan
    Joshi, Subodh M.
    Panda, Aniruddha
    Koneripalli, Kaushik
    Sagotra, Arun Kumar
    Patel, Harshil
    Sharma, Ankush
    Jagtap, Ameya D.
    Kalyanaraman, Kaushic
    NEUROCOMPUTING, 2025, 622
  • [37] User Behavior Simulation with Large Language Model-based Agents
    Wang, Lei
    Zhang, Jingsen
    Yang, Hao
    Chen, Zhi-yuan
    Tang, Jiakai
    Zhang, Zeyu
    Chen, Xu
    Lin, Yankai
    Sun, Hao
    Song, Ruihua
    Zhao, Xin
    Xu, Jun
    Dou, Zhicheng
    Wang, Jun
    Wen, Ji-rong
    ACM TRANSACTIONS ON INFORMATION SYSTEMS, 2025, 43 (02)
  • [38] Data augmentation and language model adaptation
    Janiszek, D
    De Mori, R
    Bechet, E
    2001 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOLS I-VI, PROCEEDINGS: VOL I: SPEECH PROCESSING 1; VOL II: SPEECH PROCESSING 2 IND TECHNOL TRACK DESIGN & IMPLEMENTATION OF SIGNAL PROCESSING SYSTEMS NEURALNETWORKS FOR SIGNAL PROCESSING; VOL III: IMAGE & MULTIDIMENSIONAL SIGNAL PROCESSING MULTIMEDIA SIGNAL PROCESSING - VOL IV: SIGNAL PROCESSING FOR COMMUNICATIONS; VOL V: SIGNAL PROCESSING EDUCATION SENSOR ARRAY & MULTICHANNEL SIGNAL PROCESSING AUDIO & ELECTROACOUSTICS; VOL VI: SIGNAL PROCESSING THEORY & METHODS STUDENT FORUM, 2001, : 549 - 552
  • [39] Speech de-identification data augmentation leveraging large language model
    Dhingra, Priyanshu
    Agrawal, Satyam
    Veerappan, Chandra Sekar
    Thi Nga Ho
    Chng, Eng Siong
    Tong, Rong
    2024 INTERNATIONAL CONFERENCE ON ASIAN LANGUAGE PROCESSING, IALP 2024, 2024, : 97 - 102
  • [40] Dynamic model-based clustering for spatio-temporal data
    Paci, Lucia
    Finazzi, Francesco
    STATISTICS AND COMPUTING, 2018, 28 (02) : 359 - 374