共 50 条
[1]
Akula Arjun, 2020, PROCEED INGS 58 AN, P6555, DOI 10.18653/v1/2020.acl-main.586
[2]
Bottom-Up and Top-Down Attention for Image Captioning and Visual Question Answering
[J].
2018 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR),
2018,
:6077-6086
[3]
[Anonymous], 2015, PROC ADVNEURAL INF P
[4]
VQA: Visual Question Answering
[J].
2015 IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV),
2015,
:2425-2433
[5]
Brown TB, 2020, ADV NEUR IN, V33
[6]
Behind the Scene: Revealing the Secrets of Pre-trained Vision-and-Language Models
[J].
COMPUTER VISION - ECCV 2020, PT VI,
2020, 12351
:565-580
[7]
Chen X, 2015, Microsoft coco captions: Data collection and evaluation server," in, V1504, P325
[8]
UNITER: UNiversal Image-TExt Representation Learning
[J].
COMPUTER VISION - ECCV 2020, PT XXX,
2020, 12375
:104-120
[9]
Devlin J, 2019, 2019 CONFERENCE OF THE NORTH AMERICAN CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS: HUMAN LANGUAGE TECHNOLOGIES (NAACL HLT 2019), VOL. 1, P4171
[10]
Duygulu P, 2002, LECT NOTES COMPUT SC, V2353, P97

