共 40 条
[1]
Alberti C, 2019, 2019 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING AND THE 9TH INTERNATIONAL JOINT CONFERENCE ON NATURAL LANGUAGE PROCESSING (EMNLP-IJCNLP 2019), P2131
[2]
Bottom-Up and Top-Down Attention for Image Captioning and Visual Question Answering
[J].
2018 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR),
2018,
:6077-6086
[3]
VQA: Visual Question Answering
[J].
2015 IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV),
2015,
:2425-2433
[4]
Bouraoui Z, 2020, AAAI CONF ARTIF INTE, V34, P7456
[5]
Chen YC, 2020, Arxiv, DOI [arXiv:1909.11740, DOI 10.48550/ARXIV.1909.11740]
[6]
Clark K, 2019, Arxiv, DOI [arXiv:1906.04341, DOI 10.48550/ARXIV.1906.04341]
[7]
Conneau A, 2018, Arxiv, DOI arXiv:1803.05449
[8]
Devlin J, 2019, 2019 CONFERENCE OF THE NORTH AMERICAN CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS: HUMAN LANGUAGE TECHNOLOGIES (NAACL HLT 2019), VOL. 1, P4171
[9]
Gan Z, 2020, Arxiv, DOI arXiv:2006.06195

