Learning Multi-context Aware Location Representations from Large-scale Geotagged Images

被引：4

作者：

Yin, Yifang ^{[1
]}

Zhang, Ying ^{[2
]}

Liu, Zhenguang ^{[3
]}

Liang, Yuxuan ^{[1
]}

Wang, Sheng ^{[1
,4
]}

Shah, Rajiv Ratn ^{[5
]}

Zimmermann, Roger ^{[1
]}

机构：

[1] Natl Univ Singapore, Singapore, Singapore

[2] Northwestern Polytech Univ, Xian, Peoples R China

[3] Zhejiang Gongshang Univ, Hangzhou, Peoples R China

[4] Alibaba Grp, Singapore, Singapore

[5] IIIT Delhi, Delhi, India

来源：

PROCEEDINGS OF THE 29TH ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, MM 2021 | 2021年

关键词：

Location representations; pre-trained neural networks; attentionbased; fusion; geo-aware applications; FEATURES;

D O I：

10.1145/3474085.3475268

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

With the ubiquity of sensor-equipped smartphones, it is common to have multimedia documents uploaded to the Internet that have GPS coordinates associated with them. Utilizing such geotags as an additional feature is intuitively appealing for improving the performance of location-aware applications. However, raw GPS coordinates are fine-grained location indicators without any semantic information. Existing methods on geotag semantic encoding mostly extract hand-crafted, application-specific location representations that heavily depend on large-scale supplementary data and thus cannot perform efficiently on mobile devices. In this paper, we present a machine learning based approach, termed GPS2Vec+, which learns rich location representations by capitalizing on the world-wide geotagged images. Once trained, the model has no dependence on the auxiliary data anymore so it encodes geotags highly efficiently by inference. We extract visual and semantic knowledge from image content and user-generated tags, and transfer the information into locations by using geotagged images as a bridge. To adapt to different application domains, we further present an attention-based fusion framework that estimates the importance of the learnt location representations under different contexts for effective feature fusion. Our location representations yield significant performance improvements over the state-of-the-art geotag encoding methods on image classification and venue annotation.

引用

页码：899 / 907

页数：9

共 50 条

[31] Learning to Associate Words and Images Using a Large-scale Graph
Ya, Heqing
Sun, Haonan
Helt, Jeffrey
Lee, Tai Sing
2017 14TH CONFERENCE ON COMPUTER AND ROBOT VISION (CRV 2017), 2017, : 16 - 23
[32] Revisiting Document Representations for Large-Scale Zero-Shot Learning
Kil, Jihyung
Chao, Wei-Lun
2021 CONFERENCE OF THE NORTH AMERICAN CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS: HUMAN LANGUAGE TECHNOLOGIES (NAACL-HLT 2021), 2021, : 3117 - 3128
[33] Learning improvement representations to accelerate evolutionary large-scale multiobjective optimization
Liu, Songbai
Wang, Zeyi
Ma, Lijia
Chen, Jianyong
Zhou, Xun
INFORMATION SCIENCES, 2025, 705
[34] RoboNet: Large-Scale Multi-Robot Learning
Dasari, Sudeep
Ebert, Frederik
Tian, Stephen
Nair, Suraj
Bucher, Bernadette
Schmeckpeper, Karl
Singh, Siddharth
Levine, Sergey
Finn, Chelsea
CONFERENCE ON ROBOT LEARNING, VOL 100, 2019, 100
[35] Large-scale multi-label classification using unknown streaming images: Large-scale multi-label classification using unknown streaming images
Zhang Y.
Wang Y.
Liu X.-Y.
Mi S.
Zhang M.-L.
Pattern Recognition, 2020, 99
[36] Mining Concept Sequences from Large-Scale Search Logs for Context-Aware Query Suggestion
Liao, Zhen
Jiang, Daxin
Chen, Enhong
Pei, Jian
Cao, Huanhuan
Li, Hang
ACM TRANSACTIONS ON INTELLIGENT SYSTEMS AND TECHNOLOGY, 2012, 3 (01)
[37] Topology-aware Sparse Allreduce for Large-scale Deep Learning
Thao Nguyen Truong
Wahib, Mohamed
Takano, Ryousei
2019 IEEE 38TH INTERNATIONAL PERFORMANCE COMPUTING AND COMMUNICATIONS CONFERENCE (IPCCC), 2019,
[38] A Multi-Context Aware Human Mobility Prediction Model Based on Motif-Preserving Travel Preference Learning
Chen, Yong
Xie, Ningke
Xu, Haoge
Chen, Xiqun
Lee, Der-Horng
IEEE TRANSACTIONS ON INTELLIGENT TRANSPORTATION SYSTEMS, 2024, 25 (02) : 2139 - 2152
[39] Learning a gaze estimator with neighbor selection from large-scale synthetic eye images
Wang, Yafei
Zhao, Tongtong
Ding, Xueyan
Peng, Jinjia
Bian, Jiming
Fu, Xianping
KNOWLEDGE-BASED SYSTEMS, 2018, 139 : 41 - 49
[40] Learning Visual Balance from Large-scale Datasets of Aesthetically Highly Rated Images
Jahanian, Ali
Vishwanathan, S. V. N.
Allebach, Jan P.
HUMAN VISION AND ELECTRONIC IMAGING XX, 2015, 9394

← 1 2 3 4 5 →