GEO-SEQ2SEQ: Twitter User Geolocation on Noisy Data through Sequence to Sequence Learning

被引:0
|
作者
Zhang, Jingyu [1 ]
DeLucia, Alexandra [1 ]
Zhang, Chenyu [2 ]
Dredze, Mark [1 ]
机构
[1] Johns Hopkins Univ, Dept Comp Sci, Baltimore, MD 21218 USA
[2] Stanford Univ, Dept Comp Sci, Stanford, CA 94305 USA
来源
FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, ACL 2023 | 2023年
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Location information can support social media analyses by providing geographic context. Some of the most accurate and popular Twitter geolocation systems rely on rule-based methods that examine the user-provided profile location, which fail to handle informal or noisy location names. We propose GEO-SEQ2SEQ, a sequence-to-sequence (seq2seq) model for Twitter user geolocation that rewrites noisy, multilingual user-provided location strings into structured English location names. We train our system on tens of millions of multilingual location string and geotagged-tweet pairs. Compared to leading methods, our model vastly increases coverage (i.e., the number of users we can geolocate) while achieving comparable or superior accuracy. Our error analysis reveals that constrained decoding helps the model produce valid locations according to a location database. Finally, we measure biases across language, country of origin, and time to evaluate fairness, and find that while our model can generalize well to unseen temporal data, performance does vary by language and country.
引用
收藏
页码:4778 / 4794
页数:17
相关论文
共 50 条
  • [41] Seq2Emo: A Sequence to Multi-Label Emotion Classification Model
    Huang, Chenyang
    Trabelsi, Amine
    Qin, Xuebin
    Farruque, Nawshad
    Mou, Lili
    Zaiane, Osmar
    2021 CONFERENCE OF THE NORTH AMERICAN CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS: HUMAN LANGUAGE TECHNOLOGIES (NAACL-HLT 2021), 2021, : 4717 - 4724
  • [42] Accelerating Deep Learning based Identification of Chromatin Accessibility from noisy ATAC-seq Data
    Chaudhary, Narendra
    Misra, Sanchit
    Kalamkar, Dhiraj
    Heinecke, Alexander
    Georganas, Evangelos
    Ziv, Barukh
    Adelman, Menachem
    Kaul, Bharat
    2022 IEEE 36TH INTERNATIONAL PARALLEL AND DISTRIBUTED PROCESSING SYMPOSIUM WORKSHOPS (IPDPSW 2022), 2022, : 176 - 185
  • [43] SD-seq2seq: A Deep Learning Model for Bus Bunching Prediction Based on Smart Card Data
    Gong, Zengyang
    Du, Bo
    Liu, Zhidan
    Zeng, Wei
    Perez, Pascal
    Wu, Kaishun
    2020 29TH INTERNATIONAL CONFERENCE ON COMPUTER COMMUNICATIONS AND NETWORKS (ICCCN 2020), 2020,
  • [44] Viewing Channel as Sequence Rather Than Image: A 2-D Seq2Seq Approach for Efficient MIMO-OFDM CSI Feedback
    Chen, Zirui
    Zhang, Zhaoyang
    Xiao, Zhuoran
    Yang, Zhaohui
    Wong, Kai-Kit
    IEEE TRANSACTIONS ON WIRELESS COMMUNICATIONS, 2023, 22 (11) : 7393 - 7407
  • [45] Integration of ChIP-seq and machine learning reveals enhancers and a predictive regulatory sequence vocabulary in melanocytes
    Gorkin, David U.
    Lee, Dongwon
    Reed, Xylena
    Fletez-Brant, Christopher
    Bessling, Seneca L.
    Loftus, Stacie K.
    Beer, Michael A.
    Pavan, William J.
    McCallion, Andrew S.
    GENOME RESEARCH, 2012, 22 (11) : 2290 - 2301
  • [46] Protein Identification Using Customized Protein Sequence Databases Derived from RNA-Seq Data
    Wang, Xiaojing
    Slebos, Robbert J. C.
    Wang, Dong
    Halvey, Patrick J.
    Tabb, David L.
    Liebler, Daniel C.
    Zhang, Bing
    JOURNAL OF PROTEOME RESEARCH, 2012, 11 (02) : 1009 - 1017
  • [47] Sequence-specific bias correction for RNA-seq data using recurrent neural networks
    Zhang, Yao-zhong
    Yamaguchi, Rui
    Imoto, Seiya
    Miyano, Satoru
    BMC GENOMICS, 2017, 18
  • [48] Sequence-specific bias correction for RNA-seq data using recurrent neural networks
    Yao-zhong Zhang
    Rui Yamaguchi
    Seiya Imoto
    Satoru Miyano
    BMC Genomics, 18
  • [49] Optimally choosing PWM motif databases and sequence scanning approaches based on ChIP- seq data
    Dabrowski, Michal
    Dojer, Norbert
    Krystkowiak, Izabella
    Kaminska, Bozena
    Wilczynski, Bartek
    BMC BIOINFORMATICS, 2015, 16
  • [50] Estimates of allele-specific expression in Drosophila with a single genome sequence and RNA-seq data
    Quinn, Andrew
    Juneja, Punita
    Jiggins, Francis M.
    BIOINFORMATICS, 2014, 30 (18) : 2603 - 2610