GEO-SEQ2SEQ: Twitter User Geolocation on Noisy Data through Sequence to Sequence Learning

被引:0
|
作者
Zhang, Jingyu [1 ]
DeLucia, Alexandra [1 ]
Zhang, Chenyu [2 ]
Dredze, Mark [1 ]
机构
[1] Johns Hopkins Univ, Dept Comp Sci, Baltimore, MD 21218 USA
[2] Stanford Univ, Dept Comp Sci, Stanford, CA 94305 USA
来源
FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, ACL 2023 | 2023年
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Location information can support social media analyses by providing geographic context. Some of the most accurate and popular Twitter geolocation systems rely on rule-based methods that examine the user-provided profile location, which fail to handle informal or noisy location names. We propose GEO-SEQ2SEQ, a sequence-to-sequence (seq2seq) model for Twitter user geolocation that rewrites noisy, multilingual user-provided location strings into structured English location names. We train our system on tens of millions of multilingual location string and geotagged-tweet pairs. Compared to leading methods, our model vastly increases coverage (i.e., the number of users we can geolocate) while achieving comparable or superior accuracy. Our error analysis reveals that constrained decoding helps the model produce valid locations according to a location database. Finally, we measure biases across language, country of origin, and time to evaluate fairness, and find that while our model can generalize well to unseen temporal data, performance does vary by language and country.
引用
收藏
页码:4778 / 4794
页数:17
相关论文
共 50 条
  • [31] Baiting out a full length sequence from unmapped RNA-seq data
    Li, Dongwei
    Huang, Qitong
    Huang, Lei
    Wen, Jikai
    Luo, Jing
    Li, Qing
    Peng, Yanling
    Zhang, Yubo
    BMC GENOMICS, 2021, 22 (01)
  • [32] Baiting out a full length sequence from unmapped RNA-seq data
    Dongwei Li
    Qitong Huang
    Lei Huang
    Jikai Wen
    Jing Luo
    Qing Li
    Yanling Peng
    Yubo Zhang
    BMC Genomics, 22
  • [33] Ord2Seq: Regarding Ordinal Regression as Label Sequence Prediction
    Wang, Jinhong
    Cheng, Yi
    Chen, Jintai
    Chen, TingTing
    Chen, Danny
    Wu, Jian
    2023 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION, ICCV, 2023, : 5842 - 5852
  • [34] Seq2C: from sequence to copy number for cancer samples
    Lai, Zhongwu
    Markovets, Aleksandra
    Dry, Jonathan
    CANCER RESEARCH, 2016, 76
  • [35] Seq2Struct: a resource for establishing sequence-structure links
    Via, A
    Zanzoni, A
    Helmer-Citterich, M
    BIOINFORMATICS, 2005, 21 (04) : 551 - 553
  • [36] GSSF: A Generative Sequence Similarity Function Based on a Seq2Seq Model for Clustering Online Handwritten Mathematical Answers
    Huy Quang Ung
    Cuong Tuan Nguyen
    Hung Tuan Nguyen
    Nakagawa, Masaki
    DOCUMENT ANALYSIS AND RECOGNITION - ICDAR 2021, PT II, 2021, 12822 : 145 - 159
  • [37] Learning Tn5 Sequence Bias from ATAC-seq on Naked Chromatin
    Ansari, Meshal
    Fischer, David S.
    Theis, Fabian J.
    ARTIFICIAL NEURAL NETWORKS AND MACHINE LEARNING, ICANN 2020, PT I, 2020, 12396 : 105 - 114
  • [38] ISVASE: identification of sequence variant associated with splicing event using RNA-seq data
    Aljohi, Hasan Awad
    Liu, Wanfei
    Lin, Qiang
    Yu, Jun
    Hu, Songnian
    BMC BIOINFORMATICS, 2017, 18
  • [39] Genome Sequence of a Potential New Benyvirus Isolated from Mango RNA-seq Data
    Sela, Noa
    Luria, Neta
    Yaari, Mor
    Prusky, Dov
    Dombrovsky, Aviv
    GENOME ANNOUNCEMENTS, 2016, 4 (06)
  • [40] ISVASE: identification of sequence variant associated with splicing event using RNA-seq data
    Hasan Awad Aljohi
    Wanfei Liu
    Qiang Lin
    Jun Yu
    Songnian Hu
    BMC Bioinformatics, 18