Understanding political polarization using language models: A dataset and method

被引:3
|
作者
Gode, Samiran [1 ]
Bare, Supreeth [1 ]
Raj, Bhiksha [1 ,2 ]
Yoo, Hyungon [1 ]
机构
[1] Carnegie Mellon Univ, Pittsburgh, PA 15213 USA
[2] Mohamed bin Zayed Univ Artificial Intelligence, Abu Dhabi, U Arab Emirates
关键词
Background information - Health care education - Information information - Informed decision - Language model - Model-based method - Political systems - Political views - Social issues - Wikipedia;
D O I
10.1002/aaai.12104
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Our paper aims to analyze political polarization in US political system using language models, and thereby help candidates make an informed decision. The availability of this information will help voters understand their candidates' views on the economy, healthcare, education, and other social issues. Our main contributions are a dataset extracted from Wikipedia that spans the past 120 years and a language model-based method that helps analyze how polarized a candidate is. Our data are divided into two parts, background information and political information about a candidate, since our hypothesis is that the political views of a candidate should be based on reason and be independent of factors such as birthplace, alma mater, and so forth. We further split this data into four phases chronologically, to help understand if and how the polarization amongst candidates changes. This data has been cleaned to remove biases. To understand the polarization, we begin by showing results from some classical language models in Word2Vec and Doc2Vec. And then use more powerful techniques like the Longformer, a transformer-based encoder, to assimilate more information and find the nearest neighbors of each candidate based on their political view and their background. The code and data for the project will be available here: ""
引用
收藏
页码:248 / 254
页数:7
相关论文
共 50 条
  • [21] Understanding language understanding: Computational models of reading
    Dyer, MG
    TRENDS IN COGNITIVE SCIENCES, 2000, 4 (01) : 35 - 35
  • [22] The Journey of Language Models in Understanding Natural Language
    Liu, Yuanrui
    Zhou, Jingping
    Sang, Guobiao
    Huang, Ruilong
    Zhao, Xinzhe
    Fang, Jintao
    Wang, Tiexin
    Li, Bohan
    WEB INFORMATION SYSTEMS AND APPLICATIONS, WISA 2024, 2024, 14883 : 331 - 363
  • [23] The Importance of Understanding Language in Large Language Models
    Youssef, Alaa
    Stein, Samantha
    Clapp, Justin
    Magnus, David
    AMERICAN JOURNAL OF BIOETHICS, 2023, 23 (10): : 6 - 7
  • [24] Improving Speech Understanding Accuracy with Limited Training Data Using Multiple Language Models and Multiple Understanding Models
    Katsumaru, Masaki
    Nakano, Mikio
    Komatani, Kazunori
    Funakoshi, Kotaro
    Ogata, Tetsuya
    Okuno, Hiroshi G.
    INTERSPEECH 2009: 10TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2009, VOLS 1-5, 2009, : 2699 - +
  • [25] Producing An Instagram Dataset For Persian Language Sentiment Analysis Using Crowdsourcing Method
    Heidari, Mahsa
    Shamsinejad, Pirooz
    2020 6TH INTERNATIONAL CONFERENCE ON WEB RESEARCH (ICWR), 2020, : 284 - 287
  • [26] A political understanding of the learning of written language. Language Democracy
    Casas, Alejandro
    Esau Tibata, Henry
    CUADERNOS DE LINGUISTICA HISPANICA, 2009, 13 : 113 - 128
  • [27] PrivacyGLUE: A Benchmark Dataset for General Language Understanding in Privacy Policies
    Shankar, Atreya
    Waldis, Andreas
    Bless, Christof
    Rodriguez, Maria Andueza
    Mazzola, Luca
    APPLIED SCIENCES-BASEL, 2023, 13 (06):
  • [28] Understanding Aesthetics with Language: A Photo Critique Dataset for Aesthetic Assessment
    Nieto, Daniel Vera
    Celona, Luigi
    Fernandez-Labrador, Clara
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 35 (NEURIPS 2022), 2022,
  • [29] UR-FUNNY: A Multimodal Language Dataset for Understanding Humor
    Hasan, Md Kamrul
    Rahman, Wasifur
    Zadeh, Amir
    Zhong, Jianyuan
    Tanveer, Md Iftekhar
    Morency, Louis-Philippe
    Hoque, Mohammed
    2019 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING AND THE 9TH INTERNATIONAL JOINT CONFERENCE ON NATURAL LANGUAGE PROCESSING (EMNLP-IJCNLP 2019): PROCEEDINGS OF THE CONFERENCE, 2019, : 2046 - 2056
  • [30] A Spoken Drug Prescription Dataset in French for Spoken Language Understanding
    Kocabiyikoglu, Ali Can
    Portet, Francois
    Gibert, Prudence
    Blanchon, Herve
    Babouchkine, Jean-Marc
    Gavazzi, Gaetan
    LREC 2022: THIRTEEN INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION, 2022, : 1023 - 1031