Understanding political polarization using language models: A dataset and method

被引:3
|
作者
Gode, Samiran [1 ]
Bare, Supreeth [1 ]
Raj, Bhiksha [1 ,2 ]
Yoo, Hyungon [1 ]
机构
[1] Carnegie Mellon Univ, Pittsburgh, PA 15213 USA
[2] Mohamed bin Zayed Univ Artificial Intelligence, Abu Dhabi, U Arab Emirates
关键词
Background information - Health care education - Information information - Informed decision - Language model - Model-based method - Political systems - Political views - Social issues - Wikipedia;
D O I
10.1002/aaai.12104
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Our paper aims to analyze political polarization in US political system using language models, and thereby help candidates make an informed decision. The availability of this information will help voters understand their candidates' views on the economy, healthcare, education, and other social issues. Our main contributions are a dataset extracted from Wikipedia that spans the past 120 years and a language model-based method that helps analyze how polarized a candidate is. Our data are divided into two parts, background information and political information about a candidate, since our hypothesis is that the political views of a candidate should be based on reason and be independent of factors such as birthplace, alma mater, and so forth. We further split this data into four phases chronologically, to help understand if and how the polarization amongst candidates changes. This data has been cleaned to remove biases. To understand the polarization, we begin by showing results from some classical language models in Word2Vec and Doc2Vec. And then use more powerful techniques like the Longformer, a transformer-based encoder, to assimilate more information and find the nearest neighbors of each candidate based on their political view and their background. The code and data for the project will be available here: ""
引用
收藏
页码:248 / 254
页数:7
相关论文
共 50 条
  • [41] WARPED LANGUAGE MODELS FOR NOISE ROBUST LANGUAGE UNDERSTANDING
    Namazifar, Mahdi
    Tur, Gokhan
    Hakkani-Tur, Dilek
    2021 IEEE SPOKEN LANGUAGE TECHNOLOGY WORKSHOP (SLT), 2021, : 981 - 988
  • [42] Understanding Telecom Language Through Large Language Models
    Bariah, Lina
    Zou, Hang
    Zhao, Qiyang
    Mouhouche, Belkacem
    Bader, Faouzi
    Debbah, Merouane
    IEEE CONFERENCE ON GLOBAL COMMUNICATIONS, GLOBECOM, 2023, : 6542 - 6547
  • [43] Large language models and political science
    Linegar, Mitchell
    Kocielnik, Rafal
    Alvarez, R. Michael
    FRONTIERS IN POLITICAL SCIENCE, 2023, 5
  • [44] On Political Theory and Large Language Models
    Rodman, Emma
    POLITICAL THEORY, 2024, 52 (04) : 548 - 580
  • [45] MODELS OF NATURAL-LANGUAGE UNDERSTANDING
    BATES, M
    PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, 1995, 92 (22) : 9977 - 9982
  • [46] Meaning and understanding in large language models
    Havlik, Vladimir
    SYNTHESE, 2024, 205 (01)
  • [47] Discriminative Models for Spoken Language Understanding
    Wang, Ye-Yi
    Acero, Alex
    INTERSPEECH 2006 AND 9TH INTERNATIONAL CONFERENCE ON SPOKEN LANGUAGE PROCESSING, VOLS 1-5, 2006, : 2426 - 2429
  • [48] Natural Language Dataset Generation Framework for Visualizations Powered by Large Language Models
    Ko, Hyung-Kwon
    Jeon, Hyeon
    Park, Gwanmo
    Kim, Dae Hyun
    Kim, Nam Wook
    Kim, Juho
    Seo, Jinwook
    PROCEEDINGS OF THE 2024 CHI CONFERENCE ON HUMAN FACTORS IN COMPUTING SYTEMS (CHI 2024), 2024,
  • [49] Understanding factual belief polarization: the role of trust, political sophistication, and affective polarization
    Rekker, Roderik
    Harteveld, Eelco
    ACTA POLITICA, 2024, 59 (03) : 643 - 670
  • [50] Creation of a structured solar cell material dataset and performance prediction using large language models
    Xie, Tong
    Wan, Yuwei
    Zhou, Yufei
    Huang, Wei
    Liu, Yixuan
    Linghu, Qingyuan
    Wang, Shaozhou
    Kit, Chunyu
    Grazian, Clara
    Zhang, Wenjie
    Hoex, Bram
    PATTERNS, 2024, 5 (05):