Online Unstructured Data Analysis Models with KoBERT and Word2vec: A Study on Sentiment Analysis of Public Opinion in Korean

被引:2
|
作者
Baek, Changwon [1 ]
Kang, Jiho [2 ]
Choi, Sangsoo [1 ]
机构
[1] Korea Inst Sci & Technol KIST, Technol Convergence Ctr, Seoul, South Korea
[2] Korea Univ, Inst Engn Res, Seoul, South Korea
关键词
KoBERT; Word2vec; Public opinion analysis; Sentiment classification; INTERNET;
D O I
10.5391/IJFIS.2023.23.3.244
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
Online news articles and comments play a vital role in shaping public opinion. Numerous studies have conducted online opinion analyses using these as raw data. Bidirectional encoder representations from transformer (BERT)-based sentiment analysis of public opinion have recently attracted significant attention. However, owing to its limited linguistic versatility and low accuracy in domains with insufficient learning data, the application of BERT to Korean is challenging. Conventional public opinion analysis focuses on term frequency; hence, low-frequency words are likely to be excluded because their importance is underestimated. This study aimed to address these issues and facilitate the analysis of public opinion regarding Korean news articles and comments. We propose a method for analyzing public opinion using word2vec to increase the word-frequency-centered analytical limit in conjunction with KoBERT, which is optimized for Korean language by improving BERT. Naver news articles and comments were analyzed using a sentiment classification model developed on the KoBERT framework. The experiment demonstrated a sentiment classification accuracy of over 90%. Thus, it yields faster and more precise results than conventional methods. Words with a low frequency of occurrence, but high relevance, can be identified using word2vec.
引用
收藏
页码:244 / 258
页数:15
相关论文
共 50 条
  • [31] Word2Vec Model Analysis for Semantic Similarities in English Words
    Jatnika, Derry
    Bijaksana, Moch Arif
    Suryani, Arie Ardiyanti
    4TH INTERNATIONAL CONFERENCE ON COMPUTER SCIENCE AND COMPUTATIONAL INTELLIGENCE (ICCSCI 2019) : ENABLING COLLABORATION TO ESCALATE IMPACT OF RESEARCH RESULTS FOR SOCIETY, 2019, 157 : 160 - 167
  • [32] Affective Analysis of Chinese Sentences Based on Word2vec and SVC
    Wan, Fu-yong
    Li, Shi-qiang
    2018 INTERNATIONAL CONFERENCE ON COMPUTER, COMMUNICATIONS AND MECHATRONICS ENGINEERING (CCME 2018), 2018, 332 : 701 - 708
  • [33] Sentiment Analysis Using Word2vec And Long Short-Term Memory (LSTM) For Indonesian Hotel Reviews
    Muhammad, Putra Fissabil
    Kusumaningrum, Retno
    Wibowo, Adi
    5TH INTERNATIONAL CONFERENCE ON COMPUTER SCIENCE AND COMPUTATIONAL INTELLIGENCE 2020, 2021, 179 : 728 - 735
  • [34] Real-Time Sentiment Analysis of 2019 Election Tweets using Word2vec and Random Forest Model
    Hitesh, M. S. R.
    Vaibhav, Vedhosi
    Kalki, Y. J. Abhishek
    Kamtam, Suraj Harsha
    Kumari, Santoshi
    2019 2ND INTERNATIONAL CONFERENCE ON INTELLIGENT COMMUNICATION AND COMPUTATIONAL TECHNIQUES (ICCT), 2019, : 146 - 151
  • [35] Public Opinion Evolution Law and Sentiment Analysis of Campus Online Public Opinion Events
    Xu, Zhengzhi
    Ye, Zi
    Ye, Haiyang
    Zhu, Lijia
    Lu, Ke
    Quan, Hong
    Wang, Jun
    Gu, Shanchuan
    Zhang, Shangfeng
    Zhang, Guodao
    JOURNAL OF ADVANCED COMPUTATIONAL INTELLIGENCE AND INTELLIGENT INFORMATICS, 2024, 28 (04) : 990 - 1004
  • [36] Aspect Analysis of Cebu Establishments' Online Reviews using k-means Clustering and word2vec
    Capao, Kris
    Gorro, Ken D.
    Gorro, Kim D.
    Sabellano, Mary Jane
    Militante, Cris Lawrence Adrian G.
    Manalili, Justin Paul C.
    PROCEEDINGS OF 2018 3RD INTERNATIONAL CONFERENCE ON COMPUTER AND COMMUNICATION SYSTEMS (ICCCS), 2018, : 61 - 66
  • [37] Qualitative data analysis of disaster risk reduction suggestions assisted by topic modeling and word2vec
    Gorro, Ken
    Ancheta, Jeffrey Rosario
    Capao, Kris
    Oco, Nathaniel
    Roxas, Rachel Edita
    Sabellano, Mary Jane
    Nonnecke, Brandie
    Mohanty, Shrestha
    Crittenden, Camille
    Goldberg, Ken
    2017 INTERNATIONAL CONFERENCE ON ASIAN LANGUAGE PROCESSING (IALP), 2017, : 293 - 297
  • [38] Integrated systems approach to enhance rural development: word2vec analysis
    Dube, Thabile
    Telukdarie, Arnesh
    COGENT SOCIAL SCIENCES, 2025, 11 (01):
  • [39] Research on Semantic Prediction Analysis of Tibetan Text Based on Word2Vec
    Ding Hai-lan
    Yu Hong-zhi
    Qi Kun-yu
    2018 INTERNATIONAL SYMPOSIUM ON POWER ELECTRONICS AND CONTROL ENGINEERING (ISPECE 2018), 2019, 1187
  • [40] Multi-Sentiment Modeling with Scalable Systematic Labeled Data Generation via Word2Vec Clustering
    Mayank, Dhruv
    Padmanabhan, Kanchana
    Pal, Koushik
    2016 IEEE 16TH INTERNATIONAL CONFERENCE ON DATA MINING WORKSHOPS (ICDMW), 2016, : 952 - 959