Corpus Creation and Analysis for Named Entity Recognition in Telugu-English Code-Mixed Social Media Data

被引:0
|
作者
Srirangam, Vamshi Krishna [1 ]
Reddy, Appidi Abhinav [1 ]
Singh, Vinay [1 ]
Shrivastava, Manish [1 ]
机构
[1] Int Inst Informat Technol, KCIS, LTRC, Hyderabad, India
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Named Entity Recognition(NER) is one of the important tasks in Natural Language Processing-(NLP) and also is a sub task of Information Extraction. In this paper we present our work on NER in Telugu-English code-mixed social media data. Code-Mixing, a progeny of multilingualism is a way in which multi-lingual people express themselves on social media by using linguistics units from different languages within a sentence or speech context. Entity Extraction from social media data such as tweets(twitter)(1) is in general difficult due to its informal nature, code-mixed data further complicates the problem due to its informal, unstructured and incomplete information. We present a Telugu-English code-mixed corpus with the corresponding named entity tags. The named entities used to tag data are Person('Per'), Organization('Org') and Location('Loc'). We experimented with the machine learning models Conditional Random Fields(CRFs), Decision Trees and Bidirectional LSTMs on our corpus which resulted in a F1-score of 0.96, 0.94 and 0.95 respectively.
引用
收藏
页码:183 / 189
页数:7
相关论文
共 50 条
  • [21] Corpus creation and language identification for code-mixed Indonesian-Java']Javanese-English Tweets
    Hidayatullah, Ahmad Fathan
    Apong, Rosyzie Anna
    Lai, Daphne T. C.
    Qazi, Atika
    PEERJ COMPUTER SCIENCE, 2023, 9
  • [22] Harnessing the power of hugging face's multilingual transformers: unravelling the code-mixed named entity recognition enigma
    Shamim, Rejuwan
    Shaikh, Asadullah
    INTERNATIONAL JOURNAL OF INTELLIGENT ENGINEERING INFORMATICS, 2024, 12 (03)
  • [23] Stress Recognition in Code-Mixed Social Media Texts using Machine Learning
    Achamaleh, Tewodros
    Eyob, Lemlem
    Tayyab, Muhammad
    Sidorov, Grigori
    Batyrshin, Ildar
    INTERNATIONAL JOURNAL OF COMBINATORIAL OPTIMIZATION PROBLEMS AND INFORMATICS, 2024, 15 (01): : 32 - 38
  • [24] Weakly labeled data augmentation for social media named entity recognition
    Kim, Juae
    Kim, Yejin
    Kang, Sangwoo
    EXPERT SYSTEMS WITH APPLICATIONS, 2022, 209
  • [25] A Language Identification System for Code-Mixed English-Manipuri Social Media Text
    Lamabam, Priyadarshini
    Chakma, Kunal
    PROCEEDINGS OF 2ND IEEE INTERNATIONAL CONFERENCE ON ENGINEERING & TECHNOLOGY ICETECH-2016, 2016, : 79 - 83
  • [26] Hate Speech Detection in Hindi-English Code-Mixed Social Media Text
    Santosh, T. Y. S. S.
    Aravind, K. V. S.
    PROCEEDINGS OF THE 6TH ACM IKDD CODS AND 24TH COMAD, 2019, : 310 - 313
  • [27] Social media text analytics of Malayalam–English code-mixed using deep learning
    S. Thara
    Prabaharan Poornachandran
    Journal of Big Data, 9
  • [28] Sinhala-English Code-Mixed Data Analysis: A Review on Data Collection Process
    Smith, Ian
    Thayasivam, Uthayasanker
    2019 19TH INTERNATIONAL CONFERENCE ON ADVANCES IN ICT FOR EMERGING REGIONS (ICTER - 2019), 2019,
  • [29] Machine Learning Techniques for Sentiment Analysis of Code-Mixed and Switched Indian Social Media Text Corpus: A Comprehensive Review
    Ahmad, Gazi Imtiyaz
    Singla, Jimmy
    Ali, Anis
    Reshi, Aijaz Ahmad
    Salameh, Anas A.
    INTERNATIONAL JOURNAL OF ADVANCED COMPUTER SCIENCE AND APPLICATIONS, 2022, 13 (02) : 455 - 467
  • [30] Sentiment Analysis for Code-Mixed Indian Social Media Text With Distributed Representation
    Shalini, K.
    Ganesh, Barathi H. B.
    Kumar, Anand M.
    Soman, K. P.
    2018 INTERNATIONAL CONFERENCE ON ADVANCES IN COMPUTING, COMMUNICATIONS AND INFORMATICS (ICACCI), 2018, : 1126 - 1131