Corpus Creation and Analysis for Named Entity Recognition in Telugu-English Code-Mixed Social Media Data

被引:0
|
作者
Srirangam, Vamshi Krishna [1 ]
Reddy, Appidi Abhinav [1 ]
Singh, Vinay [1 ]
Shrivastava, Manish [1 ]
机构
[1] Int Inst Informat Technol, KCIS, LTRC, Hyderabad, India
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Named Entity Recognition(NER) is one of the important tasks in Natural Language Processing-(NLP) and also is a sub task of Information Extraction. In this paper we present our work on NER in Telugu-English code-mixed social media data. Code-Mixing, a progeny of multilingualism is a way in which multi-lingual people express themselves on social media by using linguistics units from different languages within a sentence or speech context. Entity Extraction from social media data such as tweets(twitter)(1) is in general difficult due to its informal nature, code-mixed data further complicates the problem due to its informal, unstructured and incomplete information. We present a Telugu-English code-mixed corpus with the corresponding named entity tags. The named entities used to tag data are Person('Per'), Organization('Org') and Location('Loc'). We experimented with the machine learning models Conditional Random Fields(CRFs), Decision Trees and Bidirectional LSTMs on our corpus which resulted in a F1-score of 0.96, 0.94 and 0.95 respectively.
引用
收藏
页码:183 / 189
页数:7
相关论文
共 50 条
  • [1] Named Entity Recognition for Hindi-English Code-Mixed Social Media Text
    Singh, Vinay
    Shrivastava, Manish
    Akhtar, Syed Sarfaraz
    Vijay, Deepanshu
    NAMED ENTITIES, 2018, : 27 - 35
  • [2] Sentiment Analysis of Code-Mixed Telugu-English Data Leveraging Syllable and Word Embeddings
    Rayala, Upendar Rao
    Seshadri, Karthick
    Sristy, Nagesh Bhattu
    ACM TRANSACTIONS ON ASIAN AND LOW-RESOURCE LANGUAGE INFORMATION PROCESSING, 2023, 22 (10)
  • [3] Named Entity Recognition on Arabic-English Code-Mixed Data
    Sabty, Caroline
    Elmahdy, Mohamed
    Abdennadher, Slim
    2019 13TH IEEE INTERNATIONAL CONFERENCE ON SEMANTIC COMPUTING (ICSC), 2019, : 93 - 97
  • [4] Performance Analysis of Named Entity Recognition Approaches on Code-Mixed Data
    Gaddamidi, Sreeja
    Prasath, Rajendra
    INFORMATION, COMMUNICATION AND COMPUTING TECHNOLOGY (ICICCT 2021), 2021, 1417 : 153 - 167
  • [5] Named Entity Recognition for Code-Mixed Indian Corpus using Meta Embedding
    Priyadharshini, Ruba
    Chakravarthi, Bharathi Raja
    Vegupatti, Mani
    McCrae, John P.
    2020 6TH INTERNATIONAL CONFERENCE ON ADVANCED COMPUTING AND COMMUNICATION SYSTEMS (ICACCS), 2020, : 68 - 72
  • [6] Named Entity Recognition for Code Mixed Social Media Sentences
    Sharma, Yashvardhan
    Bhargava, Rupal
    Tadikonda, Bapiraju Vamsi
    INTERNATIONAL JOURNAL OF SOFTWARE SCIENCE AND COMPUTATIONAL INTELLIGENCE-IJSSCI, 2021, 13 (02): : 23 - 36
  • [7] Gazetteer Enhanced Named Entity Recognition for Code-Mixed WebQueries
    Fetahu, Besnik
    Fang, Anjie
    Rokhlenko, Oleg
    Malmasi, Shervin
    SIGIR '21 - PROCEEDINGS OF THE 44TH INTERNATIONAL ACM SIGIR CONFERENCE ON RESEARCH AND DEVELOPMENT IN INFORMATION RETRIEVAL, 2021, : 1677 - 1681
  • [8] Deep Learning Based Sentiment Analysis in a Code-Mixed English-Hindi and English-Bengali Social Media Corpus
    Jamatia, Anupam
    Swamy, Steve Durairaj
    Gamback, Bjorn
    Das, Amitava
    Debbarma, Swapan
    INTERNATIONAL JOURNAL ON ARTIFICIAL INTELLIGENCE TOOLS, 2020, 29 (05)
  • [9] Bilingual Sentiment Analysis for a Code-mixed Punjabi English Social Media Text
    Yadav, Konark
    Lamba, Aashish
    Gupta, Dhruv
    Gupta, Ansh
    Karmakar, Purnendu
    Saini, Sandeep
    PROCEEDINGS OF THE 2020 5TH INTERNATIONAL CONFERENCE ON COMPUTING, COMMUNICATION AND SECURITY (ICCCS-2020), 2020,
  • [10] Gender Prediction in English-Hindi Code-Mixed Social Media Content: Corpus and Baseline System
    Khandelwal, Ankush
    Swami, Sahil
    Akhtar, Syed Sarfaraz
    Shrivastava, Manish
    COMPUTACION Y SISTEMAS, 2018, 22 (04): : 1241 - 1247