Characteristics of Malay translated hadith corpus

被引:1
|
作者
Sazali, Siti Syakirah [1 ]
Rahman, Nurazzah Abdul [1 ]
Abu Bakar, Zainab [2 ]
机构
[1] Univ Teknol MARA, Fac Comp & Math Sci, Shah Alam, Selangor, Malaysia
[2] Al Madinah Int Univ, Fac Comp & Informat Technol, Kuala Lumpur, Malaysia
关键词
Malay language; Linguistic analysis; Malay translated hadith corpus; Natural language processing; Corpus linguistic; NAME ENTITY RECOGNITION; TEXT;
D O I
10.1016/j.jksuci.2020.07.011
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Annotated corpus can greatly assist in the natural language processing field. For example, computers can understand more of the document context, and indexing and clustering in information retrieval can be done precisely with less or no ambiguity of words. However, there are only a few annotated corpora in Malay language, which are not publicly shared. In this paper, we delve into analysing and annotating Malay translated hadith documents in terms of tagging and entities. There are three phases, which are manual filtering and cleaning, analysing the corpus and creating the benchmark. As the result, an analysis and benchmark of Malay translated hadith corpus were produced in term of part-of-speech and named entities tags that follows the Zipf's law distribution. (C) 2020 The Authors. Published by Elsevier B.V. on behalf of King Saud University.
引用
收藏
页码:2151 / 2160
页数:10
相关论文
共 50 条
  • [1] A Survey: Framework of an Information Retrieval for Malay Translated Hadith Document
    Zulkefli, Nurul Syeilla Syazhween
    Rahman, Nurazzah Abdul
    Puteh, Mazidah
    8TH INTERNATIONAL CONFERENCE ON MECHANICAL AND MANUFACTURING ENGINEERING 2017 (ICME'17), 2017, 135
  • [2] Graph-based Text Representation for Malay Translated Hadith Text
    Alias, Nursyahidah
    Abd Rahman, Nurazzah
    Ismail, Normaly Kamal
    Nor, Zulhilmi Mohamed
    Alias, Muhammad Nazir
    2016 THIRD INTERNATIONAL CONFERENCE ON INFORMATION RETRIEVAL AND KNOWLEDGE MANAGEMENT (CAMP), 2016, : 60 - 66
  • [3] Characteristics of a Malay Journalistic Corpus
    Zamin, Norshuhani
    Oxley, Alan
    Abu Bakar, Zainab
    Farhan, Syed Ahmad
    2012 IEEE CONFERENCE ON CONTROL, SYSTEMS & INDUSTRIAL INFORMATICS (ICCSII), 2012, : 214 - 218
  • [4] Characteristics of a Malay journalistic corpus
    Faculty of Science and Information Technology, Universiti Teknologi PETRONAS, 31750 Tronoh, Perak, Malaysia
    不详
    不详
    Proc. IEEE Conf. Control, Syst. Ind. Informatics, ICCSII, (214-218):
  • [5] A Parallel Latent Semantic Indexing (LSI) Algorithm for Malay Hadith Translated Document Retrieval
    Abd Rahman, Nurazzah
    Mabni, Zulaile
    Omar, Nasiroh
    Hanum, Haslizatul Fairuz Mohamed
    Rahim, Nik Nur Amirah Tuan Mohamad
    SOFT COMPUTING IN DATA SCIENCE, SCDS 2015, 2015, 545 : 154 - 163
  • [6] A Malay Hadith Translated Document Retrieval Using Parallel Latent Semantic Indexing (LSI)
    Rahim, Nik Nur Amirah Tuan Mohamad
    Mabni, Zulaile
    Hanum, Haslizatul Mohamed
    Rahman, Nurazzah Abdul
    2016 THIRD INTERNATIONAL CONFERENCE ON INFORMATION RETRIEVAL AND KNOWLEDGE MANAGEMENT (CAMP), 2016, : 118 - 123
  • [7] Comparative Study of Machine Learning Approach on Malay Translated Hadith Text Classification based on Sanad
    Rahifah, Syuhairah
    Najib, Mohammad
    Abd Rahman, Nurazzah
    Ismail, Normaly Kamal
    Alias, Nursyahidah
    Nor, Zulhilmi Mohamed
    Alias, Muhammad Nazir
    8TH INTERNATIONAL CONFERENCE ON MECHANICAL AND MANUFACTURING ENGINEERING 2017 (ICME'17), 2017, 135
  • [8] M-Hadith: Retrieving Malay Hadith Text in a Mobile Application
    Zainudin, Mohamad Khairul Annuar B.
    Rias, Riaza Mohd
    2012 IEEE SYMPOSIUM ON COMPUTER APPLICATIONS AND INDUSTRIAL ELECTRONICS (ISCAIE 2012), 2012,
  • [9] Experiment with Text Summarization as a Positive Hierarchical Fuzzy Logic Ranking Indicator for Domain Specific Retrieval of Malay Translated Hadith
    bin Rodzman, Shaiful Bakhtiar
    Ismail, Normaly Kamal
    Rahman, Nurazzah Abd
    Aljunid, Syed Ahmad
    Rahman, Hayati Abd
    Nor, Zulhilmi Mohamed
    Khalif, Ku Muhammad Naim Ku
    Noor, Ahmad Yunus Mohd
    2019 IEEE 9TH SYMPOSIUM ON COMPUTER APPLICATIONS & INDUSTRIAL ELECTRONICS (ISCAIE), 2019, : 299 - 304
  • [10] Malay Interrogative Knowledge Corpus
    Sidi, Fatimah
    Jabar, Marzanah A.
    Selamat, Mohd Hasan
    Ghani, Abdul Azim Abdul
    Sulaiman, Md Nasir
    Baharom, Salmi
    PROCEEDINGS OF KNOWLEDGE MANAGEMENT 5TH INTERNATIONAL CONFERENCE 2010, 2010, : 685 - 689