Development Of A Standard Text And Speech Corpus For The Punjabi Language

被引:0
|
作者
Dhanjal, Surinder [1 ]
Bhatia, Satvinder Singh [2 ]
机构
[1] Thompson Rivers Univ, Dept Comp Sci, Kamloops, BC, Canada
[2] Thapar Univ, Sch Math & Comp Applicat, Patiala, Punjab, India
关键词
Text corpus; Speech corpus; Corpora development; Punjabi language; Malwa; Malwai Dialect; Gurmukhi Script; Speech processing; IPA;
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
In this paper, a new text and speech corpus in the Punjabi language has been developed. The Punjabi language is a modern Indo-Aryan language. The Punjabi language has been ranked amongst the top spoken languages of the world. Over the years, this ranking has varied between 10 and 18. Any research work on the Punjabi language, therefore, assumes an international significance. The Punjabi language is the native language of the Punjab state in two countries: East Punjab in India, and West Punjab in Pakistan. There are many dialects of the Punjabi language and two different scripts in both countries. It will be an enormous task to design a new text or speech corpus that can completely describe all dialects in both scripts. This work, therefore, concentrates only on one dialect of the Punjabi language: the Malwai dialect. This paper describes at least 20 unique features of the newly designed corpus.
引用
收藏
页数:6
相关论文
共 50 条
  • [21] Corpus Based Study on Vocabulary Profile of Shahmukhi Punjabi Language.
    Arslan, Muhammad Farukh
    Mehmood, Muhammad Asim
    Hayat, Shaukat
    DILEMAS CONTEMPORANEOS-EDUCACION POLITICA Y VALORES, 2019, 6
  • [22] Text Summarization Technique for Punjabi Language Using Neural Networks
    Jain, Arti
    Arora, Anuja
    Yadav, Divakar
    Morato, Jorge
    Kaur, Amanpreet
    INTERNATIONAL ARAB JOURNAL OF INFORMATION TECHNOLOGY, 2021, 18 (06) : 807 - 818
  • [23] Implementation of Phonetic Level Speech Recognition System for Punjabi Language
    Mittal, Shama
    Kaur, Rupinderdeep
    2016 1ST INDIA INTERNATIONAL CONFERENCE ON INFORMATION PROCESSING (IICIP), 2016,
  • [24] The development of syllable based text to speech system for Tamil language
    Karthikadevi, M.
    Srinivasagan, K.G.
    2014 International Conference on Recent Trends in Information Technology, ICRTIT 2014, 2014,
  • [25] The Development of Syllable Based Text to Speech System for Tamil language
    Karthikadevi, M.
    Srinivasagan, K. G.
    2014 INTERNATIONAL CONFERENCE ON RECENT TRENDS IN INFORMATION TECHNOLOGY (ICRTIT), 2014,
  • [26] PUNJARPAbet: A NEW PHONETIC ALPHABET FOR SPEECH PROCESSING IN THE PUNJABI LANGUAGE
    Dhanjal, Surinder
    Bhatia, Satvinder Singh
    JOURNAL OF CIRCUITS SYSTEMS AND COMPUTERS, 2014, 23 (05)
  • [27] Trust the text: language, corpus and discourse
    Liu, Jingzhong
    AUSTRALIAN REVIEW OF APPLIED LINGUISTICS, 2005, 28 (01) : 109 - 112
  • [28] Emilia: a speech corpus for Argentine Spanish text to speech synthesis
    Torres, Humberto M.
    Gurlekian, Jorge A.
    Evin, Diego A.
    Cossio Mercado, Christian G.
    LANGUAGE RESOURCES AND EVALUATION, 2019, 53 (03) : 419 - 447
  • [29] Emilia: a speech corpus for Argentine Spanish text to speech synthesis
    Humberto M. Torres
    Jorge A. Gurlekian
    Diego A. Evin
    Christian G. Cossio Mercado
    Language Resources and Evaluation, 2019, 53 : 419 - 447
  • [30] Concatenative speech synthesizers and speech corpus for Macedonian language
    Chungurski, Slavcho
    Kraljevski, Ivan
    Mihajlov, Dragan
    Arsenovski, Sime
    PROCEEDINGS OF THE ITI 2008 30TH INTERNATIONAL CONFERENCE ON INFORMATION TECHNOLOGY INTERFACES, 2008, : 669 - +