Development Of A Standard Text And Speech Corpus For The Punjabi Language

被引：0

作者：

Dhanjal, Surinder ^{[1
]}

Bhatia, Satvinder Singh ^{[2
]}

机构：

[1] Thompson Rivers Univ, Dept Comp Sci, Kamloops, BC, Canada

[2] Thapar Univ, Sch Math & Comp Applicat, Patiala, Punjab, India

来源：

2013 INTERNATIONAL CONFERENCE ORIENTAL COCOSDA HELD JOINTLY WITH 2013 CONFERENCE ON ASIAN SPOKEN LANGUAGE RESEARCH AND EVALUATION (O-COCOSDA/CASLRE) | 2013年

关键词：

Text corpus; Speech corpus; Corpora development; Punjabi language; Malwa; Malwai Dialect; Gurmukhi Script; Speech processing; IPA;

D O I：

暂无

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

In this paper, a new text and speech corpus in the Punjabi language has been developed. The Punjabi language is a modern Indo-Aryan language. The Punjabi language has been ranked amongst the top spoken languages of the world. Over the years, this ranking has varied between 10 and 18. Any research work on the Punjabi language, therefore, assumes an international significance. The Punjabi language is the native language of the Punjab state in two countries: East Punjab in India, and West Punjab in Pakistan. There are many dialects of the Punjabi language and two different scripts in both countries. It will be an enormous task to design a new text or speech corpus that can completely describe all dialects in both scripts. This work, therefore, concentrates only on one dialect of the Punjabi language: the Malwai dialect. This paper describes at least 20 unique features of the newly designed corpus.

引用

页数：6

共 50 条

[21] Corpus Based Study on Vocabulary Profile of Shahmukhi Punjabi Language.
Arslan, Muhammad Farukh
Mehmood, Muhammad Asim
Hayat, Shaukat
DILEMAS CONTEMPORANEOS-EDUCACION POLITICA Y VALORES, 2019, 6
[22] Text Summarization Technique for Punjabi Language Using Neural Networks
Jain, Arti
Arora, Anuja
Yadav, Divakar
Morato, Jorge
Kaur, Amanpreet
INTERNATIONAL ARAB JOURNAL OF INFORMATION TECHNOLOGY, 2021, 18 (06) : 807 - 818
[23] Implementation of Phonetic Level Speech Recognition System for Punjabi Language
Mittal, Shama
Kaur, Rupinderdeep
2016 1ST INDIA INTERNATIONAL CONFERENCE ON INFORMATION PROCESSING (IICIP), 2016,
[24] The development of syllable based text to speech system for Tamil language
Karthikadevi, M.
Srinivasagan, K.G.
2014 International Conference on Recent Trends in Information Technology, ICRTIT 2014, 2014,
[25] The Development of Syllable Based Text to Speech System for Tamil language
Karthikadevi, M.
Srinivasagan, K. G.
2014 INTERNATIONAL CONFERENCE ON RECENT TRENDS IN INFORMATION TECHNOLOGY (ICRTIT), 2014,
[26] PUNJARPAbet: A NEW PHONETIC ALPHABET FOR SPEECH PROCESSING IN THE PUNJABI LANGUAGE
Dhanjal, Surinder
Bhatia, Satvinder Singh
JOURNAL OF CIRCUITS SYSTEMS AND COMPUTERS, 2014, 23 (05)
[27] Trust the text: language, corpus and discourse
Liu, Jingzhong
AUSTRALIAN REVIEW OF APPLIED LINGUISTICS, 2005, 28 (01) : 109 - 112
[28] Emilia: a speech corpus for Argentine Spanish text to speech synthesis
Torres, Humberto M.
Gurlekian, Jorge A.
Evin, Diego A.
Cossio Mercado, Christian G.
LANGUAGE RESOURCES AND EVALUATION, 2019, 53 (03) : 419 - 447
[29] Emilia: a speech corpus for Argentine Spanish text to speech synthesis
Humberto M. Torres
Jorge A. Gurlekian
Diego A. Evin
Christian G. Cossio Mercado
Language Resources and Evaluation, 2019, 53 : 419 - 447
[30] Concatenative speech synthesizers and speech corpus for Macedonian language
Chungurski, Slavcho
Kraljevski, Ivan
Mihajlov, Dragan
Arsenovski, Sime
PROCEEDINGS OF THE ITI 2008 30TH INTERNATIONAL CONFERENCE ON INFORMATION TECHNOLOGY INTERFACES, 2008, : 669 - +

← 1 2 3 4 5 →