Massively Multilingual Pronunciation Mining with WikiPron

被引：0

作者：

Lee, Jackson L.

Ashby, Lucas F. E. ^{[1
]}

Garza, M. Elizabeth ^{[1
]}

Lee-Sikka, Yeonju ^{[1
]}

Miller, Sean ^{[1
]}

Wong, Alan ^{[1
]}

McCarthy, Arya D. ^{[2
]}

Gorman, Kyle ^{[1
]}

机构：

[1] CUNY, Grad Ctr, New York, NY 10021 USA

[2] Johns Hopkins Univ, Ctr Language & Speech Proc, Baltimore, MD 21218 USA

来源：

PROCEEDINGS OF THE 12TH INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION (LREC 2020) | 2020年

关键词：

speech; pronunciation; grapheme-to-phoneme; g2p; MODELS;

D O I：

暂无

中图分类号：

TP39 [计算机的应用];

学科分类号：

081203 ; 0835 ;

摘要：

We introduce WikiPron, an open-source command-line tool for extracting pronunciation data from Wiktionary, a collaborative multilingual online dictionary. We first describe the design and use of WikiPron. We then discuss the challenges faced scaling this tool to create an automatically-generated database of 1.7 million pronunciations from 165 languages. Finally, we validate the pronunciation database by using it to train and evaluating a collection of generic grapheme-to-phoneme models. The software, pronunciation data, and models are all made available under permissive open-source licenses.

引用

页码：4223 / 4228

页数：6

共 50 条

[21] On Pronunciation in a Multilingual Dictionary: The Case of Lukumi, Olukumi and Yoruba Dictionary
Uguru, Joy O.
Okeke, Chukwuma O.
LEXIKOS, 2020, 30 : 519 - 539
[22] A Web-Based Tool for Developing Multilingual Pronunciation Lexicons
Ainsley, Samantha
Ha, Linne
Jansehe, Martin
Kim, Ara
Nanzawa, Masayuki
12TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2011 (INTERSPEECH 2011), VOLS 1-5, 2011, : 3338 - +
[23] Learning Translations via Images with a Massively Multilingual Image Dataset
Hewitt, John
Ippolito, Daphne
Callahan, Brendan
Kriz, Reno
Wijaya, Derry
Callison-Burch, Chris
PROCEEDINGS OF THE 56TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL), VOL 1, 2018, : 2566 - 2576
[24] CVSS Corpus and Massively Multilingual Speech-to-Speech Translation
Jia, Ye
Ramanovich, Michelle Tadmor
Wang, Quan
Zen, Heiga
LREC 2022: THIRTEEN INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION, 2022, : 6691 - 6703
[25] EMS: Efficient and Effective Massively Multilingual Sentence Embedding Learning
Mao, Zhuoyuan
Chu, Chenhui
Kurohashi, Sadao
IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2024, 32 : 2841 - 2856
[26] Multilingual context-based pronunciation learning for Text-to-Speech
Comini, Giulia
Ribeiro, Manuel Sam
Yang, Fan
Shim, Heereen
Lorenzo-Trueba, Jaime
INTERSPEECH 2023, 2023, : 631 - 635
[27] Text mining applied to multilingual corpora
Neri, F
Raffaelli, R
Knowledge Mining, 2005, 185 : 123 - 131
[28] COMFO: Multilingual Corpus for Opinion Mining
Faty, Lamine
Drame, Khadim
Sarr, Edouard Ngor
Ndiaye, Marie
Diop, Ibrahima
Dia, Yoro
Sall, Ousmane
ARTIFICIAL GENERAL INTELLIGENCE, AGI 2022, 2023, 13539 : 14 - 19
[29] Multilingual Argument Mining: Datasets and Analysis
Toledo-Ronen, Orith
Orbach, Matan
Bilu, Yonatan
Spector, Artem
Slonim, Noam
FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, EMNLP 2020, 2020,
[30] Mining the Multilingual Terminology from the Web
Sadat, Fatiha
2013 IEEE PACIFIC RIM CONFERENCE ON COMMUNICATIONS, COMPUTERS AND SIGNAL PROCESSING (PACRIM), 2013, : 41 - 45

← 1 2 3 4 5 →