KBioXLM: A Knowledge-anchored Biomedical Multilingual Pretrained Language Model

被引：0

作者：

Geng, Lei ^{[1
]}

Yan, Xu ^{[1
]}

Cao, Ziqiang ^{[1
]}

Li, Juntao ^{[1
]}

Li, Wenjie ^{[3
]}

Li, Sujian ^{[2
]}

Zhou, Xinjie ^{[4
]}

Yang, Yang ^{[4
]}

Zhang, Jun ^{[5
]}

机构：

[1] Soochow Univ, Inst Artificial Intelligence, Suzhou, Peoples R China

[2] Peking Univ, Beijing, Peoples R China

[3] Hong Kong Polytech Univ, Hong Kong, Peoples R China

[4] Pharmcube, Beijing, Peoples R China

[5] Changping Lab, Beijing, Peoples R China

来源：

FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (EMNLP 2023) | 2023年

基金：

中国国家自然科学基金;

关键词：

CORPUS;

D O I：

暂无

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Most biomedical pretrained language models are monolingual and cannot handle the growing cross-lingual requirements. The scarcity of non-English domain corpora, not to mention parallel data, poses a significant hurdle in training multilingual biomedical models. Since knowledge forms the core of domain-specific corpora and can be translated into various languages accurately, we propose a model called KBioXLM, which transforms the multilingual pretrained model XLM-R into the biomedical domain using a knowledge-anchored approach. We achieve a biomedical multilingual corpus by incorporating three granularity knowledge alignments (entity, fact, and passage levels) into monolingual corpora. Then we design three corresponding training tasks (entity masking, relation masking, and passage relation prediction) and continue training on top of the XLM-R model to enhance its domain crosslingual ability. To validate the effectiveness of our model, we translate the English benchmarks of multiple tasks into Chinese. Experimental results demonstrate that our model significantly outperforms monolingual and multilingual pretrained models in cross-lingual zero-shot and few-shot scenarios, achieving improvements of up to 10+ points. Our code is publicly available at https://github.com/ ngwlh-gl/KBioXLM.

引用

页码：11239 / 11250

页数：12

共 50 条

[41] DN at SemEval-2023 Task 12: Low-Resource Language Text Classification via Multilingual Pretrained Language Model Fine-tuning
Daniil, Homskiy
Narek, Maloyan
17TH INTERNATIONAL WORKSHOP ON SEMANTIC EVALUATION, SEMEVAL-2023, 2023, : 1537 - 1541
[42] BatteryBERT: A Pretrained Language Model for Battery Database Enhancement
Huang, Shu
Cole, Jacqueline M.
JOURNAL OF CHEMICAL INFORMATION AND MODELING, 2022, 62 (24) : 6365 - 6377
[43] Language clustering and knowledge sharing in multilingual organizations: A social perspective on language
Ahmad, Farhan
Widen, Gunilla
JOURNAL OF INFORMATION SCIENCE, 2015, 41 (04) : 430 - 443
[44] A Survey on Model Compression and Acceleration for Pretrained Language Models
Xu, Canwen
McAuley, Julian
THIRTY-SEVENTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, VOL 37 NO 9, 2023, : 10566 - 10575
[45] Eliciting Knowledge from Pretrained Language Models for Prototypical Prompt Verbalizer
Wei, Yinyi
Mo, Tong
Jiang, Yongtao
Li, Weiping
Zhao, Wen
ARTIFICIAL NEURAL NETWORKS AND MACHINE LEARNING - ICANN 2022, PT II, 2022, 13530 : 222 - 233
[46] Enriching contextualized language model from knowledge graph for biomedical information extraction
Fei, Hao
Ren, Yafeng
Zhang, Yue
Ji, Donghong
Liang, Xiaohui
BRIEFINGS IN BIOINFORMATICS, 2021, 22 (03)
[47] Cross-Lingual Information Retrieval from Multilingual Construction Documents Using Pretrained Language Models
Kim, Jungyeon
Chung, Sehwan
Chi, Seokho
JOURNAL OF CONSTRUCTION ENGINEERING AND MANAGEMENT, 2024, 150 (06)
[48] Multilingual Pretrained based Multi-feature Fusion Model for English Text Classification
Zhang, Ruijuan
COMPUTER SCIENCE AND INFORMATION SYSTEMS, 2025, 22 (01) : 133 - 152
[49] A Novel Pretrained General-purpose Vision Language Model for the Vietnamese Language
Dinh Anh Vu
Quang Nhat Minh Pham
Giang Son Tran
ACM TRANSACTIONS ON ASIAN AND LOW-RESOURCE LANGUAGE INFORMATION PROCESSING, 2024, 23 (05)
[50] Critical multilingual language awareness: the role of teachers as language activists and knowledge generators
Cummins, Jim
LANGUAGE AWARENESS, 2023, 32 (04) : 560 - 573

← 1 2 3 4 5 →