MaterialBERT for natural language processing of materials science texts

被引：15

作者：

Yoshitake, Michiko ^{[1
]}

Sato, Fumitaka ^{[1
,2
]}

Kawano, Hiroyuki ^{[1
,2
]}

Teraoka, Hiroshi ^{[1
,2
]}

机构：

[1] Natl Inst Mat Sci, MaDIS, 1-1 Namiki, Tsukuba, Ibaraki 3050044, Japan

[2] Ridgelinez, Business Sci Unit, Tokyo, Japan

来源：

SCIENCE AND TECHNOLOGY OF ADVANCED MATERIALS-METHODS | 2022年 / 2卷 / 01期

关键词：

Word embedding; pre-training; BERT; literal information;

D O I：

10.1080/27660400.2022.2124831

中图分类号：

T [工业技术];

学科分类号：

08 ;

摘要：

A BERT (Bidirectional Encoder Representations from Transformers) model, which we named "MaterialBERT", has been generated using scientific papers in wide area of material science as a corpus. A new vocabulary list for tokenizer was generated using material science corpus. Two BERT models with different vocabulary lists for the tokenizer, one with the original one made by Google and the other newly made by the authors, were generated. Word vectors embedded during the pre-training with the two MaterialBERT models reasonably reflect the meanings of materials names in material-class clustering and in the relationship between base materials and their compounds or derivatives for not only inorganic materials but also organic materials and organometallic compounds. Fine-tuning with CoLA (The Corpus of Linguistic Acceptability) using the pre-trained MaterialBERT showed a higher score than the original BERT. The two MaterialBERTs could be also utilized as a starting point for transfer learning of a narrower domain-specific BERT. [GRAPHICS]

引用

页码：372 / 380

页数：9

共 50 条

[31] The language of classifying in introductory science texts
Darian, S
JOURNAL OF PRAGMATICS, 1997, 27 (06) : 815 - 839
[32] An architecture for language processing for scientific texts
Copestake, Ann
Corbett, Peter
Murray-Rust, Peter
Rupp, C. J.
Siddharthan, Advaith
Teufel, Simone
Waldron, Ben
PROCEEDINGS OF THE UK E-SCIENCE ALL HANDS MEETING 2006, 2006, : 614 - 621
[33] Extracting Intrauterine Device Usage from Clinical Texts using Natural Language Processing
Shi, Jianlin
Mowery, Danielle
Chapman, Wendy
Zhang, Mingyuan
Sanders, Jessica
Gawron, Lori
2017 IEEE INTERNATIONAL CONFERENCE ON HEALTHCARE INFORMATICS (ICHI), 2017, : 568 - 571
[34] Towards a Methodology for Comparing Legal Texts Based on Semantic, Storytelling and Natural Language Processing
Graziano, Mariangela
Di Martino, Beniamino
Cante, Luigi Colucci
Esposito, Antonio
Lupi, Pietro
COMPLEX, INTELLIGENT AND SOFTWARE INTENSIVE SYSTEMS, CISIS-2024, 2024, 87 : 343 - 352
[35] Introduction for artificial intelligence and law: special issue “natural language processing for legal texts”
Livio Robaldo
Serena Villata
Adam Wyner
Matthias Grabmair
Artificial Intelligence and Law, 2019, 27 : 113 - 115
[36] Introduction for artificial intelligence and law: special issue "natural language processing for legal texts"
Robaldo, Livio
Villata, Serena
Wyner, Adam
Grabmair, Matthias
ARTIFICIAL INTELLIGENCE AND LAW, 2019, 27 (02) : 113 - 115
[37] Determining Of Semantically Close Texts Of Stock Market News Using Natural Language Processing
Bosacheva, Tatiana
Magomedov, Shamil
Lebedev, Artem
INTERNATIONAL CONFERENCE ON ELECTRICAL, COMPUTER AND ENERGY TECHNOLOGIES (ICECET 2021), 2021, : 871 - 875
[38] Natural language processing in mental health applications using non-clinical texts
Calvo, Rafael A.
Milne, David N.
Hussain, M. Sazzad
Christensen, Helen
NATURAL LANGUAGE ENGINEERING, 2017, 23 (05) : 649 - 685
[39] Automatically Extractino Procedural Knowledge from Instructional Texts using Natural Language Processing
Zhang, Ziqi
Webster, Philip
Uren, Victoria
Varga, Andrea
Ciravegna, Fabio
LREC 2012 - EIGHTH INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION, 2012, : 520 - 527
[40] Polysemy in Controlled Natural Language Texts
Gruzitis, Normunds
Barzdins, Guntis
CONTROLLED NATURAL LANGUAGE, 2010, 5972 : 102 - 120

← 1 2 3 4 5 →