Cross-Lingual Information Retrieval from Multilingual Construction Documents Using Pretrained Language Models

被引:2
|
作者
Kim, Jungyeon [1 ]
Chung, Sehwan [1 ]
Chi, Seokho [1 ,2 ]
机构
[1] Seoul Natl Univ, Dept Civil & Environm Engn, Seoul 08826, South Korea
[2] Seoul Natl Univ, Inst Construct & Environm Engn, Seoul 08826, South Korea
基金
新加坡国家研究基金会;
关键词
D O I
10.1061/JCEMD4.COENG-14273
中图分类号
TU [建筑科学];
学科分类号
0813 ;
摘要
The growth of the global construction market has attracted international companies to participate in overseas projects. Overseas projects are extremely dynamic with numerous uncertainties, raising the need to collect information about construction in host countries. Due to the vast amounts of text data in the construction industry, an automated method, specifically information retrieval, is required to find the necessary information. Previous studies have suggested automated methods to review various construction documents. However, these studies required substantial manual effort and mainly focused on only one language, resulting in loss of vital information because it is buried in documents written in the host country's language. To address these limitations, this study proposes a cross-lingual information retrieval (CLIR) framework using pretrained Bidirectional Encoder Representations from Transformers (BERT) models to retrieve information from multilingual construction documents. The proposed framework employs language models (i.e., monolingual, multilingual, and cross-lingual) and trains these models on a construction data set to enhance their ability in construction-specific text. The framework achieved reliable performance of retrieval, even with minimal additional training using domain-specific data. The results indicate that training on the domain data set raises the level of retrieval, increasing the mean reciprocal rank of a specific task by up to 0.2128. With the employment of a monolingual model with machine translation, CLIR in a specific domain could be performed effectively without the need for a labeled data set. The suggested CLIR framework offers a practical alternative for dealing with construction documents in overseas projects, reducing time and cost while improving risk identification and mitigation.
引用
收藏
页数:15
相关论文
共 50 条
  • [21] Cross-lingual Language Model Pretraining for Retrieval
    Yu, Puxuan
    Fei, Hongliang
    Li, Ping
    PROCEEDINGS OF THE WORLD WIDE WEB CONFERENCE 2021 (WWW 2021), 2021, : 1029 - 1039
  • [22] Can Monolingual Pretrained Models Help Cross-Lingual Classification?
    Chi, Zewen
    Dong, Li
    Wei, Furu
    Mao, Xian-Ling
    Huang, Heyan
    1ST CONFERENCE OF THE ASIA-PACIFIC CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS AND THE 10TH INTERNATIONAL JOINT CONFERENCE ON NATURAL LANGUAGE PROCESSING (AACL-IJCNLP 2020), 2020, : 12 - 17
  • [23] Supporting Arabic Cross-Lingual Retrieval Using Contextual Information
    Ahmed, Farag
    Nuernberger, Andreas
    Nitsche, Marcus
    MULTIDISCIPLINARY INFORMATION RETRIEVAL, 2011, 6653 : 30 - 45
  • [24] Cross-lingual information retrieval by feature vectors
    Lilleng, Jeanine
    Tomassen, Stein L.
    NATURAL LANGUAGE PROCESSING AND INFORMATION SYSTEMS, PROCEEDINGS, 2007, 4592 : 229 - +
  • [25] Correction: Enhancing Cross-lingual Biomedical Concept Normalization Using Deep Neural Network Pretrained Language Models
    Ying-Chi Lin
    Phillip Hoffmann
    Erhard Rahm
    SN Computer Science, 3 (6)
  • [26] Dictionary methods for cross-lingual information retrieval
    Ballesteros, L
    Croft, B
    DATABASE AND EXPERT SYSTEMS APPLICATIONS, 1996, 1134 : 791 - 801
  • [27] A system for supporting cross-lingual information retrieval
    Capstick, J
    Diagne, AK
    Erbach, G
    Uszkoreit, H
    Leisenberg, A
    Leisenberg, M
    INFORMATION PROCESSING & MANAGEMENT, 2000, 36 (02) : 275 - 289
  • [28] Translating Justice: A Cross-Lingual Information Retrieval System for Maltese Case Law Documents
    Azzopardi, Joel
    ADVANCES IN INFORMATION RETRIEVAL, ECIR 2024, PT V, 2024, 14612 : 236 - 240
  • [29] Improving Pretrained Cross-Lingual Language Models via Self-Labeled Word Alignment
    Chi, Zewen
    Dong, Li
    Zheng, Bo
    Huang, Shaohan
    Mao, Xian-Ling
    Huang, Heyan
    Wei, Furu
    59TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS AND THE 11TH INTERNATIONAL JOINT CONFERENCE ON NATURAL LANGUAGE PROCESSING (ACL-IJCNLP 2021), VOL 1, 2021, : 3418 - 3430
  • [30] A Learning to rank framework based on cross-lingual loss function for cross-lingual information retrieval
    Ghanbari, Elham
    Shakery, Azadeh
    APPLIED INTELLIGENCE, 2022, 52 (03) : 3156 - 3174