Unsupervised incremental acquisition of a thematic corpus from the Web

被引:0
|
作者
Duclaye, F [1 ]
Yvon, F [1 ]
Collin, O [1 ]
机构
[1] France Telecom, R&D, F-22307 Lannion, France
关键词
paraphrases; synonyms; machine learning; Web; automatic classification; EM;
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
This paper presents a nearly unsupervised learning methodology for automatically acquiring a thematic corpus from the Web. Relying on a bootstrapping mechanism, our system starts with one single linguistic expression of a given target, semantic relationship. It then samples the Web so as to progressively accumulate a corpus of potential examples of the same relationship. Sampling steps alternate with filtering steps, making it possible to keep the corpus thematically focused. The corpus is finally analysed to search for potential paraphrases of the initial expression of the semantic relationship. These paraphrases will eventually be used to improve our question-answering system. This paper focuses on the learning aspect of the system and reports experimental results regarding the effectiveness of our filtering strategy.
引用
收藏
页码:752 / 757
页数:6
相关论文
共 50 条
  • [1] Unsupervised acquisition of entailment relations from the Web
    Szpektor, Idan
    Tanev, Hristo
    Dagan, Ido
    Coppola, Bonaventura
    Kouylekov, Milen
    NATURAL LANGUAGE ENGINEERING, 2015, 21 (01) : 3 - 47
  • [2] Unsupervised context sensitive language acquisition from a large corpus
    Solan, Z
    Horn, D
    Ruppin, E
    Edelman, S
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 16, 2004, 16 : 961 - 968
  • [3] An Incremental Acquisition Method for Web Forensics
    Chen, Guangxuan
    Chen, Guangxiao
    Zhang, Lei
    Liu, Qiang
    INTERNATIONAL JOURNAL OF DIGITAL CRIME AND FORENSICS, 2021, 13 (06)
  • [4] Automatic acquisition of Chinese-English parallel corpus from the web
    Zhang, Ying
    Wu, Ke
    Gao, Jianfeng
    Vines, Phil
    ADVANCES IN INFORMATION RETRIEVAL, 2006, 3936 : 420 - 431
  • [5] Automatic Acquisition of Large-scale Academic Bilingual Parallel Corpus from the Web
    Han Yong
    Li Yu
    He Xiaoning
    Yang Muyun
    Lei Guohua
    2009 INTERNATIONAL CONFERENCE ON ASIAN LANGUAGE PROCESSING, 2009, : 318 - 321
  • [6] Unsupervised and incremental acquisition of and reasoning on holistic task knowledge for household robot companions
    Pardowitz, M.
    Zoellner, R.
    Dillmann, R.
    2006 IEEE/RSJ INTERNATIONAL CONFERENCE ON INTELLIGENT ROBOTS AND SYSTEMS, VOLS 1-12, 2006, : 5060 - +
  • [7] Unsupervised Acquisition of Machine Translation Corpus Combining LSTM-BiRNN Classifiers
    Piao, Yidan
    IET Conference Proceedings, 2024, 2024 (21): : 53 - 57
  • [8] Incremental Construction of an Associative Network from a Corpus
    Lemaire, Benoit
    Denhiere, Guy
    PROCEEDINGS OF THE TWENTY-SIXTH ANNUAL CONFERENCE OF THE COGNITIVE SCIENCE SOCIETY, 2004, : 825 - 830
  • [9] Unsupervised lexicon acquisition from speech and text
    Kurata, Gakuto
    Mori, Shinsuke
    Itoh, Nobuyasu
    Nishimura, Masafumi
    2007 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOL IV, PTS 1-3, 2007, : 421 - +
  • [10] An approach for incremental knowledge acquisition from text
    Ruiz-Sánchez, JM
    Valencia-García, R
    Fernández-Breis, JT
    Martínez-Béjar, R
    Compton, P
    EXPERT SYSTEMS WITH APPLICATIONS, 2003, 25 (01) : 77 - 86