Sentence alignment for monolingual comparable corpora

被引:0
|
作者
Barzilay, R [1 ]
Elhadad, N [1 ]
机构
[1] Cornell Univ, Dept Comp Sci, Ithaca, NY 14853 USA
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
We address the problem of sentence alignment for monolingual corpora, a phenomenon distinct from alignment in parallel corpora. Aligning large comparable corpora automatically would provide a valuable resource for learning of text-to-text rewriting rules. We incorporate context into the search for an optimal alignment in two complementary ways: learning rules for matching paragraphs using topic structure and further refining the matching through local alignment to find good sentence pairs. Evaluation shows that our alignment method outperforms state-of-the-art systems developed for the same task.
引用
收藏
页码:25 / 32
页数:8
相关论文
共 50 条
  • [31] Overview of Arabic Sentence Corpora
    Awdeh, Hussein
    Abdallah, Adelle
    Bernard, Gilles
    Hajjar, Mohammad
    PROCEEDINGS OF THE 13TH INTERNATIONAL JOINT CONFERENCE ON COMPUTATIONAL INTELLIGENCE (IJCCI), 2021, : 285 - 292
  • [32] Resolving translation ambiguity using monolingual corpora
    Qu, Y
    Grefenstette, G
    Evans, DA
    ADVANCES IN CROSS-LANGUAGE INFORMATION RETRIEVAL, 2003, 2785 : 223 - 241
  • [33] Extraction of name and transliteration in monolingual and parallel corpora
    Lin, T
    Wu, JC
    Chang, JS
    MACHINE TRANSLATION: FROM REAL USERS TO RESEARCH, PROCEEDINGS, 2004, 3265 : 177 - 186
  • [34] Polyglot synthesis using a mixture of monolingual corpora
    Latorre, J
    Iwano, K
    Furui, S
    2005 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOLS 1-5: SPEECH PROCESSING, 2005, : 1 - 4
  • [35] Integrated Parallel Sentence and Fragment Extraction from Comparable Corpora: A Case Study on Chinese-Japanese Wikipedia
    Chu, Chenhui
    Nakazawa, Toshiaki
    Kurohashi, Sadao
    ACM TRANSACTIONS ON ASIAN AND LOW-RESOURCE LANGUAGE INFORMATION PROCESSING, 2016, 15 (02)
  • [36] Acquiring synonyms from monolingual comparable texts
    Shimohata, M
    Sumita, E
    NATURAL LANGUAGE PROCESSING - IJCNLP 2005, PROCEEDINGS, 2005, 3651 : 233 - 244
  • [37] Named Entity Transliteration with Comparable Corpora
    Sproat, Richard
    Tao, Tao
    Zhai, ChengXiang
    COLING/ACL 2006, VOLS 1 AND 2, PROCEEDINGS OF THE CONFERENCE, 2006, : 73 - 80
  • [38] Bilingual comparable corpora and the training of translators
    Zanettin, F
    META, 1998, 43 (04) : 616 - 630
  • [39] Multimodal Comparable Corpora for Machine Translation
    Afli, Haithem
    Barrault, Loic
    Schwenk, Holger
    LREC 2014 - NINTH INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION, 2014,
  • [40] Complex sentence production in bilingual and monolingual children
    Nicoladis, Elena
    Luo, Amanda
    Vouronikos, George
    INTERNATIONAL JOURNAL OF BILINGUAL EDUCATION AND BILINGUALISM, 2024, 27 (07) : 936 - 945