HTML text segmentation for Web page summarization by a key sentence extraction method

被引:0
|
作者
Sunayama, Wataru [1 ,3 ]
Iyama, Akihiro [2 ,4 ]
Yachida, Masahiko [2 ,5 ,6 ,7 ]
机构
[1] Faculty of Information Sciences, Hiroshima City University, Hiroshima, 731-3194, Japan
[2] Graduate School of Engineering Science, Osaka University, Toyonaka, 560-8531, Japan
[3] Department of Information Sciences, Hiroshima City University
[4] TIS, Inc.
[5] Graduate School of Engineering Science
[6] IPSJ
[7] RSJ
来源
Systems and Computers in Japan | 2006年 / 37卷 / 07期
关键词
The information displayed as the search result by search engines is important for quickly finding the desired information; In particular; the summary of each Web page in the search results is important for determining the Web page content; as well as for determining how the input search term is used in each Web page; namely; the relation between the search term and the Web page. However; the summaries of the search results in conventional search engines have problems such as extracting only the opening text and not containing the search term; or containing the search term but having the sentence truncated in the middle so that the context of the term or the content of the Web page cannot be determined. Therefore; a summary in sentence units is desirable; but since [!text type='HTML']HTML[!/text] text includes many nonsentence items that do not contain punctuation; if they are unprocessed; it is difficult for a key sentence extraction system that treats sentences as units to provide a summary. Thus; i n this paper; we propose an [!text type='HTML']HTML[!/text] text segmentation system that divides the source text of each Web page into meaningfully connected groups of text corresponding to sentences. We also verify experimentally that the text generated by this system can be used effectively in a Web page summarization. © 2006 Wiley Periodicals; Inc;
D O I
暂无
中图分类号
学科分类号
摘要
Journal article (JA)
引用
收藏
页码:26 / 36
相关论文
共 50 条
  • [1] A Hybrid Text Summarization Method With Sentence-extraction
    Zhao, Xiaojuan
    PROCEEDINGS OF 2010 INTERNATIONAL CONFERENCE ON INFORMATION TECHNOLOGY AND INDUSTRIAL ENGINEERING, VOLS I AND II, 2010, : 729 - 733
  • [2] Automatic Summarization and Keyword Extraction from Web Page or Text File
    You, Xiangdong
    2019 IEEE 2ND INTERNATIONAL CONFERENCE ON COMPUTER AND COMMUNICATION ENGINEERING TECHNOLOGY (CCET), 2019, : 154 - 158
  • [3] Konkani Text Summarization By Sentence Extraction
    Rodrigues, Sheryl
    Fernandes, Sonia
    Pai, Anusha
    2019 10TH INTERNATIONAL CONFERENCE ON COMPUTING, COMMUNICATION AND NETWORKING TECHNOLOGIES (ICCCNT), 2019,
  • [4] A novel web page text information extraction method
    Wang, Chongjun
    Wei, Peng
    PROCEEDINGS OF 2019 IEEE 3RD INFORMATION TECHNOLOGY, NETWORKING, ELECTRONIC AND AUTOMATION CONTROL CONFERENCE (ITNEC 2019), 2019, : 2213 - 2218
  • [5] Development of Browser Extension for HTML']HTML Web Page Content Extraction
    Karabulut, Murat
    Mayda, Islam
    2ND INTERNATIONAL CONGRESS ON HUMAN-COMPUTER INTERACTION, OPTIMIZATION AND ROBOTIC APPLICATIONS (HORA 2020), 2020, : 17 - 22
  • [6] Text Summarization by Sentence Extraction Using Unsupervised Learning
    Garcia-Hernandez, Rene Arnulfo
    Montiel, Romyna
    Ledeneva, Yulia
    Rendon, Erendira
    Gelbukh, Alexander
    Cruz, Rafael
    MICAI 2008: ADVANCES IN ARTIFICIAL INTELLIGENCE, PROCEEDINGS, 2008, 5317 : 133 - +
  • [7] Key sentence based text summarization using Keywords and WordNet
    Dang, Chenghua
    Luo, Xinjun
    WSEAS Transactions on Computers, 2007, 6 (05): : 829 - 834
  • [8] Information-content based sentence extraction for text summarization
    Mallett, D
    Elding, J
    Nascimento, MA
    ITCC 2004: INTERNATIONAL CONFERENCE ON INFORMATION TECHNOLOGY: CODING AND COMPUTING, VOL 2, PROCEEDINGS, 2004, : 214 - 218
  • [9] Multi-document Text Summarization Using Sentence Extraction
    Ahuja, Ravinder
    Anand, Willson
    ARTIFICIAL INTELLIGENCE AND EVOLUTIONARY COMPUTATIONS IN ENGINEERING SYSTEMS, ICAIECES 2016, 2017, 517 : 235 - 242
  • [10] A Web Information Extraction method Based on HTML']HTML Parser
    Zhang, Zhiming
    Huang, Shuaishuai
    Li, Ping
    ADVANCED TECHNOLOGIES IN MANUFACTURING, ENGINEERING AND MATERIALS, PTS 1-3, 2013, 774-776 : 1802 - 1806