HTML text segmentation for Web page summarization by a key sentence extraction method

被引:0
|
作者
Sunayama, Wataru [1 ,3 ]
Iyama, Akihiro [2 ,4 ]
Yachida, Masahiko [2 ,5 ,6 ,7 ]
机构
[1] Faculty of Information Sciences, Hiroshima City University, Hiroshima, 731-3194, Japan
[2] Graduate School of Engineering Science, Osaka University, Toyonaka, 560-8531, Japan
[3] Department of Information Sciences, Hiroshima City University
[4] TIS, Inc.
[5] Graduate School of Engineering Science
[6] IPSJ
[7] RSJ
来源
Systems and Computers in Japan | 2006年 / 37卷 / 07期
关键词
The information displayed as the search result by search engines is important for quickly finding the desired information; In particular; the summary of each Web page in the search results is important for determining the Web page content; as well as for determining how the input search term is used in each Web page; namely; the relation between the search term and the Web page. However; the summaries of the search results in conventional search engines have problems such as extracting only the opening text and not containing the search term; or containing the search term but having the sentence truncated in the middle so that the context of the term or the content of the Web page cannot be determined. Therefore; a summary in sentence units is desirable; but since [!text type='HTML']HTML[!/text] text includes many nonsentence items that do not contain punctuation; if they are unprocessed; it is difficult for a key sentence extraction system that treats sentences as units to provide a summary. Thus; i n this paper; we propose an [!text type='HTML']HTML[!/text] text segmentation system that divides the source text of each Web page into meaningfully connected groups of text corresponding to sentences. We also verify experimentally that the text generated by this system can be used effectively in a Web page summarization. © 2006 Wiley Periodicals; Inc;
D O I
暂无
中图分类号
学科分类号
摘要
Journal article (JA)
引用
收藏
页码:26 / 36
相关论文
共 50 条
  • [21] A Novel Method for the Web page Segmentation And Identification
    Wang, Jing
    Liu, Zhijing
    2009 INTERNATIONAL CONFERENCE ON COMPUTER ENGINEERING AND TECHNOLOGY, VOL I, PROCEEDINGS, 2009, : 229 - 231
  • [22] A Method of Readability Assessment for Web Documents Using Text Features and HTML']HTML Structures
    Yamasaki, Takahiro
    Tokiwa, Kin-Ichiroh
    ELECTRONICS AND COMMUNICATIONS IN JAPAN, 2014, 97 (10) : 1 - 10
  • [23] A Web Page Segmentation Method based on Page Layouts and Title Blocks
    Sano, Hiroyuki
    Shiramatsu, Shun
    Ozono, Tadachika
    Shintani, Toramatsu
    INTERNATIONAL JOURNAL OF COMPUTER SCIENCE AND NETWORK SECURITY, 2011, 11 (10): : 84 - 90
  • [24] A Visual Based Page Segmentation for Deep Web Data Extraction
    Palekar, Vikas R.
    PROCEEDINGS OF THE INTERNATIONAL CONFERENCE ON SOFT COMPUTING FOR PROBLEM SOLVING (SOCPROS 2011), VOL 2, 2012, 131 : 791 - 804
  • [25] A sentence scoring method for extractive text summarization based on natural language queries
    I.T Department, G.V.P College of Engineering, Visakhapatnam, Andhra Pradesh 530048, India
    不详
    Int. J. Comput. Sci. Issues, 3 (259-262):
  • [26] Feature Priority Based Sentence Filtering Method for Extractive Automatic Text Summarization
    Meena, Yogesh Kumar
    Gopalani, Dinesh
    INTERNATIONAL CONFERENCE ON COMPUTER, COMMUNICATION AND CONVERGENCE (ICCC 2015), 2015, 48 : 728 - 734
  • [27] Indonesian Automatic Text Summarization Based on A New Clustering Method in Sentence Level
    Cai, Zefeng
    Lin, Nankai
    Ma, Chuyu
    Jiang, Shengyi
    BDE 2019: 2019 INTERNATIONAL CONFERENCE ON BIG DATA ENGINEERING, 2019, : 24 - 29
  • [28] A Web Content Extraction Method Base on Punctuation Distribution and HTML']HTML Tag Similarity
    Gong, Nan
    Fan, Chunxiao
    Wu, Yuexin
    Ming, Yue
    LISS 2013, 2015, : 803 - 810
  • [29] A Novel Chinese Text Summarization Approach Using Sentence Extraction Based on Kernel Words Recognition
    Yang, Weijie
    Dai, Ruwei
    Cui, Xia
    FIFTH INTERNATIONAL CONFERENCE ON FUZZY SYSTEMS AND KNOWLEDGE DISCOVERY, VOL 4, PROCEEDINGS, 2008, : 134 - 139
  • [30] An effective sentence-extraction technique using contextual information and statistical approaches for text summarization
    Ko, Youngjoong
    Seo, Jungyun
    PATTERN RECOGNITION LETTERS, 2008, 29 (09) : 1366 - 1371