HTML text segmentation for Web page summarization by a key sentence extraction method

被引:0
|
作者
Sunayama, Wataru [1 ,3 ]
Iyama, Akihiro [2 ,4 ]
Yachida, Masahiko [2 ,5 ,6 ,7 ]
机构
[1] Faculty of Information Sciences, Hiroshima City University, Hiroshima, 731-3194, Japan
[2] Graduate School of Engineering Science, Osaka University, Toyonaka, 560-8531, Japan
[3] Department of Information Sciences, Hiroshima City University
[4] TIS, Inc.
[5] Graduate School of Engineering Science
[6] IPSJ
[7] RSJ
来源
Systems and Computers in Japan | 2006年 / 37卷 / 07期
关键词
The information displayed as the search result by search engines is important for quickly finding the desired information; In particular; the summary of each Web page in the search results is important for determining the Web page content; as well as for determining how the input search term is used in each Web page; namely; the relation between the search term and the Web page. However; the summaries of the search results in conventional search engines have problems such as extracting only the opening text and not containing the search term; or containing the search term but having the sentence truncated in the middle so that the context of the term or the content of the Web page cannot be determined. Therefore; a summary in sentence units is desirable; but since [!text type='HTML']HTML[!/text] text includes many nonsentence items that do not contain punctuation; if they are unprocessed; it is difficult for a key sentence extraction system that treats sentences as units to provide a summary. Thus; i n this paper; we propose an [!text type='HTML']HTML[!/text] text segmentation system that divides the source text of each Web page into meaningfully connected groups of text corresponding to sentences. We also verify experimentally that the text generated by this system can be used effectively in a Web page summarization. © 2006 Wiley Periodicals; Inc;
D O I
暂无
中图分类号
学科分类号
摘要
Journal article (JA)
引用
收藏
页码:26 / 36
相关论文
共 50 条
  • [31] Network video summarization based on key frame extraction via superpixel segmentation
    Jin, Haiyan
    Yu, Yang
    Li, Yumeng
    Xiao, Zhaolin
    TRANSACTIONS ON EMERGING TELECOMMUNICATIONS TECHNOLOGIES, 2022, 33 (06):
  • [32] Feature-based Unsupervised Method for Salient Sentence Ranking in Text Summarization Task
    Nguyen Minh Phuong
    Le The Anh
    PROCEEDINGS OF THE 2024 9TH INTERNATIONAL CONFERENCE ON INTELLIGENT INFORMATION TECHNOLOGY, ICIIT 2024, 2024, : 346 - 351
  • [33] Page segmentation and text extraction from gray scale image in microfilm format
    Yuan, Q
    Tan, CL
    DOCUMENT RECOGNITION AND RETRIEVAL VIII, 2001, 4307 : 323 - 332
  • [34] Web Page Classification Based on an Accurate Technique for Key Data Extraction
    Lassri, Safae
    Benlahmar, El Habib
    Tragha, Abderrahim
    ADVANCED INTELLIGENT SYSTEMS FOR SUSTAINABLE DEVELOPMENT (AI2SD'2020), VOL 2, 2022, 1418 : 1124 - 1131
  • [35] A segmentation method for web page analysis using shrinking and dividing
    Cao, Jiuxin
    Mao, Bo
    Luo, Junzhou
    INTERNATIONAL JOURNAL OF PARALLEL EMERGENT AND DISTRIBUTED SYSTEMS, 2010, 25 (02) : 93 - 104
  • [36] INFORMATION EXTRACTION VERSUS TEXT SEGMENTATION FOR WEB CONTENT MINING
    Fragkou, Pavlina
    INTERNATIONAL JOURNAL OF SOFTWARE ENGINEERING AND KNOWLEDGE ENGINEERING, 2013, 23 (08) : 1109 - 1137
  • [37] A comparative study on key phrase extraction methods in automatic Web Site Summarization
    Zhang, Yongzheng
    Milios, Evangelos
    Zincir-Heywood, Nur
    2007, Digital Information Research Foundation (05):
  • [38] Efficient Web Page Main Text Extraction towards Online News Analysis
    Zhou, Baoyao
    Xiong, Yuhong
    Liu, Wei
    ICEBE 2009: IEEE INTERNATIONAL CONFERENCE ON E-BUSINESS ENGINEERING, PROCEEDINGS, 2009, : 37 - 41
  • [39] EXTRACTIVE TEXT SUMMARIZATION BY FEATURE- BASED SENTENCE EXTRACTION USING RULE-BASED CONCEPT
    Naik, Siya Sadashiv
    Gaonkar, Manisha Naik
    2017 2ND IEEE INTERNATIONAL CONFERENCE ON RECENT TRENDS IN ELECTRONICS, INFORMATION & COMMUNICATION TECHNOLOGY (RTEICT), 2017, : 1364 - 1368
  • [40] A method of readability assessment for web documents using text features and HTML structures
    Yamasaki, Takahiro
    Tokiwa, Kin-Ichiroh
    IEEJ Transactions on Electronics, Information and Systems, 2012, 132 (09) : 1524 - 1532