Using clustering for web information extraction

被引:0
|
作者
Phong, Le [1 ]
Vuong, Bao [1 ]
Gao, Xiaoying [1 ]
机构
[1] Victoria Univ Wellington, Sch Math Stat & Comp Sci, POB 600, Wellington, New Zealand
关键词
information extraction; clustering; Smith-Waterman algorithm;
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
This paper introduces an approach that achieves automated data extraction from semi-structured Web pages by clustering. Both HTML tags and the textual features of text tokens are considered for similarity comparison. The first clustering process groups similar text tokens into the same text clusters, and the second clustering process groups similar data tuples into tuple clusters. A tuple cluster is a strong candidate of a repetitive data region.
引用
收藏
页码:415 / +
页数:2
相关论文
共 50 条
  • [21] Web Document Information Extraction using Class Attribute Approach
    Srivastava, Shobhit
    Haroon, Mohd
    Bajaj, Abhishek
    2013 4TH IEEE INTERNATIONAL CONFERENCE ON COMPUTER & COMMUNICATION TECHNOLOGY (ICCCT), 2013, : 17 - 22
  • [22] Web information extraction using generalized hidden Markov model
    Zhong, Ping
    Chen, Jinlin
    Cook, Terry
    2006 1ST IEEE WORKSHOP ON HOT TOPICS IN WEB SYSTEMS AND TECHNOLOGIES, 2006, : 142 - +
  • [23] Web Page Information Extraction System by Using Deep Learning
    Pakyurek, Muhammet
    Sezgin, Mehmet Selman
    Kulac, Selman
    2019 4TH INTERNATIONAL CONFERENCE ON COMPUTER SCIENCE AND ENGINEERING (UBMK), 2019, : 361 - 365
  • [24] Automatic Web Information Extraction and Alignment using CTVS Technique
    Pandarge, Sangmesh S.
    Chakkarwar, V. A.
    2017 INTERNATIONAL CONFERENCE OF ELECTRONICS, COMMUNICATION AND AEROSPACE TECHNOLOGY (ICECA), VOL 2, 2017, : 94 - 99
  • [25] Information extraction for deep web using repetitive subject pattern
    Wachirawut Thamviset
    Sartra Wongthanavasu
    World Wide Web, 2014, 17 : 1109 - 1139
  • [26] A method for web information extraction
    Lam, Man I.
    Gong, Zhiguo
    Muyeba, Maybin
    PROGRESS IN WWW RESEARCH AND DEVELOPMENT, PROCEEDINGS, 2008, 4976 : 383 - +
  • [27] Joint Information Extraction from the Web Using Linked Data
    Augenstein, Isabelle
    SEMANTIC WEB - ISWC 2014, PT II, 2014, 8797 : 505 - 512
  • [28] A Declarative Approach to Information Extraction Using Web Service API
    Samuel, John
    Rey, Christophe
    WEB ENGINEERING (ICWE 2016), 2016, 9671 : 613 - 615
  • [29] Information extraction for deep web using repetitive subject pattern
    Thamviset, Wachirawut
    Wongthanavasu, Sartra
    WORLD WIDE WEB-INTERNET AND WEB INFORMATION SYSTEMS, 2014, 17 (05): : 1109 - 1139
  • [30] Information extraction for the semantic web
    Baumgartner, R
    Eiter, T
    Gottlob, G
    Herzog, M
    Koch, C
    REASONING WEB, 2005, 3564 : 275 - 289