Extracting Records from the Web Using a Signal Processing Approach

被引:4
|
作者
Velloso, Roberto Panerai [1 ]
Dorneles, Carina F. [1 ]
机构
[1] Univ Fed Santa Catarina, Florianopolis, SC, Brazil
关键词
web mining; record extraction; structure detection; information retrieval; record alignment; ALGORITHM;
D O I
10.1145/3132847.3132875
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Extracting records from web pages enables a number of important applications and has immense value due to the amount and diversity of available information that can be extracted. This problem, although vastly studied, remains open because it is not a trivial one. Due to the scale of data, a feasible approach must be both automatic and efficient (and of course effective). We present here a novel approach, fully automatic and computationally efficient, using signal processing techniques to detect regularities and patterns in the structure of web pages. Our approach segments the web page, detects the data regions within it, identifies the records boundaries and aligns the records. Results show high f-score and linearithmic time complexity behaviour.
引用
收藏
页码:197 / 206
页数:10
相关论文
共 50 条
  • [31] Extracting knowledge from web search engine using wikipedia
    Kanavos, Andreas
    Makris, Christos
    Plegas, Yannis
    Theodoridis, Evangelos
    Communications in Computer and Information Science, 2013, 384 : 100 - 109
  • [32] Extracting Knowledge from Web Search Engine Using Wikipedia
    Kanavos, Andreas
    Makris, Christos
    Plegas, Yannis
    Theodoridis, Evangelos
    ENGINEERING APPLICATIONS OF NEURAL NETWORKS, PT II, 2013, 384 : 100 - 109
  • [33] Extracting instances of relations from Web documents using redundancy
    de Boer, Viktor
    van Someren, Maarten
    Wielinga, Bob J.
    SEMANTIC WEB: RESEARCH AND APPLICATIONS, PROCEEDINGS, 2006, 4011 : 245 - 258
  • [34] An automatic web wrapper for extracting information from web sources, using clustering techniques
    Papadakis, N
    Skoutas, D
    Raftopoulos, K
    Varvarigou, T
    2005 SYMPOSIUM ON APPLICATIONS AND THE INTERNET, PROCEEDINGS, 2005, : 24 - 30
  • [35] Extracting reflection with wavelet transform in vibroseis signal processing
    Jiang, Zhongjin
    Qiu, Xiaojun
    Lin, Jun
    Chen, Zubin
    JOURNAL OF GEOPHYSICS AND ENGINEERING, 2006, 3 (03) : 236 - 242
  • [36] Extracting Acoustic Emission Signal feature of Grinding Processing
    Huo Xiaojing
    Teng Jiaxu
    Wang Wendi
    Shen Aimin
    Yang Junwei
    ADVANCES IN MATERIALS AND MATERIALS PROCESSING IV, PTS 1 AND 2, 2014, 887-888 : 1175 - +
  • [37] A new approach for the prediction of the thermosonic signal from vibration records
    Morbidini, M
    Cawley, P
    Barden, TJ
    Almond, DP
    Duffour, P
    REVIEW OF PROGRESS IN QUANTITATIVE NONDESTRUCTIVE EVALUATION, VOLS 25A AND 25B, 2006, 820 : 558 - 565
  • [38] Extracting Family History Information From Electronic Health Records: Natural Language Processing Analysis
    Rybinski, Maciej
    Dai, Xiang
    Singh, Sonit
    Karimi, Sarvnaz
    Nguyen, Anthony
    JMIR MEDICAL INFORMATICS, 2021, 9 (04)
  • [39] Broadband array processing using signal enhancement approach
    Kim, YS
    1997 IEEE 47TH VEHICULAR TECHNOLOGY CONFERENCE PROCEEDINGS, VOLS 1-3: TECHNOLOGY IN MOTION, 1997, : 178 - 182
  • [40] Processing forced vibration test records of structural systems using the analytic signal
    Celik, Ozan Cem
    Gulkan, Hakki Polat
    JOURNAL OF VIBRATION AND CONTROL, 2021, 27 (19-20) : 2253 - 2267