Big Scholarly Data in CiteSeerX: Information Extraction from the Web

被引:5
|
作者
Ororbia, Alexander G., II [1 ]
Wu, Jian [1 ]
Khabsa, Madian [1 ]
Williams, Kyle [1 ]
Giles, C. Lee [1 ]
机构
[1] Penn State Univ, University Pk, PA 16802 USA
关键词
scholarly big data; citeseerx; information acquisition and extraction; digital library search engine; intelligent systems; METADATA EXTRACTION; TABLE;
D O I
10.1145/2740908.2741736
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
We examine CiteSeerX, an intelligent system designed with the goal of automatically acquiring and organizing largescale collections of scholarly documents from the world wide web. From the perspective of automatic information extraction and modes of alternative search, we examine various functional aspects of this complex system in order to investigate and explore ongoing and future research developments(1).
引用
收藏
页码:597 / 602
页数:6
相关论文
共 50 条
  • [41] Web robot detection in the scholarly information environment
    Huntington, Paul
    Nicholas, David
    Jamali, Hamid R.
    JOURNAL OF INFORMATION SCIENCE, 2008, 34 (05) : 726 - 741
  • [42] Information extraction from the Web: System and techniques
    Xiao, L
    Wissmann, D
    Brown, M
    Jablonski, S
    APPLIED INTELLIGENCE, 2004, 21 (02) : 195 - 224
  • [43] Information Extraction from the Web: System and Techniques
    Luo Xiao
    Dieter Wissmann
    Michael Brown
    Stephan Jablonski
    Applied Intelligence, 2004, 21 : 195 - 224
  • [44] Visual extraction of information from web pages
    Della Penna, Giuseppe
    Magazzeni, Daniele
    Orefice, Sergio
    JOURNAL OF VISUAL LANGUAGES AND COMPUTING, 2010, 21 (01): : 23 - 32
  • [45] MORTY: Structured Summarization for Targeted Information Extraction from Scholarly Articles
    Jaradeh, Mohamad Yaser
    Stocker, Markus
    Auer, Soeren
    FROM BORN-PHYSICAL TO BORN-VIRTUAL: AUGMENTING INTELLIGENCE IN DIGITAL LIBRARIES, ICADL 2022, 2022, 13636 : 290 - 300
  • [46] Information extraction from Web pages using semi-structured data alignment
    Kuboyama, Tetsuji
    Miyahara, Tetsuhiro
    Hirokawa, Sachio
    Itou, Eisuke
    WMSCI 2005: 9th World Multi-Conference on Systemics, Cybernetics and Informatics, Vol 1, 2005, : 42 - 47
  • [47] Unsupervised information extraction from unstructured, ungrammatical data sources on the World Wide Web
    Matthew Michelson
    Craig A. Knoblock
    International Journal of Document Analysis and Recognition (IJDAR), 2007, 10 : 211 - 226
  • [48] Unsupervised information extraction from unstructured, ungrammatical data sources on the World Wide Web
    Michelson, Matthew
    Knoblock, Craig A.
    INTERNATIONAL JOURNAL ON DOCUMENT ANALYSIS AND RECOGNITION, 2007, 10 (3-4) : 211 - 226
  • [49] Preprocessing framework for scholarly big data management
    Khan, Samiya
    Alam, Mansaf
    MULTIMEDIA TOOLS AND APPLICATIONS, 2023, 82 (25) : 39719 - 39743
  • [50] Preprocessing framework for scholarly big data management
    Samiya Khan
    Mansaf Alam
    Multimedia Tools and Applications, 2023, 82 : 39719 - 39743