Extracting Threshold Conceptual Structures from Web Documents

被引:3
|
作者
Ciobanu, Gabriel [1 ]
Horne, Ross [1 ]
Vaideanu, Cristian [2 ]
机构
[1] Romanian Acad, Inst Comp Sci, Iasi, Romania
[2] AI Cuza Univ Ia, Fac Math, Iasi, Romania
来源
关键词
D O I
10.1007/978-3-319-08389-6_12
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
In this paper we describe an iterative approach based on formal concept analysis to refine the information retrieval process. Based on weights for ranking documents we define a weighted formal context. We use a Galois connection to introduce a new type of formal concept that allows us to work with specific thresholds for searching words in Web documents. By increasing the threshold, we obtain smaller lattices with more relevant concepts, thus improving the retrieval of more specific items. We use techniques for processing large data sets in parallel, to generate sequences of Galois lattices, overcoming the time complexity of building a lattice for an entire large context.
引用
收藏
页码:130 / 144
页数:15
相关论文
共 50 条
  • [21] Extracting relevant snippets from web documents through language model based text segmentation
    Li, Qing
    Candan, K. Selcuk
    Qi, Yan
    PROCEEDINGS OF THE IEEE/WIC/ACM INTERNATIONAL CONFERENCE ON WEB INTELLIGENCE: WI 2007, 2007, : 287 - 290
  • [22] EXTRACTING THE MAIN CONTENT OF WEB DOCUMENTS BASED ON A NAIVE SMOOTHING METHOD
    Mohammadzadeh, Hadi
    Gottron, Thomas
    Schweiggert, Franz
    Nakhaeizadeh, Gholamreza
    KDIR 2011: PROCEEDINGS OF THE INTERNATIONAL CONFERENCE ON KNOWLEDGE DISCOVERY AND INFORMATION RETRIEVAL, 2011, : 470 - 475
  • [23] Fixing the Threshold for Effective Detection of Near Duplicate Web Documents in Web Crawling
    Narayana, V. A.
    Premchand, P.
    Govardhan, A.
    ADVANCED DATA MINING AND APPLICATIONS, ADMA 2010, PT I, 2010, 6440 : 169 - 180
  • [24] Extracting semantic relationships between terms from PC documents and its applications to web search personalization
    Ohshima, H
    Oyama, S
    Tanaka, K
    FRONTIERS OF WWW RESEARCH AND DEVELOPMENT - APWEB 2006, PROCEEDINGS, 2006, 3841 : 579 - 590
  • [25] A Study of Extracting Knowledge from Guideline Documents
    Taboada, M.
    Meizoso, M.
    Martinez, D.
    Tellado, S.
    COMPUTER AIDED SYSTEMS THEORY - EUROCAST 2009, 2009, 5717 : 195 - +
  • [26] Extracting indices from Japanese legal documents
    Tho Thi Ngoc Le
    Shirai, Kiyoaki
    Minh Le Nguyen
    Shimazu, Akira
    ARTIFICIAL INTELLIGENCE AND LAW, 2015, 23 (04) : 315 - 344
  • [27] Extracting digital fingerprints from Chinese documents
    Liu, Guo-Hua
    Ma, Hui-Dong
    Li, Xu
    Liang, Peng
    CIS: 2007 INTERNATIONAL CONFERENCE ON COMPUTATIONAL INTELLIGENCE AND SECURITY, PROCEEDINGS, 2007, : 438 - 441
  • [28] Extracting mathematical expressions from postscript documents
    Department of Precision Machinery and Precision Instrumentation, University of Science and Technology of China, Hefei 230027, China
    不详
    Shu Ju Cai Ji Yu Chu Li, 2008, 4 (454-458):
  • [29] Extracting Topical Phrases from Clinical Documents
    He, Yulan
    THIRTIETH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, 2016, : 2957 - 2963
  • [30] Extracting Time Information from Korean Documents
    Lee, Seung-Dong
    Jeong, Young-Seob
    2023 IEEE INTERNATIONAL CONFERENCE ON BIG DATA AND SMART COMPUTING, BIGCOMP, 2023, : 407 - 409