Facilitating Document Annotation Using Content and Querying Value

被引:6
|
作者
Ruiz, Eduardo J. [1 ]
Hristidis, Vagelis [1 ]
Ipeirotis, Panagiotis G. [2 ]
机构
[1] Univ Calif Riverside, Dept Comp Sci & Engn, Riverside, CA 92521 USA
[2] NYU, Informat Syst Grp, IOMS Dept, Leonard N Stern Sch Business, New York, NY 10012 USA
基金
美国国家科学基金会;
关键词
Document annotation; adaptive forms; collaborative platforms; DATABASE;
D O I
10.1109/TKDE.2012.224
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
A large number of organizations today generate and share textual descriptions of their products, services, and actions. Such collections of textual data contain significant amount of structured information, which remains buried in the unstructured text. While information extraction algorithms facilitate the extraction of structured relations, they are often expensive and inaccurate, especially when operating on top of text that does not contain any instances of the targeted structured information. We present a novel alternative approach that facilitates the generation of the structured metadata by identifying documents that are likely to contain information of interest and this information is going to be subsequently useful for querying the database. Our approach relies on the idea that humans are more likely to add the necessary metadata during creation time, if prompted by the interface; or that it is much easier for humans (and/or algorithms) to identify the metadata when such information actually exists in the document, instead of naively prompting users to fill in forms with information that is not available in the document. As a major contribution of this paper, we present algorithms that identify structured attributes that are likely to appear within the document, by jointly utilizing the content of the text and the query workload. Our experimental evaluation shows that our approach generates superior results compared to approaches that rely only on the textual content or only on the query workload, to identify attributes of interest.
引用
收藏
页码:336 / 349
页数:14
相关论文
共 50 条
  • [21] Automatic image annotation using visual content and folksonomies
    Stefanie Lindstaedt
    Roland Mörzinger
    Robert Sorschag
    Viktoria Pammer
    Georg Thallinger
    Multimedia Tools and Applications, 2009, 42 : 97 - 113
  • [22] Automatic image annotation using visual content and folksonomies
    Lindstaedt, Stefanie
    Moerzinger, Roland
    Sorschag, Robert
    Pammer, Viktoria
    Thallinger, Georg
    MULTIMEDIA TOOLS AND APPLICATIONS, 2009, 42 (01) : 97 - 113
  • [23] Semi-structured Document Annotation Using Entity and Relation Types
    Kundu, Arpita
    Ghosh, Subhasish
    Bhattacharya, Indrajit
    MACHINE LEARNING AND KNOWLEDGE DISCOVERY IN DATABASES, ECML PKDD 2021: RESEARCH TRACK, PT III, 2021, 12977 : 52 - 68
  • [24] EachWiki: Facilitating Wiki Authoring by Annotation Suggestion
    Wang, Haofen
    Fu, Linyun
    Jin, Wei
    Yu, Yong
    ACM TRANSACTIONS ON INTELLIGENT SYSTEMS AND TECHNOLOGY, 2012, 3 (04)
  • [25] Facilitating functional annotation of chicken microarray data
    Buza, Teresia J.
    Kumar, Ranjit
    Gresham, Cathy R.
    Burgess, Shane C.
    McCarthy, Fiona M.
    BMC BIOINFORMATICS, 2009, 10 : S2
  • [26] Facilitating functional annotation of chicken microarray data
    Teresia J Buza
    Ranjit Kumar
    Cathy R Gresham
    Shane C Burgess
    Fiona M McCarthy
    BMC Bioinformatics, 10
  • [27] Web-based annotation and collaboration - Electronic document annotation using a standards-compliant web browser
    Harmon, Trev
    WEBIST 2007: PROCEEDINGS OF THE THIRD INTERNATIONAL CONFERENCE ON WEB INFORMATION SYSTEMS AND TECHNOLOGIES, VOL SEBEG/EL: SOCIETY, E-BUSINESS AND E-GOVERNMENT, E-LEARNING, 2007, : 321 - 329
  • [28] GenoQuery: a new querying module for functional annotation in a genomic warehouse
    Lemoine, Frederic
    Labedan, Bernard
    Froidevaux, Christine
    BIOINFORMATICS, 2008, 24 (13) : I322 - I329
  • [29] Document search using content, structure and properties
    不详
    PART-WHOLE REASONING IN AN OBJECT-CENTERED FRAMWORK, 2000, 1771 : 145 - 165
  • [30] Text Representation for Efficient Document Annotation
    Seifert, Christin
    Ulbrich, Eva
    Kern, Roman
    Granitzer, Michael
    JOURNAL OF UNIVERSAL COMPUTER SCIENCE, 2013, 19 (03) : 383 - 405