Facilitating Document Annotation Using Content and Querying Value

被引:6
|
作者
Ruiz, Eduardo J. [1 ]
Hristidis, Vagelis [1 ]
Ipeirotis, Panagiotis G. [2 ]
机构
[1] Univ Calif Riverside, Dept Comp Sci & Engn, Riverside, CA 92521 USA
[2] NYU, Informat Syst Grp, IOMS Dept, Leonard N Stern Sch Business, New York, NY 10012 USA
基金
美国国家科学基金会;
关键词
Document annotation; adaptive forms; collaborative platforms; DATABASE;
D O I
10.1109/TKDE.2012.224
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
A large number of organizations today generate and share textual descriptions of their products, services, and actions. Such collections of textual data contain significant amount of structured information, which remains buried in the unstructured text. While information extraction algorithms facilitate the extraction of structured relations, they are often expensive and inaccurate, especially when operating on top of text that does not contain any instances of the targeted structured information. We present a novel alternative approach that facilitates the generation of the structured metadata by identifying documents that are likely to contain information of interest and this information is going to be subsequently useful for querying the database. Our approach relies on the idea that humans are more likely to add the necessary metadata during creation time, if prompted by the interface; or that it is much easier for humans (and/or algorithms) to identify the metadata when such information actually exists in the document, instead of naively prompting users to fill in forms with information that is not available in the document. As a major contribution of this paper, we present algorithms that identify structured attributes that are likely to appear within the document, by jointly utilizing the content of the text and the query workload. Our experimental evaluation shows that our approach generates superior results compared to approaches that rely only on the textual content or only on the query workload, to identify attributes of interest.
引用
收藏
页码:336 / 349
页数:14
相关论文
共 50 条
  • [31] DOCUMENT ANNOTATION - TO WRITE, TYPE OR SPEAK
    TUCKER, P
    JONES, DM
    INTERNATIONAL JOURNAL OF MAN-MACHINE STUDIES, 1993, 39 (06): : 885 - 900
  • [32] Document image matching and annotation lifting
    Ye, M
    Bern, M
    Goldberg, D
    SIXTH INTERNATIONAL CONFERENCE ON DOCUMENT ANALYSIS AND RECOGNITION, PROCEEDINGS, 2001, : 753 - 760
  • [33] Towards effective and efficient NFS querying on XML document
    Li, Xiao-Guang
    Yu, Ge
    Gong, Jian
    Wang, Da-Ling
    Bao, Yu-Bin
    Jisuanji Xuebao/Chinese Journal of Computers, 2007, 30 (01): : 57 - 67
  • [34] Facilitating Understanding of Large Document Collections
    Bae, Jae Hyeon
    Xu, Weijia
    Esteva, Maria
    11TH INTERNATIONAL CONFERENCE ON DOCUMENT ANALYSIS AND RECOGNITION (ICDAR 2011), 2011, : 1334 - 1338
  • [35] Flexible Content Extraction and Querying for Videos
    Demir, Utku
    Koyuncu, Murat
    Yazici, Adnan
    Yilmaz, Turgay
    Sert, Mustafa
    FLEXIBLE QUERY ANSWERING SYSTEMS, 2011, 7022 : 460 - +
  • [36] Ontological extraction of content for text querying
    Andreasen, T
    Jensen, PA
    Nilsson, JF
    Paggio, P
    Pedersen, BS
    Thomsen, HE
    NATURAL LANGUAGE PROCESSING AND INFORMATION SYSTEMS, 2002, 2553 : 123 - 136
  • [37] Encoding and Querying Historic Map Content
    Scheider, Simon
    Jones, Jim
    Sanchez, Alber
    Kessler, Carsten
    CONNECTING A DIGITAL EUROPE THROUGH LOCATION AND PLACE, 2014, : 251 - 273
  • [38] Modeling and querying videos by content trajectories
    Aghbari, Z
    Kaneko, K
    Makinouchi, A
    2000 IEEE INTERNATIONAL CONFERENCE ON MULTIMEDIA AND EXPO, PROCEEDINGS VOLS I-III, 2000, : 463 - 466
  • [39] Querying image database by video content
    Wang, CH
    Lin, HC
    Shih, CC
    Tyan, HR
    Lin, CF
    Liao, HYM
    ADVANCES IN MULTIMEDIA INFORMATION PROCESSING - PCM 2002, PROCEEDING, 2002, 2532 : 159 - 166
  • [40] Resequencing and annotation of the Nostoc punctiforme ATTC 29133 genome: facilitating biofuel and high-value chemical production
    Moraes, Luis E.
    Blow, Matthew J.
    Hawley, Erik R.
    Piao, Hailan
    Kuo, Rita
    Chiniquy, Jennifer
    Shapiro, Nicole
    Woyke, Tanja
    Fadel, James G.
    Hess, Matthias
    AMB EXPRESS, 2017, 7