Improving Access to Large-scale Digital Libraries Through Semantic-enhanced Search and Disambiguation

被引:9
|
作者
Hinze, Annika [1 ]
Taube-Schock, Craig [1 ]
Bainbridge, David [1 ]
Matamua, Rangi [2 ]
Downie, J. Stephen [3 ]
机构
[1] Univ Waikato, Comp Sci, Hamilton, New Zealand
[2] Univ Waikato, Maori & Pacific Dev, Hamilton, New Zealand
[3] Univ Illinois, Lib & Informat Sci, Chicago, IL 60680 USA
关键词
QUERY EXPANSION; SYSTEM;
D O I
10.1145/2756406.2756920
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
With 13,000,000 volumes comprising 4.5 billion pages of text, it is currently very difficult for scholars to locate relevant sets of documents that are useful in their research from the HathiTrust Digital Libary (HTDL) using traditional lexically-based retrieval techniques. Existing document search tools and document clustering approaches use purely lexical analysis, which cannot address the inherent ambiguity of natural language. A semantic search approach offers the potential to overcome the shortcoming of lexical search, but-even if an appropriate network of ontologies could be decided upon-it would require a full semantic markup of each document. In this paper, we present a conceptual design and report on the initial implementation of a new framework that affords the benefits of semantic search while minimizing the problems associated with applying existing semantic analysis at scale. Our approach avoids the need for complete semantic document markup using pre-existing ontologies by developing an automatically generated Concept-in-Context (CiC) network seeded by a priori analysis of Wikipedia texts and identification of semantic metadata. Our Capisco system analyzes documents by the semantics and context of their content. The disambiguation of search queries is done interactively, to fully utilize the domain knowledge of the scholar. Our method achieves a form of semantic-enhanced search that simultaneously exploits the proven scale benefits provided by lexical indexing.
引用
收藏
页码:147 / 156
页数:10
相关论文
共 50 条
  • [1] Semantic-Enhanced Visual Abstraction of Large-Scale Multivariate Graphs
    Liu Y.-H.
    Zhang R.-M.
    Zhang J.-Y.
    Gao F.
    Gao Y.
    Zhou Z.-G.
    Jisuanji Xuebao/Chinese Journal of Computers, 2020, 43 (01): : 136 - 150
  • [2] Improving large-scale search engines with semantic annotations
    Fuentes-Lorenzo, Damaris
    Fernandez, Norberto
    Fisteus, Jesus A.
    Sanchez, Luis
    EXPERT SYSTEMS WITH APPLICATIONS, 2013, 40 (06) : 2287 - 2296
  • [3] Large Scale Author Name Disambiguation in Digital Libraries
    Khabsa, Madian
    Treeratpituk, Pucktada
    Giles, C. Lee
    2014 IEEE INTERNATIONAL CONFERENCE ON BIG DATA (BIG DATA), 2014,
  • [4] Building large-scale digital libraries
    Schatz, B
    Chen, HC
    COMPUTER, 1996, 29 (05) : 22 - 26
  • [5] Information Access through Search Engines and Digital Libraries
    Smith, Alastair G.
    ELECTRONIC LIBRARY, 2009, 27 (01): : 187 - 188
  • [6] Addressing the challenge of managing large-scale digital multimedia libraries
    Gurrin, Cathal
    Aarflot, Tjalve
    Sav, Sorin
    Johansen, Dag
    Journal of Digital Information Management, 2009, 7 (05): : 261 - 269
  • [7] Improving Search Engines via Large-Scale Physiological Sensing
    White, Ryen W.
    Ma, Ryan
    SIGIR'17: PROCEEDINGS OF THE 40TH INTERNATIONAL ACM SIGIR CONFERENCE ON RESEARCH AND DEVELOPMENT IN INFORMATION RETRIEVAL, 2017, : 881 - 884
  • [8] Complementarity in Subject Metadata in Large-Scale Digital Libraries: A Comparative Analysis
    Zavalina, Oksana
    CATALOGING & CLASSIFICATION QUARTERLY, 2014, 52 (01) : 77 - 89
  • [9] Fast user notification in large-scale digital libraries: Experiments and results
    Frej, H. Belhaj
    Rigaux, P.
    Spyratos, N.
    ADVANCES IN DATABASES AND INFORMATION SYSTEMS, PROCEEDINGS, 2007, 4690 : 343 - +
  • [10] A semantic approach to improving machine readability of a large-scale attack graph
    Jooyoung Lee
    Daesung Moon
    Ikkyun Kim
    Youngseok Lee
    The Journal of Supercomputing, 2019, 75 : 3028 - 3045