Improving Access to Large-scale Digital Libraries Through Semantic-enhanced Search and Disambiguation

被引:9
|
作者
Hinze, Annika [1 ]
Taube-Schock, Craig [1 ]
Bainbridge, David [1 ]
Matamua, Rangi [2 ]
Downie, J. Stephen [3 ]
机构
[1] Univ Waikato, Comp Sci, Hamilton, New Zealand
[2] Univ Waikato, Maori & Pacific Dev, Hamilton, New Zealand
[3] Univ Illinois, Lib & Informat Sci, Chicago, IL 60680 USA
关键词
QUERY EXPANSION; SYSTEM;
D O I
10.1145/2756406.2756920
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
With 13,000,000 volumes comprising 4.5 billion pages of text, it is currently very difficult for scholars to locate relevant sets of documents that are useful in their research from the HathiTrust Digital Libary (HTDL) using traditional lexically-based retrieval techniques. Existing document search tools and document clustering approaches use purely lexical analysis, which cannot address the inherent ambiguity of natural language. A semantic search approach offers the potential to overcome the shortcoming of lexical search, but-even if an appropriate network of ontologies could be decided upon-it would require a full semantic markup of each document. In this paper, we present a conceptual design and report on the initial implementation of a new framework that affords the benefits of semantic search while minimizing the problems associated with applying existing semantic analysis at scale. Our approach avoids the need for complete semantic document markup using pre-existing ontologies by developing an automatically generated Concept-in-Context (CiC) network seeded by a priori analysis of Wikipedia texts and identification of semantic metadata. Our Capisco system analyzes documents by the semantics and context of their content. The disambiguation of search queries is done interactively, to fully utilize the domain knowledge of the scholar. Our method achieves a form of semantic-enhanced search that simultaneously exploits the proven scale benefits provided by lexical indexing.
引用
收藏
页码:147 / 156
页数:10
相关论文
共 50 条
  • [31] Real-Time Semantic Search Using Approximate Methodology for Large-Scale Storage Systems
    Hua, Yu
    Jiang, Hong
    Feng, Dan
    IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS, 2016, 27 (04) : 1212 - 1225
  • [32] ORM Ontologies with Executable Derivation Rules to Support Semantic Search in Large-Scale Data Applications
    Bur, Marton
    Stirewalt, Kurt
    ACM/IEEE 25TH INTERNATIONAL CONFERENCE ON MODEL DRIVEN ENGINEERING LANGUAGES AND SYSTEMS, MODELS 2022 COMPANION, 2022, : 81 - 82
  • [33] Reclaiming Accountability: Improving Writing Programs through Accreditation and Large-Scale Assessments
    Velazquez, Ashley
    ASSESSING WRITING, 2017, 34 : 100 - 102
  • [34] Improving Large-Scale Image Retrieval Through Robust Aggregation of Local Descriptors
    Husain, Syed Sameed
    Bober, Miroslaw
    IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2017, 39 (09) : 1783 - 1796
  • [35] EGM: Enhanced Graph-based Model for Large-scale Video Advertisement Search
    Yu, Tan
    Liu, Jie
    Yang, Yi
    Li, Yi
    Fei, Hongliang
    Li, Ping
    PROCEEDINGS OF THE 28TH ACM SIGKDD CONFERENCE ON KNOWLEDGE DISCOVERY AND DATA MINING, KDD 2022, 2022, : 4443 - 4451
  • [36] Search Tracker: Human-Derived Object Tracking in the Wild Through Large-Scale Search and Retrieval
    Bency, Archith John
    Karthikeyan, S.
    De Leo, Carter
    Sunderrajan, Santhoshkumar
    Manjunath, B. S.
    IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2017, 27 (08) : 1803 - 1814
  • [37] Improving Performance Insensitivity of Large-Scale Multiobjective Optimization via Monte Carlo Tree Search
    Hong, Haokai
    Jiang, Min
    Yen, Gary G.
    IEEE TRANSACTIONS ON CYBERNETICS, 2024, 54 (03) : 1816 - 1827
  • [38] Improving broad-coverage medical entity linking with semantic type prediction and large-scale datasets
    Vashishth S.
    Newman-Griffis D.
    Joshi R.
    Dutt R.
    Rosé C.P.
    Journal of Biomedical Informatics, 2021, 121
  • [39] Accelerating Large-Scale Molecular Similarity Search through Exploiting High Performance Computing
    Zhu, Chun Jiang
    Zhu, Tan
    Li, Haining
    Bi, Jinbo
    Song, Minghu
    2019 IEEE INTERNATIONAL CONFERENCE ON BIOINFORMATICS AND BIOMEDICINE (BIBM), 2019, : 330 - 333
  • [40] A Quantitative Evaluation of Trademark Search Engines' Performances through Large-Scale Statistical Analysis
    Vandamme, Thomas
    Cabay, Julien
    Debeir, Olivier
    PROCEEDINGS OF THE 19TH INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE AND LAW, ICAIL 2023, 2023, : 343 - 350