Mathematical information retrieval (MathIR) applications such as semantic formula search and question answering systems rely on knowledge-bases that link mathematical expressions to their natural language names. For database population, mathematical formulae need to be annotated and linked to semantic concepts, which is very time-consuming. In this paper, we present our approach to structure and speed up this process by using an application-driven strategy and AI-aided system. We evaluate the quality and time-savings of AI-generated formula and identifier annotation recommendations on a test selection of Wikipedia articles from the physics domain. Moreover, we evaluate the community acceptance of Wikipedia formula entity links and Wikidata item creation and population to ground the formula semantics. Our evaluation shows that the AI guidance was able to significantly speed up the annotation process by a factor of 1.4 for formulae and 2.4 for identifiers. Our contributions were accepted in 88% of the edited Wikipedia articles and 67% of the Wikidata items. The "AnnoMathTeX" annotation recommender system is hosted by Wikimedia at annomathtex.wmflabs.org . In the future, our data refinement pipeline will be integrated seamlessly into the Wikimedia user interfaces.
机构:
Hokkaido Univ, Grad Sch Informat Sci & Technol, Kita Ku, N14 W9, Sapporo, Hokkaido 0600814, JapanHokkaido Univ, Grad Sch Informat Sci & Technol, Kita Ku, N14 W9, Sapporo, Hokkaido 0600814, Japan
Yoshioka, Masaharu
KNOWLEDGE GRAPHS AND LANGUAGE TECHNOLOGY,
2017,
10579
: 119
-
136