Extracting Structured Information from the Textual Description of Geometry Word Problems

被引:0
|
作者
Boob, Archana [1 ]
Bodakhe, Prajakta [2 ]
Radke, Mansi A. [1 ]
Deshpande, Umesh A. [1 ]
机构
[1] Visvesvaraya Natl Inst Technol, Nagpur, Maharashtra, India
[2] CUEMATH, Nagpur, Maharashtra, India
关键词
Natural Language Processing; Artificial Intelligence; Mathematics (Geometry) word problems; Named Entity Recognition; Predicate Generation;
D O I
10.1145/3639233.3639255
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
AI (Artificial Intelligence) is playing its role in every field as automation is the way forward. Solving geometry mathematics word problems (MWPs) automatically can help in smart tutoring and has a plethora of applications in many fields. To solve a geometry MWPs, multi-modal methods are needed to parse the question text using Natural Language Processing (NLP) and diagram using Image Processing (IP) techniques and combine the information from both. The information is then to be structured and appropriate axioms and theorems need to be applied to transform the question into intermediate equations which could be solved by equation solvers. In this entire pipeline, text parsing of the question content is a crucial component which is precisely the main focus of this paper. There have been attempts in the literature and techniques to solve geometry MWPs have been proposed, however they rely on input predicates and do not generate them automatically. In cases where they are generated, regular expressions are used which lack generalisation and scalability. This paper models the text parsing problem as a Relation Extraction and Natural Language Generation problem where from the question text structured information is generated in the form of predicates. We basically generate appropriate geometric tags by using a Named Entity Recognition (NER) annotator and then utilize these tags to generate predicates that represent the question in a machine readable formal language. To test the proposed approach, a custom dataset is created containing about 500 questions from standard elementary school-level Indian text books. It has been demonstrated through experiments on the custom dataset that the suggested method is capable of extracting geometric relations. To evaluate the generated predicates, we create ground truth predicates for each of the questions through skilled domain experts. We also design a unique method to evaluate the generated predicates with the METEOR (Metric for Evaluation of Translation with Explicit ORdering) metric. The experimental findings on this dataset indicate precision (P) of 0.70, recall (R) of 0.60, harmonic mean (F-mean) of 0.60, average penalty (p) of 0.12 and final METEOR Score (M) of 0.54. Furthermore, we experiment the proposed technique on another dataset. When tested on Geometry3K data set, a precision of 0.67, recall of 0.64, harmonic mean (F-mean) of 0.64, penalty (p) of 0.27 and final METEOR Score (M) of 0.42 is obtained.
引用
收藏
页码:31 / 37
页数:7
相关论文
共 50 条
  • [1] Extracting information from textual descriptions for actuarial applications
    Manski, Scott
    Yang, Kaixu
    Lee, Gee Y.
    Maiti, Tapabrata
    ANNALS OF ACTUARIAL SCIENCE, 2021, 15 (03) : 605 - 622
  • [2] Extracting information from semi-structured Internet sources
    Jeong, JS
    Oh, DI
    ISIE 2001: IEEE INTERNATIONAL SYMPOSIUM ON INDUSTRIAL ELECTRONICS PROCEEDINGS, VOLS I-III, 2001, : 1378 - 1381
  • [3] Extracting structured subject information from digital document archives
    Liu, Jyi-Shane
    Lee, Ching-Ying
    Digital Libraries: Achievements, Challenges and Opportunities, Proceedings, 2006, 4312 : 141 - 150
  • [4] Extracting information from semi-structured internet sources
    Div. of Info. Tech. Eng., College of Engineering, SoonChunHyang University, Asan, Korea, Republic of
    IEEE Int Symp Ind Electron, (1378-1381):
  • [5] Extracting Structured Scholarly Information from the Machine Translation Literature
    Choi, Eunsol
    Horvat, Matic
    May, Jonathan
    Knight, Kevin
    Marcu, Daniel
    LREC 2016 - TENTH INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION, 2016, : 421 - 425
  • [6] Open Problems in Global Analysis. Structured Foliations and the Information Geometry
    Boyom, Michel
    GEOMETRIC SCIENCE OF INFORMATION (GSI 2021), 2021, 12829 : 380 - 388
  • [7] Extracting geometry information from point cloud of urban building
    College of Information Science and Engineering, Shandong University of Science and Technology, Qingdao, 266510, China
    不详
    Proc SPIE Int Soc Opt Eng,
  • [8] Recognition techniques for extracting information from semi-structured documents
    Della Ventura, A
    Gagliardi, I
    Zonta, B
    DOCUMENT RECOGNITION AND RETRIEVAL VIII, 2001, 4307 : 130 - 137
  • [9] A strategy for extracting information from semi-structured web pages
    Shaker, Mahmoud
    Ibrahim, Hamidah
    Mustapha, Aida
    Abdullah, Lili Nurliyana
    INTERNATIONAL JOURNAL OF WEB INFORMATION SYSTEMS, 2010, 6 (04) : 304 - 318
  • [10] Extracting more information from EEG recordings for a better description of sleep
    Lewandowski, Achim
    Rosipal, Roman
    Dorffner, Georg
    COMPUTER METHODS AND PROGRAMS IN BIOMEDICINE, 2012, 108 (03) : 961 - 972