Extracting Structured Information from the Textual Description of Geometry Word Problems

被引：0

作者：

Boob, Archana ^{[1
]}

Bodakhe, Prajakta ^{[2
]}

Radke, Mansi A. ^{[1
]}

Deshpande, Umesh A. ^{[1
]}

机构：

[1] Visvesvaraya Natl Inst Technol, Nagpur, Maharashtra, India

[2] CUEMATH, Nagpur, Maharashtra, India

来源：

PROCEEDINGS OF 2023 7TH INTERNATIONAL CONFERENCE ON NATURAL LANGUAGE PROCESSING AND INFORMATION RETRIEVAL, NLPIR 2023 | 2023年

关键词：

Natural Language Processing; Artificial Intelligence; Mathematics (Geometry) word problems; Named Entity Recognition; Predicate Generation;

D O I：

10.1145/3639233.3639255

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

AI (Artificial Intelligence) is playing its role in every field as automation is the way forward. Solving geometry mathematics word problems (MWPs) automatically can help in smart tutoring and has a plethora of applications in many fields. To solve a geometry MWPs, multi-modal methods are needed to parse the question text using Natural Language Processing (NLP) and diagram using Image Processing (IP) techniques and combine the information from both. The information is then to be structured and appropriate axioms and theorems need to be applied to transform the question into intermediate equations which could be solved by equation solvers. In this entire pipeline, text parsing of the question content is a crucial component which is precisely the main focus of this paper. There have been attempts in the literature and techniques to solve geometry MWPs have been proposed, however they rely on input predicates and do not generate them automatically. In cases where they are generated, regular expressions are used which lack generalisation and scalability. This paper models the text parsing problem as a Relation Extraction and Natural Language Generation problem where from the question text structured information is generated in the form of predicates. We basically generate appropriate geometric tags by using a Named Entity Recognition (NER) annotator and then utilize these tags to generate predicates that represent the question in a machine readable formal language. To test the proposed approach, a custom dataset is created containing about 500 questions from standard elementary school-level Indian text books. It has been demonstrated through experiments on the custom dataset that the suggested method is capable of extracting geometric relations. To evaluate the generated predicates, we create ground truth predicates for each of the questions through skilled domain experts. We also design a unique method to evaluate the generated predicates with the METEOR (Metric for Evaluation of Translation with Explicit ORdering) metric. The experimental findings on this dataset indicate precision (P) of 0.70, recall (R) of 0.60, harmonic mean (F-mean) of 0.60, average penalty (p) of 0.12 and final METEOR Score (M) of 0.54. Furthermore, we experiment the proposed technique on another dataset. When tested on Geometry3K data set, a precision of 0.67, recall of 0.64, harmonic mean (F-mean) of 0.64, penalty (p) of 0.27 and final METEOR Score (M) of 0.42 is obtained.

引用

页码：31 / 37

页数：7

共 50 条

[21] Latin word order - Structured meaning and information
Reinhardt, Tobias
TLS-THE TIMES LITERARY SUPPLEMENT, 2006, (5411): : 25 - 25
[22] Latin word order: Structured meaning and information
de Melo, Wolfgang David Cirilo
LINGUA, 2007, 117 (08) : 1483 - 1489
[23] Towards Extracting Structured Drug Information from Raw Texts using Deep Learning
Hantig, Stefan Cristian
Slavescu, Radu Razvan
Slavescu, Kinga Cristina
PROCEEDINGS OF 2020 IEEE INTERNATIONAL CONFERENCE ON AUTOMATION, QUALITY AND TESTING, ROBOTICS (AQTR), 2020, : 398 - 401
[24] Structured Literature Image Finder: Extracting Information from Text and Images in Biomedical Literature
Coelho, Luis Pedro
Ahmed, Amr
Arnold, Andrew
Kangas, Joshua
Sheikh, Abdul-Saboor
Xing, Eric P.
Cohen, William W.
Murphy, Robert F.
LINKING LITERATURE, INFORMATION, AND KNOWLEDGE FOR BIOLOGY, 2010, 6004 : 23 - +
[25] Extracting Structured Information from Free-Text Medication Prescriptions Using Dependencies
MacKinlay, Andrew
Verspoor, Karin
PROCEEDINGS OF THE ACM SIXTH INTERNATIONAL WORKSHOP ON DATA AND TEXT MINING IN BIOMEDICAL INFORMATICS, 2012, : 35 - 39
[26] Extracting Decision Models from Textual Descriptions of Processes
Quishpi, Luis
Carmona, Josep
Padro, Lluis
BUSINESS PROCESS MANAGEMENT (BPM 2021), 2021, 12875 : 85 - 102
[27] Information Geometry and Minimum Description Length Networks
Sun, Ke
Wang, Jun
Kalousis, Alexandros
Marchand-Maillet, Stephane
INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 37, 2015, 37 : 49 - 58
[28] Development of a Method for Extracting Structured Dose Information from Free-Text Electronic Prescriptions
Liang, Man Qing
Gidla, Vivek
Verma, Aman
Weir, Daniala
Tamblyn, Robyn
Buckeridge, David
Motulsky, Aude
MEDINFO 2019: HEALTH AND WELLBEING E-NETWORKS FOR ALL, 2019, 264 : 1568 - 1569
[29] RETRACTED: Extracting Information from Semi-structured Web Documents: A Framework (Retracted Article)
Memon, Nasrullah
Qureshi, Abdul Rasool
Hicks, David
Harkiolakis, Nicholas
ADVANCED WEB AND NETWORK TECHNOLOGIES, AND APPLICATIONS, 2008, 4977 : 54 - +
[30] The (Continuing) Information Problems in Structured Finance
Mason, Joseph R.
JOURNAL OF STRUCTURED FINANCE, 2008, 14 (01): : 7 - 11

← 1 2 3 4 5 →