Predicting First-Language and Second-Language Proficiency Using Eye Fixation Data and Demographic Information: Assumptions, Data Representations, and Methods

被引：0

作者：

Shalileh, Soroosh ^{[1
,2
]}

Kairov, Matvey ^{[2
]}

Baminiwatte, Ranga ^{[3
]}

Parshina, Olga ^{[4
]}

Dragoy, Olga ^{[1
,5
]}

机构：

[1] HSE Univ, Ctr Language & Brain, Moscow 101000, Russia

[2] HSE Univ, Lab Artificial Intelligence Cognit Sci, Moscow 101000, Russia

[3] Clemson Univ, Sch Comp, Clemson, SC 29634 USA

[4] Middlebury Coll, Psychol Dept, Middlebury, VT 05753 USA

[5] Russian Acad Sci, Inst Linguist, Moscow 125009, Russia

来源：

IEEE ACCESS | 2024年 / 12卷

关键词：

Data models; Accuracy; Convolutional neural networks; Solid modeling; Predictive models; Prediction algorithms; Linguistics; Gaze tracking; Artificial intelligence; Natural language processing; Multi lingual; First-language; second-language proficiency; eye-tracking; applied artificial intelligence; CONVOLUTIONAL NEURAL-NETWORKS; LEARNERS; LEVEL;

D O I：

10.1109/ACCESS.2024.3468460

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

Studying first-language (L1), second-language (L2) acquisition, and bilingualism using eye movement data has become a popular topic in psycholinguistic and educational research communities. The current research uses eye fixation data along with demographic information, to investigate the five research questions (RQ) as follows. Q(1) Is it possible to predict L1 from the eye fixation data using artificial intelligence (AI) methods? Q(2) Is it possible to predict second-language proficiency (L2P) from eye-fixation data using AI methods? Q(3 )Which of the six L2P assessment batteries under consideration is more effective in predicting L2P? Q(4 )How informative is eye fixation data or its combination with demographic information in predicting L1 and L2P? Q(5 )How can eye fixation data be represented for training AI models in predicting L1 and L2P? We used the MECO L2 data set and scrutinized the performance of three families of AI methods. In respect to each RQ the results showed that 1) using only eye fixation data, it is possible to predict L1 with a ROC-AUC equal to 0.755; 2) using only eye fixation data, it is not possible to predict L2P accurately (since a R-2-score equal to 0.216 was obtained); 3) L2 Lexical Skills is the most effective L2P assessment battery; 4) combining the eye-fixation data with demographic features led to a significant improvement in the performance of the models, i.e., a ROC-AUC equal to 0.997 in predicting L1 and a R-2-score equal to 0.899 in predicting L2P were obtained, and simultaneously downgraded the impacts of eye-fixation parameters; 5) the 2D-scatter plot images can be considered an appropriate candidate for training AI models using only eye-fixation data-at least for predicting L1.

引用

页码：145832 / 145844

页数：13

共 7 条

[1] Speaking rate, information density, and information rate in first-language and second-language speech
Bradlow, Ann R.
INTERSPEECH 2019, 2019, : 3559 - 3563
[2] Information encoding and transmission profiles of first-language (L1) and second-language (L2) speech*
Bradlow, Ann R.
BILINGUALISM-LANGUAGE AND COGNITION, 2022, 25 (01) : 148 - 162
[3] Do eye movements reveal differences between monolingual and bilingual children's first-language and second-language reading? A focus on word frequency effects
Whitford, Veronica
Joanisse, Marc F.
JOURNAL OF EXPERIMENTAL CHILD PSYCHOLOGY, 2018, 173 : 318 - 337
[4] Using census data to test the critical-period hypothesis for second-language acquisition
Stevens, G
PSYCHOLOGICAL SCIENCE, 2004, 15 (03) : 215 - 216
[5] Using a Data-Driven Approach to Estimate Second-Language Proficiency From Brain Activation: A Functional Near-Infrared Spectroscopy Study
Lei, Miaomei
Miyoshi, Toshinori
Dan, Ippeita
Sato, Hiroki
FRONTIERS IN NEUROSCIENCE, 2020, 14
[6] New approaches to using census data to test the critical-period hypothesis for second-language acquisition
Wiley, EW
Bialystok, E
Hakuta, K
PSYCHOLOGICAL SCIENCE, 2005, 16 (04) : 341 - 343
[7] Extracting Medical Information From Free-Text and Unstructured Patient-Generated Health Data Using Natural Language Processing Methods: Feasibility Study With Real-world Data
Sezgin, Emre
Hussain, Syed-Amad
Rust, Steve
Huang, Yungui
JMIR FORMATIVE RESEARCH, 2023, 7

← 1 →