Linear approximation of the quantile-quantile plot for semantic labelling of numeric columns in tabular data

被引:0
|
作者
Alobaid, Ahmad [2 ]
Corcho, Oscar [1 ]
机构
[1] Univ Politecn Madrid, Madrid 28660, Spain
[2] Runzbuzz, Kuwait, Kuwait
关键词
Semantic labelling; Semantic annotation; Knowledge graph; Quantile-quantile plot; Cumulative distribution function;
D O I
10.1016/j.eswa.2023.122152
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Tabular data are often published on the Web without meta-data or a clear description of their content, which makes it difficult for such data to be utilised. Semantic labelling has been proposed in the literature to address this problem and facilitate the utilisation of tabular data (e.g., for data integration). It is the assignment of meanings (labels or annotations) to tabular data. Several semantic labelling approaches have been developed that often predict the labels in a (semi-)automatic fashion. However, semantic labelling approaches that assign labels to textual columns are not suitable for predicting the labels of numeric columns due to missing values, rounding errors, or data expiry. To overcome this, we proposed a modified quantile-quantile plot technique with linear approximation. We experimented with the T2Dv2 semantic labelling benchmark and analysed the results. We showed an increase in the performance of our approach as the size of the table grew. We also discussed how the effects of different parameters diminish as the size of the table increase, and the accuracy with the different parameters converges as the table size gets closer to 200 rows. Finally, we showed that our approach using the quantile-quantile plot outperforms Kolmogorov-Smirnov test.
引用
收藏
页数:12
相关论文
共 50 条
  • [1] Quantile-quantile plot for deviance residuals in the generalized linear model
    Ben, MG
    Yohai, VJ
    JOURNAL OF COMPUTATIONAL AND GRAPHICAL STATISTICS, 2004, 13 (01) : 36 - 47
  • [2] A quantile-quantile plot based pattern matching for defect detection
    Tsai, DM
    Yang, CH
    PATTERN RECOGNITION LETTERS, 2005, 26 (13) : 1948 - 1962
  • [3] Typology-based semantic labeling of numeric tabular data
    Alobaid, Ahmad
    Kacprzak, Emilia
    Corcho, Oscar
    SEMANTIC WEB, 2021, 12 (01) : 5 - 20
  • [4] Inference on linear quantile regression with dyadic data
    Chen, Hongqi
    JOURNAL OF STATISTICAL PLANNING AND INFERENCE, 2025, 239
  • [5] Improving linear quantile regression for replicated data
    Jana, Kaushik
    Sengupta, Debasis
    STATISTICS, 2022, 56 (06) : 1193 - 1206
  • [6] Generalized linear mixed quantile regression with panel data
    Lu, Xiaoming
    Fan, Zhaozhi
    PLOS ONE, 2020, 15 (08):
  • [7] LOCAL LINEAR QUANTILE REGRESSION WITH DEPENDENT CENSORED DATA
    El Ghouch, Anouar
    Van Keilegom, Ingrid
    STATISTICA SINICA, 2009, 19 (04) : 1621 - 1640
  • [8] Transformed linear quantile regression with censored survival data
    Miao, Rui
    Sun, Liuquan
    Tian, Guo-Liang
    STATISTICS AND ITS INTERFACE, 2016, 9 (02) : 131 - 139
  • [9] Local linear quantile regression with truncated and dependent data
    Wang, Jiang-Feng
    Ma, Wei-Min
    Fan, Guo-Liang
    Wen, Li-Min
    STATISTICS & PROBABILITY LETTERS, 2015, 96 : 232 - 240
  • [10] Quantile regression in linear mixed models: a stochastic approximation EM approach
    Galarza, Christian E.
    Lachos, Victor H.
    Bandyopadhyay, Dipankar
    STATISTICS AND ITS INTERFACE, 2017, 10 (03) : 471 - 482