Linear approximation of the quantile-quantile plot for semantic labelling of numeric columns in tabular data

被引:0
|
作者
Alobaid, Ahmad [2 ]
Corcho, Oscar [1 ]
机构
[1] Univ Politecn Madrid, Madrid 28660, Spain
[2] Runzbuzz, Kuwait, Kuwait
关键词
Semantic labelling; Semantic annotation; Knowledge graph; Quantile-quantile plot; Cumulative distribution function;
D O I
10.1016/j.eswa.2023.122152
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Tabular data are often published on the Web without meta-data or a clear description of their content, which makes it difficult for such data to be utilised. Semantic labelling has been proposed in the literature to address this problem and facilitate the utilisation of tabular data (e.g., for data integration). It is the assignment of meanings (labels or annotations) to tabular data. Several semantic labelling approaches have been developed that often predict the labels in a (semi-)automatic fashion. However, semantic labelling approaches that assign labels to textual columns are not suitable for predicting the labels of numeric columns due to missing values, rounding errors, or data expiry. To overcome this, we proposed a modified quantile-quantile plot technique with linear approximation. We experimented with the T2Dv2 semantic labelling benchmark and analysed the results. We showed an increase in the performance of our approach as the size of the table grew. We also discussed how the effects of different parameters diminish as the size of the table increase, and the accuracy with the different parameters converges as the table size gets closer to 200 rows. Finally, we showed that our approach using the quantile-quantile plot outperforms Kolmogorov-Smirnov test.
引用
收藏
页数:12
相关论文
共 50 条
  • [21] A linear two-stage stochastic programming problem with quantile criterion: Its discrete approximation
    Kibzun, AI
    Nikulin, IV
    AUTOMATION AND REMOTE CONTROL, 2001, 62 (08) : 1339 - 1348
  • [22] A Linear Two-Stage Stochastic Programming Problem with Quantile Criterion: Its Discrete Approximation
    A. I. Kibzun
    I. V. Nikulin
    Automation and Remote Control, 2001, 62 : 1339 - 1348
  • [23] Functional Linear Partial Quantile Regression with Guaranteed Convergence for Neuroimaging Data Analysis
    Yu, Dengdeng
    Pietrosanu, Matthew
    Mizera, Ivan
    Jiang, Bei
    Kong, Linglong
    Tu, Wei
    STATISTICS IN BIOSCIENCES, 2024, 17 (1) : 174 - 190
  • [24] Quantile regression and variable selection for partially linear model with randomly truncated data
    Hong-Xia Xu
    Zhen-Long Chen
    Jiang-Feng Wang
    Guo-Liang Fan
    Statistical Papers, 2019, 60 : 1137 - 1160
  • [25] Quantile regression of partially linear varying coefficient models with nonignorable nonresponse data
    Liang, Xiaowen
    Tian, Boping
    Yang, Lijian
    JOURNAL OF THE KOREAN STATISTICAL SOCIETY, 2025,
  • [26] Quantile regression and variable selection for partially linear model with randomly truncated data
    Xu, Hong-Xia
    Chen, Zhen-Long
    Wang, Jiang-Feng
    Fan, Guo-Liang
    STATISTICAL PAPERS, 2019, 60 (04) : 1137 - 1160
  • [27] Estimation and variable selection of quantile partially linear additive models for correlated data
    Zhao, Weihua
    Li, Rui
    Lian, Heng
    JOURNAL OF STATISTICAL COMPUTATION AND SIMULATION, 2024, 94 (02) : 315 - 345
  • [28] Power-transformed linear quantile regression estimation for censored competing risks data
    Fan, Caiyun
    Zhang, Feipeng
    Zhou, Yong
    Statistics and Its Interface, 2017, 10 (02) : 239 - 254
  • [29] Bayesian quantile regression for partially linear single-index model with longitudinal data
    Liu, Changsheng
    Liang, Hanying
    Li, Yongmei
    STATISTICAL PAPERS, 2025, 66 (01)
  • [30] Quantile partially linear additive model for data with dropouts and an application to modeling cognitive decline
    Maidman, Adam
    Wang, Lan
    Zhou, Xiao-Hua
    Sherwood, Ben
    STATISTICS IN MEDICINE, 2023, 42 (16) : 2729 - 2745