Tabular data are often published on the Web without meta-data or a clear description of their content, which makes it difficult for such data to be utilised. Semantic labelling has been proposed in the literature to address this problem and facilitate the utilisation of tabular data (e.g., for data integration). It is the assignment of meanings (labels or annotations) to tabular data. Several semantic labelling approaches have been developed that often predict the labels in a (semi-)automatic fashion. However, semantic labelling approaches that assign labels to textual columns are not suitable for predicting the labels of numeric columns due to missing values, rounding errors, or data expiry. To overcome this, we proposed a modified quantile-quantile plot technique with linear approximation. We experimented with the T2Dv2 semantic labelling benchmark and analysed the results. We showed an increase in the performance of our approach as the size of the table grew. We also discussed how the effects of different parameters diminish as the size of the table increase, and the accuracy with the different parameters converges as the table size gets closer to 200 rows. Finally, we showed that our approach using the quantile-quantile plot outperforms Kolmogorov-Smirnov test.
机构:
Chinese Acad Sci, Acad Math & Syst Sci, Inst Appl Math, Beijing 100190, Peoples R ChinaChinese Acad Sci, Acad Math & Syst Sci, Inst Appl Math, Beijing 100190, Peoples R China
Miao, Rui
Sun, Liuquan
论文数: 0引用数: 0
h-index: 0
机构:
Chinese Acad Sci, Acad Math & Syst Sci, Inst Appl Math, Beijing 100190, Peoples R ChinaChinese Acad Sci, Acad Math & Syst Sci, Inst Appl Math, Beijing 100190, Peoples R China
Sun, Liuquan
Tian, Guo-Liang
论文数: 0引用数: 0
h-index: 0
机构:
Univ Hong Kong, Dept Stat & Actuarial Sci, Pokfulam Rd, Hong Kong, Hong Kong, Peoples R ChinaChinese Acad Sci, Acad Math & Syst Sci, Inst Appl Math, Beijing 100190, Peoples R China