Knowledge Transfer with Low-Quality Data: A Feature Extraction Issue

被引:39
|
作者
Quanz, Brian [1 ]
Huan, Jun [1 ]
Mishra, Meenakshi [1 ]
机构
[1] Univ Kansas, Informat & Telecommun Technol Ctr, Dept Elect Engn & Comp Sci, Lawrence, KS 66045 USA
基金
美国国家科学基金会;
关键词
Knowledge transfer; transfer learning; feature extraction; sparse coding; low-quality data; ADAPTATION;
D O I
10.1109/TKDE.2012.75
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Effectively utilizing readily available auxiliary data to improve predictive performance on new modeling tasks is a key problem in data mining. In this research, the goal is to transfer knowledge between sources of data, particularly when ground-truth information for the new modeling task is scarce or is expensive to collect where leveraging any auxiliary sources of data becomes a necessity. Toward seamless knowledge transfer among tasks, effective representation of the data is a critical but yet not fully explored research area for the data engineer and data miner. Here, we present a technique based on the idea of sparse coding, which essentially attempts to find an embedding for the data by assigning feature values based on subspace cluster membership. We modify the idea of sparse coding by focusing the identification of shared clusters between data when source and target data may have different distributions. In our paper, we point out cases where a direct application of sparse coding will lead to a failure of knowledge transfer. We then present the details of our extension to sparse coding, by incorporating distribution distance estimates for the embedded data, and show that the proposed algorithm can overcome the shortcomings of the sparse coding algorithm on synthetic data and achieve improved predictive performance on a real world chemical toxicity transfer learning task.
引用
收藏
页码:1789 / 1802
页数:14
相关论文
共 50 条
  • [21] The use of contextual spatial knowledge for low-quality image segmentation
    Kallel, Imene Khanfir
    Almouahed, Shaban
    Alsahwa, Bassem
    Solaiman, Basel
    MULTIMEDIA TOOLS AND APPLICATIONS, 2019, 78 (08) : 9645 - 9665
  • [22] Investigation of Multiple Imputation in Low-Quality Questionnaire Data
    Van Ginkel, Joost R.
    MULTIVARIATE BEHAVIORAL RESEARCH, 2010, 45 (03) : 574 - 598
  • [23] Harnessing the information contained in low-quality data sources
    Couso, Ines
    Sanchez, Luciano
    INTERNATIONAL JOURNAL OF APPROXIMATE REASONING, 2014, 55 (07) : 1485 - 1486
  • [24] Low-quality females prefer low-quality males when choosing a mate
    Holveck, Marie-Jeanne
    Riebel, Katharina
    PROCEEDINGS OF THE ROYAL SOCIETY B-BIOLOGICAL SCIENCES, 2010, 277 (1678) : 153 - 160
  • [25] Transferring deep knowledge for object recognition in Low-quality underwater videos
    Sun, Xin
    Shi, Junyu
    Liu, Lipeng
    Dong, Junyu
    Plant, Claudia
    Wang, Xinhua
    Zhou, Huiyu
    NEUROCOMPUTING, 2018, 275 : 897 - 908
  • [26] A Low-quality Data User Identification Method Based on Blockchain
    Wei, Jiayong
    Zhang, Hua
    Chen, Yuebu
    Xu, Yanxin
    2023 INTERNATIONAL CONFERENCE ON DATA SECURITY AND PRIVACY PROTECTION, DSPP, 2023, : 136 - 142
  • [27] Mining fuzzy association rules from low-quality data
    Palacios, A. M.
    Gacto, M. J.
    Alcala-Fdez, J.
    SOFT COMPUTING, 2012, 16 (05) : 883 - 901
  • [28] Mining fuzzy association rules from low-quality data
    A. M. Palacios
    M. J. Gacto
    J. Alcalá-Fdez
    Soft Computing, 2012, 16 : 883 - 901
  • [29] Sifting Truths from Multiple Low-Quality Data Sources
    Xie, Zizhe
    Liu, Qizhi
    Bao, Zhifeng
    WEB AND BIG DATA, APWEB-WAIM 2017, PT I, 2017, 10366 : 74 - 81
  • [30] A novel method for clinical risk prediction with low-quality data
    Wang, Zeyuan
    Poon, Josiah
    Wang, Shuze
    Sun, Shiding
    Poon, Simon
    ARTIFICIAL INTELLIGENCE IN MEDICINE, 2021, 114