Knowledge Transfer with Low-Quality Data: A Feature Extraction Issue

被引:39
|
作者
Quanz, Brian [1 ]
Huan, Jun [1 ]
Mishra, Meenakshi [1 ]
机构
[1] Univ Kansas, Informat & Telecommun Technol Ctr, Dept Elect Engn & Comp Sci, Lawrence, KS 66045 USA
基金
美国国家科学基金会;
关键词
Knowledge transfer; transfer learning; feature extraction; sparse coding; low-quality data; ADAPTATION;
D O I
10.1109/TKDE.2012.75
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Effectively utilizing readily available auxiliary data to improve predictive performance on new modeling tasks is a key problem in data mining. In this research, the goal is to transfer knowledge between sources of data, particularly when ground-truth information for the new modeling task is scarce or is expensive to collect where leveraging any auxiliary sources of data becomes a necessity. Toward seamless knowledge transfer among tasks, effective representation of the data is a critical but yet not fully explored research area for the data engineer and data miner. Here, we present a technique based on the idea of sparse coding, which essentially attempts to find an embedding for the data by assigning feature values based on subspace cluster membership. We modify the idea of sparse coding by focusing the identification of shared clusters between data when source and target data may have different distributions. In our paper, we point out cases where a direct application of sparse coding will lead to a failure of knowledge transfer. We then present the details of our extension to sparse coding, by incorporating distribution distance estimates for the embedded data, and show that the proposed algorithm can overcome the shortcomings of the sparse coding algorithm on synthetic data and achieve improved predictive performance on a real world chemical toxicity transfer learning task.
引用
收藏
页码:1789 / 1802
页数:14
相关论文
共 50 条
  • [1] Knowledge Transfer with Low-Quality Data: a Feature Extraction Issue
    Quanz, Brian
    Huan, Jun
    Mishra, Meenakshi
    IEEE 27TH INTERNATIONAL CONFERENCE ON DATA ENGINEERING (ICDE 2011), 2011, : 769 - 779
  • [2] The study on low-quality images Geometric Facial Feature Extraction
    Liu, Xueping
    Li, Yibo
    PROCEEDINGS OF THE INTERNATIONAL CONFERENCE ON LOGISTICS, ENGINEERING, MANAGEMENT AND COMPUTER SCIENCE, 2014, 101 : 290 - 293
  • [3] Editorial: Special issue on mining low-quality data
    Zhu, Xingquan
    Khoshgoftaar, Taghi M.
    Davidson, Ian
    Zhang, Shichao
    KNOWLEDGE AND INFORMATION SYSTEMS, 2007, 11 (02) : 131 - 136
  • [4] Editorial: Special issue on mining low-quality data
    Xingquan Zhu
    Taghi M. Khoshgoftaar
    Ian Davidson
    Shichao Zhang
    Knowledge and Information Systems, 2007, 11 : 131 - 136
  • [5] Integrative denoising and feature extraction method (D-FE) for improving low-quality Raman data
    Wang, Chunjie
    Zhao, Xiaoyu
    Zhao, Yue
    Cai, Lijing
    Tong, Liang
    Wang, Baicheng
    MICROCHEMICAL JOURNAL, 2025, 210
  • [6] Magnetic energy-based feature extraction for low-quality fingerprint images
    Hassanat, Ahmad B. A.
    Prasath, V. B. Surya
    Al-kasassbeh, Mouhammd
    Tarawneh, Ahmad S.
    Al-shamailh, Ahmad J.
    SIGNAL IMAGE AND VIDEO PROCESSING, 2018, 12 (08) : 1471 - 1478
  • [7] Magnetic energy-based feature extraction for low-quality fingerprint images
    Ahmad B. A. Hassanat
    V. B. Surya Prasath
    Mouhammd Al-kasassbeh
    Ahmad S. Tarawneh
    Ahmad J. Al-shamailh
    Signal, Image and Video Processing, 2018, 12 : 1471 - 1478
  • [8] Enhancement of feature extraction for low-quality fingerprint images using stochastic resonance
    Ryu, Choonwoo
    Kong, Seong G.
    Kim, Hakil
    PATTERN RECOGNITION LETTERS, 2011, 32 (02) : 107 - 113
  • [9] Lithium extraction from low-quality brines
    Yang, Sixie
    Wang, Yigang
    Pan, Hui
    He, Ping
    Zhou, Haoshen
    NATURE, 2024, 636 (8042) : 309 - 321
  • [10] The Impacts of Low-Quality Training Data on Information Extraction from Clinical Reports
    Marcheggiani, Diego
    Sebastiani, Fabrizio
    ERCIM NEWS, 2018, (112): : 45 - 46