Learning From Less Data: A Unified Data Subset Selection and Active Learning Framework for Computer Vision

被引:29
|
作者
Kaushal, Vishal [1 ]
Iyer, Rishabh [2 ]
Kothawade, Suraj [1 ]
Mahadev, Rohan [3 ]
Doctor, Khoshrav [4 ]
Ramakrishnan, Ganesh [1 ]
机构
[1] Indian Inst Technol, Mumbai, Maharashtra, India
[2] Microsoft, Redmond, WA USA
[3] AITOE Labs, Mumbai, Maharashtra, India
[4] Univ Massachusetts, Amherst, MA 01003 USA
关键词
D O I
10.1109/WACV.2019.00142
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
Supervised machine learning based state-of-the-art computer vision techniques are in general data hungry. Their data curation poses the challenges of expensive human labeling, inadequate computing resources and larger experiment turn around times. Training data subset selection and active learning techniques have been proposed as possible solutions to these challenges. A special class of subset selection functions naturally model notions of diversity, coverage and representation and can be used to eliminate redundancy thus lending themselves well for training data subset selection. They can also help improve the efficiency of active learning in further reducing human labeling efforts by selecting a subset of the examples obtained using the conventional uncertainty sampling based techniques. In this work, we empirically demonstrate the effectiveness of two diversity models, namely the Facility-Location and Dispersion models for training-data subset selection and reducing labeling effort. We demonstrate this across the board for a variety of computer vision tasks including Gender Recognition, Face Recognition, Scene Recognition, Object Detection and Object Recognition. Our results show that diversity based subset selection done in the right way can increase the accuracy by upto 5 - 10% over existing baselines, particularly in settings in which less training data is available. This allows the training of complex machine learning models like Convolutional Neural Networks with much less training data and labeling costs while incurring minimal performance loss.
引用
收藏
页码:1289 / 1299
页数:11
相关论文
共 50 条
  • [21] Active learning from data streams
    Zhu, Xingquan
    Zhang, Peng
    Lin, Xiaodong
    Shi, Yong
    ICDM 2007: PROCEEDINGS OF THE SEVENTH IEEE INTERNATIONAL CONFERENCE ON DATA MINING, 2007, : 757 - +
  • [22] Active learning from process data
    Raju, GK
    Cooney, CL
    AICHE JOURNAL, 1998, 44 (10) : 2199 - 2211
  • [23] Graph Deep Active Learning Framework for Data Deduplication
    Cao, Huan
    Du, Shengdong
    Hu, Jie
    Yang, Yan
    Horng, Shi-Jinn
    Li, Tianrui
    BIG DATA MINING AND ANALYTICS, 2024, 7 (03): : 753 - 764
  • [24] Active Learning for Streaming Data in A Contextual Bandit Framework
    Song, Linqi
    Xu, Jie
    Li, Congduan
    ICCDE 2019: PROCEEDINGS OF THE 2019 5TH INTERNATIONAL CONFERENCE ON COMPUTING AND DATA ENGINEERING, 2019, : 29 - 35
  • [25] A Unified Framework for Automatic Distributed Active Learning
    Chen, Xu
    Wujek, Brett
    IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2022, 44 (12) : 9774 - 9786
  • [26] NGC: A Unified Framework for Learning with Open-World Noisy Data
    Wu, Zhi-Fan
    Wei, Tong
    Jiang, Jianwen
    Mao, Chaojie
    Tang, Mingqian
    Li, Yu-Feng
    2021 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2021), 2021, : 62 - 71
  • [27] A Theoretical Framework for Computer Vision Learning Environment
    Laura, Luigi
    Miloso, Antonio
    LEARNING TECHNOLOGIES AND SYSTEMS, ICWL 2022, SETE 2022, 2023, 13869 : 527 - 534
  • [28] Column subset selection for active learning in image classification
    Shen, Jianfeng
    Ju, Bin
    Jiang, Tao
    Ren, Jingjing
    Zheng, Miao
    Yao, Chengwei
    Li, Lanjuan
    NEUROCOMPUTING, 2011, 74 (18) : 3785 - 3792
  • [29] Computer Assisted Vocabulary Learning: Framework and Tracking User Data
    Ma, Qing
    CALICO JOURNAL, 2013, : 230 - 243
  • [30] Query-learning-based iterative feature-subset selection for learning from high-dimensional data sets
    Hiroshi Mamitsuka
    Knowledge and Information Systems, 2006, 9 : 91 - 108