Learning From Less Data: A Unified Data Subset Selection and Active Learning Framework for Computer Vision

被引:29
|
作者
Kaushal, Vishal [1 ]
Iyer, Rishabh [2 ]
Kothawade, Suraj [1 ]
Mahadev, Rohan [3 ]
Doctor, Khoshrav [4 ]
Ramakrishnan, Ganesh [1 ]
机构
[1] Indian Inst Technol, Mumbai, Maharashtra, India
[2] Microsoft, Redmond, WA USA
[3] AITOE Labs, Mumbai, Maharashtra, India
[4] Univ Massachusetts, Amherst, MA 01003 USA
关键词
D O I
10.1109/WACV.2019.00142
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
Supervised machine learning based state-of-the-art computer vision techniques are in general data hungry. Their data curation poses the challenges of expensive human labeling, inadequate computing resources and larger experiment turn around times. Training data subset selection and active learning techniques have been proposed as possible solutions to these challenges. A special class of subset selection functions naturally model notions of diversity, coverage and representation and can be used to eliminate redundancy thus lending themselves well for training data subset selection. They can also help improve the efficiency of active learning in further reducing human labeling efforts by selecting a subset of the examples obtained using the conventional uncertainty sampling based techniques. In this work, we empirically demonstrate the effectiveness of two diversity models, namely the Facility-Location and Dispersion models for training-data subset selection and reducing labeling effort. We demonstrate this across the board for a variety of computer vision tasks including Gender Recognition, Face Recognition, Scene Recognition, Object Detection and Object Recognition. Our results show that diversity based subset selection done in the right way can increase the accuracy by upto 5 - 10% over existing baselines, particularly in settings in which less training data is available. This allows the training of complex machine learning models like Convolutional Neural Networks with much less training data and labeling costs while incurring minimal performance loss.
引用
收藏
页码:1289 / 1299
页数:11
相关论文
共 50 条
  • [1] Submodularity in Data Subset Selection and Active Learning
    Wei, Kai
    Iyer, Rishabh
    Bilmes, Jeff
    INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 37, 2015, 37 : 1954 - 1963
  • [2] Feature Selection with Annealing for Computer Vision and Big Data Learning
    Barbu, Adrian
    She, Yiyuan
    Ding, Liangjing
    Gramajo, Gary
    IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2017, 39 (02) : 272 - 286
  • [3] Batch Mode Active Learning for Networked Data with Optimal Subset Selection
    Xu, Haihui
    Zhao, Pengpeng
    Sheng, Victor S.
    Liu, Guanfeng
    Zhao, Lei
    Wu, Jian
    Cui, Zhiming
    WEB-AGE INFORMATION MANAGEMENT (WAIM 2015), 2015, 9098 : 96 - 108
  • [4] Non-Uniform Subset Selection for Active Learning in Structured Data
    Paul, Sujoy
    Bappy, Jawadul H.
    Roy-Chowdhury, Amit
    30TH IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2017), 2017, : 830 - 839
  • [5] A unified active learning framework for annotating graph data for regression task
    Samoaa, Peter
    Aronsson, Linus
    Longa, Antonio
    Leitner, Philipp
    Chehreghani, Morteza Haghir
    ENGINEERING APPLICATIONS OF ARTIFICIAL INTELLIGENCE, 2024, 138
  • [6] Data Driven Feature Selection for Machine Learning Algorithms in Computer Vision
    Zhang, Fan
    Li, Wei
    Zhang, Yifan
    Feng, Zhiyong
    IEEE INTERNET OF THINGS JOURNAL, 2018, 5 (06): : 4262 - 4272
  • [7] Active Vision for Deep Visual Learning: A Unified Pooling Framework
    Guo, Nan
    Gu, Ke
    Qiao, Junfei
    Liu, Hantao
    IEEE TRANSACTIONS ON INDUSTRIAL INFORMATICS, 2022, 18 (10) : 6610 - 6618
  • [8] Learning Active Learning from Data
    Konyushkova, Ksenia
    Raphael, Sznitman
    Fua, Pascal
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 30 (NIPS 2017), 2017, 30
  • [9] Training Data Subset Search With Ensemble Active Learning
    Chitta, Kashyap
    Alvarez, Jose M.
    Haussmann, Elmar
    Farabet, Clement
    IEEE TRANSACTIONS ON INTELLIGENT TRANSPORTATION SYSTEMS, 2022, 23 (09) : 14741 - 14752
  • [10] MeshCut data augmentation for deep learning in computer vision
    Jiang, Wei
    Zhang, Kai
    Wang, Nan
    Yu, Miao
    PLOS ONE, 2020, 15 (12):