Learning From Less Data: A Unified Data Subset Selection and Active Learning Framework for Computer Vision

被引:29
|
作者
Kaushal, Vishal [1 ]
Iyer, Rishabh [2 ]
Kothawade, Suraj [1 ]
Mahadev, Rohan [3 ]
Doctor, Khoshrav [4 ]
Ramakrishnan, Ganesh [1 ]
机构
[1] Indian Inst Technol, Mumbai, Maharashtra, India
[2] Microsoft, Redmond, WA USA
[3] AITOE Labs, Mumbai, Maharashtra, India
[4] Univ Massachusetts, Amherst, MA 01003 USA
关键词
D O I
10.1109/WACV.2019.00142
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
Supervised machine learning based state-of-the-art computer vision techniques are in general data hungry. Their data curation poses the challenges of expensive human labeling, inadequate computing resources and larger experiment turn around times. Training data subset selection and active learning techniques have been proposed as possible solutions to these challenges. A special class of subset selection functions naturally model notions of diversity, coverage and representation and can be used to eliminate redundancy thus lending themselves well for training data subset selection. They can also help improve the efficiency of active learning in further reducing human labeling efforts by selecting a subset of the examples obtained using the conventional uncertainty sampling based techniques. In this work, we empirically demonstrate the effectiveness of two diversity models, namely the Facility-Location and Dispersion models for training-data subset selection and reducing labeling effort. We demonstrate this across the board for a variety of computer vision tasks including Gender Recognition, Face Recognition, Scene Recognition, Object Detection and Object Recognition. Our results show that diversity based subset selection done in the right way can increase the accuracy by upto 5 - 10% over existing baselines, particularly in settings in which less training data is available. This allows the training of complex machine learning models like Convolutional Neural Networks with much less training data and labeling costs while incurring minimal performance loss.
引用
收藏
页码:1289 / 1299
页数:11
相关论文
共 50 条
  • [11] Unified Fairness from Data to Learning Algorithm
    Zhang, Yanfu
    Luo, Lei
    Huang, Heng
    2021 21ST IEEE INTERNATIONAL CONFERENCE ON DATA MINING (ICDM 2021), 2021, : 1499 - 1504
  • [12] Active Learning With Optimal Instance Subset Selection
    Fu, Yifan
    Zhu, Xingquan
    Elmagarmid, Ahmed K.
    IEEE TRANSACTIONS ON CYBERNETICS, 2013, 43 (02) : 464 - 475
  • [13] Data Efficient Lithography Modeling With Transfer Learning and Active Data Selection
    Li, Yibo
    Li, Meng
    Watanabe, Yuki
    Kimura, Taiki
    Matsunawa, Tetsuaki
    Nojima, Shigeki
    Pan, David Z.
    IEEE TRANSACTIONS ON COMPUTER-AIDED DESIGN OF INTEGRATED CIRCUITS AND SYSTEMS, 2019, 38 (10) : 1900 - 1913
  • [14] GLISTER: Generalization based Data Subset Selection for Efficient and Robust Learning
    Killamsetty, Krishnateja
    Sivasubramanian, Durga
    Ramakrishnan, Ganesh
    Iyer, Rishabh
    THIRTY-FIFTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, THIRTY-THIRD CONFERENCE ON INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE AND THE ELEVENTH SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2021, 35 : 8110 - 8118
  • [15] Unified Programming Model and Software Framework for Big Data Machine Learning and Data Analytics
    Gu, Rong
    Tang, Yun
    Dong, Qianhao
    Wang, Zhaokang
    Liu, Zhiqiang
    Wang, Shuai
    Yuan, Chunfeng
    Huang, Yihua
    IEEE 39TH ANNUAL COMPUTER SOFTWARE AND APPLICATIONS CONFERENCE WORKSHOPS (COMPSAC 2015), VOL 3, 2015, : 562 - 567
  • [16] Sample Selection based Active Learning for Imbalanced Data
    Chairi, Ikram
    Alaoui, Souad
    Lyhyaoui, Abdelouahid
    10TH INTERNATIONAL CONFERENCE ON SIGNAL-IMAGE TECHNOLOGY AND INTERNET-BASED SYSTEMS SITIS 2014, 2014, : 645 - 651
  • [17] Dual Active Learning for Both Model and Data Selection
    Tang, Ying-Peng
    Huang, Sheng-Jun
    PROCEEDINGS OF THE THIRTIETH INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE, IJCAI 2021, 2021, : 3052 - 3058
  • [18] Practical Active Learning with Model Selection for Small Data
    Pardakhti, Maryam
    Mandal, Nila
    Ma, Anson W. K.
    Yang, Qian
    20TH IEEE INTERNATIONAL CONFERENCE ON MACHINE LEARNING AND APPLICATIONS (ICMLA 2021), 2021, : 1647 - 1653
  • [19] Unlabeled data selection for active learning in image classification
    Xiongquan Li
    Xukang Wang
    Xuhesheng Chen
    Yao Lu
    Hongpeng Fu
    Ying Cheng Wu
    Scientific Reports, 14
  • [20] Unlabeled data selection for active learning in image classification
    Li, Xiongquan
    Wang, Xukang
    Chen, Xuhesheng
    Lu, Yao
    Fu, Hongpeng
    Wu, Ying Cheng
    SCIENTIFIC REPORTS, 2024, 14 (01)