Learning From Less Data: A Unified Data Subset Selection and Active Learning Framework for Computer Vision

被引:29
|
作者
Kaushal, Vishal [1 ]
Iyer, Rishabh [2 ]
Kothawade, Suraj [1 ]
Mahadev, Rohan [3 ]
Doctor, Khoshrav [4 ]
Ramakrishnan, Ganesh [1 ]
机构
[1] Indian Inst Technol, Mumbai, Maharashtra, India
[2] Microsoft, Redmond, WA USA
[3] AITOE Labs, Mumbai, Maharashtra, India
[4] Univ Massachusetts, Amherst, MA 01003 USA
关键词
D O I
10.1109/WACV.2019.00142
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
Supervised machine learning based state-of-the-art computer vision techniques are in general data hungry. Their data curation poses the challenges of expensive human labeling, inadequate computing resources and larger experiment turn around times. Training data subset selection and active learning techniques have been proposed as possible solutions to these challenges. A special class of subset selection functions naturally model notions of diversity, coverage and representation and can be used to eliminate redundancy thus lending themselves well for training data subset selection. They can also help improve the efficiency of active learning in further reducing human labeling efforts by selecting a subset of the examples obtained using the conventional uncertainty sampling based techniques. In this work, we empirically demonstrate the effectiveness of two diversity models, namely the Facility-Location and Dispersion models for training-data subset selection and reducing labeling effort. We demonstrate this across the board for a variety of computer vision tasks including Gender Recognition, Face Recognition, Scene Recognition, Object Detection and Object Recognition. Our results show that diversity based subset selection done in the right way can increase the accuracy by upto 5 - 10% over existing baselines, particularly in settings in which less training data is available. This allows the training of complex machine learning models like Convolutional Neural Networks with much less training data and labeling costs while incurring minimal performance loss.
引用
收藏
页码:1289 / 1299
页数:11
相关论文
共 50 条
  • [31] Query-learning-based iterative feature-subset selection for learning from high-dimensional data sets
    Mamitsuka, H
    KNOWLEDGE AND INFORMATION SYSTEMS, 2006, 9 (01) : 91 - 108
  • [32] A general framework for learning rules from data
    Apolloni, B
    Esposito, A
    Malchiodi, D
    Orovas, C
    Palmas, G
    Taylor, JG
    IEEE TRANSACTIONS ON NEURAL NETWORKS, 2004, 15 (06): : 1333 - 1349
  • [33] Towards a Framework for Learning from Networked Data
    Ramon, Jan
    GRAPH-BASED REPRESENTATION AND REASONING, 2014, 8577 : 25 - 30
  • [34] A Theoretical Framework for Learning from Quantum Data
    Heidari, Mohsen
    Padakandla, Arun
    Szpankowski, Wojciech
    2021 IEEE INTERNATIONAL SYMPOSIUM ON INFORMATION THEORY (ISIT), 2021, : 1469 - 1474
  • [35] Optimal Training Data Selection in Active Learning for Discrimination and Classification
    Xu, Xiaojian
    Shay, Charlie
    2020 5TH INTERNATIONAL CONFERENCE ON COMPUTER AND COMMUNICATION SYSTEMS (ICCCS 2020), 2020, : 15 - 19
  • [36] Synthetic Data for Deep Learning in Computer Vision & Medical Imaging: A Means to Reduce Data Bias
    Paproki, Anthony
    Salvado, Olivier
    Fookes, Clinton
    ACM COMPUTING SURVEYS, 2024, 56 (11)
  • [37] Online Active Learning Ensemble Framework for Drifted Data Streams
    Shan, Jicheng
    Zhang, Hang
    Liu, Weike
    Liu, Qingbao
    IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2019, 30 (02) : 486 - 498
  • [38] Lill-DATA - A Framework for Traceable Active Learning Projects
    Stieler, Fabian
    Elia, Miriam
    Weigell, Benjamin
    Bauer, Bernhard
    Kienle, Peter
    Roth, Anton
    Muellegger, Gregor
    Nann, Marius
    Dopfer, Sarah
    2023 IEEE 31ST INTERNATIONAL REQUIREMENTS ENGINEERING CONFERENCE WORKSHOPS, REW, 2023, : 465 - 474
  • [39] An optimization framework for classifier learning from image data for computer-assisted diagnosis
    Mennicke, J.
    Muenzenmayer, C.
    Wittenberg, T.
    Schmid, U.
    4TH EUROPEAN CONFERENCE OF THE INTERNATIONAL FEDERATION FOR MEDICAL AND BIOLOGICAL ENGINEERING, 2009, 22 (1-3): : 629 - 632
  • [40] A Unified Batch Selection Policy for Active Metric Learning
    Priyadarshini, K.
    Chaudhuri, Siddhartha
    Borkar, Vivek
    Chaudhuri, Subhasis
    MACHINE LEARNING AND KNOWLEDGE DISCOVERY IN DATABASES, ECML PKDD 2021: RESEARCH TRACK, PT II, 2021, 12976 : 599 - 616