Large-scale structural learning and predicting via hashing approximation

被引:0
|
作者
Dandan Chen
Yingjie Tian
机构
[1] University of Chinese Academy of Sciences,School of Mathematical Sciences
[2] Chinese Academy of Sciences,Research Center on Fictitious Economy and Data Science, Key Laboratory of Big Data Mining and Knowledge Management
来源
关键词
Nonparallel support vector machine; Structural information; Locality-sensitive hashing; Minwise hashing;
D O I
暂无
中图分类号
学科分类号
摘要
By combining the structural information with nonparallel support vector machine, structural nonparallel support vector machine (SNPSVM) can fully exploit prior knowledge to directly improve the algorithm’s generalization capacity. However, the scalability issue how to train SNPSVM efficiently on data with huge dimensions has not been studied. In this paper, we integrate linear SNPSVM with b-bit minwise hashing scheme to speedup the training phase for large-scale and high-dimensional statistical learning, and then we address the problem of speeding-up its prediction phase via locality-sensitive hashing. For one-against-one multi-class classification problems, a two-stage strategy is put forward: a series of hash-based classifiers are built in order to approximate the exact results and filter the hypothesis space in the first stage and then the classification can be refined by solving a multi-class SNPSVM on the remaining classes in the second stage. The proposed method can deal with large-scale classification problems with a huge number of features. Experimental results on two large-scale datasets (i.e., news20 and webspam) demonstrate the efficiency of structural learning via b-bit minwise hashing. Experimental results on the ImageNet-BOF dataset, and several large-scale UCI datasets show that the proposed hash-based prediction can be more than two orders of magnitude faster than the exact classifier with minor losses in quality.
引用
收藏
页码:2889 / 2903
页数:14
相关论文
共 50 条
  • [31] Scalable Supervised Discrete Hashing for Large-Scale Search
    Luo, Xin
    Wu, Ye
    Xu, Xin-Shun
    WEB CONFERENCE 2018: PROCEEDINGS OF THE WORLD WIDE WEB CONFERENCE (WWW2018), 2018, : 1603 - 1612
  • [32] Neighborhood Discriminant Hashing for Large-Scale Image Retrieval
    Tang, Jinhui
    Li, Zechao
    Wang, Meng
    Zhao, Ruizhen
    IEEE TRANSACTIONS ON IMAGE PROCESSING, 2015, 24 (09) : 2827 - 2840
  • [33] Supervised Distributed Hashing for Large-Scale Multimedia Retrieval
    Zhai, Deming
    Liu, Xianming
    Ji, Xiangyang
    Zhao, Debin
    Satoh, Shin'ichi
    Gao, Wen
    IEEE TRANSACTIONS ON MULTIMEDIA, 2018, 20 (03) : 675 - 686
  • [34] Semi-Supervised Hashing for Large-Scale Search
    Wang, Jun
    Kumar, Sanjiv
    Chang, Shih-Fu
    IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2012, 34 (12) : 2393 - 2406
  • [35] Unsupervised Multiview Distributed Hashing for Large-Scale Retrieval
    Shen, Xiaobo
    Tang, Yunpeng
    Zheng, Yuhui
    Yuan, Yun-Hao
    Sun, Quan-Sen
    IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2022, 32 (12) : 8837 - 8848
  • [36] Cascaded Deep Hashing for Large-Scale Image Retrieval
    Lu, Jun
    Zhang, Li
    NEURAL INFORMATION PROCESSING (ICONIP 2018), PT VI, 2018, 11306 : 419 - 429
  • [37] Distance variety preserving hashing for large-scale retrieval
    Zhai, Sheping
    CONCURRENCY AND COMPUTATION-PRACTICE & EXPERIENCE, 2019, 31 (12):
  • [38] Large-scale eigenvector approximation via Hilbert Space Embedding Nystrom
    Lin, Ming
    Wang, Fei
    Zhang, Changshui
    PATTERN RECOGNITION, 2015, 48 (05) : 1904 - 1912
  • [39] LARGE-SCALE STRUCTURE AND THE ADHESION APPROXIMATION
    WEINBERG, DH
    GUNN, JE
    MONTHLY NOTICES OF THE ROYAL ASTRONOMICAL SOCIETY, 1990, 247 (02) : 260 - 286
  • [40] Scheduling Large-scale Distributed Training via Reinforcement Learning
    Peng, Zhanglin
    Ren, Jiamin
    Zhang, Ruimao
    Wu, Lingyun
    Wang, Xinjiang
    Luo, Ping
    2018 IEEE INTERNATIONAL CONFERENCE ON BIG DATA (BIG DATA), 2018, : 1797 - 1806