In this paper we advocate that market mechanisms inspired by economics in conjunction with intelligent data selection is the key to fulfilling learning tasks in the presence of big data subject to privacy concerns of users. We design a market of private data that are gathered towards building a classifier. Each data owner has a private cost that quantifies his discomfort for providing data to the learner. Also, at each stage of the learning process, each data owner is characterized by a utility score, which expresses the utility of his data for the learner. The learner initiates a call for buying data, and interested owners respond by declaring their costs. The learner computes the utility score of each data owner. It then selects one owner to buy data from and associated payment. For the goals of minimizing the expected privacy cost of users and of minimizing the expected payment by the learner, we propose a variation of a Vickrey-Clarke-Groves (VCG) auction and an optimal auction respectively. For the case where the data arrive in a streaming fashion, we formulate the multi-round sequential decision version of the problem of learning the classifier. At each round the learner decides whether it will stop the learning process given the current classifier accuracy or perform one more auction. The problem amounts to weighing the current cost of classifier inaccuracy against the expected (privacy or reimbursement) costs incurred through the auction, plus the expected cost of the new classifier accuracy; we cast this problem as an optimal stopping one. The complexity of our framework scales only linearly with the size of the data set and fits in a broad range of private-data scenarios and data attributes.
机构:
Univ Calif Los Angeles, Anderson Sch Management, Los Angeles, CA 90095 USAUniv Calif Los Angeles, Anderson Sch Management, Los Angeles, CA 90095 USA