The most widely adopted approach for knowledge extraction from raw data generated at the edges of the Internet (e.g., by IoT or personal mobile devices) is through global cloud platforms, where data is collected from devices, and analysed. However, with the increasing number of devices spread in the physical environment, this approach rises several concerns. The data gravity concept, one of the basis of Fog and Mobile Edge Computing, points towards a decentralisation of computation for data analysis, whereby the latter is performed closer to where data is generated, for both scalability and privacy reasons. Hence, data produced by devices might be processed according to one of the following approaches: (i) directly on devices that collected it (ii) in the cloud, or (iii) through fog/mobile edge computing techniques, i.e., at intermediate nodes in the network, running distributed analytics after collecting subsets of the data. Clearly, (i) and (ii) are the two extreme cases of (iii). It is worth noting that the same analytics task executed at different collection points in the network, comes at different costs in terms of traffic generated over the network. Precisely, these costs refer to the traffic generated to move data towards the collection point selected (e.g. the Edge or the Cloud) and the one induced by the distributed analytics process. Until now, deciding if to use intermediate collection points, and which one they should be in order to both obtain a target accuracy and minimise the network traffic, is an open question. In this paper, we propose an analytical framework able to cope with this problem. Precisely, we consider learning tasks, and define a model linking the accuracy of the learning task performed with a certain set of collection points, with the corresponding network traffic. The model can be used to identify, given the specification of the learning problem (e.g. binary classification, regression, etc.), and its target accuracy, what is the optimal level for collecting data in order to minimise the total network cost. We validate our model through simulations in order to show that setting, in simulation, the level of intermediate collection indicated by our model, leads to the minimum cost for the target accuracy.