With the emergence of edge computing, the problem of offloading jobs between an Edge Device (ED) and an Edge Server (ES) received significant attention in the past. Motivated by the fact that an increasing number of applications are using Machine Learning (ML) inference from the data samples collected at the EDs, we study the problem of offloading inference jobs by considering the following novel aspects: 1) in contrast to a typical computational job an inference job has accuracy measure, 2) both inference accuracy and processing time of an inference job increases with the size of the ML model, and 3) recently proposed Deep Neural Networks (DNNs) for resource-constrained EDs provide the choice of scaling down the model size by trading off the inference accuracy. Therefore, we consider a newsystem with multiple small-size ML models at the ED and a powerful large-size ML model at the ES and study the problem of offloading inference jobs with the objective of maximizing the total inference accuracy at the ED subject to a time constraint.. on the makespan. Noting that the problem is NP-hard, we propose an approximation algorithm: Accuracy Maximization using LP-Relaxation and Rounding (AMR(2)), and prove that it results in a makespan at most 2T, and achieves a total accuracy that is lower by a small constant (less than 1) from the optimal total accuracy. As proof of concept, we implemented AMR2 on a Raspberry Pi, equipped with MobileNets, that is connected via LAN to a server equipped with ResNet, and studied the total accuracy and makespan performance of AMR(2) for image classification.