RPL is introduced to conduct path selection in Low-power and Lossy Networks (LLN), including IoT. A routing policy in RPL is governed by its objective function, which corresponds to the requirements of the IoT application, e.g., energy-efficiency, and reliability in terms of Packet Delivery Ratio (PDR). In many applications, it is not possible to connect the nodes to the power outlet. Also, since nodes may be geographically inaccessible, replacing the depleted batteries is infeasible. Hence, harvesters are an admirable replacement for traditional batteries to prevent energy hole problem, and consequently to enhance the lifetime and reliability of IoT networks. Nevertheless, the unstable level of energy absorption in harvesters necessitates developing a routing policy, which could consider harvesting aspects. Furthermore, since the rates of absorption, and consumption are incredibly dynamic in different parts of the network, learning-based techniques could be employed in the routing process to provide energy-efficiency. Accordingly, this paper introduces LANTERN; a learning-based routing policy for improving PDR in energy-harvesting IoT networks. In addition to the rate of energy absorption, and consumption, LANTERN utilizes the remaining energy in its routing policy. In this regard, LANTERN introduces a novel routing metric called Energy Exponential Moving Average (EEMA) to perform its path selection. Based on diversified simulations conducted in Cooja, with prolonging the lifetime of the network by 5.7x, and mitigating the probability of energy hole problem, LANTERN improves the PDR by up to 97%, compared to the state-of-the-art. Also, the consumed energy per successfully delivered packet is reduced by 76%.