Reinforcement Learning (RL) has been proved to alleviate metric inconsistency and exposure deviation in training-evaluation of neural machine translation (NMT), but the sample efficiency is limited by sampling methods (Temporal-Difference (TD) or Monte-Carlo (MC)), and still cannot compensate for the inefficient non-zero rewards caused by insufficient data sets. In addition, RL rewards can only be effective when the model parameters are basically determined. Therefore, we proposed episodic control reinforcement learning method, which obtains the model with basically determined parameters through the knowledge transfer, and records the historical action trajectory by introducing semi-tabular differentiable neural dictionary (DND), the model can quickly approximate the real state-value according to samples reward when updating policy. We verified on CCMT2019 Mongolian-Chinese (Mo-Zh), Tibetan-Chinese (Ti-Zh), and Uyghur-Chinese (Ug-Zh) tasks, and the results showed that the quality was significantly improved, which fully demonstrated the effectiveness of the method.