Different from traditional operational research optimization algorithms, Deep Learning can solve combinatorial optimization problems in real time and has been widely used. However, these models based on pointer network have difficulty in obtaining features on the graph, they are not conducive to solving problems that are modeled on the graph. Secondly, as the structure of deep learning models becomes more complex, the explanation and analysis of the models becomes more difficult. There is a lack of interpretable work on models, which seriously hinders the development of Deep Learning. In order to solve these problems, a policy network that can effectively encode features on the graph and is interpretable is proposed. Specifically, a model structure in the field of graph neural network is introduced to extract the features on the graph, and a policy network is built, the network is trained using Reinforcement Learning; an agent-based interpretability method is used to mine the features that be used as explanation in the initial feature, these mined features are used to explain the actions of policy network.The effectiveness of the above methods is verified by experiments for solving the Traveling Salesman Problem: Policy network can effectively encode the features on the graph and has good generalization ability; The interpretability experiment shows that the actions of the policy network can be explained, which proves the interpretability of the policy network.