We propose a new method to generate a program from a Reinforcement Learning policy. Compared to previous methods, we exploit more RL-specific elements such as the critic value-network. Improved actions from the critic are used to steer a Genetic Programming process via a fitness function.