I just implemented the PPO algorithm in tensorflow and strictly followed the algorithm provided in the original PPO paper by Schulman et. al. 2017
Previously I did some experiments with the DDPG algorithm by Lillicrap et. al. 2016, in which they employ a target q-function in order to stabilize the training. However in the PPO paper they do not seem to use a target v-function.
Why no target value function is needed in the PPO algorithm? And would there by any benefits in using a target v-function with soft updates in the PPO algorithm?