Is it possible to apply policy gradient if the parameters of policy are not differentiable? If not, is there any other algorithm for optimizing such type of policies?
One example I'm thinking about is a hard boundary: if $W^T x > 0$ then take action $a_0$, and if $W^T x \leq 0$ then take action $a_1$. Here the parameter is the vector $W$ and the policy is not differentiable.
I believe this question is kind of general, as most deterministic policies should be non-differentiable with respect to their parameters.