This problem is sometimes called score estimation, because $\nabla_x \log p(x)$ is the score of whatever model $p$ with respect to a hypothetical location parameter. I've seen it called the Hyvärinen score after Aapo Hyvärinen's technique called score matching for fitting (unnormalized) statistical models.
Some flavors of score estimation:
From a density estimate
Just do any flavor of density estimation, $\hat p(x)$, and differentiate its log, perhaps using automatic differentiation as in the other answers.
- Kernel density estimation might be reasonable in fairly low dimensions:
$$
\hat p(x) = \frac1n \sum_{i=1}^n k(X_i, x)
\qquad
\nabla_x \log \hat p(x)
= \frac{\frac1n \sum_{i=1}^n \nabla_x k(X_i, x)}{\frac1n \sum_{i=1}^n k(X_i, x)}
.$$
Deep density estimation models are a nice fit for this setup since they already use autodiff. One good recent model is
Papamakarios, Pavlakou, and Murray. Masked Autoregressive Flow for Density Estimation. NIPS 2018.
This type of approach is more computationally expensive than KDE but can yield excellent results, particularly if you have a fair amount of data and a GPU. Code is available from the authors.
From an unnormalized log-density estimate
Unnormalized log-densities are sometimes called the energy. Differentiating one of these estimates will be the score.
Kernel exponential families, which are fit via score matching, are a reasonably natural fit for score estimation. (They're hard to normalize, but luckily that doesn't matter for the score.) There's some evidence that these scale better than KDE in moderate dimensions for at least certain kinds of densities. For a recent computationally reasonable-ish approch, see
Sutherland (ahem), Strathmann, Arbel, and Gretton. Efficient and principled score estimation with Nyström kernel exponential families. AISTATS 2018.
These have some nice theoretical properties but, in my opinion as one of the authors of that paper: practical application still requires a lot of manual work in choosing the kernel and hyperparameters (something we're working on now). Code is available from the authors, but email me if you want to actually try it. :)
This is a more recent approach to directly train a network to output the energy, using a score matching variant. I'm not familiar with the practicalities here, but it seems potentially promising:
Saremi, Mehrjou, Schölkopf, and Hyvärinen. Deep Energy Estimator Networks. arXiv, May 2018.
Direct score estimators
You can also train a model to just directly output the score.
One problem with this approach is that you might or might not get something that's actually a valid gradient of some vector field; the first approach below certainly suffers from that problem, I'm not sure about the later ones.
In theory, if you train a denoising autoencoder $r(x + \sigma \varepsilon) \approx x$, where $\varepsilon \sim \mathcal N(0, I)$, then as $\sigma \to 0$ you have $\frac{r(x + \sigma \varepsilon) - x}{\sigma} \to \nabla_x \log p(x)$ under some assumptions about the strength of the autoencoder architecture and your ability to optimize it.
The estimator as written in this paper doesn't seem to work especially well (see the experimental results of the kernel exp family and deep energy estimator network papers above), but it might work well with a more complex network structure and more careful parameter tuning / etc. I wouldn't particularly recommend this strategy, but it's interesting.
Alain and Bengio. What Regularized Auto-Encoders Learn from the Data Generating Distribution. JMLR 2014.
The following paper (in Section 3.1) proposes an estimator specifically for $\nabla_x \log p(X_i)$, given only $X_i \sim p$, based on inverting Stein's identity (and also depending on a kernel choice). I'm not familiar with its practical performance, but imagine it should be okay if you choose a reasonable kernel and can get away with only evaluating the score at the $X_i$.
Li and Turner. Gradient Estimators for Implicit Models. ICLR 2018.
This paper is a sequel of sorts to the previous one, which gives a different (Stein-based) estimator that can be evaluated at out-of-sample points and provides some theoretical guarantees. I haven't read it yet, but it's high on the list:
Shi, Sun, and Zhu. A Spectral Approach to Gradient Estimation for Implicit Distributions. arXiv, June 2018.