The most obvious and straightforward application of tensors (that I know of) in statistics is computing high-order moments of a multivariate distribution. For example, consider a random vector $x\sim F$, where $F$ is some $p$-dimensional distribution. Given some data matrix $X \in \mathbb{R}^{n\times p}$ where $n$ is the number of observations, each of which is drawn iid from $F$, the second moment $\mathbb{E}(xx^\top) = \mathbb{E}(x\otimes x)$ can be estimated from the sample $X$ as follows $$\hat{\mathbb{E}}(x\otimes x) = \frac1n \sum_{i=1}^n X_{i\cdot} \otimes X_{i\cdot} = \frac1n X^\top X \in \mathbb{R}^{p\times p}$$ where $X_{i\cdot}$ is the $i^{th}$ row of $X$. Certainly this is a matrix that is only a few operations away from the covariance matrix. Continuing on to the third moment, which is again related to "co-skewness," we see we are dealing with an order-3 tensor
$$\hat{\mathbb{E}}(x\otimes x \otimes x) = \frac1n \sum_{i=1}^n X_{i\cdot} \otimes X_{i\cdot}\otimes X_{i\cdot} \in \mathbb{R}^{p\times p\times p}$$ The "co-kurtosis" tensor is order 4 and so on for higher-order moments.
These moment tensors have been applied in financial portfolio optimization decades ago, multivariate data standardization (standardize by skew, not just mean and variance), and obviously in deep learning (eg. tensorflow) where the gradients of the loss function with respect to model parameters contain tensors that are used in back-propagation. I believe there are additional applications in natural language processing, multivariate time series, and stochastic block models.
I agree with @whuber: When the indices are not pivotal to the work, it certainly is an intuitive and flexible generalization that sheds light on the lower-dimensional cases. However, it tends to make things difficult for statisticians and engineers that have to stress out about three or more indices and weird complicated generalization of what seemed like ergonomic rules (eg. what is the trace of a high-order tensor? what does symmetry mean? etc) That's probably why many of the statisticians and applied mathematicians I know avoid tensors and simply stack/flatten the 2d cross-sections of each tensor into a tall matrix.