I have a population $\mathcal{X}$ of $N$ samples extracted from a multivariate gaussian random variable $\mathbf{x} \in \mathbb{R}^d$. Let us define a transformation $f_{d\rightarrow r} (\mathbf{x}) = \mathbf{x'}$ which performs a dimensionality reduction to a space $\mathbb{R}^r$ with $r < d$, and a "reverse" transformation $g_{r \rightarrow d}(\mathbf{x'}) = \mathbf{\tilde x}$ (as pointed out in the comments, not an actual inverse) that projects the data back in the original space. I also define PCA$_{d\rightarrow r}(\mathbf{x})$ as the projection in the space of the top $r$ eigenvectors of the (sample-estimated) covariance matrix, with corresponding reverse transformation PCA$^{-1}_{r \rightarrow d}(z)$. Then, is PCA$_{d\rightarrow r}(\mathbf{x})$ optimal (in terms of average MSE of reconstruction) in the space of (possibly nonlinear) dimensionality reduction functions? I.e.
$$ \text{argmin}_{f_{d\rightarrow r},\,g_{r\rightarrow d}} \mathbb{E}_\mathcal{X} \left[ \sum_{x_i \in \mathcal{X}}|| g_{r\rightarrow d}\left(f_{d\rightarrow r}(x_i)\right) - x_i ||^2 \right] = \text{PCA}_{d\rightarrow r}(\cdot), \, \text{PCA}^{-1}_{r \rightarrow d}(\cdot) \text{ ?}$$
While I realize this is not true for a generic random variable, I have an intuition that it could be the case for when the input data is normally distributed, since the first and second moments define all the relevant information necessary to compress the data. I am also aware of the fact that PCA would be optimal in the space of linear projections, but I am interested in the wider space of the nonlinear transformations (e.g. autoencoders). However I could not find any explicit solution to this question anywhere.
EDIT
I realized it will be easier to provide some context on the practical problem that made me ask this question. I am working with autoencoders for dimensionality reduction, and I noticed that for some kinds of data the reconstruction error (even in the training set) goes never below the reconstruction error of PCA (with same embedding dimension), up to negligible fluctuations. This happens regardless of how I change the architecture or hyperparameters. I began suspecting that it was due to the data having only first order correlations (similar to a multivariate gaussian distributed variable), for which there could be some fundamental limit of compressibility. In other words, is PCA an actual lower bound for the reconstruction error of data when reducing the dimensions from $d$ to $r$ components? Any hint on this would be appreciated.