I'm familiar with the maximum entropy (ME) principle in statistical mechanics, where, for example, the Boltzmann distribution $p(\epsilon_i|\beta)$ is identified as the ME distribution constrained by normalizability and a given average energy $\langle E \rangle$, where the inverse temperature $\beta$ is a Lagrange multiplier.
E. T. Jaynes called this "no-data inference", i.e., from $p(\epsilon_i|\beta)$ we make a host of other predictions (e.g. calculate the average pressure $\langle P \rangle$) based only on the information ingested by ME, not on directly observed data.
But now suppose I have a data set of measured values $x_{1:N} \in \mathbb{R}^N$ and I want to determine a pdf $p(x|\cdot)$ that describes my knowledge about the $x_{1:N}$ using ME.
After I have determined an invariant measure $m(x)$ to describe my initial state of ignorance, one way to proceed would be moment density estimation. In other words, I could start calculating $I$ empirical moments of the $x_{1:N}$, $$ \langle x^i \rangle, \quad i = 1 \cdots I, $$ use these to numerically determine the Lagrange multipliers $\lambda_{1:I}$, and finally end up with $$ p(x|\lambda_{1:I}) = Z(\lambda_{1:I})^{-1} \ m(x) \exp\Big(\sum_{i=1}^I \lambda_i x^i \Big). $$
Here's what troubling me:
- Since I actually dispose of the $x_{1:N}$, I can calculate an empirical average of any function $f(x)$ over them. For example, $f(x) = x$ or $f(x) = \arctan{\log |x|}$.
- The form of the function $f(x)$ directly determines the form of $p(x|\cdot)$ and consequently the sufficient statistic through the constraint $\langle f(x) \rangle = \int f(x) \ p(x) \ dx$.
- But what determines the functional form of $f(x)$? Here's what I think: In physics, it is presumably the measuring apparatus or "natural" expressions occuring frequently in physical theory (e.g. we talk about energy $E$ and not $\sqrt{E}$, so we supply $\langle E \rangle$ and not $\langle \sqrt{E} \rangle$). But what should we do when we actually have data, and we could choose any $f(x)$, and hence $p(x|\cdot)$, we like?
- What does the invariant measure $m(x)$ actually accomplish in this regard? Suppose $x > 0$ and $y = x^2$. If I setup a ME pdf $p(x|\cdot)$ using $\langle x \rangle$ and $\langle x^2 \rangle$, is this equivalent to the ME pdf $p(y|\cdot)$ using $\langle \sqrt{y} \rangle$ and $\langle y \rangle$?