You would like to find the most uninformative distribution possible that matches your data (or some aspects of your data). This way, you are making the least assumptions about the nature of the random process generating the data.
Consider that the most informative one is probably going to overfit the data.
Here is an example: Suppose I have iid data $(X_1,X_2,\dots,X_n) = (0,1,0,1,1,2,1,0,0)$ from a distribution on a 3-letter alphabet $\{0,1,2\}$. Let $p_i$ be the probability of observing $i=0,1,2$ from this distribution. Suppose that I want to pick a distribution (equivalently vector $(p_0,p_1,p_2)$) whose mean matches the observed sample mean $\bar X = \frac1n \sum_{i=1}^n X_i$.
The maximum entropy principle solves the following problem
$$
\max_{p} -\sum_{i=0}^n p_i \log p_i, \; \quad \text{subject to}\quad 0 p_0 + 1p_1 + 2p_2 = \bar x
$$
and that $p$ is a probability vector: $\sum_i p_i = 1$ and $p_i \ge 0$ for $i=0,1,2$. Entropy is a concave function and maximizing it over a convex region (usually) leads to to some $p$ from the interior of the region. On the other hand minimizing the entropy, a concave function, gives you one of the extreme points, i.e., one of $(1-\bar x,\bar x,0)$ or $(1-\frac12 \bar x,0,\frac12\bar x)$ which are less random distributions, putting all the mass on two of the letters out of three. (I will leave the details to you to figure out. You can eliminate $p_0$ and try to plot the feasible region in $p_1$-$p_2$ plane.) Which of these two extreme points is the solution (seems to be) determined by which of $\bar x/2$ or $\bar x$ is closer to $1/2$. For example if $\bar x = 0.6$, then $\bar x$ is closer to $1/2$ than $\bar x/2$, hence the solution will be $(0.4,0.6,0)$, i.e., you will put all the mass on $\{0,1\}$ even if you have observed some $2$s among your sample.
Without any data constraint, i.e., removing $0p_0 + 1p_1 + 2p_2 = \bar x$ maximizing the entropy gives you the uniform distribution $(1/3,1/3,1/3)$ while minimizing it gives you one of the deterministic distributions $(1,0,0)$, $(0,1,0)$ or $(0,0,1)$. A uniform prior before you see any data is better than assuming a priori with certainty that one of the letters is always coming up.