The hierarchical model
You don't actually even need the marginal probability mass function $m()$, you actually only need the marginal moments of $Y$.
In this tutorial, Casella (1992) is assuming the following hierarchical model for a response count $Y$:
$$Y|p\sim\mbox{bin}(n,p)$$
and
$$p \sim \mbox{Beta}(\lambda,\lambda)$$
with $n=50$.
Moments of the prior distribution
The Beta distribution usually has two parameters, $\alpha$ and $\beta$ say, and the mean is $\alpha/(\alpha+\beta)$.
The variance is slightly more complicated, see the Wikipedia article on the Beta distribution.
In this case, both parameters are the same, $\alpha=\beta=\lambda$, so the prior distribution for $p$ is symmetric with mean 1/2.
The hyper-parameter $\lambda$ affects the variance of the prior distribution around 1/2, with larger values of $\lambda$ corresponding to smaller variances.
In other words, $\lambda$ determines the precision (and hence the informativeness) of the prior.
Specifically, the mean and variance of the prior distribution are $E(p)=1/2$ and $\mbox{var}(p)=1/\{4(2\lambda+1)\}$.
It will also be convenient later to note that
$$E[p(1-p)]=\frac12-\frac14-\frac1{4(2\lambda+1)}=\frac{\lambda}{2(2\lambda+1)}$$
Marginal moments for $Y$
Now we can obtain the marginal moments of $Y$.
It is a characteristic of empirical Bayes that it uses the marginal distribution for marginal moments to estimate the unknown parameters in the prior.
The marginal mean of $Y$ is obviously
$$E(Y)=E_p(np)=n/2$$
The marginal variance of $Y$ is most easily obtained by the law of total variance:
\begin{eqnarray}\mbox{var}(Y)&=&E_p \mbox{var}(Y|p) + \mbox{var}_p E(Y|p)\\
&=&E_p[ np(1-p)] + \mbox{var}_p[np]\\
&=&\frac{n\lambda}{2(2\lambda+1)}+\frac{n^2}{4(2\lambda+1)}\\
&=&\frac{n}{4}\frac{2\lambda+n}{2\lambda+1}
\end{eqnarray}
It might not be obvious, but var($Y$) is a decreasing function of $\lambda$.
It has a maximum value of $n^2/4$ for $\lambda=0$ and a minimum value of $n/4$ for $\lambda=\infty$.
Hyper-parameter estimation
We can use the observed variance of $Y$ to estimate $\lambda$.
Suppose we observe y-values 35 and 27.
The sample variance of these two values is 32.
Equating
$$\mbox{var}(Y)=\frac{n}{4}\frac{2\lambda+n}{2\lambda+1}=32$$
and solving for $\lambda$ gives $\hat\lambda=$15.205.
Posterior Inference
Now that we have estimated the hyper-parameter $\hat\lambda$, we can now proceed with Bayesian posterior inference.
Given an observation $Y=y_i$, we have two possible estimators of the corresponding $p_i$.
The usual maximum likelihood estimator (MLE) is $\hat p_i=y_i/n$, but we also have the prior estimate $p_0=1/2$ predicted by the prior distribution.
How should we combine these two estimators?
The precision of the MLE is determined by $n$ and the precision of the prior is determined by $2\lambda$, so we weight the two estimators accordingly.
The posterior estimator for $p_i$ is the weighted average of the two estimators
$$E(p_i|y_i,\lambda)=w_0 p_0 + w_1 \hat{p}_i$$
with weights equal to the relative precisions
$$w_0=\frac{2\lambda}{2\lambda+n}$$
and
$$w_1=\frac{n}{2\lambda+n}$$
This gives $$E(p_i|y_i,\lambda)=\frac{y_i+\lambda}{n+2
\lambda}$$
and we just plug-in $\lambda=\hat\lambda$.
Another way to interpret the prior estimator is like this.
It is as if we observed another $n=2\lambda$ cases and observed $\lambda$ successes (exactly half).
So we just combine the prior sample with the observed sample to get $y_i+\lambda$ successes from $n+2\lambda$ cases, and that becomes the posterior mean.
Interpretation
Notice what is happening here.
If the prior distribution was diffuse, then the $p_i$s will vary from one observation to another, and the variance of the $y_i$ will be relatively large.
If the prior distribution was very concentrated, then the $p_i$ should be very consistent and the $y_i$ should be less variable.
So we can use the variance of the $y_i$ to guess what the prior precision $2\lambda$ might have been.
If the $y_i$ values are very tight, then we conclude that $\lambda$ is large and we give more weight to the prior distribution, moving all the $\hat p_i$ values towards 1/2.
If the $y_i$ values are very dispersed, then we conclude that $\lambda$ was small, and we give less weight to the prior, leaving the $\hat p_i$ values more as they are.
This is the essential idea of empirical Bayes, common to all empirical Bayes applications.