Solution
The matrix algebra can be dismaying and, if not carried out elegantly, can require an awful lot of (superfluous) algebraic manipulation. However, the situation is much simpler than it looks, because (creating the matrix $X$ by putting a column of ones in first, and then the column of independent values $(x_i)$ after it)
$$X^\prime X = \pmatrix{n & S_x \\ S_x & S_{xx}}$$
and
$$X^\prime y = \pmatrix{S_y & S_{xy}}$$
(The $S_{*}$ are handy--and fairly common--abbreviations for sums of the variables and their products). Thus, the normal equations for the estimates $\hat\beta = (\hat\beta_0, \hat\beta_1)$ are--when written out as simultaneous linear equations--merely
$$\matrix{n \hat\beta_0 + S_x\hat\beta_1 = S_y \\
S_x \hat\beta_0 + S_{xx}\hat\beta_1 = S_{xy},}$$
which are to be solved for $\hat\beta_0$ and $\hat\beta_1.$ Indeed, you don't really need to solve this ab initio: all you have to do at this point is check which formula for $\hat \beta_1$ actually works. That requires only elementary algebra. I won't show it because there's a better way that produces the same result in a much more illuminating and generalizable fashion.
Motivation and Generalization
Recall that the normal equations are derived by considering the problem of minimizing the sum of squares of residuals,
$$\operatorname{SSR} = \sum_i \left(y_i - (\beta_0 + \beta_1 x_i)\right)^2.$$
The appearance of $\beta_0$ corresponds to a column of ones in $X$ while the appearance of $\beta_1$ corresponds to a column $(x_i)$ in $X$. In general, those columns are not orthogonal. (Recall that we say two vectors are orthogonal when their dot product is zero. Geometrically, this means they are perpendicular. See the references for more about this.) We can make them orthogonal by subtracting some multiple of one of them from the other. The easiest choice is to subtract a constant from each $x_i$ to make the result orthogonal to the constant column; that is, we seek a number $c$ for which
$$0 = (1,1,\ldots, 1) \cdot (x_1-c, x_2-c, \ldots, x_n-c) = \sum_{i} (1(x_i-c)) = Sx - nc.$$
The unique solution clearly is $c = Sx/n = \bar x,$ the mean of the $x_i.$ Accordingly, let's rewrite the model in terms of the "centered" variables $x_i-\bar x.$ It asks us to minimize
$$\operatorname{SSR} = \sum_i \left(y_i - (\beta_0 + \beta_1\bar x + \beta_1 (x_i-\bar x))\right)^2.$$
For simplicity, write the unknown constant term as
$$\alpha = \beta_0 + \beta_1 \bar x,$$
understanding that once solutions $\hat\alpha$ and $\hat\beta_1$ are obtained, we easily find the estimate
$$\hat\beta_0 = \hat\alpha - \hat\beta_1\bar x.$$
In terms of the unknowns $(\hat\alpha,\hat\beta_1)$ the Normal equations are now
$$\pmatrix{n & 0 \\ 0 & \sum_i(x_i-\bar x)^2}\pmatrix{\hat\alpha\\\hat\beta_1}=\pmatrix{Sy \\ \sum_i (x_i-\bar x)y_i}.$$
When written out as two simultaneous linear equations, each unknown is isolated in its own equation, which is simple to solve: this is what having orthogonal columns in $X$ achieves. In particular, the equation for $\hat\beta_1$ is
$$\sum_i(x_i-\bar x)^2\ \hat\beta_1 = \sum_i (x_i-\bar x)y_i.$$
It's a short and simple algebraic step from this to the desired result. (Use the fact that $\sum_i (x_i-\bar x)\bar y = 0.$)
The generalization to multiple variables proceeds in the same manner: at the first step, subtract suitable multiples of the first column of $X$ from each of the other columns so that all the resulting columns are orthogonal to the first column. (Recall this comes down to solving a linear equation for one unknown constant $c,$ which is easy.) Repeat by subtracting suitable multiples of the second column from the (new) third, fourth, ..., etc. columns to make them orthogonal to the first two columns simultaneously. Continue "sweeping out" the columns in this fashion until they are mutually orthogonal. The resulting normal equations will involve at most one variable at a time and therefore are simple to solve. Finally, the solutions have to be converted back to the original variables (just like you have to convert the estimates $\hat\alpha$ and $\hat\beta_1$ back into an estimate of $\hat\beta_0$ in the ordinary regression case). At each step of the way, all you are doing is creating new equations from old ones and solving for a single variable at a time.
References
For a more formal account of this approach to solving the normal equations, see Gram-Schmidt orthogonalization.
Its use in multiple regression is discussed by Lynne Lamotte in The Gram-Schmidt Construction as a Basis for Linear Models, The American Statistician 68(1), February 2014.
To see how to find just a single coefficient estimate without having to compute the others, see the analysis at https://stats.stackexchange.com/a/166718/919.
For a geometric interpretation, see my answers at https://stats.stackexchange.com/a/97881/919, https://stats.stackexchange.com/a/113207/919,