Your model is $\log\mu=\beta_0+\log t$, since for an offset (which you log-transform since you are working with a log link) you constrain the corresponding parameter to be $1$. On the original scale (where we want to match moments), this means for the $i$-th observation
$$ \mu_i = t_i\cdot\exp\beta_0.$$
Since you want to match moments, you can estimate $\beta_0$ so that $\hat\mu$ matches the mean of the observations, or equivalently by taking averages,
$$ \bar y = \bar t\cdot\exp\beta_0,$$
so you set
$$ \hat\beta_0 = \log\big(\frac{\bar y}{\bar t}\big)=\log\bar y-\log\bar t. $$
Now, for a negative binomial model, you have overdispersion, or
$$ E(y_i-\mu_i)^2 =\mu_i+\frac{\mu_i^2}{\phi} $$
for some overdispersion parameter $\phi>0$, which is just a reformulation of your second formula, or
$$ \phi = \frac{\mu_i^2}{E(y_i-\mu_i)^2-\mu_i}. $$
A possible moments estimator would then be
$$ \hat\phi = \frac{\sum_{i=1}^n\hat\mu_i^2}{\sum_{i=1}^n(y_i-\hat\mu_i)^2-\hat\mu_i}. $$
There is likely some bias involved here, so I would recommend you think about "real" maximum likelihood estimation. Hilbe's textbook Negative Binomial Regression is very helpful.