I'm working with a large data set (confidential, so I can't share too much),
It might be possible to create a small data set that has some of the general characteristics of the real data without either the variable names nor any of the actual values.
and came to the conclusion a negative binomial regression would be necessary. I've never done a glm regression before, and I can't find any clear information about what the assumptions are. Are they the same for MLR?
Clearly not! You already know you're assuming response is conditionally negative binomial, not conditionally normal. (Some assumptions are shared. Independence for example.)
Let me talk about GLMs more generally first.
GLMs include multiple regression but generalize in several ways:
1) the conditional distribution of the response (dependent variable) is from the exponential family, which includes the Poisson, binomial, gamma, normal and numerous other distributions.
2) the mean response is related to the predictors (independent variables) through a link function. Each family of distributions has an associated canonical link function - for example in the case of the Poisson, the canonical link is the log. The canonical links are almost always the default, but in most software you generally have several choices within each distribution choice. For the binomial the canonical link is the logit (the linear predictor is modelling $\log(\frac{p}{1-p})$, the log-odds of a success, or a "1") and for the Gamma the canonical link is the inverse - but in both cases other link functions are often used.
So if your response was $Y$ and your predictors were $X_1$ and $X_2$, with a Poisson regression with the log link you might have for your description of how the mean of $Y$ is related to the $X$'s:
$\text{E}(Y_i) = \mu_i$
$\log\mu_i= \eta_i$ ($\eta$ is called the 'linear predictor', and here the link function is $\log$, the symbol $g$ is often used to represent the link function)
$\eta_i = \beta_0 + \beta_1 x_{1i} + \beta_2 x_{2i}$
3) the variance of the response is not constant, but operates through a variance-function (a function of the mean, possibly times a scaling parameter). For example, the variance of a Poisson is equal to the mean, while for a gamma it's proportional to the square of the mean. (The quasi-distributions allow some degree of decoupling of Variance function from assumed distribution)
--
So what assumptions are in common with what you remember from MLR?
Independence is still there.
Homoskedasticity is no longer assumed; the variance is explicitly a function of the mean and so in general varies with the predictors (so while the model is generally heteroskedastic, the heteroskedasticity takes a specific form).
Linearity: The model is still linear in the parameters (i.e. the linear predictor is $X\beta$), but the expected response is not linearly related to them (unless you use the identity link function!).
The distribution of the response is substantially more general
The interpretation of the output is in many ways quite similar; you can still look at estimated coefficients divided by their standard errors for example, and interpret them similarly (they're asymptotically normal - a Wald z-test - but people still seem to call them t-ratios, even when there's no theory that makes them $t$-distributed in general).
Comparisons between nested models (via 'anova-table' like setups) are a bit different, but similar (involving asymptotic chi-square tests). If you're comfortable with AIC and BIC these can be calculated.
Similar kinds of diagnostic displays are generally used, but can be harder to interpret.
Much of your multiple linear regression intuition will carry over if you keep the differences in mind.
Here's an example of something you can do with a glm that you can't really do with linear regression (indeed, most people would use nonlinear regression for this, but GLM is easier and nicer for it) in the normal case - $Y$ is normal, modelled as a function of $x$:
$\text{E}(Y) = \exp(\eta) = \exp(X\beta) = \exp(\beta_0+\beta_1 x)$ (that is, a log-link)
$\text{Var}(Y) = \sigma^2$
That is, a least-squares fit of an exponential relationship between $Y$ and $x$.
Can I transform the variables the same way (I've already discovered transforming the dependent variable is a bad call since it needs to be a natural number)?
You (usually) don't want to transform the response (DV). You sometimes may want to transform predictors (IVs) in order to achieve linearity of the linear predictor.
I already determined that the negative binomial distribution would help with the over-dispersion in my data (variance is around 2000, the mean is 48).
Yes, it can deal with overdispersion. But take care not to confuse the conditional dispersion with the unconditional dispersion.
Another common approach - if a bit more kludgy and so somewhat less satisfying to my mind - is quasi-Poisson regression (overdispersed Poisson regression).
With the negative binomial, it's in the exponential family if you specify a particular one of its parameters (the way it's usually reparameterized for GLMS at least). Some packages will fit it if you specify the parameter, others will wrap ML estimation of that parameter (say via profile likelihood) around a GLM routine, automating the process. Some will restrict you to a smaller set of distributions; you don't say what software you might use so it's difficult to say much more there.
I think usually the log-link tends to be used with negative binomial regression.
There are a number of introductory-level documents (readily found via google) that lead through some basic Poisson GLM and then negative binomial GLM analysis of data, but you may prefer to look at a book on GLMs and maybe do a little Poisson regression first just to get used to that.