Problem
I would like to measure the effects of different treatments on the distribution of subjects between the two groups (If you need more details, subjects are cells and I want to measure the effect of different gene knockouts on cell differentiation). I fit Generalized Linear Model of the following form:
\begin{align} Y &\sim {\rm Binom}(p|N) \\ {\rm logit}(p) &\sim N(\beta X^T, \Sigma) \end{align}
Here $Y$ represents observed number of subjects in each of the two groups for different treatments and design matrix $X$ represents dummy-encoded categorical treatment variable.
Note that columns of $X$ are orthogonal as each observed subject is only treated by one treatment.
When generating $X$ I drop the level that corresponds to non-treated subjects such that the intercept represent the background distribution between the two groups in the absence of the treatment. Therefore coefficients $\beta$ represent the effect of each treatment compared to non-treated control.
Question
When interpreting the output from summary.glm()
(below), do I need to correct the p-values (i.e., Pr(>|z|)
) if I want to select the treatments that have significant effects? Will the answer change if I observe the same population before and after treatment?
Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) -3.2600440 0.2522047 -12.926 < 2e-16 ***
TreatmentA 0.8582053 0.3762970 2.281 0.02257 *
TreatmentB 0.1369642 0.4346106 0.315 0.75265
TreatmentC -0.0083802 0.4549547 -0.018 0.98530
TreatmentD -0.4489033 0.4786030 -0.938 0.34827
TreatmentE -0.0910970 0.4330876 -0.210 0.83340
...
I found a related question on CrossValidated: How to test the statistical significance for categorical variable in linear regression? However, that question deals with interpreting the overall significance of the categorical variable rather than its levels.
EDIT:
The model is a logistic regression and is formulated as follows:
\begin{align} Y &\sim {\rm Binom}(p|N) \\ {\rm logit}(p) &= \beta X^T + \epsilon \\ \epsilon &\sim Logistic(0, S) \end{align}