For the first time in a paper I am going to attempt to write out the equation for my model using mathematical notation, but I am a little unclear about how best to do it for the particular model I am using
In my experiment I have two groups (let's call them A
and B
) of participants and I want to estimate the average level of the outcome variable y
in each group. My model is a simple linear regression using level-means coding, with a separate intercept term for each group (see here).
In the draft of the paper I have written the model out like so
$y_i = \alpha_Ax_{Ai} + \alpha_Bx_{Bi} + \varepsilon$
and have explained it with the following text "where $y_i$ is participant $i$'s expected score, $\alpha_A$ is the average score for group A, $x_{Ai}$ is a binary variable indicating whether participant $i$ belongs to group A, $\alpha_B$ is the average score for group B, $x_{Bi}$ is a binary variable indicating whether participant $i$ belongs to group B, and $\varepsilon$ is measurement error."
I have several question about this equation. First, in my model I estimate separate variances for group A
and group B
.
Question 1: Should I acknowledge the fact that there are separate estimates of the variance in the regression equation? For example would this equation be more appropriate?
$y_i = \alpha_Ax_{Ai} + \alpha_Bx_{Bi} + \varepsilon_{ij}$
where $y_i$ is participant $i$'s expected score, $\alpha_A$ is the average score for group A, $x_{Ai}$ is a binary variable indicating whether participant $i$ belongs to group A, $\alpha_B$ is the average score for group B, $x_{Bi}$ is a binary variable indicating whether participant $i$ belongs to group B, and $\varepsilon$ is the average amount participant i's score deviates from the average in group j, the group they were allocated to.
Question 2: If this second version is the correct version:
is it correct to have the i and j subscripts after $\varepsilon$ to account for the fact that there are separate variances? I am concerned that in this version j only appears after $\varepsilon$ and not after the $x$'s.
is it more correct to describe $\varepsilon$ as measurement error or the amount participant i's score deviates from their group mean?