Consider two datasets, a study dataset with $n$ points and a control dataset with $n_c$ points, with $n$<$n_c$. Each point in each of the datasets is composed of the measurement of 4 independent variables and one dependent variable: $X_1$,$X_2$,$X_3$, $X_4$, and $Y$, respectively. I note that these variables are correlated.
I would like to evaluate the hypothesis that the study dataset has a different Y (in average or distribution) than that of the control dataset, after controlling for all independent variables $X_1$, $X_2$, $X_3$, $X_4$ simultaneously.
Following a previous discussion, I applied multiple regression analysis to the two datasets. The coefficients of the linear regression are different, unsurprisingly. Since the control dataset is larger than the study one, I wanted to make sure that the difference was not the result of small(er) number statistics. So from the $n_c$ control observations I randomly selected a subset of $n$ and repeated the regression analysis, 10k times. The difference for one of the coefficients, the one with the largest value, is quite significant, at 2.7$\sigma$ when assuming a Gaussian distribution.
Is this test conclusive in the sense that it proves that the datasets are different in what concerns Y? How would you suggest to do such a test? I played around with PCA but could not formulate the question in a concise fashion, but I am quite unhappy with the current dependence on the model assumption (linear).