I'm not entirely sure what I am talking about is Simpson's paradox, because an opposite relationship does not appear when you combine two data sets, but merely a different one. Still, I think it is some variation of the paradox.
I present below some data I am working with, made generic as X and Y.
Here are the relationships in question uncombined:
When the two data sets are combined, a strong negative relationship appears between. Clearly, uncombined, there is not a strong negative relationship in both data sets. For my study, it's important that there is not the same strong negative relationship combined as there is uncombined.
What I want to know is there any kosher way to say—or to show statistically—in a publishable paper that a strong negative relationship may only appear when these datasets are combined due to the following: The datapoints for Y in the green set are generally higher than the Y datapoints in the yellow set, and the datapoints for X are generally smaller for green than they are for yellow?
I have asked my statistician working with me on this, and he does not know of any such way, but he advised me to seek out a way to do it, if feasible.
Here are the data for these two sets, color coded:
Y X Color
29.2 3.822954823 orange
45.4 4.446472019 orange
37.8 4.364963504 orange
18.6 4.154740061 orange
36.2 3.449355433 orange
22.2 4.426129426 orange
49.8 3.765931373 orange
28.6 4.552311436 orange
54.4 4.270718232 orange
49.4 4.668501529 orange
18 4.32480195 orange
41.6 3.733698964 orange
59.6 3.371865443 green
52.3 3.404674047 green
76.8 3.20353443 green
41.8 3.198529412 green
64.2 3.352293578 green
34.4 3.559021407 green
69.3 4.107033639 green
62.4 3.363302752 green
45 3.109489051 green
59.2 3.8 green
38.1 3.178023327 green
15.8 4.550671551 green
31.4 3.823887873 green
43.1 3.5 green
72.1 3.613040829 green
61.3 3.386029412 green
59.9 3.549664839 green