-1

I'm running both correlational and regression analyses on a variable that is not normally distributed:

enter image description here

For correlations, I decided to use Spearman's rank correlation (which is non-parametric) due to the non-normal distribution. For regression (which involves several continuous variables predicting the presently displayed variable), I decided to use the simple, linear OLS method because the only assumption that I can tell is violated is that of normally-distributed errors, and from what I've read it's not essential on its own.

Question: is there an inherent problem with using non-parametric correlations and parametric regression on the same variable (as I am doing in this case)?

Stated differently, do I have to stick to using only parametric or only non-parametric methods for this variable (choosing one or the other)?

AlexR
  • 149
  • 6

1 Answers1

1

Not an inherent problem with it, but they may well tell you seemingly inconsistent things (such as the Spearman correlation being significant and suggesting a relationship in one direction while the line - and the corresponding Pearson correlation - can be significant in the other direction). That's not a problem if you don't treat it as one. (For example, if you're doing the two things for different purposes, it may not matter that they suggest different directions of relationship, albeit of a different kind.)

Here's a data set for which that's the case:

x:
 -0.154 -1.614  0.200 -1.099 -1.337 -0.668 -1.289 -2.257 -0.601  0.411
 -1.444  1.454 -0.443 -0.114 -0.122 -0.527 -0.305  0.122  0.199 -0.940
 -1.776  0.422  2.245  1.550 -0.557 -0.261  0.275 -0.310 -0.367  0.459
 30.000

y: 
 -0.947  1.367  0.703  0.409  0.474 -0.147  0.302  2.069 -0.210  0.535
 -0.124 -1.031 -0.192 -0.058  0.363 -0.218  1.079 -1.083 -1.676  1.250
  0.759 -1.058 -2.183 -0.741 -0.226 -0.912  0.401  0.997 -0.171 -1.901
 30.000

Stated differently, do I have to stick to using only parametric or only non-parametric methods for this variable (choosing one or the other)?

It's possible to do things that are both parametric and nonparametric at the same time (in different ways)

For example, you can fit straight lines using nonparametric correlations. One way to do this is to choose the slope that makes the nonparametric correlation between residuals and the predictor (IV, x-variable) equal to 0.

This is parametric in the relationship between x and y (it fits a straight line, which has two parameters) but the distributional model for the errors is nonparametric.

[There's also regressions that are parametric in the distributional model but nonparametric in the relationship between x and y; kernel or spline regression models for example.]

If you use Kendall correlation instead, you essentially have a Theil-Sen regression line (about which there are many posts on site).

For a more detailed explanation of such an approach (including an example), see here:

If linear regression is related to Pearson's correlation, are there any regression techniques related to Kendall's and Spearman's correlations?

There are any number of robust or nonparametric or non-Gaussian-but-parametric ways to fit straight lines.

It should be possible to consider analyses that are more alike; if I don't have a good reason to assume normality in one instance, I don't have it in another*; and there's no reason you have to assume it in either.

It would be possible to consider a different parametric model, and to use both linear (Pearson) correlation and linear regression, or perhaps not to assume parametric model and still assume linearity. It should also be possible not to assume linearity but something more general (like monotonicity) for both.

* though take care; what is assumed to be normal in each instance is not the same thing.

Glen_b
  • 257,508
  • 32
  • 553
  • 939
  • Your informed opinion is much appreciated. I know I need to do more reading on this, but for practical purposes (i.e., peer review and publication), if I use Spearman's rho, do I have to justify the decision to use OLS regression? By that I mean, if I use non-parametric correlations, is there an expectation that I will use a non-parametric regression method (thus requiring an explanation if I don't)? I'm trying to take this out of the hypothetical 'is-this-a-problem' realm in order to make a decision. – AlexR Jul 14 '18 at 09:05
  • Sorry, I don't see how this would be answerable by a statistician per se -- you're publishing in a journal some application area I don't work in (and which you haven't specified). Different application areas have different "traditions", conventions and expectations. I cannot guess how these unspecified editors or referees will think. You'd need to ask them what they expect, rather than asking us what makes sense from a statistical point of view. If they happen to be mistaken in their views, I won't be able to correct any misconceptions they carry without being able to talk to them. – Glen_b Jul 14 '18 at 10:03
  • It is psychology, where Pearson and OLS are the default methods used for correlations and regression, as I'm sure they are in many other fields. I'm not asking however which methods are conventional to use -- rather, I'm asking specifically, whether the use of one method creates the expectation that another type of method will be used, and I imagine that this should not be discipline-specific. – AlexR Jul 14 '18 at 16:58
  • on "creates the expectation" -- it depends on who is doing the expecting, and what justification is offered. As long as there's some reason to be looking at both kinds of thing I'd have no intrinsic objection (I'd look at the argument for doing it), but that doesn't mean other people will regard it as reasonable. I think there are differences across disciplines on this kind of thing. Some disciplines appear considerably more inclined to invoke rules (or sometimes "rules") about what's okay to do in an analysis. ... ctd – Glen_b Jul 14 '18 at 21:19
  • ctd... We don't know what the justifications are, nor the rules that might be argued to apply in one instance or another. Whichever way you do it, I would anticipate that you should expect to have to justify your choices very carefully, but I would be careful about concluding that non-normality rules out Pearson correlation (rather than simply changes the way you compute its significance) or that Spearman could not be used if you assume linearity. (Those issues are very much impacted by the kind of "rules" I am suggesting may vary across disciplines.) – Glen_b Jul 14 '18 at 21:31
  • One note I should add, since I frequently see psychology students do this wrongly -- marginal normality of each variable is not what is assumed in the usual Pearson correlation test (rather, the usual assumption is bivariate normality), but in fact it works perfectly well in regression, which for distributional purposes assumes merely conditional normality on one variable, and is fairly (level-)robust in a variety of situations beyond that. – Glen_b Dec 14 '20 at 09:48