how to measure the correlation between non-normally distributed numeric variable and nominal variable?

Question

I have two nominal variables and some numeric variables.

The first nominal variable is a binary one. I want to measure the correlation between this binary variable and the other numeric variables.
The second nominal variable has 37 categories. Again, I should measure the correlation between this nominal variable and the other numeric variables.

Based on this, I am not allowed to use one-way Anova because my data is not normally distributed. According to the answer to this post, Eta is associated with one-way Anova, so due to the non-normality of my data, it is not possible to use Eta. Therefore, I decided to use Kruskal-Wallis for my second nominal variable with 37 categories based on this post.Should I use Mann–Whitney U-test for my first binary nominal variable? Is it true to do so?

It should be noted that my data set includes 2200 observations. Besides, I want to do it as a Exploratory Data Analysis step.

You use the words "non-normal" but I think you mean "nominal". i.e., non-normal is a term usually reserved for variables that are numeric but that do not follow a normal distribution. — Jeromy Anglim, Aug 24 '18 at 07:25
@JeromyAnglim Thanks. My data is not normally distributed, so I used the term non-normal. Besides, I need to measure the association between nominal and continuous variables, which are non-normal. — ebrahimi, Aug 24 '18 at 07:29
@JeromyAnglim Thanks a lot. Sorry, I need to measure the association between a nominal and some continuous variables. I already used Cramer's V to measure the association between two nominal variables. — ebrahimi, Aug 24 '18 at 07:33
@JeromyAnglim As a matter of fact, I already used `one-way Anova` but as I understood that my data is not normally distributed, I doubt it would be right to use `one-way Anova`. However, I am not sure about using `kruskal-Wallis` and `Mann-Whitney U-test`. — ebrahimi, Aug 24 '18 at 07:38
@JeromyAnglim Sorry, Eta is not affected by the fact that my data is not normally distributed? According to the provided link: "The most classic "correlation" measure between a nominal and an interval ("numeric") variable is Eta, also called correlation ratio, and equal to the root R-square of the one-way ANOVA (with p-value = that of the ANOVA)." Therefore, if one-way Anova is not suitable for non-normal data, maybe this would be true about Eta. — ebrahimi, Aug 24 '18 at 07:46
Your title contradicts to your two points. You actually are asking about binary - nominal and nominal - nominal associations. I don't see any "numeric"/continuous variable in the points asked. — ttnphns, Aug 24 '18 at 08:47
Eta does not request normally distributed groups to measure the strength of association; it however needs more or less symmetric distribution; It needs normality for p-value, though. — ttnphns, Aug 24 '18 at 10:14
@ttnphns Thanks a lot. Sorry, I modified my post. Could you please let me know what test should be used if we want to rely on p-value? — ebrahimi, Aug 24 '18 at 10:23
I agree with @ttnphns, I think you still need to edit your post title to make it match your body. — Silverfish, Aug 31 '18 at 14:46
@Silverfish Thanks a lot. I revised my title again. I would greatly appreciate if you could answer it. — ebrahimi, Aug 31 '18 at 15:29
I don't think I'd have anything to add to @JeromyAnglim's answer unfortunately — Silverfish, Aug 31 '18 at 16:26

score 4 · Answer 1 · edited Aug 24 '18 at 07:45

4

Nominal with nominal: There are a few measures of association designed for two or more nominal variables (i.e., 3 or more unordered categories for one variable; and two or more unordered categories for the other variable).

Here are two that come to mind.

Goodman & Kruskal's lambda: https://en.wikipedia.org/wiki/Goodman_and_Kruskal%27s_lambda
Cramér's V: https://en.wikipedia.org/wiki/Cram%C3%A9r%27s_V

Nominal with numeric: I agree that eta-squared or ICC are two common approaches to quantifying the association between a nominal and numeric variable.

edited Aug 24 '18 at 07:45

Nick Cox

48,377
8
110
156

answered Aug 24 '18 at 07:30

Jeromy Anglim

42,044
23
146
250

1

(+1) Aside on notation: As I recall, Cramér originally used the symbol $\nu$ (lower case Greek letter nu). The widespread but not universal present convention to use roman letters for sample statistics would lead to transliterating this as $n$, not a good idea. I suppose $V$ may be customary for the good reason of using a roman letter and/or as a misreading of $\nu$. There lies a very small historical question which may be of interest to some. – Nick Cox Aug 24 '18 at 07:52
1

Just an addition: Cramer's V is exactly related to the averaged canonical correlations squared (https://stats.stackexchange.com/a/140057/3277) which puts Cramer V into the context of linear model (performed on the dummy sets). – ttnphns Aug 24 '18 at 10:04
@JeromyAnglim Thanks a lot. I upvote your answer but still I am not sure about the p-value of Eta. – ebrahimi Aug 24 '18 at 15:16

how to measure the correlation between non-normally distributed numeric variable and nominal variable?

1 Answers1

Linked