1

I have two nominal variables and some numeric variables.

  1. The first nominal variable is a binary one. I want to measure the correlation between this binary variable and the other numeric variables.
  2. The second nominal variable has 37 categories. Again, I should measure the correlation between this nominal variable and the other numeric variables.

Based on this, I am not allowed to use one-way Anova because my data is not normally distributed. According to the answer to this post, Eta is associated with one-way Anova, so due to the non-normality of my data, it is not possible to use Eta. Therefore, I decided to use Kruskal-Wallis for my second nominal variable with 37 categories based on this post.Should I use Mann–Whitney U-test for my first binary nominal variable? Is it true to do so?

It should be noted that my data set includes 2200 observations. Besides, I want to do it as a Exploratory Data Analysis step.

ebrahimi
  • 227
  • 3
  • 12
  • You use the words "non-normal" but I think you mean "nominal". i.e., non-normal is a term usually reserved for variables that are numeric but that do not follow a normal distribution. – Jeromy Anglim Aug 24 '18 at 07:25
  • @JeromyAnglim Thanks. My data is not normally distributed, so I used the term non-normal. Besides, I need to measure the association between nominal and continuous variables, which are non-normal. – ebrahimi Aug 24 '18 at 07:29
  • @JeromyAnglim Thanks a lot. Sorry, I need to measure the association between a nominal and some continuous variables. I already used Cramer's V to measure the association between two nominal variables. – ebrahimi Aug 24 '18 at 07:33
  • @JeromyAnglim As a matter of fact, I already used `one-way Anova` but as I understood that my data is not normally distributed, I doubt it would be right to use `one-way Anova`. However, I am not sure about using `kruskal-Wallis` and `Mann-Whitney U-test`. – ebrahimi Aug 24 '18 at 07:38
  • @JeromyAnglim Sorry, Eta is not affected by the fact that my data is not normally distributed? According to the provided link: "The most classic "correlation" measure between a nominal and an interval ("numeric") variable is Eta, also called correlation ratio, and equal to the root R-square of the one-way ANOVA (with p-value = that of the ANOVA)." Therefore, if one-way Anova is not suitable for non-normal data, maybe this would be true about Eta. – ebrahimi Aug 24 '18 at 07:46
  • 2
    Your title contradicts to your two points. You actually are asking about binary - nominal and nominal - nominal associations. I don't see any "numeric"/continuous variable in the points asked. – ttnphns Aug 24 '18 at 08:47
  • Eta does not request normally distributed groups to measure the strength of association; it however needs more or less symmetric distribution; It needs normality for p-value, though. – ttnphns Aug 24 '18 at 10:14
  • @ttnphns Thanks a lot. Sorry, I modified my post. Could you please let me know what test should be used if we want to rely on p-value? – ebrahimi Aug 24 '18 at 10:23
  • I agree with @ttnphns, I think you still need to edit your post title to make it match your body. – Silverfish Aug 31 '18 at 14:46
  • @Silverfish Thanks a lot. I revised my title again. I would greatly appreciate if you could answer it. – ebrahimi Aug 31 '18 at 15:29
  • I don't think I'd have anything to add to @JeromyAnglim's answer unfortunately – Silverfish Aug 31 '18 at 16:26

1 Answers1

4

Nominal with nominal: There are a few measures of association designed for two or more nominal variables (i.e., 3 or more unordered categories for one variable; and two or more unordered categories for the other variable).

Here are two that come to mind.

Nominal with numeric: I agree that eta-squared or ICC are two common approaches to quantifying the association between a nominal and numeric variable.

Nick Cox
  • 48,377
  • 8
  • 110
  • 156
Jeromy Anglim
  • 42,044
  • 23
  • 146
  • 250
  • 1
    (+1) Aside on notation: As I recall, Cramér originally used the symbol $\nu$ (lower case Greek letter nu). The widespread but not universal present convention to use roman letters for sample statistics would lead to transliterating this as $n$, not a good idea. I suppose $V$ may be customary for the good reason of using a roman letter and/or as a misreading of $\nu$. There lies a very small historical question which may be of interest to some. – Nick Cox Aug 24 '18 at 07:52
  • 1
    Just an addition: Cramer's V is exactly related to the averaged canonical correlations squared (https://stats.stackexchange.com/a/140057/3277) which puts Cramer V into the context of linear model (performed on the dummy sets). – ttnphns Aug 24 '18 at 10:04
  • @JeromyAnglim Thanks a lot. I upvote your answer but still I am not sure about the p-value of Eta. – ebrahimi Aug 24 '18 at 15:16