Find relation between Categorical dependent variable and continuous independent variable

Question

I have one sample group of data where I found the difference between two categorical data (this is my dependent variable) and continuous - numerical data (this is my independent variable). I want to answer my hypothesis based on a statistical method. I am trying to find this statistical method to find a relation (significance) between a categorical dependent variable (ordinal) with a continuous independent variable. Any suggestions?

Does https://stats.stackexchange.com/questions/97/what-are-good-basic-statistics-to-use-for-ordinal-data help? — mdewey, Apr 01 '21 at 12:46
Hello @mdewey, unfortunately no it doesn't! My aim is to find the statistical approach to find the significance within one sample group for categorical dependent variable with continuous independent variable... Any suggestions? — user317257, Apr 01 '21 at 19:31
Thank you for editing your question but I still find what you've written unclear. By "two categorical data" do you mean you have two different categorical variables, or that you have one categorical variable which has only two categories (i.e. it's binary)? Then right at the end you mention "ordinal" for the first time. In fact it isn't even clear to me how your first sentence relates to your third. Could you try explaining again more clearly? It's probably best to describe each aspect of your data only once, rather than explain it twice but in different ways. Some context may help too. — Silverfish, Apr 02 '21 at 13:15

Patrick Bormann · Answer 1 · 2021-03-31T18:36:01.050

0

Just to be clear you want significance not strength of the relationship/association? strength of relationship/association would be measured with:

spearman-rank-correlation
kendalls-tau
point biserial (if your categorical is dichotomous)

But if you really care about the significance of your relationship, in the past, many used a t-test for checking this. Well i would advise to use a similar concept from ML where it is known as permutation importance, but here it is called permutation tests (its similar in terms of shuffling ;-))

https://towardsdatascience.com/how-to-assess-statistical-significance-in-your-data-with-permutation-tests-8bb925b2113d

https://www.jwilber.me/permutationtest/

Excerpt:

Ideally, we'd calculate a test statistic for every possible permutation of treatment among our groups. This would create an exact distribution of all possible test statistics under our null hypothesis.

I would advise the latter source for explaining and the first one for coding.

edited Mar 31 '21 at 18:36

answered Mar 31 '21 at 18:29

Patrick Bormann

1,498
2
14

Thank you for your answer. However, I would like to clarify that it is a small sample within-group test. I want to find the significance between them so what is the most recommended statistical method? – user317257 Mar 31 '21 at 21:34
Can you update your answer and give me a hint on the setup? if it is really a small sample size with group comparison, than maybe the welch test would be sufficnet, as a leven test for checking variances is mostly distorted towards t-test. I can also give you the paper source, as it is free in psychological journey. – Patrick Bormann Apr 01 '21 at 17:43
Hi, I have one group sample data (less than 10 participants). I have a categorical data (dependent variable) and a numerical continuous data (independent variable). I am trying to test my hypothesis based on these two variables. Can you send me any resource that could help? Thank you so much for all your help! – user317257 Apr 01 '21 at 19:45
1

https://medium.com/@outside2SDs/an-overview-of-correlation-measures-between-categorical-and-continuous-variables-4c7f85610365 , with this setup you dont need a t-test, you only have one group. In summary point biserial, multinomial Regression but believe my you have not enough samples for anything greater than point biserial. – Patrick Bormann Apr 02 '21 at 10:15

kjetil b halvorsen · Answer 2 · 2021-04-04T19:49:15.140

0

If your categorical variable is binary, you can use (binomial) logistic regression, if it has more than two levels, multinomial logistic regression. Maybe you should spline the continuous independent variable. That, and other decisions, depend on context that you did not tell us. In all cases, plot your data! This answer assumes you have enough data, at least aroun 100 for logistic regression. Otherwise, simpler methods like in the answer from @Patrick Bormann

Your question is really a faq, so see some stored searches, also this one.

edited Apr 04 '21 at 19:49

answered Apr 04 '21 at 16:59

kjetil b halvorsen

63,378
26
142
467

He has 10 data points, how should he perform logistic regression, he couldnt even interpret p-values nor odds? – Patrick Bormann Apr 04 '21 at 18:59
@Patrick Bormann: Where does it say he has ten data points? If so, you ar clearly right! – kjetil b halvorsen Apr 04 '21 at 19:46
1

In his comments below my post: quote: Hi, I have one group sample data (less than 10 participants), thats the reason i still no longer recommend permutation tests and any other things ^^ only the point biserial – Patrick Bormann Apr 04 '21 at 19:51

Find relation between Categorical dependent variable and continuous independent variable

2 Answers2