0

We study the acceptation of students to the university according to different criteria. The parameters x are quantitative values. The parametre y is binary value, it is the result, if the student is admitted y equal to 1, if not y equal to 0. I'm trying to figure out which parameter x1, x2, x3 or x4 has the biggest influence on y in order to have 1. It can be useful to know the weight of each parametre on y.

For that, I'm using rcorr like this example. I'm not sure that I can use rcorr because of my y parametre. If not, which function should I use?

library(Hmisc)
x1 <- runif(50, min=0, max=100)
x2 <- runif(50, min=0, max=100)
x3 <- runif(50, min=0, max=100)
x4 <- runif(50, min=0, max=100)
y <- sample(0:1, 50, replace = TRUE)

d <- data.frame(x1,x2,x3,x4,y)
m <- as.matrix(d)
rcorr(m, type=c("pearson","spearman"))
Tali
  • 163
  • 6
  • 1
    Questions solely about how software works are [off-topic](http://stats.stackexchange.com/help/on-topic) here, but you may have a real statistical question buried here. You may want to edit your question to clarify the underlying statistical issue. You may find that when you understand the statistical concepts involved, the software-specific elements are self-evident or at least easy to get from the documentation. – gung - Reinstate Monica Jun 16 '16 at 12:53
  • I guess you're asking if it's OK to use Spearman's correlation to relate 2 variables when 1 is binary. You may want to read [Pearson's or Spearman's correlation with non-normal data](http://stats.stackexchange.com/q/3730/7290), & [Correlations with categorical variables](http://stats.stackexchange.com/q/108007/7290). – gung - Reinstate Monica Jun 17 '16 at 11:19

2 Answers2

1

In this case you may be better off using logistic regression rather than correlations for evaluating the relations between your continuous predictor variables and outcome. That will allow you to examine how all of the predictor variables together are related to admission success, and makes it possible to examine how interactions among the predictors might also be related to outcome.

Your search for "which parameter ... has the biggest influence on y in order to have 1," while understandable, may be dangerous. In general, trying to find a single predictor variable throws away the useful information from the other variables and can lead to severe problems with reliability. In particular, if some of your predictors are correlated then the particular one most highly related to outcome in your present data sample may not be so closely related when you try to apply your model to new cases. This Cross Validated page discusses the problems with model selection in logistic regression, and contains links to similar discussion in other contexts.

EdM
  • 57,766
  • 7
  • 66
  • 187
0

You will need to state to R, which package you take the function from. I guess, you'll want to use the Hmisc package, so

library(Hmisc)

Will have to be called before you code works. It then produces no warnings nor errors. You will have ties, but the manual describes how the functions deals with them ('midranks').

library(Hmisc)
help(rcorr)

If you are worried about ties, this may be of interest: https://stackoverflow.com/questions/10711395/spearman-correlation-and-ties

Cheers, Bernhard

Bernhard
  • 7,419
  • 14
  • 36