Using Pearson's correlation coefficient for probability

Question

After a sample size of 400+ I was able to get a Pearson's coefficient of .25. How am I supposed to break this down into a probability or a percentage. Rather, how can I explain my findings in laymen terms?

I should give some more information. We have two different tests. One of these tests has 1 question, How satisfied are you? It is scored between a 1-4, with 4 being the most satisfied and 1 being the lowest. The other test has 18 questions. These were an internal review. The review consisted of such questions (did the customer service rep use the customer's name, did the customer service rep give the correct technical answer, did the rep teach the customer how to fix the problem themselves if applicable).

The first test is asked to the customer, the second test with its 18 different questions is filled out by the supervisor. They either pass (-1), Not Applicable or Neither Pass/Fail (0), or Fail (1).

Our goal is to find which variables in the second test best predict scores in the first test. We want to do this to find the problem areas that we can fix internally in order to better serve our customer and give them the best experience possible (More 3's and 4's on the first test).

In what sense do you want to "break this down into a probability or percentage"? That isn't something that is usually done w/ $r$. — gung - Reinstate Monica, Jan 20 '14 at 20:00
I would like to be able to report to the Higher Ups something akin to... Based on the history and the correlation if we get a score of X on one variable we can expect a score of Y on the other variable? I do not think this is possible. I am not a statistician but I'm being pushed more and more into statistical work here at the office. I am trying to explain something in terms of percentages. They love percentages. — Carl, Jan 20 '14 at 20:17
Carl, you can't derive expected scores ($E(Y|X=x)$) *from* the correlation in that way. To make those kinds of statements requires a a few [additional pieces of information](http://stats.stackexchange.com/questions/32464/how-does-the-correlation-coefficient-differ-from-regression-slope) -- and then corresponds to interpreting a regression relationship, rather than a correlation. You can make such statements in terms of *standardized scores* from a correlation, but that requires substantially more explanation. — Glen_b, Jan 20 '14 at 20:48
Thank's Glen. I currently only have access to Excel, no SAS or SPSS. I see the Data Analytics Add-In for Excel, this has a regression option. I should be using Regression then instead of Correlation as a predictor? I am understanding that correlation only shows how well the two variables both rise together or one falls and the other rises, etc. While Regression is used as a predictive modeler? — Carl, Jan 20 '14 at 20:52
Yes, exactly: The regression equality with the estimated parameters can be used to predict. Correlation also does what you wrote. — Horst Grünbusch, Jan 20 '14 at 20:57
Is there a pretty good link via this site or elsewhere on how to properly standardize scores from a correlation. Or is reporting the regression value of r^2 more accurate? In theory I think I would want to do a multiple regression, but I do not think that there is a way to do this with Excel. — Carl, Jan 20 '14 at 21:07
@Carl, for an overview of how regression & correlation are related, it may help you to read my answer here: [What is the difference between linear regression on y with x and x with y?](http://stats.stackexchange.com/questions/22718//22721#22721) — gung - Reinstate Monica, Jan 20 '14 at 21:09
Thanks Glen, Gung, and HG. I've added in more information to my OP in order to give a better idea so that with some tutelage I can better my own understanding of the proper procedure, thus make my bosses happy. — Carl, Jan 20 '14 at 21:13
Hmmm... doing this properly will actually require some fairly advanced statistical techniques. I don't doubt you could do them, but you should probably have a minimum of 3 stats classes under your belt & more sophisticated software than Excel. You may want to work w/ a statistical consultant. — gung - Reinstate Monica, Jan 20 '14 at 21:21
I've taken stat methods 1, stats for business, math stat 1, math stat 2, applied experimental design and I am currently in regression analysis (we've yet to get to the regression part, still a review of stat methods). I hope that's enough stat courses, it sure feels like enough! Work is currently working on getting me access to SAS or SPSS. I realize after taking these courses that we rarely had to pretend like we had to explain the data or the statistics the programs came up with for us. As I am the only remotely number savvy person here I get to run the data and attempt to translate it. — Carl, Jan 20 '14 at 21:26

Horst Grünbusch · Answer 1 · 2014-01-20T21:29:41.463

1

You can take $R^2 = 0.0625$ and conclude that 6.25% of the variation in one variable are explained by the variation in the other variable. This is done in regression, as a hint to further reading.

In regression, you model the outcomes of each variable as a function of the other variable and an unknown parameter (the regression equality) plus an error term. The unknown parameter will be estimated by this regression model. It basically estimates the parameter such that the sum of the squared residuals is minimized.

How precise this model explains the variation among the dependent variable is usually indicated by $R^2$ which can be interpreted just the way I wrote. In case of linear regression with one independent variable, this $R^2$ is identical to the squared correlation coefficient, because $R^2$ is generally defined as the (variance of the dependent variable - residuals)/(variance of the observed dependent variable).

Note that linear regression might not be the appropriate analysis of your data.

Edit: After Carl's last edit, I see that usual regression and Pearson-correlation is not appropriate. Instead, something from nonparametric regression might be a better choice.

edited Jan 20 '14 at 21:29

answered Jan 20 '14 at 20:21

Horst Grünbusch

5,020
17
22

2

Nb., $.25^2 = .0625$. Also, I think you *could* explain the connection to regression methods & how they can help the OP achieve his goals; he isn't asking for a class, so we don't need to only provide hints. – gung - Reinstate Monica Jan 20 '14 at 20:39
Where is the justification for the " conclude that 50% of the variation in one variable are explained by the variation in the other variable." I am an Actuarial Science student. I understand how to calculate statistics pretty well. But I've never actually had to interpret the data, only give my results. – Carl Jan 20 '14 at 20:45
@Horst, Carl might be less confused by the "conclude that **50%** of the variation..." if you correct the inconsistency b/t it & the $R^2$ that you list. – gung - Reinstate Monica Jan 20 '14 at 21:14
@gung: You're completely right. It's already a bit late here. – Horst Grünbusch Jan 20 '14 at 21:16
The 6.25 percent of the variation in one variable being explained by another is something that I understand! At last! Thank you, that much I've already reported. What that means to me and how i tried to relay it to superiors is that variable A can be explained 6.25 percent with variable b in regards to what happens if b rises or falls, what behavior A will show. – Carl Jan 20 '14 at 21:35

Using Pearson's correlation coefficient for probability

1 Answers1