1

I have a formula (not mine):

NY/A = (Passing Yards - Sack Yards) / (Passes Attempted + Times Sacked)

and this formula correlates with wins at a correlation coefficient of 0.50.

Edit: I have the data for wins and passing yards, sack yards, passes attempted, and times sacked. NY/A stands for net yards per attempt, and I would like to correlate this with wins at a coefficient of at least 0.52. I can put some weight on times sacked or sack yards.

My goal is to increase the correlation to at least 0.52. How can I do this? Is there a regression I can run in R?

MJ95
  • 141
  • 6
  • Please explain us in editing your question what are these variables, do you have a column with actual wins/loose, how many rows ? I guess you want a better formula, with maybe some weight somewhere, but where do you want to put these weights ? Also, note that when you fit a model looking if correlation improves is not a validation because correlation **will** improve because of the degree of freedom you introduce. – brumar Jun 19 '15 at 13:11
  • Can you expand on what you mean by this, "this formula correlates with wins at a correlation coefficient of 0.50". Wins are binary - how are you measuring correlation? – TrynnaDoStat Jun 19 '15 at 13:31
  • @TrynnaDoStat A higher NY/A score correlates with a higher number of wins per games played (0.5 correlation). – MJ95 Jun 19 '15 at 13:39
  • @MJ95 Correlation coefficient usually refers to Pearson's Correlation Coefficient which is used to measure the linear relationship between two continuous variables. Whether or not a team won is not a continuous variable. – TrynnaDoStat Jun 19 '15 at 13:50
  • @TrynnaDoStat Wins is a continuous variable, as the number of wins for a quarterback can range anywhere from 0 to infinity, in theory. – MJ95 Jun 19 '15 at 13:53
  • Please note that is very hard to reliably distinguish a correlation coefficient of 0.50 from one of 0.52. Also, you seem much too focused on fitting closely the particular set of data that you already have, without paying enough attention to how valid predictions made from those data will be. See for example [this page](http://stats.stackexchange.com/questions/128618/what-is-cross-validation) about the danger of "overfitting" your data, and one way to minimize that danger. – EdM Jun 19 '15 at 16:26

0 Answers0