I am analyzing Stack Overflow Posts. So I have a database with 1000000 questions, their current score (upvote or downvote) and a flag, that there is a source code part in the question (or not).
So I want to test. Is there a correlation between the presence of source code and the votes. So are posts with code have a higher score than posts without.
So I created a cross table like this:
Now I will do an Chi-Square test with R with these values. The Result is like this:
As you can see, the result is significant (p-value). So I am right, that there is a correlation between the score of the vote and the presence of code?
I am not sure, that I am doing things right here. And another question: How do I know the direction of the correlation. Does the result mean, that the presence of code will result in a higher score, or in a lower?