Can one just fit a linear model between $Y$ and $X$, and then take the p-values of the $t$-test done on the slope (or beta) of the linear model? Since presumably correlation would be in the same direction as the slope/beta of the linear regression model?
Asked
Active
Viewed 209 times
3
-
Yes, $t$ and $p$ values are the same. Concerning the correlation coefficient (Pearson's). Denote $b_{Y.X}$ the slope of the regression with $Y$ being the dependent and $X$ the independent variable. Further, denote $s_{X},s_{Y}$ the standard deviations of the variable. Then the correlation coefficient is: $r = b_{Y.X}\cdot\frac{s_{X}}{s_{Y}}$. See [here](http://stats.stackexchange.com/a/104577/21054). – COOLSerdash Jul 04 '14 at 06:51
2 Answers
4
Yes, that would work. You can also find the t statistic for a given correlation coefficient r based on a sample of size N as follows: $$t_{(df=N-2)}=\frac r {\sqrt{\frac{1-r^2}{N-2}}}$$This would need adjusting if your null hypothesis is not $\rho=0$. I trust you can find the p value for t.

Nick Stauner
- 11,558
- 5
- 47
- 105
1
Here is a Python function to perform a t-test, as well as a z-test for the significance of the correlation coefficient between two series.
import warnings
import numpy as np
import scipy as sp
import scipy.stats as sps
def test_correlation(series1, series2, true_correlation, test_type = 'z'):
"""Test population correlation is equal to a given value
if the correlation passed is zero, then a t-test is called
else, a z-test is called. It is possible to also compute the
z-test when the hypothesised population correlation is zero.
Keyword arguments:
1. series1, series2: the pandas series whose correlation has to be computed
"""
N = series1.count()
sample_correlation = series1.corr(series2)
dof = None
# change the type of the test if 't' and the correlation is not zero
if test_type == 't':
try:
assert(true_correlation == 0)
except AssertionError:
warnings.warn('A t-test cannot be computed with a non-zero '
' hypothesised correlation coefficient, switching to a z-test.')
test_type = 'z'
if test_type == 'z':
# compute the z-test
sample_mean = 0.5*np.log((1 + sample_correlation)/(1 - sample_correlation))
population_mean = 0.5*np.log((1 + true_correlation)/(1 - true_correlation))
population_std = 1/np.sqrt(N - 3)
statistic = (sample_mean - population_mean)/population_std
pvalue = 2*(1-sps.norm.cdf(abs(statistic)))
elif test_type == 't':
# compute the t-test
statistic = np.sqrt(N-2)*sample_correlation/np.sqrt(1 - sample_correlation**2)
dof = N - 2
pvalue = 2*(1-sps.t.cdf(abs(statistic), dof))
return({'statistic': statistic, 'p-value': pvalue, 'DoF': dof, 'test_type': test_type})
Here are examples of its use:
print("==============================================================")
print("Tests of population correlation")
print("==============================================================")
# under the null
series1 = pd.Series(np.random.randn(10))
series2 = pd.Series(np.random.randn(10))
print(test_correlation(series1, series2, 0, 't'))
# under the alternative
mX = np.random.multivariate_normal((0, 0), [[1, .7], [.7, 1]], size = 10)
series1 = pd.Series(mX[:, 0])
series2 = pd.Series(mX[:, 1])
print(test_correlation(series1, series2, 0.1, 't'))

tchakravarty
- 8,442
- 2
- 36
- 50