How do you calculate the statistical significance of a correlation between $Y$ and $X$?

Question

Can one just fit a linear model between $Y$ and $X$, and then take the p-values of the $t$-test done on the slope (or beta) of the linear model? Since presumably correlation would be in the same direction as the slope/beta of the linear regression model?

Yes, $t$ and $p$ values are the same. Concerning the correlation coefficient (Pearson's). Denote $b_{Y.X}$ the slope of the regression with $Y$ being the dependent and $X$ the independent variable. Further, denote $s_{X},s_{Y}$ the standard deviations of the variable. Then the correlation coefficient is: $r = b_{Y.X}\cdot\frac{s_{X}}{s_{Y}}$. See [here](http://stats.stackexchange.com/a/104577/21054). — COOLSerdash, Jul 04 '14 at 06:51

Nick Stauner · Answer 1 · 2014-07-04T07:20:11.947

4

Yes, that would work. You can also find the t statistic for a given correlation coefficient r based on a sample of size N as follows: $$t_{(df=N-2)}=\frac r {\sqrt{\frac{1-r^2}{N-2}}}$$This would need adjusting if your null hypothesis is not $\rho=0$. I trust you can find the p value for t.

edited Jul 04 '14 at 07:20

answered Jul 04 '14 at 07:14

Nick Stauner

11,558
5
47
105

score 1 · Accepted Answer · answered Jul 04 '14 at 07:20

Here is a Python function to perform a t-test, as well as a z-test for the significance of the correlation coefficient between two series.

import warnings 
import numpy as np
import scipy as sp
import scipy.stats as sps

def test_correlation(series1, series2, true_correlation, test_type = 'z'):
    """Test population correlation is equal to a given value
    if the correlation passed is zero, then a t-test is called
    else, a z-test is called. It is possible to also compute the
    z-test when the hypothesised population correlation is zero.

    Keyword arguments:
    1. series1, series2: the pandas series whose correlation has to be computed
    """
    N = series1.count()
    sample_correlation = series1.corr(series2)
    dof = None

    # change the type of the test if 't' and the correlation is not zero    
    if test_type == 't':
        try:
            assert(true_correlation == 0)
        except AssertionError: 
            warnings.warn('A t-test cannot be computed with a non-zero '
                          ' hypothesised correlation coefficient, switching to a z-test.')
            test_type  = 'z'

    if test_type  == 'z':
        # compute the z-test
        sample_mean = 0.5*np.log((1 + sample_correlation)/(1 - sample_correlation))
        population_mean = 0.5*np.log((1 + true_correlation)/(1 - true_correlation))
        population_std = 1/np.sqrt(N - 3)
        statistic = (sample_mean - population_mean)/population_std
        pvalue = 2*(1-sps.norm.cdf(abs(statistic)))
    elif test_type == 't':
        # compute the t-test
        statistic = np.sqrt(N-2)*sample_correlation/np.sqrt(1 - sample_correlation**2)
        dof = N - 2
        pvalue = 2*(1-sps.t.cdf(abs(statistic), dof))

    return({'statistic': statistic, 'p-value': pvalue, 'DoF': dof, 'test_type': test_type})

Here are examples of its use:

print("==============================================================")
print("Tests of population correlation")
print("==============================================================")
# under the null
series1 = pd.Series(np.random.randn(10))
series2 = pd.Series(np.random.randn(10))
print(test_correlation(series1, series2, 0, 't'))

# under the alternative
mX = np.random.multivariate_normal((0, 0), [[1, .7], [.7, 1]], size = 10) 
series1 = pd.Series(mX[:, 0])
series2 = pd.Series(mX[:, 1])
print(test_correlation(series1, series2, 0.1, 't'))

How do you calculate the statistical significance of a correlation between $Y$ and $X$?

2 Answers2