3

Can one just fit a linear model between $Y$ and $X$, and then take the p-values of the $t$-test done on the slope (or beta) of the linear model? Since presumably correlation would be in the same direction as the slope/beta of the linear regression model?

COOLSerdash
  • 25,317
  • 8
  • 73
  • 123
InquilineKea
  • 361
  • 1
  • 9
  • Yes, $t$ and $p$ values are the same. Concerning the correlation coefficient (Pearson's). Denote $b_{Y.X}$ the slope of the regression with $Y$ being the dependent and $X$ the independent variable. Further, denote $s_{X},s_{Y}$ the standard deviations of the variable. Then the correlation coefficient is: $r = b_{Y.X}\cdot\frac{s_{X}}{s_{Y}}$. See [here](http://stats.stackexchange.com/a/104577/21054). – COOLSerdash Jul 04 '14 at 06:51

2 Answers2

4

Yes, that would work. You can also find the t statistic for a given correlation coefficient r based on a sample of size N as follows: $$t_{(df=N-2)}=\frac r {\sqrt{\frac{1-r^2}{N-2}}}$$This would need adjusting if your null hypothesis is not $\rho=0$. I trust you can find the p value for t.

Nick Stauner
  • 11,558
  • 5
  • 47
  • 105
1

Here is a Python function to perform a t-test, as well as a z-test for the significance of the correlation coefficient between two series.

import warnings 
import numpy as np
import scipy as sp
import scipy.stats as sps

def test_correlation(series1, series2, true_correlation, test_type = 'z'):
    """Test population correlation is equal to a given value
    if the correlation passed is zero, then a t-test is called
    else, a z-test is called. It is possible to also compute the
    z-test when the hypothesised population correlation is zero.

    Keyword arguments:
    1. series1, series2: the pandas series whose correlation has to be computed
    """
    N = series1.count()
    sample_correlation = series1.corr(series2)
    dof = None

    # change the type of the test if 't' and the correlation is not zero    
    if test_type == 't':
        try:
            assert(true_correlation == 0)
        except AssertionError: 
            warnings.warn('A t-test cannot be computed with a non-zero '
                          ' hypothesised correlation coefficient, switching to a z-test.')
            test_type  = 'z'

    if test_type  == 'z':
        # compute the z-test
        sample_mean = 0.5*np.log((1 + sample_correlation)/(1 - sample_correlation))
        population_mean = 0.5*np.log((1 + true_correlation)/(1 - true_correlation))
        population_std = 1/np.sqrt(N - 3)
        statistic = (sample_mean - population_mean)/population_std
        pvalue = 2*(1-sps.norm.cdf(abs(statistic)))
    elif test_type == 't':
        # compute the t-test
        statistic = np.sqrt(N-2)*sample_correlation/np.sqrt(1 - sample_correlation**2)
        dof = N - 2
        pvalue = 2*(1-sps.t.cdf(abs(statistic), dof))

    return({'statistic': statistic, 'p-value': pvalue, 'DoF': dof, 'test_type': test_type}) 

Here are examples of its use:

print("==============================================================")
print("Tests of population correlation")
print("==============================================================")
# under the null
series1 = pd.Series(np.random.randn(10))
series2 = pd.Series(np.random.randn(10))
print(test_correlation(series1, series2, 0, 't'))

# under the alternative
mX = np.random.multivariate_normal((0, 0), [[1, .7], [.7, 1]], size = 10) 
series1 = pd.Series(mX[:, 0])
series2 = pd.Series(mX[:, 1])
print(test_correlation(series1, series2, 0.1, 't'))
tchakravarty
  • 8,442
  • 2
  • 36
  • 50