3

Correction value to use to correct for continuity in the case of zero entry cell for tetrachoric, polychoric, polybi, and mixed.cor. See the examples for the effect of correcting versus not correcting for continuity

I read this from r psych::tetrachoric help file, but I don't understand the meaning.

What's the meaning?

Glen_b
  • 257,508
  • 32
  • 553
  • 939
WhiteGirl
  • 437
  • 1
  • 5
  • 15
  • A search of either the internet or of [our site](https://stats.stackexchange.com/search?q=continuity+correction) gets many hits for the search term *continuity correction*. See [Wikipedia](https://en.wikipedia.org/wiki/Continuity_correction) for example. A continuity correction is an adjustment to the argument that is often used when approximating a discrete cdf by a continuous one, intended to improve that approximation. ...ctd – Glen_b Aug 03 '17 at 02:39
  • ctd... See the discussion in the longer answer here: https://stats.stackexchange.com/questions/213966/why-does-the-continuity-correction-say-the-normal-approximation-to-the-binomia and the information in the comment here: https://stats.stackexchange.com/questions/58393/results-on-continuity-corrections#comment115003_58393 Also see [the answer to this question](https://math.stackexchange.com/questions/416150/what-is-continuity-correction-in-statistics) on math.SE – Glen_b Aug 03 '17 at 02:39

2 Answers2

5

This is actually stated in the documentation:

For tetrachoric, in the degenerate case of a cell entry with zero observations, a correction for continuity is applied and .5 is added to the cell entry.

This can be also traced in the source code

tab[tab==0] <- correct

Check the Why does the continuity correction (say, the normal approximation to the binomial distribution) work? thread to learn more on continuity correction in general (in this case what the authors mean is just to apply the correction to zeros).

Tim
  • 108,699
  • 20
  • 212
  • 390
1

This problem is sometimes not due to intentional correction for continuity at all. After wasting time on this issue multiple times, here's a reference for others.

These psych functions sometimes provide confusing output. Take this example:

> x = matrix(c(29, 89, 387, 77, 108, 251), nrow = 3)
> x
     [,1] [,2]
[1,]   29   77
[2,]   89  108
[3,]  387  251

A contingency matrix between ordinal variables from a paper I was reading. If we try the obvious way to get the latent correlation:

> psych::polychoric(x)
Call: psych::polychoric(x = x)
Polychoric correlations 
   C1   C2  
R1 1.00     
R2 0.32 1.00

 with tau of 
         1     2     3    4    5
[1,] -0.43 -0.43  0.43 0.43 0.43
[2,]  -Inf -0.43 -0.43 0.43  Inf
Warning message:
In matpLower(x, nvar, gminx, gmaxx, gminy, gmaxy) :
  1 cells were adjusted for 0 values using the correction for continuity. Examine your data carefully.

We get a warning about the continuity adjustment. However, there is no empty cell in our matrix, so what is going on? We can expand our data into a full dataset and try some other methods:

x2 = splitstackshape::expandRows(data.frame(as.table(x)), "Freq") %>% map_df(ordered)
table(x2$Var1, x2$Var2)

which shows the same as the matrix.

If we try the function in polycor instead, we get:

> polycor::hetcor(as.data.frame(x2))

Two-Step Estimates

Correlations/Type of Correlation:
       Var1       Var2
Var1      1 Polychoric
Var2 -0.342          1

Standard Errors:
  Var1   Var2 
       0.0462 
Levels:  0.0462

n = 941 

P-values for Tests of Bivariate Normality:
 Var1  Var2 
      0.287 
Levels:  0.287

Thus, we get a different value, of -.34. So which is right, .32 or (-).34? We can try an alternative psych function (Note that it will give errors if not given numeric input! It won't auto-convert from ordered factors, quite annoying.):

> psych::mixedCor(x2 %>% map_df(as.numeric))
  Starting polydi |....................................................................................................| 100%Call: psych::mixedCor(data = x2 %>% map_df(as.numeric))
     Var1  Var2 
Var1  1.00      
Var2 -0.34  1.00

which turns out to agree with the polycor one. The problem is that the psych functions expect input in very particular format, in this case, if you want to input a contingency matrix, it must be a class table, not a matrix. If you give it matrix input, it doesn't figure out your data is a contingency table and tries to do something else on it that produces nonsense results, and has to be continuity corrected.

> psych::polychoric(as.table(x))
[1] "You seem to have a table, I will return just one correlation."
$rho
[1] -0.342

$objective
[1] 1.5

$tau.row
     A      B 
-1.213 -0.462 

$tau.col
    A 
0.092 

One can also use table(x2$Var1, x2$Var2) %>% psych::polychoric().

CoderGuy123
  • 441
  • 4
  • 13