Comparing frequency between groups

Question

I'm using SPSS, new to statistics. I have 2 groups of patients (Group 1: 1000 patients; Group 2: 400 patients). One of five different interventions (surgery a,b,c,d,e) was performed on subjects in each Group. I want to compare the frequency of each intervention between the groups (obtaining a P value).

So I want to say: there was/was not a statistically significant difference (P-value) in the number of 'interventions a" between the two groups.

Thanks

Welcome to Cross Validated? Do you mean that you want to know if group $A$ picks the five surgery options in the same distribution as group $B?$ — Dave, Feb 23 '22 at 18:01
Please show the $2\times 5$ table of counts (not percentages). [Or a similar fictitious one for illustration, if you can't reveal exact data.] // State more clearly what you want to know, Whether the two groups have similar proportions in the five interventions? Or are you only interested in intervention a vs. non-a? // Most software programs will do the relevant chi-squared test, so this isn't limited to SPSS. — BruceET, Feb 24 '22 at 07:27
Hi, thanks. yes. But comparing each surgery on each group. 1. I want to know if group A picks surgery A in the same distribution as group B 2. I want to know if group A picks surgery B in the same distribution as group B 3. I want to know if group A picks surgery C in the same distribution as group B ..... same for each surgery — user350382, Feb 24 '22 at 12:21
I hope my answer helps with some of that. Doing a chi-squared test on the entire $2 \times 5$ table first may save some time, knowing where to look for differences. After you've read it (and maybe my link), ask remaining questions. // Also please, edit the essence of your comment into your Question (not everyone reads all comments). // If you plan to ask other questions in the future, please take a few minutes to tour the site so things might go more smoothly next time. — BruceET, Feb 24 '22 at 15:44

score 1 · Answer 1 · answered Feb 23 '22 at 18:01

1

Use a 2-way chi-squared test, with group vs intervention as the two variables.

answered Feb 23 '22 at 18:01

I tried, but i have 10 different surgeries. so it didnt work. – user350382 Feb 23 '22 at 18:17
total group a group b Single-chamber devices 43 (14%) 21 (18%) 22 (10%) p 0.11 Dual-chamber devices 199 (60%) 69 (59%) 130 (61%) p 0 .78 Triple-chamber devices 85 (26%) 28 (24%) 57 (27%) p 0.56 this is what im trying to do (obviously this is not my data) – user350382 Feb 23 '22 at 18:17
1

I see, in Stata you have what are called immediate data - i.e., the counts and percentages and not the data. Some software packages can handle this, and some can't. SPSS should, so google "SPSS chi square test without data" or something thereabouts. Be careful on Cross-Validated, as there are moderators who will close questions on software usage, or questions on how to do something via a particular software package. Typical reasons are "it's not a statistical question" or "it's a software usage question." – Feb 23 '22 at 18:48
1

This is more like a Comment than an Answer. – BruceET Feb 24 '22 at 09:11

BruceET · Answer 2 · 2022-02-24T16:04:37.303

Speculating that the following table of counts is something like yours (and that the answer to @Dave's question is Yes), I will use it as an example. Columns are for interventions a, b, c, d, and e. [Using R.]

TAB = rbind(Gp1, Gp2); TAB
    [,1] [,2] [,3] [,4] [,5]
Gp1   81  182  203  264  270     # sum 1000
Gp2   62   26   88  117  107     # sum  400

Observed counts $X_{ij}$ of subjects are shown in the table TAB: $X_{ij}, i=1,2; j = 1,2,3,4,5;$ and $\sum_{ij} X_{ij}=1400.$

A chi-squared test of homogeneity, uses the null hypothesis (homogeneous proportions for the five interventions) to find corresponding expected counts $E_{ij}, i=1,2; j = 1,2,3,4,5.$ See the formula in a statistics text or this link.

The P-value in the output of the chi-squared test below is very near $0.$ So, there are some significant differences between groups in proportions of interventions.

chisq.test(TAB)

        Pearson's Chi-squared test

data:  TAB
X-squared = 42.899, df = 4, p-value = 1.086e-08

The expected counts are shown in the table below:

    chisq.test(TAB)$exp
         [,1]      [,2]      [,3]     [,4]     [,5]
Gp1 102.14286 148.57143 207.85714 272.1429 269.2857
Gp2  40.85714  59.42857  83.14286 108.8571 107.7143

To get a chi-squared statistic as large as 42.9, there must have been large disagreements in some cells of the table between observed and expected counts. If intervention choices were made in the same way for the two groups, then it would be almost impossible to get a chi-squared statistic as large as $42.9.$

The squares of the ten Pearson residuals below add to give the chi-squared statistic. That is, $R_{ij} = \frac{X_{ij} - E_{ij}}{\sqrt{E_{ij}}}$ and $\sum_{ij} R_{ij}^2 = 42.899.$

Looking at residuals with the largest absolute values will give you a clue where the important disagreements between observed and expected counts may lie.

From the table below, we see that they lie in the first two columns. So, proportions of interventions a and b may have been significantly different between the two groups.

chisq.test(TAB)$res
         [,1]      [,2]       [,3]       [,4]        [,5]
Gp1 -2.091990  2.742522 -0.3368980 -0.4936036  0.04352766
Gp2  3.307727 -4.336309  0.5326825  0.7804559 -0.06882327

If you do a post hoc test for just the first two columns, you get a highly significant result.

TAB.ab = rbind(c(81,182), c(62,26))
TAB.ab
     [,1] [,2]
[1,]   81  182
[2,]   62   26
chisq.test(TAB.ab)$p.val
[1] 1.290181e-10

If you are going to look at several sub-tables in this way, then in order to minimize the risk of 'false discovery' you need to insist on P-values smaller than 5% to declare significant differences. (If you do lots of ad hoc tests on the same data, you may eventually get a significant P-value by chance alone--even if there is no real effect.) This is not a problem in my $2\times 2$ example just above, because the P-value is very nearly $0.$

Depending on particulars of your real data, you may have to look at only one or two $2\times 2$ sub-tables out of the ${5 \choose 2} = 10$ possibilities.

Comparing frequency between groups

2 Answers2