You don't want to use the chi-squared test to analyze this situation. It does not correspond to the question you want to answer. Instead, you should use logistic regression. To learn more about the distinction between using a test which assumes one variable is a predictor (e.g., logistic regression) and a test that does not assume any of the variables are predictors (e.g., the chi-squared test), it may help you to read my answer here. If you are unfamiliar with LR, you may want to read through some of the threads on CV categorized under the logistic tag. For some basics, my answer here explains the ideas behind probabilities, odds, odds ratios, and log odds; my answer here is written in a different context, but ends up providing an overview of what LR is all about in order to answer the OP's question. It has been a long time since I've used MATLAB, and I don't think I ever fit a LR model with it, but I gather the function to use is glmfit(); a walk-through of a simple example can be found in this blog post.
If you were to analyze these data in R
, it would be:
my.data = read.table(text="Sector Bankrupt Nothing total
BioTech 15 110 125
Airline 20 120 140
AutoCos 50 100 150
Telecom 60 40 100
Oil&Gas 9 120 129", header=TRUE)
sector = c()
for(i in 1:5) sector = c(sector, rep(as.character(my.data$Sector[i]), my.data$total[i]))
bankrupt = c(rep(1, 15), rep(0, 110), rep(1, 20), rep(0, 120), rep(1, 50), rep(0, 100),
rep(1, 60), rep(0, 40), rep(1, 9), rep(0, 120))
lr.model = glm(formula=bankrupt~sector, family=binomial(link="logit"))
anova(lr.model, test="LRT")
# Analysis of Deviance Table
#
# Model: binomial, link: logit
#
# Response: bankrupt
#
# Terms added sequentially (first to last)
#
#
# Df Deviance Resid. Df Resid. Dev Pr(>Chi)
# NULL 643 708.5
# sector 4 111.09 639 597.4 < 2.2e-16 ***
# ---
# Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Following a significant result from the LR model, you could conduct pairwise tests for equality of proportion bankrupt. I gather the MATLAB function is ztest(). In R
it would be:
props = with(my.data, Bankrupt/total)
names(props) = my.data$Sector
props
# BioTech Airline AutoCos Telecom Oil&Gas
# 0.12000000 0.14285714 0.33333333 0.60000000 0.06976744
my.table = as.table(cbind(my.data$Bankrupt, my.data$Nothing))
rownames(my.table) = my.data$Sector
colnames(my.table) = c("Bankrupt", "Nothing")
my.table
# Bankrupt Nothing
# BioTech 15 110
# Airline 20 120
# AutoCos 50 100
# Telecom 60 40
# Oil&Gas 9 120
prop.test(my.table[4:5,])
#
# 2-sample test for equality of proportions with continuity correction
#
# data: my.table[4:5, ]
# X-squared = 72.7321, df = 1, p-value < 2.2e-16
# alternative hypothesis: two.sided
# 95 percent confidence interval:
# 0.4157529 0.6447122
# sample estimates:
# prop 1 prop 2
# 0.60000000 0.06976744
If you didn't know, a-priori, which comparisons you were interested in testing, but simply tested whichever were suggested by the observed proportions, you may want to adjust the critical alpha to control for familywise error rates. With all pairwise comparisons, there are $5*4/2 = 10$ possible comparisons (and which are not orthogonal), so you could use the Bonferroni correction by dividing alpha by 10 to determine the threshold you want to use for significance (i.e., $.05/10=.005$).
You can learn more about these sorts of issues by reading the threads on CV categorized under the multiple-comparisons tag.