Optimal approach to independence check

Question

I want to see if a vendor is biased against one type of vehicle compared to another. I have two categories of vehicles: OWN and T/C. The vendor sells fuel to these vehicles.

The process is as follows:

Vehicle sends REQUEST to vendor.
Vendor may choose not to act on this request
If vendor chooses to act he will respond with an OFFER.

I have the number of REQUESTS (R_o / R_t) the supplier has received from each of the vehicle categories, and the number of OFFERS (O_o / O_t) he has responded with.

I am basically trying to figure out if a given vendor is biased against a T/C vehicle compared to an OWN vehicle on the basis of categories. I am slightly familiar with the Chi-square dependence check, and it looks like that may be the best approach, but I wasn't confident enough in my own knowledge of it to implement it without consulting more knowledgeable people first. I hope you have time to help me out with this.

Chi-square calculation:

     Observed:
              T/C    OWN     Total
 No Offers    54     328      382
Got Offers    22     230      252
     Total    76     558      634

      Expected:
              T/C    OWN    Total
 No Offers    45.79  336.21  382
Got Offers    30.21  221.79  252
     Total    76     558     634


            X^2-calc: (O-E)^2/E
               1.45   0.20
               2.26   0.30     X^2
                               4.21

score 4 · Accepted Answer · edited Apr 13 '17 at 12:44

If your interest is in comparing the proportion of requests that attract offers for the two vehicle types, then it's a two sample proportions test.

That could indeed be done as a chi-square test.

However, it sounds like your alternative is one tailed (you're interested in telling if there's a bias in a particular direction). If that direction of anticipated bias was not based on the data you use in the test and you don't wish to pick up any bias in the opposite direction, you might do a one-tailed test instead.

Otherwise the chi-square test does the same job and you could use it since you have some familiarity with it.

What is a typical number of requests for each type and what fraction of them overall result in offers?

The two sample proportions test statistic can be found on this page (though it calls it "Two-proportion z-test, pooled for $H_0\colon p_1=p_2$").

For that, your requests are the $n$'s and the offers are the $x$'s.

The corresponding chi-square test is discussed here.

There, the O's are the offer counts and the counts of requests that got no offers. The E's are defined from the O's using the formula at the link.

An explanation of p-values is here; the first sentence defines them.

If you need additional explanation please clarify what you need.

Major edit in response to questions in comments:

but it's good to know that that is the proper way to do it.

Whoah. I didn't say that.

And your calculations up there don't include a continuity correction.

With CC:

> chisq.test(matrix(c(54,22,328,230),nr=2))

    Pearson's Chi-squared test with Yates' continuity correction

data:  matrix(c(54, 22, 328, 230), nr = 2)
X-squared = 3.709, df = 1, p-value = 0.05412

Without CC:

> chisq.test(matrix(c(54,22,328,230),nr=2),correct=FALSE)

    Pearson's Chi-squared test

data:  matrix(c(54, 22, 328, 230), nr = 2)
X-squared = 4.2058, df = 1, p-value = 0.04029

Could you possibly point me in the direction of a formula to calculate the p-value for 1. degrees of freedom?

A chi-square(1) is the square of a normal. You can evaluate probabilities by taking the square root and doubling the upper tail area for a normal. So if you don't have a chi-square function or table, you can use normal tables:

> 2*pnorm(sqrt(4.2058),lower.tail=FALSE)
[1] 0.04028597

and get the same result. But that only works for 1 d.f.

I haven't been able to find any myself - all the sites either have a Java application or a pre-defined table

There's a reason for this. There's no exact closed-form function. You can approximate it in various ways (e.g. by series expansions or by using ratios of polynomials or by numerical integration).

and I do not know if Excel's in-built CHITEST()-funktion applies Yates Continuity Theorem.

It doesn't do continuity correction by default. I'm not sure why you're quite so focused on it though.

About the result: You are saying that a p-value of 0.05 indicates that the situation occurs 1/20 by chance.

Not so. Please read carefully the first sentence of the p-value article I pointed to.

You need to pursue this until you comprehend why the answer to your next question:

Does that mean 19/20 are biased?

.... is 'obviously not'.

if I want a precise and true p-value I need to use a formula

Nothing about the continuity correction makes it either 'precise' or 'true' and in any case you don't need to use 'a formula' to calculate the p-value after using the continuity correction.

I've tried plotting the p-value against X^2 of 1 df and creating an exponential trendline. The fit has an R^2-value of 0.993 - do you think I can use the function of the trendline as my formula for p-value?

Not in general, no. Not even if you didn't have it out by what looks like a factor of 10.

Major edit 2, in response to further comments:

The p-value was never going to be the essential factor in the report -

This sentence makes me happy. Significance tests are useful in the right context, but for some reason they get used much more often than I'd ever think is reasonable.

its purpose was to give the user a quick idea of the nature of the transactions that have taken place with the specific supplier.

That might perhaps be better served by a measure of the effect size, (such as a difference in proportion, with an accompanying confidence interval to give some sense of whether the difference is explainable by chance).

I need to consider the "balance between the costs of the two types of error, and between the probabilities of the two types of error at your sample size." Could you explain what this means?

This was in relation to choosing a significance level; if p-values aren't particularly important to you it may not matter.

These are the two error types I was referring to:

http://en.wikipedia.org/wiki/Type_I_and_type_II_errors

In your case, the probability of the second type of error is a function of the difference in proportion; often what is done is a particular effect size is chosen and the desired power at that effect size is used to determine the power curve, usually by choosing the sample size for a study, but sometimes the Type I error rate (significance level) is moved instead.

In your case, if you're trying to choose a type I error rate and have a fixed sample size, you have the power at some given effect size to trade off against.

I lack an intuitive understanding of how the probabilities relate to the real world.

There is approximately a 1/25 chance (no Yates') of getting the REQUESTS/OFFERS combination in this case

Only if the null hypothesis is true. (Which it won't be.)

why should this be a weak indication of bias? Why not 1/100?

I wish more people would ask such a question.

What is 'weak' or 'strong' depends on the context; the tradeoff I mentioned is part of that calculation. Let's say that it's what is more typically regarded as weak evidence -- since a 1/20 event is hardly astonishing. Your own particular needs and circumstances should always trump convential senses of what's weak or strong.

Incidentally, I think you should feel no embarrassment over not getting the notion of a p-value first go. It is a somewhat subtle, even counterintuitive idea and is one of the most misunderstood concepts in the whole of statistics. Indeed, I am often asked by people "Is this text any good?" - often one I've never seen.

One of the first things I do in evaluating a text is to check whether it screws up on explaining what a p-value is. That easily eliminates a quarter of textbooks (often with titles like "Introductory statistics for ________") on the spot. If a large fraction of the people teaching in some department or other that has an intro stats class still get it wrong after sometimes decades at it, you shouldn't feel too bad. In fact I was surprised when I first read the first sentence in wikipedia's p-value article to find it got it right. I expected to have to fix it.

I've also seen it wrong in videos for online courses. I've even seen it wrong in academic papers once or twice (fortunately actual statisticians don't get it wrong very often - it's usually someone in some other area doing it as a sideline).

Is it something that should be settled experimentally by analysing several cases we know are biased, and then using the average p-value of those as the significance level?

Not really - the typical size of bias in cases that are biased don't tell you that a bias one tenth as large isn't important. How much bias (as measured by difference in probability or odds-ratio or whatever) would matter? That's the sort of effect size you should focus on.

Could one say that this new significance level had been adjusted to the specific need of the analysis: To pinpoint bias?

I don't know that I properly follow here, but if you decide on your type I and type II error rates together by comparing the relative loss of making the two types of error at no difference (type I) and at the minimum relevant effect size (type II), you can say it was chosen with regard to detecting bias of a size that's of practical importance.

Major edit 3, in response to additional questions

Your suggestion of measuring the effect size sounds interesting. This is a screenshot of the report I am working on. There is a difference of 42.4% between OWN and T/C, with regards to the percentage of offers received relative to vehicle type. If I've understood you correctly, this is the effect size. This was actually what I was considering doing prior to the chi-squared test, but I wasn't sure how to deal with the fact that I must define an arbitrary threshold ("If the effect size is bigger than 20%, then the supplier is likely biased") -- as well as how to include sample size in my considerations.

You can perhaps get the best of both worlds if you compute a confidence interval for that difference, as I already mentioned.

If the CI includes zero, it's equivalent to saying that 'it could be explained by random variation' and thus gives teh same kind of information as a hypothesis test. If you use the right choice of interval, the interval will even exactly correspond to a chi-square, like so --

Proportions interval and test:

> prop.test(x=c(54,328),n=c(76,558),alt="two.sided",correct=FALSE)

    2-sample test for equality of proportions without continuity
    correction

data:  c(54, 328) out of c(76, 558)
X-squared = 4.2058, df = 1, p-value = 0.04029
alternative hypothesis: two.sided
95 percent confidence interval:
 0.01287586 0.23254953
sample estimates:
   prop 1    prop 2 
0.7105263 0.5878136

Corresponding chi-square test:

> chisq.test(matrix(c(54,22,328,230),nr=2),correct=FALSE)

    Pearson's Chi-squared test

data:  matrix(c(54, 22, 328, 230), nr = 2)
X-squared = 4.2058, df = 1, p-value = 0.04029

(See how the p-value is the same? They is the same test; the 100(1-\alpha)% confidence interval for the difference doesn't include zero whenever the level-$\alpha$ test is significant)

In any case the other common two-sample proportions intervals will come close to the same as the chi-square anyway (in the sense that the effective p-values will generally be pretty close).

The chi-squared test seems to package all of this neatly into a single quantity, so I would prefer to make the significance level useful for my purposes while also including the effect size in the report.

This is fine - but you can also do similar "neat packages" in other ways.

If I know the effect size I am testing for, using power analysis I can calculate the minimum sample size required for a good chi-square test. If my sample size exceeds this, I can then move on to finding significance level. Is this correct?

Forget the sample size calculation if you already have a sample whose size you can't change; what you do is work out the power associated with the sort of effect size you'd regard as important to pick up; it tells you not whether you can do the test (you can!) but how good the test will be at picking up interesting deviations.

Thank you for your reply. I would prefer if I could test for bias in both directions. I usually have ~300 requests total, 1/4 belonging to T/C and the remaining 3/4 belonging to OWN. 1/3 of the requests usually receive an offer - the division of these between OWN and T/C varies greatly. I will update the thread once I have read your links and tried it out. Thank you! — KHH, May 10 '13 at 11:43
Okay, I've now had a chance to work with the data. Here is a screenshot of the tables: http://i.imgur.com/QiEG9yL.png A Chi^2-value of 4.21 gives me a p-value between 0.05 and 0.02. From what I can read, a p-value below 0.05 (significance level) is typically interpreted as an indication of bias, but I doubt I can count for the same significance level to apply in my situation. What are your thoughts on this? — KHH, May 10 '13 at 15:03
I took the liberty of putting a text version of your table in your question. Your calculation looks correct. The actual p-value is quite close to 0.05 (and then if you apply Yates continuity correction, the p-value is just above 0.05); that is, when the underlying rates of offer are the same, a result at least as extreme as that occurs one time in 20 just by chance. So while there's a suggestion of bias in these results, but it's fairly weak. ... ctd — Glen_b, May 11 '13 at 02:41
ctd ... If your conclusion is going to be one of claiming that people are engaged in bias, you probably want somewhat stronger evidence before you say anything too strong. As for what significance level you should use, that depends on the balance between the costs of the two types of error, and between the probabilities of the two types of error at your sample size and the kind of effect size (difference in proportions) you'd like to be able to pick up. — Glen_b, May 11 '13 at 02:41
Thanks @Glen_b ! I think I applied Yates Continuity Theorem in my calculations, but it's good to know that that is the proper way to do it. Could you possibly point me in the direction of a formula to calculate the p-value for 1. degrees of freedom? I haven't been able to find any myself - all the sites either have a Java application or a pre-defined table - and I do not know if Excel's in-built CHITEST()-funktion applies Yates Continuity Theorem. About the result: You are saying that a p-value of 0.05 indicates that the situation occurs 1/20 by chance. Does that mean 19/20 are biased? — KHH, May 11 '13 at 13:59
Yeah, the CHITEST()-function does not include Yates' theorem, so if I want a precise and true p-value I need to use a formula. I've tried plotting the p-value against X^2 of 1 df and creating an exponential trendline. The fit has an R^2-value of 0.993 - do you think I can use the function of the trendline as my formula for p-value? The function is 0.06535e(-0.613x) — KHH, May 11 '13 at 14:34
too much to answer here in comments. I have edited my answer above. Almost all of your understanding revealed here is wrong. — Glen_b, May 11 '13 at 22:48
Quite embarrassing. I had only skimmed the article on p-values and clearly misunderstood the most important aspects of it. This leaves the chi-square test relatively toothless compared to the ridiculous idea of it I had in my mind before. I also read up on Yates' theorem, and I'm not going to use it in my calculations. The p-value was never going to be the essential factor in the report - rather, its purpose was to give the user a quick idea of the nature of the transactions that have taken place with the specific supplier. This brings me back to your comment from yesterday, [...] — KHH, May 12 '13 at 04:38
-- I need to consider the "balance between the costs of the two types of error, and between the probabilities of the two types of error at your sample size." Could you explain what this means? I I lack an intuitive understanding of how the probabilities relate to the real world. There is approximately a 1/25 chance (no Yates') of getting the REQUESTS/OFFERS combination in this case - why should this be a weak indication of bias? Why not 1/100? — KHH, May 12 '13 at 04:55
Is it something that should be settled experimentally by analysing several cases we know are biased, and then using the average p-value of those as the significance level? Could one say that this new significance level had been adjusted to the specific need of the analysis: To pinpoint bias? I hope you don't feel like you're wasting you're time. I'm learning A LOT from this, and I appreciate that you're taking your time explaining these things to me. — KHH, May 12 '13 at 05:00
Your suggestion of measuring the effect size sounds interesting. [This is a screenshot of the report I am working on.](http://i.imgur.com/pmT2Gt3.png) There is a difference of 42.4% between OWN and T/C, with regards to the percentage of offers received relative to vehicle type. If I've understood you correctly, this is the effect size. This was actually what I was considering doing prior to the chi-squared test, but I wasn't sure how to deal with the fact that I must define an arbitrary threshold ("If the effect size is bigger than 20%, then the supplier is likely biased") -- — KHH, May 13 '13 at 05:42
-- as well as how to include sample size in my considerations. The chi-squared test seems to package all of this neatly into a single quantity, so I would prefer to make the significance level useful for my purposes while also including the effect size in the report. If I know the effect size I am testing for, using _power analysis_ I can calculate the minimum sample size required for a good chi-square test. If my sample size exceeds this, I can then move on to finding significance level. Is this correct? — KHH, May 13 '13 at 06:04
Sorry for the late response. I've been very busy with homework and haven't had the time to focus on this much. I really wish I had time to study and understand every aspect of this, but the deadline is nearing. It seems the only thing that remains to be done is to find the power associated with the effect size I'd like to check for. If the effect size is a proportional difference of 10%, how would I go ahead and calculate the power? How is the quantity 'power' supposed to be interpreted? — KHH, May 21 '13 at 02:09
Finally: What books on statistics would you suggest to a physics undergrad? You have shown me how much understanding I lack in this field, but I continue to find it very fascinating. It stimulates me both in theoretical study and application, so would like to gain an understanding of all the fundamentals in statistics. A book or two to read over the summer would help me greatly. — KHH, May 21 '13 at 02:12

Optimal approach to independence check

1 Answers1