For an assignment I am supposed to check whether two samples follow the same distribution. The task is basically "Generate two samples with n=40 from the same distribution (1000 times). Use KS and Chi Squared to test if they both follow the same distribution. Calculate alpha and beta." The KS test is straightforward, I simply feed both samples to ks.test and check the p-values against the significance level. I have problems using Chi squared though. We can either use chisq.test or implement our own function using information from the lecture. To my understanding, both approaches require me to split the range of possible values into intervals (bins, to use histogram terminology) and calculate the probabilities of values hitting a certain bin. I then use those probabilities for the formula from the lecture or pass them to chisq.test together with the "counts" of the other sample.
I can't seem to figure out how to automatically bin both samples using identical intervals (which at least the lecture states I have to do). Also, I am not 100% sure I understood everything right, although the formula given in the lecture for comparing two samples with chi squared does not seem too complicated and makes sense to me in a chi squared context (compare expected frequency to actual frequency, but for two samples).
So I would like to know:
- Does my explanation of the concept of the test reveal some misunderstandings concerning the chi squared test?
How do I go about binning two samples using the same intervals? The suggested approach seems to be using histograms, but this usually leaves me with different intervals for each sample. The following does not work as well:
h1<-hist(sample1) h2<-hist(sample2,breaks=h1$breaks)
since at some point I am confronted with values which do not fit into the intervals specified by h1.
Since the lecture is in Russian, which is not my native language, something might very well simply have gone past me. Please let me know if you have the impression I missed a crucial point.
P.S. The code here tries to accomplish the same thing. When I run it, I immediately have the same problem as before.
'x' and 'p' must have the same number of elements