I've been using googlesheets to calculate and understand how chi squares work in paired coin tosses as a model for pairing of alleles in Punnet squares. I used the randbetween(0,1) function to simulate 1000 pairs of coin tosses, and then using the AND, XOR, and NOT(OR) operators, I was able to count the three following categories: 11, 10 + 01, and 00. Then I explicitly calculated chi squared and subsequently used the chidist function to find the p value for 2 degrees of freedom, as would be done in genetics analyses. I plotted the observed frequencies of all 3 categories and also the incremental chi square, increasing by units of 10 flips. By using googlesheet's calculation refresh, I can visualize multiple traces of category frequency and chi square.
I'm surprised by the relatively large number of times that the chi square touches or dips below the 0.05 p value in the graph, particularly in the domain of higher numbers of flips, say greater than 800. This dipping occurs perhaps once every 5-10 refreshes. In research practice a p value of 0.05 or less leads to rejection of the null hypothesis, here that is the 50:50 probabilities model. By eye the data seems to be converging but the instrument of chi square analysis misses this more times than I would have thought. I had naively thought the "law of averages" driving convergence to near the 1:2:1 ratio would lead to p-values safely larger than 0.05. If the convergence is very tight as sometimes happens, the p value is well above 0.05.
Moreover, at the beginning of the graph, with only a tens of flips, the convergence hasn't proceeded very much, but the p value remains quite large.
In the attached figure, AA represents 11, Aa and aA represents 10 and 01, and aa represents 00. I'd really appreciate any comments/answers about why chi square is so sensitive to seemingly slight straying from 1:2:1 convergence for large numbers of flips.
I'd really appreciate any explanation about what is happening. Is this just a feature of chi square that would work itself out better after 10,000 paired flips? Are these p value potholes always out there waiting for an unlucky researcher?