observed frequency vs expected frequency plot interpretation

Question

I'm trying to understand the output of a plot that considers the observed frequency and expected frequency. More or less I've thought that plotting the observed values against the expected values will inspect the degree to which pairs deviate from their expected co-occurrence frequency.

Here we can observe that there all phylogenetic relationships do not exhibit any lower than expected co-occurrence. This pattern is the contrary for all functional GCFs pairs and some of those that are random, where many of these points tend to have a low expected co-occurrence level.

I don't know what else I can interpret or gain from having a plot like this?

Nick Cox · Answer 1 · 2020-04-13T08:31:18.733

Such graphs often disappoint in practice as compared with the ideal that they might show interesting or helpful patterns.

First is the problem of over-plotting. Two or more pairs (expected, observed) will necessarily plot at the same position. Your observed frequencies are necessarily integers; your expected frequencies show some granularity too, so overlap may be common, especially in large datasets.

So, what to do about that?

Jitter points by adding random noise graphically.
Change the symbol. Dots are hard work at best (I find your image hard to read). Although it has become unfashionable in some groups for some reason, the ancient advice to use open symbols like hollow circles still seems good to me. They tolerate overlap well: think Olympic rings or marks from beer or wine glasses.
Consider showing frequency at each position by different symbol sizes. Proportionality isn't essential: in fact about 7 classes on a logarithmic scale can work fine.

Second, sometimes square root scales work well for plotting counts or similar variables. It's elementary but also fundamental that the square root of zero is zero and that square root transformations can work well with counts (which hangs with the variance of a Poisson equalling its mean).

Third, you are relying on colour alone to make distinctions. Changing the symbol too could help. But what happens if functional, phylogenetic and random plot at the same position? (I have no real idea of what those categories mean, but evidently the categories are expected to be different.) See Visualising many variables in one plot for the suggestion to plot each group in turn with the other groups as backdrop.

Hey Nick! Thank you very much for your answer on this, very detailed. I calculated a probabilistic model the probability (P) that two selected species co‐occur at a frequency either less than (Plt) or greater than (Pgt) the observed frequency of co‐occurrence. The Plt and Pgt values can be interpreted as p‐values testing whether species co‐occur significantly less often or significantly more often than expected by chance.P < 0.05 signnificant assocation and P > 0.05 = random. If significant pgt <0.05 = phylogenetic, plt <0.05 =negative — Biohacker, Apr 13 '20 at 08:51
That puts more weight on 0.05 as a threshold than I would like, but there you go. I don't work in your field. — Nick Cox, Apr 13 '20 at 09:17

observed frequency vs expected frequency plot interpretation

1 Answers1