If I have a set of terms each term having a particular frequency associated with it (the number of the times the term has appeared in fixed corpus of papers), then is the following method of significance testing valid?
calculate the median absolute deviation (MAD) of the GO term frequencies in the given corpus,
for sample $S$ : ${\rm MAD}(S) = 1.4826 \times {\rm median}(|x_{i} - {\rm median}(S) |)$
get ${\rm thresh} = 2.7\times MAD(S) + {\rm median}(S)$
use ${\rm thresh}$ as a threshold above which the GO terms are deemed significantly associated with the given corpus and below which the GO terms are deemed non-siginificant.