Questions tagged [cooccurrence]

11 questions
6
votes
1 answer

"Normalising" join probability of n events, by taking n-th root

I have a group of events which I guess you could call a compound events. Each event is something like: $$A=A_1\cap A_2\cap...\cap A_{n_a}$$ I am estimating the probability of the over all event by assuming independence of the components $$P(A) =…
3
votes
1 answer

Fisher exact test for mutation co-occurence

I've a dataset of 492 samples, for each sample I've information regarding if gene X has a germline mutation and a somatic mutation. I would like to test co-occurence of germline and somatic mutation in the same gene. What I did is that I counted…
3
votes
0 answers

Cross Co-occurrence between two corpora

I've looked around for a solution to this problem specifically in nltk, quite a bit but couldn't find much help either on SO or elsewhere. My problem is as follows: I have a set of aligned pairs of sentences: [(p1, q1), (p2,…
user1669710
  • 529
  • 3
  • 8
2
votes
0 answers

Accuracy of PMI (Pointwise Mutual Information) calculation from co-occurrence matrix

Background When calculate PMI or PPMI from a co-occurrence matrix (COM), it sums each row (co-occurrences) of the COM e.g. 2 for pineapple as in the formula in the snapshot. For this question, it is about words co-occurrences in a corpus text…
mon
  • 685
  • 5
  • 16
1
vote
0 answers

Best way to visualize which few data points out of many occur together frequently

Problem statement I am trying to construct a model that predicts stock price volatility on a given day based on data points represented as strings that may or may not be present on that day. My hypothesis is that certain combinations of these data…
1
vote
2 answers

Entropy or co-occurrence matrix to compute the randomness of gray scale images?

I have an algorithm that outputs gray scale images (not normalized). These images often contain a lot of random noise and sometimes also contain spatial structures. I would like to have some kind of measurement to compute how random an image is.…
Samuel
  • 585
  • 4
  • 15
1
vote
1 answer

Correlation Test for Non-Mutually-Exclusive Categorical Data

I have a table of 288 rows and 4 columns where each row corresponds to a tumour sample, and each column is a gene. All of the values are 0 or 1, to indicate whether there's a mutation in a particular gene for a particular sample, but obviously each…
0
votes
1 answer

How to compute Mutual Information

I am absolutely new to MI (and just really bad at it too!) and was wondering if someone could explain to me how to resolve this question. "Say word A occurs once per 1'000 words (i.e, Freq. (A)=0,001) and word be once per 100'000 words. They…
0
votes
1 answer

Measure like relative frequency that also weights the total

Background I see a lot of publications mentioning relative frequencies to indicate the presence of certain genes in a specific area. For example: Location #Genomes having gene X #Genomes Relative frequency (%) Sand 1 …
KingBoomie
  • 683
  • 4
  • 7
  • 20
0
votes
0 answers

Comparing species correlations between species in two habitats

I have two community data sets (samples as rows, species as columns, populated with abundance). This data comes from two habitats/sites, with differing numbers of samples at each site. What I want to do is test the hypothesis that species that tend…
Cody K
  • 273
  • 1
  • 7
0
votes
1 answer

Why the two conditional entropy are not comparable?

I am learning the basics of text mining. For finding the syntagmatic relations in the text like the word "technology" occurs whenever the word "information" occurs i.e co-occurrence of words "Information Technology". So one measure used for…