0

I have a dataset with samples 0 and 1 data. Here each Id represents a sample no and 0 or 1 represents if the keyword(on the left: Water, Soil, etc) exists in the publication. The regional columns on the right (eg. Africa, Asia) say where the paper was published from, however, there are overlaps between regions(eg same publication has multiple country affiliations)

1. What kind of statistical tool I will need to find the correlation between the region (Europe, Africa, Asia) and the keywords (eg. water, Soil, waste, etc)*

2. What kind of statistical tool I will need to find if region influences the keywords?

Photo

kjetil b halvorsen
  • 63,378
  • 26
  • 142
  • 467
akif
  • 1
  • 1
  • The keyword variables are binary, but not dummy ones. The region are dummy variables (they can be replaced by a single categorical variable Region). – ttnphns Sep 26 '21 at 14:37
  • It is unclear correlation between what and what precisely you want. – ttnphns Sep 26 '21 at 14:38
  • If $x_1$ is (0, 1) and so is $x_2$ then the correlation between them is just the ... correlation between them (so long as both variables have both 0 and 1 values). Unusually, but predictably, the Pearson and Spearman correlations are identical. See also https://stats.stackexchange.com/questions/103801/is-it-meaningful-to-calculate-pearson-or-spearman-correlation-between-two-boolea – Nick Cox Sep 26 '21 at 16:00
  • I would start looking into some kind of [tag:correspondence-analysis]. Maybe you could add that tag? Please also include your data in a readable format: Hi, there are blind and visually impaired users of this site who interact with it using screen readers. The screen readers can't handle the equation in your screenshot. (https://stats.meta.stackexchange.com/a/1605/155836). – kjetil b halvorsen Sep 27 '21 at 14:57

1 Answers1

0

I would start looking into some kind of . If you recode your data as a contingency table, with regions as rows and keywords as columns. Then you might use a simple correspondence analysis.

The eigenvalue of the first eigenvectors could serve as a measure of correlation (well, really the second, as the first is always 1, but without interest).

kjetil b halvorsen
  • 63,378
  • 26
  • 142
  • 467