0

I would like to calculate the average over a list of many compositions in A, B and C.

For example:

Comp1{A:0.25, B:0.25, C: 0.5}, Comp2{A:0.35, B:0.15, C: 0.5}, Comp3{A:0.45, B:0.25, C: 0.3}

And these compositions are based on different sample-sizes N. For example:

N(Comp1)=4, N(Comp2)=1000, N(Comp1)= 100000

Please give me hints how to weight these compositions by their 'certainty'. Ideally this 'certainty' would only depend on N and the degree of freedom and not on the actual composition. Ideally this measure of certainty would go from 0 to 1, where both Comp2 and Comp3 would be rather close to 1, so be weighted rather equal and not by a factor of 100.

Ideally I would like to ask for a reputable method. Also I do have a prior of what the composition should be, but I would rather not use it, unless this is mandatory.

kjetil b halvorsen
  • 63,378
  • 26
  • 142
  • 467
KaPy3141
  • 745
  • 4
  • 18
  • 1
    It seems to me that we would need to define what "certainty" is before we could sort this out. Do you mean certainty in the Bayesian sense of a prior (or posterior) that is very "peaked"? The discussion on how to define peakedness is an ongoing one; moreover if that's _not_ what you mean by "certainty" then do you mean something like how different the composition is from the Uniform distribution in which all members of the support have equal probability? Ignoring that, to create a weighting scheme with the various N values seems doable, but it sounds as if you want "certainty" in that scheme – call-in-co Sep 01 '20 at 17:23
  • 1
    Thanks a lot for your helpful comment! I definitely agree that building a new weighting system would only be possible if one had a more concrete definition of what certainty should be, which I unfortunately don't have. I kind of was just asking if there already was a standard-method that could work for my goal. Just a follow-up question: Have you ever seen a weighting by square-root of N? – KaPy3141 Sep 02 '20 at 18:36
  • no, weighting by $\sqrt{N}$ doesn't make a lot of sense. The point of using $N$ is that you want a metric in the range [0, 1] then counting how many out of $N$ gets you that exactly, plus it has the benefit of having a weight of $\frac{2}{N}$ is exactly twice the weight of $\frac{1}{N}$. If I find the time, I'll take a stab at how to make a weight that has two "levels" to it; each composition and all the compositions. – call-in-co Sep 02 '20 at 21:29
  • Something that just came to mind in terms of your search of "confidence" is information theoretic approaches. I.e. maybe you could frame your confidence as low entropy/high information. [Here](https://kconrad.math.uconn.edu/blurbs/analysis/entropypost.pdf) is a good PDF on the topic of entropy w.r.t. probability distributions; and [this](https://stats.stackexchange.com/questions/295617/what-is-the-advantages-of-wasserstein-metric-compared-to-kullback-leibler-diverg) is a Cross Validated discussion about two common measures of differences (distances) between probability distributions – call-in-co Sep 02 '20 at 22:07

0 Answers0