3

Problem: I am looking for a metric to find the representativeness of a sample for a given distribution, being the representativeness of a random sample as the degree of capacity of the sample to exhibit the characteristics of a sample that would be obtained by a given distribution.

In my case, I am trying different methods that generate samples landing in a polyhedron, and I would like to measure how good are this methods on generating a set of samples that is representative of all the feasible space (i.e. the polyhedron), assuming that a perfect sample set would be the equivalent of a sample generated by a random variable with uniform distribution in the feasible space (i.e. the polyhedron).

For instance, consider two different methods that generate samples sets as shown in the image below: enter image description here

It could be said, by mere observation, that the method generating the samples of the figure in the left is not as representative as the second method, as it over-samples the borders and does not return samples from the interior of the polygon. However, how would you quantify it? What metric would you suggest. It is worth mentioning, in case it might help, that the real feasible space is a convex polyhedron of relatively high-dimensions (n~[10,100]).

Initial approach: My first intuition was to compute statistical moments (mean and variance) of the sample and the given distribution and compare them, although I found it too simplistic, and I am sure there is a better metric to compute it. But if so, how would you deal with multi-dimensional data. What is the best way to compare how apart are two covarience matrices?

Also, I had the idea to compute the likelihood of the different sample sets and compare them... only to realise that, as the probability distribution is uniform over the polyhedron, and all sample sets of the same size will have the same likelihood. However, it might exist a way to use the likelihood concept to quantify how different is a sample from a given distribution.

Any help, comment and/or suggestion is very much appreciated, thank you a lot in advance!

mcardoner
  • 31
  • 2
  • 1
    Welcome to Cross Validated! Do you know what distribution you're trying to approximate? – Dave Feb 17 '22 at 15:16
  • 1
    Would you perhaps be trying to assess whether the samples appear to be from the *uniform* distribution on the interior of the polyhedron? If so, https://stats.stackexchange.com/questions/30982 is the same question (with answers). If not, please explain how your question differs from that one. – whuber Feb 17 '22 at 15:23
  • Thanks for the welcome, @Dave. I am not exactly trying to approximate a distribution. There are methods that generate samples over a polyhedron. I want to evaluate these samples on different aspects, one being how representative are with respect to the polyhedron volume. And that for that I want to measure the degree of similarity of these samples with respect to samples drawn from a uniform distribution in the given feasible space. – mcardoner Feb 22 '22 at 22:46
  • Many thanks for guiding me to the other question, @whuber. However, after reading the answers carefully, I am still unsure whether a convex, but non-cubic, geometry could lead to heavy edge effects that modify the Ripley's K function. Does it hold for all geometries that the uniform distribution is the horizontal line at zero for ()−? If not, should it be a sensible approach to generate a random sample set uniformly distributed in the given polyhedron, compute ()− for the uniformly random sample set, and the compare it with the rest of sample sets? Many thanks. – mcardoner Feb 22 '22 at 23:01
  • 1
    What you describe is exactly what implementations of the $K$ function do. This is how envelopes are constructed around the reference distribution, too. – whuber Feb 22 '22 at 23:28
  • Oh, I get it, many thanks. – mcardoner Feb 24 '22 at 11:22

0 Answers0