0

I am modeling a continuous bivariate distribution of a random vector $(X_1,X_2)$ using a copula. I would like to assess how well I am doing. Given a data sample, I could probably do a bivariate Kolmogorov-Smirnov test (thought it seems nontrivial, as discussed in some of these threads). However, I have an alternative idea:

  1. Pick a large $n$ and make a grid of weights $w_i=i/n$ for $i=1,\dots,n-1$.
  2. Obtain $Y_i:=w_i X_1+(1-w_i)X_2$.
  3. Assess the distributions of $Y_1,\dots,Y_{n-1}$ using the univariate Kolmogorov-Smirnov test.
  4. If $Y_i$s tend to fail the tests in stage 3. in a large fraction of instances$\color{red}{^*}$, reject the null hypothesis that the sample comes from the hypothesized bivariate distribution.
    If they do not, fail to reject the null hypothesis.

Questions:

  1. I wonder how sensible this idea is and what pitfalls I might be overlooking (aside from coarseness of the grid which may be a problem). If I am indeed overlooking things, a counterexample would be appreciated.
  2. $\color{red}{^*}$I am not sure what fraction should be large enough to achieve a desired significance level. Is it as simple as $\alpha$ fraction for the nominal $\alpha$ significance, or is it more complicated than that?
  3. I would also be interested in extending this idea beyond 2 dimensions. Do any qualitatively new problems arise in that? (I do realize the computational time would grow exponentially w.r.t. the number of dimensions and would quickly become prohibitive.)
Richard Hardy
  • 54,375
  • 10
  • 95
  • 219
  • Is it true that the bivariate distributions (X,Y) and (X',Y') are equal if and only if aX+(1-a)Y has the same distribution as aX'+(1-a)Y' for all $a \in (0,1)$? – John L Mar 31 '21 at 17:48
  • @JohnL, this is part of my Question 1 to which I do not have an answer (yet). Probably not, but it is not obvious to me, so I would be curious to see a counterexample. – Richard Hardy Mar 31 '21 at 18:02
  • This applies to other tests of fit: https://stats.stackexchange.com/q/2492/247274. What would you get from a p-value? It seems like you might benefit more from some kind of mean error from the theorized distribution. – Dave Apr 02 '21 at 12:01
  • @Dave, there are good points in the linked thread. To put it succinctly, one could do away with any and all goodness of fit tests. This comment thread is not the place to renew the discussion that can get pretty extensive :) Yet I maintain that there are things to learn from $p$-values of such tests (see e.g. the [answer by Harvey Motulsky](https://stats.stackexchange.com/questions/2492/is-normality-testing-essentially-useless/2501#2501)), hence my question. – Richard Hardy Apr 02 '21 at 12:27

0 Answers0