4

I'm reading the following Chapter from van der Vaart's Asymptotic Statistics, Section 12.1 page 161 (see the screenshot below). For the $h$ function that it mentioned, I have two questions regarding its permutation symmetry:

Question 1. I don't quite understand why we could always replace an asymmetric one with a symmetric one, and how do we do it? For example, suppose my $h(X_1,X_2)=X_1^2+X_2$, how to replace it with a new symmetric function $h^*(\cdot,\cdot)$ satisfying $h^*(X_1,X_2)=h^*(X_2,X_1)$).

Question 2. Related to the first question, I feel the highlighted sentence is a bit incomplete, and I feel the complete sentence should be something like: a given $h$ could always be replaced by a symmetric one without affecting a certain property. So I 'm wondering what property it is. I have guessed the three ones for its full meaning, and I'm wondering which one is correct:

Let $U_h=\frac{1}{{n}\choose{r}}\underset{\beta}{\sum}h(X_{\beta_1},\dots,X_{\beta_r})$, and $h$ is not necessarily symmetric.

Interpretation (1). If $h$ is not symmetric, we can always replace it with a symmetric $h^*$ such that $U_{h^*}=U_{h}$, i.e., they are always numerically equal in any sample;

Interpretation (2). If $h$ is not symmetric, we can always replace it with a symmetric $h^*$ such that $U_{h^*}$ is still unbiased for $\theta$, just like $U_{h}$;

Interpretation (3). If $h$ is not symmetric, we can always replace it with a symmetric $h^*$ such that $U_{h^*}$ and $U_{h}$ are asymptotically equivalent, in the sense that they are both consistent for $\theta$ and have exactly the same limiting distribution;

Thanks!

enter image description here

T34driver
  • 1,608
  • 5
  • 11

1 Answers1

4

Consider the $r=2$ case to simplify the notation. Because $X_1$ and $X_2$ are iid and thus exchangeable $$Eh(X_1,X_2)=Eh(X_2,X_1)$$ so the expectation is symmetric because of the exchangeability of $X_1$ and $X_2$ rather than because of any property of $h$. But since the expectation is symmetric, we can choose $h$ symmetric. Start with an arbitrary and and define $h_s(x,y) =(h(x,y)+h(y,x))/2$. By definition $$Eh_s(X_1,X_2)=(Eh(X_2,X_1)+Eh(X_1,X_2)/2$$ and by exchangeability this is $$2Eh(X_1,X_2)/2=Eh(X_1,X_2)=\theta$$

For your second part: none of the above.

$U_h$ is not the same, it's $\theta$ that's the same. What he's saying is that we are interested in $\theta$; that for defining $\theta$ there is no loss of generality in taking $h$ symmetric; and that if we take $h$ symmetric, we can define $U_h$ as the average over unordered partitions.

If we wanted to allow non-symmetric $h$ in the definition of $U_h$, we coud. We'd need to define $U_h$ as an average over ordered partitions. In the $r=2$ case above with a non-symmetric $h$ $$\frac{1}{n(n-1)} \sum_{i\neq j} h(X_i,X_j)=\frac{1}{n(n-1)} \sum_{i\neq j} h_s(X_i,X_j)= \frac{1}{n\choose 2}\sum_{i< j} h_s(X_i,X_j)$$

So, since $U$-statistics only give us symmetric $\theta$s, there's no loss of generality in symmetrising the $h$. Basically, the question is whether you demand symmetric $h$ in advance or construct the symmetrised $h_s$ inside your proofs.

If you wanted to study U-statistics on non-exchangeable data (eg, in my case, multistage samples) then there would be a loss of generality in taking $h$ to be symmetric.

Thomas Lumley
  • 21,784
  • 1
  • 22
  • 73
  • Thanks, this is very helpful! One question, the last long equation you wrote should be $$\frac{1}{n(n-1)} \sum_{i\neq j} h(X_i,X_j)=\frac{2}{n(n-1)} \sum_{i< j} h_s(X_i,X_j)= \frac{1}{n\choose 2}\sum_{i< j} h_s(X_i,X_j)$$, as we defined $h_s(X_i,X_j)$ as half the sum of a pair, right? – T34driver Sep 06 '20 at 06:49
  • 1
    That would also be true, but I meant what I wrote. The first equality says that if you use all the pairs the ordering doesn't matter, and the second says that if you then have symmetry you can go to just the ordered pairs. – Thomas Lumley Sep 06 '20 at 07:04
  • Thanks! I see, and agree the first equality is true. But I have difficulty in seeing the second equality, as the summation is identical, but the factor in front differ by 2, that is $\frac{1}{{n}\choose{2}}=\frac{2}{n(n-1)}$, you must half the summation for the third expression (by using $i – T34driver Sep 06 '20 at 07:18
  • 1
    Sorry, yes. The **last** term should have $i – Thomas Lumley Sep 06 '20 at 07:23
  • Thanks! You really deepened my understanding of U-statistics! What I got in my problem is also an asymmetric h (summed over all possible permutations), but thanks to your explanation, I guess I know how to handle it now. – T34driver Sep 06 '20 at 07:28