4

I'm dealing with 2 data vectors $A$ and $B$, that, after some processing, need to be truncated (or rectified, not sure what the correct term is), i.e.,

$A_{trunc} = abs(A),$ and $B_{trunc} = abs(B)$.

$A$ represents samples drawn from a uniform distribution and $B$ represents samples drawn from a gaussian distribution. I am interested in analytically determining what the relationship between the standard deviations of the original and truncated versions of the data are.

Clearly the standard deviations should be smaller. I did a simulation, and based on the results, I think that a truncated uniform distribution has ~50% the standard deviation of a non-truncated version, and I also find that a truncated guassian distribution has about ~60% the standard deviation of a non-truncated version. Code below:

%In this function we simulate the effects that truncation/rectification have on mean
%and variance of uniform and gaussian distribution

%In both cases, we expect mean to increase, and variance to increase

num_sim = 1000;
u_mean = nan(num_sim, 2);
u_sigma = nan(num_sim, 2);
g_mean = nan(num_sim, 2);
g_sigma = nan(num_sim, 2);

for k=1:num_sim
%uniform distribution
u_unrect = unifrnd(-1,1,[1000,1]);
u_tmp = u_unrect;
%u_tmp(u_tmp<0)=0;
u_tmp = abs(u_tmp);
u_rect = u_tmp;

u_mean(k,1) = mean(u_unrect);
u_mean(k,2) = mean(u_rect);
u_sigma(k,1) = std(u_unrect);
u_sigma(k,2) = std(u_rect);

%gaussian distribution
g_unrect = normrnd(0,1,[1000,1]);
g_tmp = g_unrect;
%g_tmp(g_tmp<0)=0;
g_tmp = abs(g_tmp);
g_rect = g_tmp;

g_mean(k,1) = mean(g_unrect);
g_mean(k,2) = mean(g_rect);
g_sigma(k,1) = std(g_unrect);
g_sigma(k,2) = std(g_rect);

end

u_ratio = mean(u_sigma(:,2))/mean(u_sigma(:,1))
g_ratio = mean(g_sigma(:,2))/mean(g_sigma(:,1))

But I'm having a real hard time figuring this out analytically for either case! How do you go about showing this result?

John Alperto
  • 167
  • 2
  • 9
  • 2
    $|B|$ has a [half-normal distribution](https://en.wikipedia.org/wiki/Half-normal_distribution), which has a standard deviation $\sqrt{1-2/\pi}\approx 0.60281$ smaller than the regular Gaussian – in perfect agreement with your simulation. And $|A|$ is just a $\mathcal{U}(0,1)$ distribution. – corey979 Oct 06 '20 at 22:15
  • Ah, I've never heard of the half-normal. And your comment about |A| makes sense given that it stays symmetric even after truncation! Thanks!!! – John Alperto Oct 06 '20 at 22:30
  • 1
    "Truncation" of a distribution or random variable is a standard term that means something rather different than your usage here. Your are *transforming* your variable by taking its *absolute value.* – whuber Oct 07 '20 at 22:44

2 Answers2

1

Comment.

I believe comments of @corey979 apply to your simulations, for which $A\sim\mathsf{Unif}(-1,1)$ and $B \sim\mathsf{Norm}(0,1).$ In that context, the comments are OK.

However, your general question, about taking the absolute value, seems to ask about any uniform and any normal distribution.

  • To explore this by simulation, you would have to look at a wider variety of uniform and normal distributions. I will show a couple of examples to illustrate.

  • To explore this analytically, you need to look at the distributions related to $X$ that result in the random variable $|X|.$

Simulation. First, look at $A \sim\mathsf{Unif}(-1,2).$

set.seed(107)
a = runif(10^5, -1, 2)
mean(a); sd(a)
[1] 0.503862
[1] 0.8682111       # aprx SD(A)
a.t = abs(a)
mean(a.t); sd(a.t)
[1] 0.83735
[1] 0.5536315       # aprx SD(|A|) < SD(Z)

par(mfrow=c(1,2))
 hist(a, prob=T, br=30, col="skyblue2")
 hist(a.t, prob=T, br=30, col="skyblue2")    
par(mfrow=c(1,1))

enter image description here

First, look at $B \sim\mathsf{Norm}(2,1).$

set.seed(2020)
b = rnorm(10^5, 2, 1)
mean(b); sd(b)
[1] 1.997227
[1] 0.9970017   # aprx SD(B)
b.t = abs(b)
mean(b.t); sd(b.t)
[1] 2.013744
[1] 0.9632047   # aprx SD(|B|) < SD(B)

par(mfrow=c(1,2))
 hist(b, prob=T, br=30, col="skyblue2")
 hist(b.t, prob=T, br=30, col="skyblue2")    
par(mfrow=c(1,1))

enter image description here

BruceET
  • 47,896
  • 2
  • 28
  • 76
1

Ordinarily this question would just require an abstract calculation with different and not terribly informative results for every distribution. But yours are special, as your comments about symmetry note, and those special properties permit us to say something.

Consider any random variable $X$ whose distribution is symmetric about $0.$ This is the case both for the Uniform distribution on $[-1,1]$ and the standard Normal distribution in the question. You are interested in relationships between certain properties of $X$ and its absolute value $|X|.$

The first two moments of $X$ are defined to be

$$\mu_1 = E[X] = 0;\quad \mu_2 = E[X^2].$$

The symmetry of $X$ guarantees the first moment vanishes.

The first two absolute moments of $X$ are

$$\nu_1 = E[|X|];\quad \nu_2 = E[|X|^2] = E[X^2] = \mu_2.$$

By definition the standard deviation $\sigma(X)$ of a variable $X$ is the square root of its variance, defined as

$$\sigma^2(X)=\operatorname{Var}(X) = \mu_2 - \mu_1^2 = \mu_2.$$

The standard deviation of $|X|$ therefore is

$$\sigma^2(|X|)=\operatorname{Var}(|X|) = \nu_2 - \nu_1^2 = \mu_2 - \nu_1^2 = \sigma^2(X) - \nu_1^2.$$

There is your universal relationship. It confirms your intuition that $\sigma(|X|)$ must always be less than $\sigma(X)$ (at least when $\nu_1(X)\ne 0,$ which is always the case for any non-constant random variable), because the variance of $|X|$ is always less than the variance of $X$ by the amount $\nu_1^2.$


Evidently you need to compute $\nu_1$ and $\nu_2$ for your two distributions, so let's do that here.

  1. The uniformly distributed $X$ has a density equal to $1/2$ on the interval $[-1,1],$ so $$\nu_1 = \int_{-1}^1 |x| \frac{1}{2}\,\mathrm{d}x = \int_0^1 x\,\mathrm{d}x = \frac{1}{2}$$ and $$\mu_2 = \int_{-1}^1 x^2 \frac{1}{2}\,\mathrm{d}x = \frac{1}{3}.$$ The two variances differ by $1/4 = (1/2)^2$ and $\sigma(X) = \sqrt{\frac{1}{3}},$ $\sigma(|X|)=\sqrt{\frac{1}{12}} \approx 0.288675.$

  2. For the standard Normal $X$ (which by definition has variance $\sigma^2(X)=1$) compute $$\nu_1 = \frac{1}{\sqrt{2\pi}}\int_{-\infty}^\infty |x|e^{-x^2/2}\,\mathrm{d}x = \sqrt{\frac{2}{\pi}}.$$ Thus $\sigma(|X|) = \sqrt{1 - \frac{2}{\pi}} \approx 0.602687.$

As a check, here are Monte Carlo simulations in R using a million observations of each variable:

sd(abs(runif(1e6, -1, 1)))
sd(abs(rnorm(1e6)))

The output will vary, but when I ran these it was

[1] 0.2887712
[1] 0.6026868

which are comfortably close to the analytically computed values.

whuber
  • 281,159
  • 54
  • 637
  • 1,101