1

I have been going through Neural Networks and Deep Learning. There is a way to represent the activation of network as:

z = summation of(w*x) + b

where w,b are weight and bias with mean of 0 and S.D of 1.

where summation extends over 1 to 1000 which is total number of inputs or x. Half of them are 0 and other half is 1.

The question is to find the standard deviation of z which is given to be square root of 3/2

This question is given in the exercise. Since my stats is weak, I have been stuck in this problem and I cannot figure out how to get the value. I considered that since W and X are independent, the variance of Z is Var(W) * Var(X). But Var(X) = 250 and its far off from the given answer.

I would be thankful if someone can help me.

P.S: If somebody wants to know more information than please open the link and Ctrl-F for "Verify that the standard deviation". This will highlight the question there.

conquester
  • 113
  • 3

1 Answers1

2

The author is taking $n_{in}$ binary variables ($x_j$ in the summation) multiplying them by $n_{in}$ weights ($w_j$ in the summation) which are all generated as $N(0,1/n_{in})$ (i.e., normal with mean zero and variance $1/n_{in}$), summing all the products together, then adding a constant offset which is distributed as $N(0,1)$

I think your confusion stems from two places. First, the author doesn't explicitly show the upper limit of their summation ($n_{in}$) which is relevant in understanding where the final variance in the example comes from. Second, you are treating $X$ like it is a random variable, when they aren't random in the hypothetical situation described by the author, but rather exactly $500$ of the $x_j$'s are assumed to be $1$. This means you can simplify the summation like so:$$\sum\limits_{j=1}^{n_{in}}x_jw_j{=}\sum\limits_{i=1}^{500}w_i^*$$

where $w^*_i$ is the $i$th weight in the set of all weights $\{w_j|x_j{=}1\}$.

Since for all $w_i^*$, $w_i^* \sim N(0,1/1000)$, the simplified sum of $w_i^*$ will result in a random variable that is distributed as $N(0,500/1000) \sim N(0,1/2)$, due to the properties of the sum of normally distributed variables. That means

$$b+\sum\limits_{i=1}^{500}w_i^* \sim N(0,1)+N(0,1/2) \sim N(0,3/2)$$ which gives you the standard deviation of $\sqrt{3/2}$. In fact, this will be the distribution of the sum any time you generate weights with a $N(0,1/n_{in})$ distribution and assume half the $x_j$ are true.

Frank
  • 118
  • 1
  • 8
  • Thank you! You don't know how helpful you have been! Although I have one doubt because the author didn't make this clear. W(sub j) are each said to be generated as N(0, 1/n). So does that mean each w(sub j) is a vector? Because a single number having a Normal distribution doesn't make sense to me. – conquester Feb 04 '16 at 21:21
  • @conquester [This answer](http://stats.stackexchange.com/a/96000/85650) does more justice to your question than I can do here, but the gist is: $w_j$ has a distribution because while it is a single number, that number is a random variable that we can't know the value of for any particular $w_j$ before it is generated. All we can do is calculate probabilities for the possible values $w_j$ might take on based on the distribution that generates it, e.g. "since it is distributed $N(0,1/n)$ there is a $68\%$ chance that $w_3$ will take on a value in the range $(-\sqrt{1/n},\sqrt{1/n})$" – Frank Feb 05 '16 at 12:11