1

Given that $x{(i)} \in R^{100}$. And the fully-connected layer $f(.)$ is $$f(x^{(i)}) = \sigma(Wx^{(i)})$$ where W is a 1000 $\times$ 100 weight matrix and $\sigma(.)$ is a point-wise nonlinearity.

I was looking at this question: Number of parameters in an artificial neural network for AIC

For this specific examples, there is an input layer, one hidden layer, and one output layer? How can I compute the number of learnable parameters here? The input dimension is $(100 \times 1)$ and the output dimension is $(1000 \times 1)$ I believe.

Is the numbber of learning parameters just $(100 \times 1000) + (1000 \times 1) = 101000$? Or am i misunderstanding?

Eisen
  • 181
  • 3

1 Answers1

0

Usually there should be a bias term b in addition to W. suppose your hidden layer is a1=σ(W1xi+b1), your output layer is y=a2=σ(W2a1+b2)

the total number of parameters for a1 should be 1000*100+1000 the total number of parameters for y/a2 should be 1000+1

Without the bias terms b1, b2, I would get the same answer as you.

The output dimension (for y or a2) is 1 here, not 1000X1. Think about a classification variable. 1000 instead is the number of hidden units in the hidden layer.

hehe
  • 347
  • 2
  • 9