0

I'm using Facebook's FastText algorithm to classify documents. Today I'm using the best practice method as the FastText github read me page suggest. I want to try and explore what would happen if I change the learning rate. In order to that I need to understand what does learning rate means?

Sycorax
  • 76,417
  • 20
  • 189
  • 313

2 Answers2

1

The standard method of training a neural network is using some or another variation of a first-order optimization method.

The simplest and most straightforward method of is gradient descent update the model parameters (weights, biases) makes linear steps in the direction of steepest descent, so the update rule has the form $$ x^{(t+1)} = x^{(t)} - \eta \nabla f\left(x^{(t)}\right) $$ where $\eta > 0$ is the learning rate and $f$ is your loss function and $x^{(\text{t})}$ is the value of the parameters at iteration $t$.

There are more ornate variations on this idea, such as momentum and Adam, but they all have some direct connection to this basic update equation.

Picking a good learning rate is important because if you set it too large, your model will get worse and if you set it too small, progress will be painfully slow. So there is usually some experimentation inovled in picking a good learning rate.

Sycorax
  • 76,417
  • 20
  • 189
  • 313
0

So a learning rate is a variable (lets say $r$) which is used in algorithms as a factor on which it changes the other variables each turn.

For instance when I use them in perceptrons I will often start with $r=1$ and then after each round of the algorithm I will do $r=r/2$, so that the changes it makes reduce each time, in hope that it will eventually terminate when the changes it makes are negligible. If I didn't do this then it might never terminate depending on the criteria set.

I will also often play around with this to see how well it classifies values such as trying $r=r/n$ or something similar, which it what you seem to be wanting to do

Beavis
  • 216
  • 1
  • 5