Vanishing Gradient Problem

Written
  • When training a neural network and the derivative of the error function is too small, it can effectively stop updating, or make extremely slow progress.
  • This causes training to finish slowly or not at all.
  • This is a problem sometimes seen with Sigmoid activation functions, among others, due to their saturating nature at the extremes.
  • Faster hardware has helped with this issue, and other types of functions such as ReLU help as well.

Thanks for reading! If you have any questions or comments, please send me a note on Twitter. And if you enjoyed this, I also have a newsletter where I send out interesting things I read and the occasional nature photo.

You can check out a recent issue, or enter your email below to subscribe.