Backpropagation has been explained to death. It really is as simple as applying the chain rule to compute gradients. However, in my recent adventures, I have found that this explanation isn’t intuitive to people who want to just get shit done. As part of my consultancy (hire me!) job, I provide a brief 1-3 day machine learning course to engineers who will maintain the algorithms that I designed. Whilst most of the work I do don’t use neural networks, recently there was a case where deep neural networks were involved.
This blog post documents what I found was useful to explain neural networks, backpropagation and gradient descent. It’s not meant to be super heavy with theory – think of it as an enabler for an engineer to hit the ground running when dealing with deep networks. I may elide over some details, so some basic understanding/familiarity of neural networks is recommended.
-  really, I need to pay the bills to get my startup funded ↩
-  people who think deep learning can solve every problem are either people with deep pockets aiming to solve a very general problem, or people who don’t understand the hype. I have found that most businesses do not have problems that involves a lot of non-linearities. In fact a large majority of problems can be solved with linear regressions. ↩