I started learning about neural networks a few months ago and I quickly realized that there were certain mathematical topics that were essential to understand in order to understand neural networks. Calculus and linear algebra are really important topics to be familiar with and it had been years since I had taken those courses in university. I thought it would be useful to consolidate some learning resources in one place that would help a beginner get started.
Do I need a math degree to build neural networks?
Short answer: absolutely not! There are tons of libraries out there that implement the complex mathematics for us and you definitely don’t need a math degree to build neural networks. However, having a basic understanding of the math behind neural networks is really important to understanding how they work in case you need to debug or optimize your algorithm.
The next section of this article describes the different pieces of a basic neural network, the mathematical concepts required, and links to videos explaining the concepts. If you’ve taken university level calculus and linear algebra, the resources I’ve linked will be a refresher for you. If you’ve never taken these courses, I recommend doing a more comprehensive course beyond the resources linked in this article so that you can gain the foundational knowledge required. These are two comprehensive online courses that are great options for learning the concepts in detail:
Where can I learn about neural networks?
If you’re interested in diving deeper into the topic of deep learning, I highly recommend this deep learning specialization I enrolled in offered through Coursera. It’s a really well put together program taught by Andrew Ng, one of the leaders in the AI space.
Mathematics you should know to understand neural networks
What is a Neural Network?
A neural network is a network of algorithms used to solve classification problems. For example, a neural network can be used to tell you if an image is showing a cat or a dog.
Figure 1 shows a very basic image of a neural network. The main pieces of a neural network are:
- Input layer
- Hidden layer (there can be several of these)
- Output layer
The input layer is the layer that contains the inputs we want to feed into our algorithm, also known as a model. Our algorithm will do some calculations using our inputs and then spit out an answer.
The input layer is usually represented as a vector of numbers. For our image example, how do we turn an image into a vector of numbers? Since the image is just a matrix of RGB values, we can just unroll or flatten the matrix of values into a vector of numbers to be used in our algorithm.
The resources below will go over vectors and matrices, the first major mathematical concepts you should know to understand neural networks.
The hidden layer(s) is where a lot of the action happens in a neural network. In Figure 1, notice that the layer is split into two portions: the weighted sum calculation and the activation calculation. I won’t go into too much detail about why these calculations are done (deep learning courses will dive into this topic), but I will outline the mathematical concepts you should practice to understand how they work.
The weighted sum, also known as a dot product is used to compute a value specified as the z value in Figure 1. It uses 3 variables: w, x, and b. W is a matrix of numbers that represent weights, which are initialized before computation. Even though it looks like w and x are being multiplied, there is actually a dot product happening.
W has an exponent T, which is not really an exponent, it represents w transpose. B is a bias, which is just a number that is initialized before computation. X is the input vector from the previous layer. You don’t have to worry about how w and b are initialized, since it will be outlined later in this article. The resources below will go over the dot product, matrix transpose, and matrix multiplication.
The activation calculation is the calculation used to generate the a value in Figure 1. The activation calculation uses a sigmoid function with a parameter of z that was calculated previously. Not all neural networks use a sigmoid function, this is normally used as a starter for simple neural networks. The sigmoid function is used as part of a logistic regression model. The resources below will go over the sigmoid function and logistic regression.
Forward propagation encompasses the calculations performed above, which includes passing data from the input layer to the hidden layer, making some calculations using our algorithms, then spitting out a prediction. However, sometimes our algorithm may not be accurate on the first try. Our algorithm might tell us that a cat picture is actually a dog picture most of the time, which is not correct. We can make improvements to our algorithm by doing a step called back propagation.
Back propagation is a step that allows us to make our algorithm better, also known as training. Earlier I mentioned that there is a matrix of weights and a bias used in the weighted sum calculation. There are techniques to initializing the weight and bias which is out of the scope of this article, but most of the time they don’t give accurate results right away. We can actually use back propagation to help us adjust the weight and bias to make our algorithm more accurate.
A cost function is used to calculate how well the algorithm did. Different neural networks can use different cost functions, so it depends on the neural network you’re using. Our neural network example uses logistic regression that has an associated cost function. The cost function is shown in Figure 2. Don’t worry if the cost function looks a little scary, if you take a comprehensive course on neural networks the details will be explained. Even if you don’t understand the cost function right now, we can focus on some specific parts of the function so you’ll be better prepared to understand it later on.
At the beginning of the function, there is a sigma symbol, which represents a sum. Another important part of the equation is the log or logarithm. The resources below will go over the sigma notation and logarithms.
Using our cost function, we can find the weight and bias parameters that minimize our cost so that we can make our algorithm more accurate. To minimize cost, a step called gradient descent is used. If you imagine that our function is a surface like in Figure 3, the idea is that we want to get to the minima (lowest point) of the surface to minimize cost.
Gradient descent involves some knowledge in calculus. You’ll need to understand derivatives and how you can get the slope of a function. The resources below will give you an overview of derivatives that will help you understand gradient descent.
So that’s it! Those are the main math resources that I thought would be really useful for a beginner in machine learning. Once you have a good understanding of the mathematical concepts outlined in this article, you’ll be in really great shape to dive into neural networks and deep learning. A topic that I didn’t cover in this article is statistics, which is also used in machine learning. You can get away without knowing statistics as a beginner and in the future I can add a statistics article as part of a math for machine learning series! I’d love to hear your questions or suggestions on future topics and thanks for reading! :)
- Figure 2: https://stats.stackexchange.com/questions/278771/how-is-the-cost-function-from-logistic-regression-derivated
- Figure 3: https://blog.paperspace.com/intro-to-optimization-in-deep-learning-gradient-descent/
I’m an employee of IBM. The views expressed in this blog are mine and don’t necessarily reflect the positions, strategies, or opinions of the company.