The last Gradient Descent algorithm we will look at is called Mini-batch Gradient
Descent. It is quite simple to understand once you know Batch and Stochastic Gradi‐
ent Descent: at each step, instead of computing the gradients based on the full train‐
ing set (as in Batch GD) or based on just one instance (as in Stochastic GD), Mini-
8 While the Normal Equation can only perform Linear Regression, the Gradient Descent algorithms can be
used to train many other models, as we will see.
batches. The main advantage of Mini-batch GD over Stochastic GD is that you can
get a performance boost from hardware optimization of matrix operations, especially
when using GPUs.
The algorithm’s progress in parameter space is less erratic than with SGD, especially
with fairly large mini-batches. As a result, Mini-batch GD will end up walking
around a bit closer to the minimum than SGD. But, on the other hand, it may be
harder for it to escape from local minima (in the case of problems that suffer from
local minima, unlike Linear Regression as we saw earlier). Figure 4-11 shows the
paths taken by the three Gradient Descent algorithms in parameter space during
training. They all end up near the minimum, but Batch GD’s path actually stops at the
minimum, while both Stochastic GD and Mini-batch GD continue to walk around.
However, don’t forget that Batch GD takes a lot of time to take each step, and Stochas‐
tic GD and Mini-batch GD would also reach the minimum if you used a good learn‐
Figure 4-11. Gradient Descent paths in parameter space
Let’s compare the algorithms we’ve discussed so far for Linear Regression8 (recall that
m is the number of training instances and n is the number of features); see Table 4-1.
Table 4-1. Comparison of algorithms for Linear Regression
Large m Out-of-core support Large n Hyperparams
Scaling required Scikit-Learn
There is almost no difference after training: all these algorithms
end up with very similar models and make predictions in exactly
the same way.
VGG The first architecture we're going to discuss is VGG (from Oxford's Visual Geometry Group, https://arxiv.org/abs/1409.1556). It was introduced in 2014, when it became a runner-up in the ImageNet challenge of that year. The VGG family of networks remains popular today and is often used as a benchmark against newer architectures. Prior to VGG (for […]
Intuition and justification for CNN The information we extract from sensory inputs is often determined by their context. With images, we can assume that nearby pixels are closely related and their collective information is more relevant when taken as a unit. Conversely, we can assume that individual pixels don't convey information related to each other. […]
Generating new MNIST images with GANs and Keras In this section, we'll demonstrate how to use GANs to generate new MNIST images with Keras. Let's start: Do the imports: 1. import matplotlib.pyplot as plt import numpy as np from keras.datasets import mnist from keras.layers import BatchNormalization, Input, Dense, Reshape, Flatten from keras.layers.advanced_activations import LeakyReLU from […]
9 A quadratic equation is of the form y = ax2 + bx + c. What if your data is actually more complex than a simple straight line? Surprisingly, you can actually use a linear model to fit nonlinear data. A simple way to do this is to add powers of each feature as new […]
Long short-term memory Hochreiter and Schmidhuber studied the problems of vanishing and exploding gradients extensively and came up with a solution called long short-term memory (LSTM) (https:// www.bioinf.jku.at/publications/older/2604.pdf). LSTMs can handle long-term dependencies due to a specially crafted memory cell. In fact, they work so well that most of the current accomplishments in training RNNs […]
Object detection Object detection is the process of finding object instances of a certain class, such as faces, cars, and trees, in images or videos. Unlike classification, object detection can detect multiple objects, as well as their location in the image. An object detector would return a list of detected objects with the following information […]