As we saw in Chapters 1 and 2, a good way to reduce overfitting is to regularize the model (i.e., to constrain it): the fewer degrees of freedom it has, the harder it will be for it to overfit the data. For example, a simple way to regularize a polynomial model is to reduce the […]

If you perform high-degree Polynomial Regression, you will likely fit the training data much better than with plain Linear Regression. For example, Figure 4-14 applies a 300-degree polynomial model to the preceding training data, and compares the result with a pure linear model and a quadratic model (2nd-degree polynomial). Notice how the 300-degree polynomial model […]

9 A quadratic equation is of the form y = ax2 + bx + c. What if your data is actually more complex than a simple straight line? Surprisingly, you can actually use a linear model to fit nonlinear data. A simple way to do this is to add powers of each feature as new […]

The last Gradient Descent algorithm we will look at is called Mini-batch Gradient Descent. It is quite simple to understand once you know Batch and Stochastic Gradi‐ ent Descent: at each step, instead of computing the gradients based on the full train‐ ing set (as in Batch GD) or based on just one instance (as […]

7 Out-of-core algorithms are discussed in Chapter 1. The main problem with Batch Gradient Descent is the fact that it uses the whole training set to compute the gradients at every step, which makes it very slow when the training set is large. At the opposite extreme, Stochastic Gradient Descent just picks a random instance […]

Batch Gradient Descent To implement Gradient Descent, you need to compute the gradient of the cost func‐ tion with regards to each model parameter θj. In other words, you need to calculate how much the cost function will change if you change θj just a little bit. This is called a partial derivative. It is […]

In Chapter 1, we looked at a simple regression model of life satisfaction: life_satisfac‐ tion = θ0 + θ1 × GDP_per_capita. This model is just a linear function of the input feature GDP_per_capita. θ0 and θ1 are the model’s parameters. More generally, a linear model makes a prediction by simply computing a weighted sum of […]

Just like other neural networks we have discussed, autoencoders can have multiple hidden layers. In this case they are called stacked autoencoders (or deep autoencoders). Adding more layers helps the autoencoder learn more complex codings. However, one must be careful not to make the autoencoder too powerful. Imagine an encoder so powerful that it just […]

Another kind of constraint that often leads to good feature extraction is sparsity: by adding an appropriate term to the cost function, the autoencoder is pushed to reduce the number of active neurons in the coding layer. For example, it may be pushed to have on average only 5% significantly active neurons in the coding […]

3 “Extracting and Composing Robust Features with Denoising Autoencoders,” P. Vincent et al. (2008). 4 “Stacked Denoising Autoencoders: Learning Useful Representations in a Deep Network with a Local Denois‐ ing Criterion,” P. Vincent et al. (2010). Denoising Autoencoders Another way to force the autoencoder to learn useful features is to add noise to its inputs, […]

An Introducetion to Regularized Linear Models Used in Machine Learning: A Completed GuideAs we saw in Chapters 1 and 2, a good way to reduce overfitting is to regularize the model (i.e., to constrain it): the fewer degrees of freedom it has, the harder it will be for it to overfit the data. For example, a simple way to regularize a polynomial model is to reduce the […]