The Bias/Variance Trade-Off in ML

Abdella Solomon


Hello, my fellow readers, I am here with another new article to brief you about the bias/variance trade-off concept in data science and machine learning. From now on, I will keep releasing articles about data science and machine learning to simplify concepts for my audiences. Follow my page if you haven’t done it yet… that being said, let’s dive into the article.

What is this article about?

If you’re a beginner or at any level of data science, you’ve surely come across the word regularisation or you will come across it sooner or later. So, regularization is about offering solutions for the bias/variance trade-off and this article will try to brief about these concepts and list different types of regularization techniques.

What is bias?

Bias is a sort of mistake in which some aspects of a dataset are given more weight and/or representation than others. This part of the generalization error is due to wrong assumptions, such as assuming that the data is linear when it is quadratic.

A high-bias model is most likely to underfit the training data since it has fewer degrees of freedom.

What is Variance?

Variance is the amount that the estimate of the target function will change given different training data. Simply explained, it is the change in prediction accuracy of the ML model between training data and test data.

It is also due to the model’s excessive sensitivity to small variations in the training data. It tries to master the training data very well instead of mastering patterns and causes an accuracy drop-down when deployed to production or tested.

A model with many degrees of freedom such as a high-degree polynomial model is likely to have high variance and thus overfit the training data.

Last but not least generalization error type is called Irreducible error. This generalization error occurs due to the noisiness of the data itself. The only way to reduce this part of the error is to clean up the data.

The Bias/Variance Trade-off

In the above sections, we have looked at the definition of bias and variance. We also looked at the problems they cause in our models. The below image shows the graphical representation of this trade-off.

The Bias/Variance Trade-off Graphically

Increasing a model’s complexity will typically increase its variance and reduce its bias. Conversely, reducing a model’s complexity increases its bias and reduces its variance. This is why it is called a trade-off.

To solve this trade-off problem, we have something called regularization techniques. Regularization will help select a midpoint between the first scenario of high bias and the later scenario of high variance. This way, we can have a low bias and at the same time low variance model.

We have different types of regularization techniques such as early stopping, dropout, data augmentation, and a lot of other techniques. Let’s just quickly understand the listed techniques and we will finalize this article.

Early Stopping is deep learning or artificial neural network technique that can be used to stop training the model when we start to spot a drop-down in the model’s performance or when we don’t see any upgrade in the performance of the model after many epochs. This is a very good technique to avoid overfitting the data.

Dropout is also a deep learning or artificial neural network technique that can be used to turn off some neurons in the neural network layers to make all neurons understand the data very well. Thus this technique can be used to avoid underfitting the data.

Data augmentation is a general machine learning technique that can be used to have multiple sizes of our training data by adding more variance of augmented data on many occasions. Since the augmented data introduce some new kind of variability, the model will be able to learn very well from those and we can make a promising model. This is a very good technique to avoid underfitting the data. Personally speaking, data augmentation is a must-to-use technique to avoid underfitting and for overall performance upgrades.

That’s all about the bias/variance trade-off and I hope you gain something from my article. If you have any feedback, let me know in the comment section. That being said, if you enjoyed the article, I guess I deserve a follow and a clap. Please share this article with your friends and family. Stay tuned!

My pages Twitter Medium LinkedIn Telegram GitHub

Related articles from the author