BIAS AND VARIANCE TRADEOFF

4 min readMay 22, 2021

BIAS

Bias is known as the difference between the actual value and predicted value at the time of training. Being high in biasing gives a large error in training as well as testing data. It’s recommended that an algorithm should always be low biased to avoid the problem of underfitting.

VARIANCE

Variance is the measure of the variability in the results predicted by our model so, to put this in a simple way variance quantifies the difference in prediction when we change our dataset. So, when we have high variance, it means that our predictions are going to be very different when we give the same test case. So, having high variance signifies our model is overfitting

Bias and variance for Regression

Initially, we split our dataset into training and testing. Here we use polynomial linear regression with three use-cases to explain the behavior of bias and variance.

Usecase: 1

At the time of training, we fit the training dataset with the best fit line with the degree of polynomial = 1 which makes a straight line.

Next, we find the residual of actual and predicted data, if we find the rate of residual or error is high for both training and testing data, it reduces the overall performance of the model and leads to the case of underfitting with high bias and high variance.

Usecase: 3

At the time of training, we fit the training dataset with the best fit line with the degree of polynomial = 4 which makes a curve line.

Next, we find the residual of actual and predicted data, Here we get a very very low residual rate with training data because the line matches all the data points.

when it comes to testing data the residual rate is high as it doesn't match with the best fit line then the performance of the model may decrease and leads to the case of Overfitting with low bias and high variance.

Usecase: 2

Here we assign a degree of polynomial = 2 which makes the best fit line as a slight curve.

We get the rate of residual or error is low for both training and testing data which is considered as a generalized and best model with low bias and low variance.

Bias and variance for Classification

In classification, we generally check the accuracy of both training and testing data. The model that gives low training and testing error or higher accuracy is to be considered as the best model with low bias and variance.

Bias and variance tradeoff

Initially, the Error rate will be high when the model complexity is low or the Degree of Polynomial is 1 for training and testing data which leads to the case of underfitting with high Bias and High Variance.

As the Degree of polynomial increases, we observe that the Error rate for training data decreases and the Error rate for testing data increases that leads to the overfitting case with Low Bias and High Variance.

During this process at a particular point, we observed that there will be a low Bias and low Variance scenario which is said to be a generalized model.

We get this generalized model by hyper-tuning our algorithm in such a way that it meets the higher accuracy of both training and testing data with lower bias and variance.

If you ❤ this article, be sure to click 👏 below to recommend it and if you have any questions, leave a comment and I will do my best to answer.

For being more aware of the world of machine learning, follow me. It’s the best way to find out when I write more articles like this.

You can also follow me on Instagram, find me on LinkedIn or email me directly. I’d love to hear from you.

That’s all folks, Have a nice day :)