Today, in this blog, we will be understanding the error metrics used for Classification and Regression. You have a model but you need to make sure that it also performs well. This is where error metrics come into the picture.

In the first place, we need to understand, what is error metrics and why is it important to choose the right one for any machine learning model.

What is error metrics?

After building a machine learning model, we need to check its validity about how accurate our prediction or classification is. Evaluation of error metrics plays an important role in deciding the model’s validity.

Why is it important to select right one?

If we use wrong error metrics for checking the model’s validity, then even if the model shows 99% accuracy or similar results, it will not be of any use. In such a case, even if training or testing accuracy may seem very high, in a real-time application, it will fail to provide the appropriate results.

First of all, the question arises that why do we not use the same evaluation or error metrics for classification and regression models?

The answer is …

For classification, we get output in discrete numbers i.e. classes while in the case of regression, we get the continuous value as output i.e. predicted value. As for both the output of the problem type is different, we need different metrics to map them.

Error metrics for classification

To understand evaluation methods for classification, let’s check some use cases first:

  • First, suppose you have data of cancer patients in which you have to predict if a person is has diagnosed with cancer or not?
  • Suppose if we have to design machine learning model to predict if a day is bad day to launch the satellite.
  • If you have Iris dataset and you have to classify that a flower belongs to which category.

The first two are binary classification problems and the third one is multi-class classification.

Evaluation Techniques:

  • Confusion Matrix
  • Classification Accuracy
  • ROC Curve (Area under the curve)
  • F1 score

Confusion Matrix

It is used to evaluate the classification model that if a class is identified correctly or not.

To begin with the confusion matrix, first, we need to understand a few terms:

Let’s take an example of the detection of cancer where we have to figure out if a patient has cancer or not.

True Positive: If a patient has cancer (in actuality) and through our machine learning model, the patient is diagnosed with cancer, then this will be the case of True Positive.

True Negative: If the patient doesn’t have cancer then it’s diagnosed as negative i.e. ‘no cancer’, then it will be the case of True negative.

False Negative: If a patient has cancer but it’s diagnosed as ‘he does not have cancer, it means it is False-negative. It means our negative detection (he does not have cancer) is wrong i.e. false.

False Positive: If a patient does not have cancer but it’s diagnosed as he has cancer through our machine learning model, then it will be considered a False positive.

Yes(Actual)True PositiveFalse Negative
No(Actual)False PositiveTrue Negative

Classification Accuracy

Classification accuracy is based on the confusion matrix only. It is basically the ratio of the sum of diagonal elements of the confusion matrix and the sum of total elements.


Accuracy = (TP + TN )/Total data points

Output:  0.8067796610169492  

Precision and Recall

[Image Source: Wikipedia]

Recall (Sensitivity/True Positive Rate)

Recall means among total predicted or classified values, how many are correctly predicted/classified.


Recall: True Positive/(True Positive + False Negative)

Let’s understand it with an example.

As an illustration, suppose if you have to predict if the weather is good or bad for launching a satellite (you might have seen in the movie “Mission Mangal”, they wanted to know when can they launch their satellite successfully).

In this case, the true positive is if they predict the weather is good and it’s actually good then they can launch their satellite. If given that weather is good but they predict that it won’t be good (i.e. the case of false negative) then they might postpone the launching of the satellite. Still, there will not be much harm.

But if weather is actually not good and the machine learning model predicts that this is good weather to launch a satellite(i.e. the case of false positive) then a mishap may happen. hard work of years will be in the trash.

So, the conclusion is, in some cases, we can’t afford “false positive”. This Means Recall value is important to calculate. For such cases, recall should be as minimal as possible.

 Output: 0.8067796610169492 


Precision means among total positively predicted values how many are correctly predicted or classified.


Precision : True Positive/(True Positive + False Positive)

Let’s understand it with an example again. This example will make your understanding of Precision and recall crystal clear.

Let us suppose you are dealing with a problem statement that wants you to identify that if a patient is diagnosed with cancer correctly or not. If a person is actually suffering from cancer and it’s predicted that he has cancer, it will be the case of a true positive.

Similarly, if a person is actually not suffering from cancer but it’s detected through a machine learning model(False Positive), it won’t harm much except make that person worried.

In case, if a person is actually suffering from cancer and it’s not detected through the machine learning model (False negative), then it may cost his life because of no treatment.

In such a case, it’s important to focus on precision than recall.

 Output: 0.8041111552992642 

Relationship between recall and precision

So cut short, which error metrics is to be analyzed is based on domain knowledge and problem statement basically.

Receiver Operating Characteristics (ROC)

ROC curve is the ratio of true positive rate and false-positive rate.

For evaluation of the binary classification model, we usually check for the area under the ROC curve. More is the area, more will be classification accuracy. As AUC is high, it means probabilities for classification are more separable.

Output:  0.7913095238095239

F1 score

F1 score is the harmonic mean of precision and recall.


F1 = 2 * (precision * recall) / (precision + recall)

How to decide if accuracy_score is used or f1_score for evaluation?

When to use accuracy score? Use it when True positives and true negatives are more important.

In case, the data is imbalanced or False Negatives and False positives are deterministic factors then the F1 Score should be used as it takes recall and precision both into account.

Evaluation methods for regression

  • Mean Absolute Error(MAE)
  • Mean Squared Error(MSE)
  • Root Mean Squared Error(RMSE)

Let’s understand one by one:

Mean Absolute Error(MAE)

It is the difference between the predicted value and the actual value of the target variable.

Output:  3.7252449081714416 

Mean Squared Error(MSE)

Sometimes, it may happen that the predicted value is negative and the actual value is positive so the difference between these values may lead to inaccurate results.

Therefore, Mean squared error is an average of squares of error i.e. difference between the actual value and predicted value.

Output: 24.92238672931211 

Root Mean Squared Error(RMSE)

Root Mean Squared Error is the squared root of the average squared error.

It is most widely used and preferred over other methods because RMSE takes the square of errors first, because of that larger errors get more penalties.

 Output: RMSE is 4.9922326397426

Note: Here I have focused on error metrics only, whatever accuracy I am getting here, is not a perfect one. We can use pre-processing techniques, fine-tuning and different machine learning models according to the problem given to achieve the best results.

Here is the GitHub link for a detailed explanation and code:

Please let me know if you have any doubts or suggestions, would like to discuss more.


  • Great content! Super high-quality! Keep it up! 🙂

  • A slightly difficult content explained in simple terms.. Awesome.Good work.

    • Thank you!

Comments are closed.