 ## Table of content

• What are evaluation metrics?
• Confusion Metrics
• Accuracy,Precision, Recall, R1 Score.

### What are Evaluation Metrics ?

• You have built multiple models for your classification task, but how do you decide which one is better?
• How do you know how is your model performing? How good is it in predicting the outcome?
• All these questions can be answered with the help of evaluation metrics.

### Confusion matrix

• A Confusion matrix is an N x N matrix used for evaluating the performance of a classification model, where N is the number of target classes. The matrix compares the actual target values with those predicted by the machine learning model.
• For a binary classification problem, we would have a 2 x 2 matrix as shown below with 4 values:
• A good model is one which has high TP and TN rates, while low FP and FN rates.

### Accuracy

It is a measure of correctness that is achieved in true prediction. In simple words, it tells us how many predictions are actually positive out of all the total positive predicted.

Accuracy is a valid choice of evaluation for classification problems which are well balanced and not skewed or there is no class imbalance.

### Precision

It is a measure of correctness that is achieved in true prediction. In simple words, it tells us how many predictions are actually positive out of all the total positive predicted.

Precision is a valid choice of evaluation metric when we want to be very sure of our prediction.

### Recall

It is a measure of actual observations which are predicted correctly, i.e. how many observations of positive class are actually predicted as positive.

It is also known as Sensitivity. Recall is a valid choice of evaluation metric when we want to capture as many positives as possible.

### F1 Score

• The F1 score is a number between 0 and 1 and is the harmonic mean of precision and recall.
• F1 score sort of maintains a balance between the precision and recall for your classifier. If your precision is low, the F1 is low and if the recall is low again your F1 score is low.
• We use Harmonic mean because it is not sensitive to extremely large values, unlike simple averages.