How to Choose the Best Classification Model Based on Performance Metrics

When working on machine learning classification tasks, selecting the best model often involves analyzing various performance metrics like accuracy, precision, recall, and F1-score. In this post, I’ll walk you through how I evaluated and selected the best model for my dataset.

The Dataset and Models

I trained several machine learning models on my dataset, including:

Support Vector Classifier (SV)
Random Forest Classifier (RF)
Naive Bayes (NB)
K-Nearest Neighbors (KNN with 4 neighbors)

I used different resampling techniques such as:

ROS (Random Over-Sampling): Helps address class imbalance by increasing the number of minority class samples.
RUS (Random Under-Sampling): Reduces the majority class samples to balance the dataset.
Near Miss (nm): A specific type of under-sampling focusing on samples close to the decision boundary.

The goal was to find the model that performed best on unseen data (test set).

Performance Metrics

To evaluate the models, I used the following metrics:

Accuracy: The ratio of correctly predicted instances to the total instances.
Precision: The ratio of correctly predicted positive observations to the total predicted positives.
Recall: The ratio of correctly predicted positive observations to all actual positives.
F1-Score: The harmonic mean of precision and recall, balancing both.

Model Performance Results

Below are the key performance metrics for each model:

Model	Accuracy	Precision	Recall	F1 Score
SV-train	0.7487	0.3893	0.4029	0.4029
RF-train	0.8441	0.4725	0.5045	0.4846
NB-train	0.0693	0.0292	0.3083	0.0519
KN4-train	0.7143	0.3788	0.3951	0.3815
RF-rs-ros-train	0.9819	0.9829	0.9819	0.9817
RF-rs-ros-test	0.9310	0.9465	0.8684	0.8842

Model Selection Process

After reviewing the metrics, the model RF-rs-ros-test (Random Forest with Random Over-Sampling) stands out as the best-performing model. Here’s why:

High Accuracy: With an accuracy of 93.10%, this model predicts most instances correctly.
Balanced Precision and Recall: Precision of 94.65% and Recall of 86.84% show that the model does well at minimizing false positives and false negatives.
Strong F1 Score: The F1 score of 88.42% confirms that this model maintains a good balance between precision and recall, making it reliable for scenarios where both metrics are important.

Why Precision and Recall Matter

Depending on your project, focusing on precision or recall may be more important. For example, in medical diagnostics, recall might be more critical because you want to catch as many positive cases as possible (minimize false negatives). In other scenarios, like fraud detection, precision might be key to avoid false positives.

In this case, RF-rs-ros-test strikes a good balance between both, making it suitable for general classification tasks where accuracy, precision, and recall are equally important.

Conclusion

From the various models and resampling techniques, Random Forest with ROS emerged as the best choice. It had the highest overall metrics, including precision, recall, and F1 score. If you are dealing with imbalanced datasets, ROS can significantly improve your model’s performance, especially when using robust classifiers like Random Forest.

When evaluating models, it’s important to look beyond accuracy and assess other metrics like precision, recall, and F1 score, particularly when working with imbalanced data. These metrics provide a more comprehensive understanding of how your model performs in different real-world scenarios.