confusion-matrix eval-met

3 May 2026 · 3 mins read · edit

illustration ¹

In time series forecasting, the task of a machine learning model is to predict the future values of a time series.
The forecast in one step ahead prediction is about the first future value in a series, where this value will be (more or less) different from the actual value, resulting in a forecast error.
There is a case where we are more interested in predicting whether the difference in value of the next day will be positive or negative, the direction of change.
In this case, we can use a confusion matrix.

Most classification model evaluations begin with the construction of a confusion matrix.
A confusion matrix is a summary of prediction results on a classification problem.
The number of correct and incorrect predictions are summarized with count values and broken down by each class.

For a simple binary classifier there are true positive (TP), true negative (TN), false positive (FP), and false negative (FN) categories.
From that classes accuracy, error rate, precision, recall, true negative rate (TNR), false positive rate (FPR), false negative rate (FNR), and F1-score.
There are also receiver operating curve (ROC) and area under the curve (AUC) to consider.

True Positive (TP): The model correctly predicted a positive outcome i.e the actual outcome was positive.
True Negative (TN): The model correctly predicted a negative outcome i.e the actual outcome was negative.
False Positive (FP): The model incorrectly predicted a positive outcome i.e the actual outcome was negative. It is also known as a Type I error.
False Negative (FN): The model incorrectly predicted a negative outcome i.e the actual outcome was positive. It is also known as a Type II error.

Optimize model performance by tuning the decision threshold, improving data preprocessing, and addressing class imbalance; these factors can improve recall without necessarily reducing precision, depending on dataset characteristics.
For medical datasets, high recall is generally prioritized to minimize false negatives, making it more critical than overall accuracy or precision.
For applications such as spam detection or fraud detection, high precision is typically preferred to reduce false positives; this is achieved by threshold tuning while balancing the precision–recall tradeoff.
F1-score is the preferred metric for imbalanced datasets when both precision and recall are equally important, particularly when neither false positives nor false negatives can be overlooked.

Rohit Kundu, “Confusion Matrix: How To Use It & Interpret Results [Examples]”, V7, 13 Sep 2022, url https://www.v7darwin.com/blog/confusion-matrix-guide [20260504]. ↩︎
Ottavio Calzone, “MAE, MSE, RMSE, and F1 score in Time Series Forecasting”, Medium, 7 Apr 2022, url https://medium.com/p/d04021ffa7ce [20260503]. ↩︎
Data On A Tangent - Jiji C., “Evaluation Metrics 101”, DataDrivenInvestor – Medium, 8 Feb 2021, url https://medium.com/p/7c8b4c3421c2 [20260503]. ↩︎
Surya Gutta, “Machine Learning Metrics in simple terms”, Analytics Vidhya – Medium, 18 Jun 2021, url https://medium.com/p/d58a9c85f9f6 [20260503]. ↩︎
Tanya Verma, “Understanding the Confusion Matrix in Machine Learning”, Medium, 21 Dec 2025, url https://medium.com/p/a8bbb97243a4 [20260503]. ↩︎
Nikhil Dasari, “Confusion Matrix Made Simple: Accuracy, Precision, Recall & F1-Score”, Towards Data Science, 30 Jul 2025, url https://towardsdatascience.com/confusion-matrix-made-simple-accuracy-precision-recall-f1-score/ [20260503]. ↩︎
GPT-5.3, “Model Tuning and Metrics”, ChatGPT, 3 May 2026, url https://chatgpt.com/share/69f74094-ed50-8323-8518-7f0a2272d22a [20260503]. ↩︎