confusion-matrix eval-met
illustration 1
- url https://framerusercontent.com/images/L1rUxpJ00yIfOffzFpQWSNpjEkA.png?width=1300&height=952 [20260504].
direction of change prediction 2
- In time series forecasting, the task of a machine learning model is to predict the future values of a time series.
- The forecast in one step ahead prediction is about the first future value in a series, where this value will be (more or less) different from the actual value, resulting in a forecast error.
- There is a case where we are more interested in predicting whether the difference in value of the next day will be positive or negative, the direction of change.
- In this case, we can use a confusion matrix.
classification metrics 3
- Most classification model evaluations begin with the construction of a confusion matrix.
- A confusion matrix is a summary of prediction results on a classification problem.
- The number of correct and incorrect predictions are summarized with count values and broken down by each class.
metrics from confusion matrix 4
- For a simple binary classifier there are true positive (TP), true negative (TN), false positive (FP), and false negative (FN) categories.
- From that classes accuracy, error rate, precision, recall, true negative rate (TNR), false positive rate (FPR), false negative rate (FNR), and F1-score.
- There are also receiver operating curve (ROC) and area under the curve (AUC) to consider.
the categories 5
- True Positive (TP): The model correctly predicted a positive outcome i.e the actual outcome was positive.
- True Negative (TN): The model correctly predicted a negative outcome i.e the actual outcome was negative.
- False Positive (FP): The model incorrectly predicted a positive outcome i.e the actual outcome was negative. It is also known as a Type I error.
- False Negative (FN): The model incorrectly predicted a negative outcome i.e the actual outcome was positive. It is also known as a Type II error.
tuning and metrics 6 7
- Optimize model performance by tuning the decision threshold, improving data preprocessing, and addressing class imbalance; these factors can improve recall without necessarily reducing precision, depending on dataset characteristics.
- For medical datasets, high recall is generally prioritized to minimize false negatives, making it more critical than overall accuracy or precision.
- For applications such as spam detection or fraud detection, high precision is typically preferred to reduce false positives; this is achieved by threshold tuning while balancing the precision–recall tradeoff.
- F1-score is the preferred metric for imbalanced datasets when both precision and recall are equally important, particularly when neither false positives nor false negatives can be overlooked.
notes
- This note is used in 26a62 entry 10.
refs
Rohit Kundu, “Confusion Matrix: How To Use It & Interpret Results [Examples]”, V7, 13 Sep 2022, url https://www.v7darwin.com/blog/confusion-matrix-guide [20260504]. ↩︎
Ottavio Calzone, “MAE, MSE, RMSE, and F1 score in Time Series Forecasting”, Medium, 7 Apr 2022, url https://medium.com/p/d04021ffa7ce [20260503]. ↩︎
Data On A Tangent - Jiji C., “Evaluation Metrics 101”, DataDrivenInvestor – Medium, 8 Feb 2021, url https://medium.com/p/7c8b4c3421c2 [20260503]. ↩︎
Surya Gutta, “Machine Learning Metrics in simple terms”, Analytics Vidhya – Medium, 18 Jun 2021, url https://medium.com/p/d58a9c85f9f6 [20260503]. ↩︎
Tanya Verma, “Understanding the Confusion Matrix in Machine Learning”, Medium, 21 Dec 2025, url https://medium.com/p/a8bbb97243a4 [20260503]. ↩︎
Nikhil Dasari, “Confusion Matrix Made Simple: Accuracy, Precision, Recall & F1-Score”, Towards Data Science, 30 Jul 2025, url https://towardsdatascience.com/confusion-matrix-made-simple-accuracy-precision-recall-f1-score/ [20260503]. ↩︎
GPT-5.3, “Model Tuning and Metrics”, ChatGPT, 3 May 2026, url https://chatgpt.com/share/69f74094-ed50-8323-8518-7f0a2272d22a [20260503]. ↩︎