butiran-✗

confusion-matrix eval-met

· 3 mins read · edit

illustration 1

direction of change prediction 2

  • In time series forecasting, the task of a machine learning model is to predict the future values of a time series.
  • The forecast in one step ahead prediction is about the first future value in a series, where this value will be (more or less) different from the actual value, resulting in a forecast error.
  • There is a case where we are more interested in predicting whether the difference in value of the next day will be positive or negative, the direction of change.
  • In this case, we can use a confusion matrix.

classification metrics 3

  • Most classification model evaluations begin with the construction of a confusion matrix.
  • A confusion matrix is a summary of prediction results on a classification problem.
  • The number of correct and incorrect predictions are summarized with count values and broken down by each class.

metrics from confusion matrix 4

  • For a simple binary classifier there are true positive (TP), true negative (TN), false positive (FP), and false negative (FN) categories.
  • From that classes accuracy, error rate, precision, recall, true negative rate (TNR), false positive rate (FPR), false negative rate (FNR), and F1-score.
  • There are also receiver operating curve (ROC) and area under the curve (AUC) to consider.

the categories 5

  • True Positive (TP): The model correctly predicted a positive outcome i.e the actual outcome was positive.
  • True Negative (TN): The model correctly predicted a negative outcome i.e the actual outcome was negative.
  • False Positive (FP): The model incorrectly predicted a positive outcome i.e the actual outcome was negative. It is also known as a Type I error.
  • False Negative (FN): The model incorrectly predicted a negative outcome i.e the actual outcome was positive. It is also known as a Type II error.

tuning and metrics 6 7

  • Optimize model performance by tuning the decision threshold, improving data preprocessing, and addressing class imbalance; these factors can improve recall without necessarily reducing precision, depending on dataset characteristics.
  • For medical datasets, high recall is generally prioritized to minimize false negatives, making it more critical than overall accuracy or precision.
  • For applications such as spam detection or fraud detection, high precision is typically preferred to reduce false positives; this is achieved by threshold tuning while balancing the precision–recall tradeoff.
  • F1-score is the preferred metric for imbalanced datasets when both precision and recall are equally important, particularly when neither false positives nor false negatives can be overlooked.

notes

  • This note is used in 26a62 entry 10.

refs


  1. Rohit Kundu, “Confusion Matrix: How To Use It & Interpret Results [Examples]”, V7, 13 Sep 2022, url https://www.v7darwin.com/blog/confusion-matrix-guide [20260504]. ↩︎

  2. Ottavio Calzone, “MAE, MSE, RMSE, and F1 score in Time Series Forecasting”, Medium, 7 Apr 2022, url https://medium.com/p/d04021ffa7ce [20260503]. ↩︎

  3. Data On A Tangent - Jiji C., “Evaluation Metrics 101”, DataDrivenInvestor – Medium, 8 Feb 2021, url https://medium.com/p/7c8b4c3421c2 [20260503]. ↩︎

  4. Surya Gutta, “Machine Learning Metrics in simple terms”, Analytics Vidhya – Medium, 18 Jun 2021, url https://medium.com/p/d58a9c85f9f6 [20260503]. ↩︎

  5. Tanya Verma, “Understanding the Confusion Matrix in Machine Learning”, Medium, 21 Dec 2025, url https://medium.com/p/a8bbb97243a4 [20260503]. ↩︎

  6. Nikhil Dasari, “Confusion Matrix Made Simple: Accuracy, Precision, Recall & F1-Score”, Towards Data Science, 30 Jul 2025, url https://towardsdatascience.com/confusion-matrix-made-simple-accuracy-precision-recall-f1-score/ [20260503]. ↩︎

  7. GPT-5.3, “Model Tuning and Metrics”, ChatGPT, 3 May 2026, url https://chatgpt.com/share/69f74094-ed50-8323-8518-7f0a2272d22a [20260503]. ↩︎