Precision & Recall: Definition & Meaning — AI Wiki

評估分類器的兩個互補指標。精確率回答「模型標為正的項中,多少真的是?」召回率回答「所有實際正的中,模型找到了多少?」高精確率的垃圾郵件過濾器很少把真郵件標為垃圾。高召回率的抓到大多數垃圾郵件。F1 分數是它們的調和平均 — 平衡兩者的單一數字。

為什麼重要

只看準確率會誤導。如果只有 0.1% 交易是詐欺,一個從不預測「詐欺」的模型能達到 99.9% 準確率 — 但完全沒用。精確率和召回率揭示取捨:抓更多詐欺(更高召回)意味著更多假警報(更低精確),反之亦然。生產中每個分類系統都是基於這個取捨調優的。

Deep Dive

The confusion matrix organizes predictions into four categories: True Positives (correctly flagged), False Positives (incorrectly flagged — Type I error), True Negatives (correctly passed), and False Negatives (missed — Type II error). Precision = TP / (TP + FP). Recall = TP / (TP + FN). F1 = 2 · (Precision · Recall) / (Precision + Recall).

The Trade-off in Practice

Most classifiers output a confidence score, and you choose a threshold above which to predict "positive." A low threshold catches more positives (high recall) but creates more false positives (low precision). A high threshold is more selective (high precision) but misses more positives (low recall). The optimal threshold depends on costs: in medical screening, missing a disease (false negative) is worse than a false alarm. In spam filtering, marking a real email as spam (false positive) is worse than letting spam through.

Beyond Binary

For multi-class problems, precision and recall are computed per class and then averaged. Macro-averaging treats all classes equally. Micro-averaging weights by class frequency. Weighted averaging weights by class support. The choice matters: if 90% of your data is class A, micro-average will be dominated by class A performance, potentially hiding poor performance on minority classes. In AI fairness work, per-class metrics are essential for ensuring the model works well for all groups.

Precision & Recall

為什麼重要

Deep Dive

The Trade-off in Practice

Beyond Binary

相關概念