Precision & Recall: Definition & Meaning — AI Wiki

评估分类器的两个互补指标。精确率回答“模型标为正的项中,多少真的是?”召回率回答“所有实际正的中,模型找到了多少?”高精确率的垃圾邮件过滤器很少把真邮件标为垃圾。高召回率的抓到大多数垃圾邮件。F1 分数是它们的调和平均 — 平衡两者的单一数字。

为什么重要

只看准确率会误导。如果只有 0.1% 交易是欺诈,一个从不预测“欺诈”的模型能达到 99.9% 准确率 — 但完全没用。精确率和召回率揭示权衡:抓更多欺诈(更高召回)意味着更多假警报(更低精确),反之亦然。生产中每个分类系统都是基于这个权衡调优的。

Deep Dive

The confusion matrix organizes predictions into four categories: True Positives (correctly flagged), False Positives (incorrectly flagged — Type I error), True Negatives (correctly passed), and False Negatives (missed — Type II error). Precision = TP / (TP + FP). Recall = TP / (TP + FN). F1 = 2 · (Precision · Recall) / (Precision + Recall).

The Trade-off in Practice

Most classifiers output a confidence score, and you choose a threshold above which to predict "positive." A low threshold catches more positives (high recall) but creates more false positives (low precision). A high threshold is more selective (high precision) but misses more positives (low recall). The optimal threshold depends on costs: in medical screening, missing a disease (false negative) is worse than a false alarm. In spam filtering, marking a real email as spam (false positive) is worse than letting spam through.

Beyond Binary

For multi-class problems, precision and recall are computed per class and then averaged. Macro-averaging treats all classes equally. Micro-averaging weights by class frequency. Weighted averaging weights by class support. The choice matters: if 90% of your data is class A, micro-average will be dominated by class A performance, potentially hiding poor performance on minority classes. In AI fairness work, per-class metrics are essential for ensuring the model works well for all groups.

Precision & Recall

为什么重要

Deep Dive

The Trade-off in Practice

Beyond Binary

相关概念