Zubnet AI学习Wiki › Classification
基础

Classification

Classifier, Categorization
把一个输入分配到一组预定义类别中的某一个的任务。“这封邮件是垃圾邮件还是不是?”(二元分类)。“这张图是猫、狗、还是鸟?”(多类别)。“这些 tag 里哪些适用于这篇文章?”(多标签)。分类是最常见的监督学习任务,也是无数真实世界 AI 应用的基础。

为什么重要

分类是大多数人第一次在实践中遇到机器学习的地方 — 垃圾邮件过滤、内容审核、医学诊断、欺诈检测、情感分析。理解分类能帮你理解整个监督学习流水线:输入带标签数据,训练出模型,输出预测。

Deep Dive

A classifier outputs a probability distribution over classes. For binary classification, a single number between 0 and 1 suffices (the probability of the positive class). For multi-class, the model outputs a probability for each class, typically using a softmax function to ensure they sum to 1. The predicted class is usually the one with the highest probability, but you can adjust the decision threshold based on your tolerance for false positives vs. false negatives.

LLMs as Classifiers

Modern LLMs are surprisingly good classifiers. Instead of training a dedicated model, you can prompt an LLM: "Classify this customer review as positive, negative, or neutral." For many classification tasks, this zero-shot approach matches or exceeds purpose-built classifiers, especially when the task requires understanding nuance or context. The trade-off is cost and latency — an LLM API call is much more expensive than running a small classifier locally.

Metrics That Matter

Accuracy (percent correct) is the most intuitive metric but can be misleading. If 99% of emails are not spam, a model that always predicts "not spam" gets 99% accuracy but catches zero spam. Precision (of predicted positives, how many are correct), recall (of actual positives, how many were found), and F1 (harmonic mean of precision and recall) give a more complete picture. The right metric depends on the cost of errors in your specific application.

相关概念

← 所有术语
← Checkpoint CLIP →