Zubnet AIसीखेंWiki › Drift Detection
Infrastructure

Drift Detection

Data Drift, Model Drift, Concept Drift
Time के साथ data distribution या model behavior में changes के लिए monitor करना जो performance degrade कर सकते हैं। Data drift: input data बदलता है (customer demographics shift, नए product categories appear)। Concept drift: inputs और correct outputs के बीच relationship बदलता है (spam क्या constitutes करता है वो evolve होता है)। Model drift: model की predictions gradually कम accurate होती जाती हैं भले ही model खुद नहीं बदला।

यह क्यों matter करता है

Models historical data पर trained होते हैं, लेकिन world बदलता रहता है। 2024 में trained एक fraud detection model 2025 के new fraud patterns miss करेगा। Pre-pandemic behavior पर trained एक recommendation system post-pandemic poor suggestions बनाएगा। Drift detection इन degradations को costly होने से पहले catch करती है — आपको alert करते हुए कि model को retraining या updating चाहिए।

Deep Dive

Data drift detection: compare the statistical distribution of current inputs to the training data distribution. If features shift significantly (using tests like KS test, PSI, or Jensen-Shannon divergence), the model may be operating outside its training distribution. Example: a credit scoring model trained on applicants aged 25–55 starts receiving applications from 18-year-olds — a population it's never seen.

Concept Drift

Concept drift is harder to detect because the inputs look the same but the correct outputs change. During COVID, "normal" purchase patterns shifted dramatically — buying 100 rolls of toilet paper went from "probable fraud" to "Tuesday." The model's predictions became wrong not because the model degraded, but because reality changed. Detecting concept drift requires comparing predictions to ground truth, which often arrives with a delay.

For LLMs

LLM drift manifests differently: user query patterns shift (new topics emerge), provider model updates change behavior (API model versions change silently), and the world changes (outdated training data). Monitoring strategies include: tracking output quality scores over time, detecting shifts in topic distribution of queries, alerting on increases in user-reported issues, and periodically re-evaluating on a fixed benchmark to detect provider-side changes.

संबंधित अवधारणाएँ

← सभी Terms
ESC