Zubnet AI學習Wiki › Named Entity Recognition
Using AI

Named Entity Recognition

NER, Entity Extraction
辨識文字中的命名實體並給它們分類 — 人、組織、地點、日期、金額和其他專有名詞。在「Apple 週二宣布在慕尼黑投資 30 億美元」中,NER 辨識出 Apple(組織)、30 億美元(金錢)、慕尼黑(地點)、週二(日期)。這是 NLP 的基礎任務,用於資訊擷取、搜尋和知識圖譜建構。

為什麼重要

NER 是從非結構化文字中擷取結構化資訊的骨幹。每個搜尋引擎、新聞聚合器、情報系統都用 NER 理解一篇文件講的是什麼。它也是從文字建構知識圖譜的第一步 — 你沒法在你沒辨識出的實體之間建立關係。

Deep Dive

NER is typically framed as a sequence labeling task: each token gets a label like B-PER (beginning of person name), I-PER (inside person name), O (not an entity). The BIO tagging scheme handles multi-word entities: "New" gets B-LOC, "York" gets I-LOC. Fine-tuned BERT models are the standard for high-accuracy NER, though spaCy's built-in NER is popular for quick, good-enough extraction.

Domain-Specific NER

General NER models handle common entity types (person, org, location, date). Domain-specific applications need custom types: medical NER extracts drugs, symptoms, and dosages. Legal NER extracts case numbers, statutes, and parties. Financial NER extracts ticker symbols, financial metrics, and regulatory references. These require domain-specific training data, which is expensive to annotate but dramatically improves extraction quality in specialized contexts.

NER with LLMs

LLMs can perform NER through prompting: "Extract all person names and organizations from this text and return as JSON." This is slower and more expensive than dedicated NER models but handles novel entity types without training data and works across languages out of the box. For production systems processing millions of documents, dedicated models win on cost. For ad-hoc extraction or uncommon entity types, LLMs win on flexibility.

相關概念

← 所有術語
ESC