Zubnet AI学习Wiki › Named Entity Recognition
Using AI

Named Entity Recognition

NER, Entity Extraction
识别文本中的命名实体并给它们分类 — 人、组织、地点、日期、金额和其他专有名词。在“Apple 周二宣布在慕尼黑投资 30 亿美元”中,NER 识别出 Apple(组织)、30 亿美元(金钱)、慕尼黑(地点)、周二(日期)。这是 NLP 的基础任务,用于信息抽取、搜索和知识图谱构建。

为什么重要

NER 是从非结构化文本中抽取结构化信息的骨干。每个搜索引擎、新闻聚合器、情报系统都用 NER 理解一篇文档讲的是什么。它也是从文本构建知识图谱的第一步 — 你没法在你没识别出的实体之间构建关系。

Deep Dive

NER is typically framed as a sequence labeling task: each token gets a label like B-PER (beginning of person name), I-PER (inside person name), O (not an entity). The BIO tagging scheme handles multi-word entities: "New" gets B-LOC, "York" gets I-LOC. Fine-tuned BERT models are the standard for high-accuracy NER, though spaCy's built-in NER is popular for quick, good-enough extraction.

Domain-Specific NER

General NER models handle common entity types (person, org, location, date). Domain-specific applications need custom types: medical NER extracts drugs, symptoms, and dosages. Legal NER extracts case numbers, statutes, and parties. Financial NER extracts ticker symbols, financial metrics, and regulatory references. These require domain-specific training data, which is expensive to annotate but dramatically improves extraction quality in specialized contexts.

NER with LLMs

LLMs can perform NER through prompting: "Extract all person names and organizations from this text and return as JSON." This is slower and more expensive than dedicated NER models but handles novel entity types without training data and works across languages out of the box. For production systems processing millions of documents, dedicated models win on cost. For ad-hoc extraction or uncommon entity types, LLMs win on flexibility.

相关概念

← 所有术语
ESC