Question Answering: Definition & Meaning — AI Wiki

一個用自然語言回答問題的系統。抽取式 QA 在給定文件中找答案所在的片段(「根據第 3 段,答案是...」)。生成式 QA 從一個或多個來源綜合出答案。開放域 QA 不需要特定文件就能回答任何問題。基於 RAG 的 QA 檢索相關文件,並從中生成答案。

為什麼重要

Question-answering 是 AI 助手的基本互動模式。每個聊天機器人、每個企業知識庫、每個客服機器人本質上都是 QA 系統。理解不同的 QA 範式(抽取式、生成式、retrieval-augmented)幫你為應用選對架構,並對準確性建立現實預期。

Deep Dive

Extractive QA (the SQuAD paradigm): given a document and a question, identify the exact span of text that answers the question. Fine-tuned BERT models excel at this — they read the document, understand the question, and highlight the answer. This is fast, accurate, and verifiable (the answer is always a direct quote). But it can only answer questions whose answers appear verbatim in the document.

RAG-Based QA

The dominant modern pattern: (1) user asks a question, (2) retrieve relevant documents from a knowledge base using semantic search, (3) include the retrieved documents in the LLM's context, (4) the LLM generates an answer based on the retrieved context. This combines the precision of retrieval with the fluency of generation. The key challenges are retrieval quality (finding the right documents) and faithfulness (generating answers that accurately reflect the source material).

Evaluation

QA accuracy is measured differently for each paradigm. Extractive QA uses exact match (EM) and F1 score against ground-truth answer spans. Generative QA is harder to evaluate automatically — multiple valid phrasings exist for any answer. RAGAS and similar frameworks evaluate RAG-based QA on faithfulness (does the answer match the source?), relevance (did you retrieve the right documents?), and answer quality. Human evaluation remains the gold standard for generative QA.

Question Answering

為什麼重要

Deep Dive

RAG-Based QA

Evaluation

相關概念