Extractive summarization identifies the most important sentences using techniques like TextRank (a graph-based algorithm inspired by PageRank) or BERT-based sentence scoring. The summary is a subset of the original sentences, which guarantees factual accuracy but can produce awkward, disconnected text. Abstractive summarization uses sequence-to-sequence models (T5, BART, or LLMs) to generate new text, producing more fluent summaries but risking hallucination — adding information not in the original.
LLMs have made summarization nearly a solved problem for documents that fit in the context window. "Summarize this article in 3 bullet points" produces surprisingly good results with zero fine-tuning. The remaining challenges: summarizing documents longer than the context window (requiring chunking strategies), maintaining factual accuracy (LLMs sometimes "enhance" the summary with plausible but fabricated details), and controlling output length precisely.
Common summarization patterns in production: map-reduce (split long document into chunks, summarize each chunk, then summarize the summaries), hierarchical (summarize sections, then summarize section summaries), and rolling (maintain a running summary that gets updated as new content is added). For meeting transcripts, speaker-attributed summarization ("Sarah proposed X, Pierre raised concern Y") is more useful than generic summarization.