A model that can understand and/or generate multiple types of data: text, images, audio, video, code. Claude can read images and text; some models can also produce images or speech. "Multimodal" contrasts with "unimodal" models that only handle one type.
Why it matters
Real-world tasks are multimodal. You want to show an AI a screenshot and ask "what's wrong here?" or give it a diagram and say "implement this." Multimodal models make that possible.