Jina VLM

Jina AI 🧠 Language Model

Jina AI's 2.4B vision-language model — state-of-the-art multilingual visual QA among 2B-scale VLMs. Processes images up to 4K resolution across 29 languages.

Specifications

Context Window32,768 tokens
Speed Fast
ModalitiesInput: text, image  ·  Output: text
FeaturesVision: Yes Tools: No Streaming: Yes

Pricing

Included with plan

Use Jina VLM on Zubnet →