Qwen Omni Turbo
Alibaba
๐ง Language Model
Multimodal model supporting text, image, audio input and text/audio output. Supports 49 voices for speech synthesis.
Specifications
Context Window32,768 tokens
Max Output8,192 tokens
Speedโโโโโ Fast
ModalitiesInput: text, image, audio ยท Output: text, audio
FeaturesVision: Yes Tools: No Streaming: Yes
Pricing
Included with plan