S
Step 1o Turbo Vision
StepFun
🧠 Language Model
StepFun's vision-language reasoning model — accepts text and images with extended thinking for complex visual analysis.
Specifications
Context Window32,000 tokens
Max Output8,192 tokens
Speed●●●●● Average
ModalitiesInput: text, image · Output: text
FeaturesVision: Yes Tools: Yes Streaming: Yes
Pricing
Included with plan