Together's inference stack is optimized for open models, offering competitive pricing by running models efficiently on their own GPU clusters. They support a wide range of models (often adding new releases within days) with OpenAI-compatible APIs, making it easy to switch from proprietary to open models. Their fine-tuning service lets you customize open models on your data without managing training infrastructure.
Together positions itself as infrastructure for the open model ecosystem. They partner with model creators (Meta, Mistral, etc.), contribute to research (FlashAttention was co-developed by Together researchers), and provide the serving layer that makes open models accessible to developers who don't want to manage GPUs. This "model cloud" layer is increasingly important as open models approach proprietary quality for many tasks.