SenseTime, the Hong Kong-based computer vision pioneer that has been on US sanctions lists since 2019, released SenseNova U1 on Tuesday under an open license on Hugging Face and GitHub. The model's pitch combines a technical claim and a supply-chain claim. Technical: U1 generates and interprets images without first translating them to text tokens, what co-founder and chief scientist Dahua Lin (also a CUHK professor of information engineering) describes as "the model's entire reasoning process is no longer limited to text โ€” it can reason with images as well." Supply chain: 10 Chinese chip designers including Cambricon and Biren Technology announced compatibility on release day. The model is positioned as a Chinese-stack alternative to US image and multimodal frontier models โ€” both architecturally and at the silicon layer.

The technical claim is the more interesting half, even if vendor benchmarks haven't been independently verified yet. Most current vision-language models (GPT-4o, Claude with vision, Gemini) handle images by encoding them into a sequence of discrete or continuous tokens that get fed into the same transformer that processes text โ€” effectively translating sight into a language the model already understands. Native image-reasoning architectures skip the translation step, processing visual representations directly through the model's reasoning trace. If SenseTime has actually shipped this at production quality, it pulls forward a research direction (think Anole, Chameleon-class fully native multimodal) into a usable open-source artifact. Lin frames it as foundational for future robotics: "models capable of processing images directly will enable robots to better understand the physical world." That's the same architectural bet behind embodied-AI work at Figure, Physical Intelligence, and DeepMind's Gemini Robotics โ€” but with a Chinese open-source license.

The supply-chain story is what makes this geopolitically loaded. SenseTime fell behind in the post-ChatGPT race, losing the spotlight to newer Chinese startups DeepSeek and MiniMax โ€” both of whom shipped frontier-class language models with notable open-source releases. With U1, SenseTime is doing something distinctive: shipping a model that 10 Chinese silicon vendors (Cambricon, Biren, and presumably Huawei Ascend, Moore Threads, Iluvatar, Enflame, and others) have validated against on day one. That coordination is the actual product. US export controls restrict Chinese access to top Nvidia chips for training, but inference is increasingly the binding constraint for production AI economics โ€” and an open-source model that runs natively on Chinese accelerators is a hedge against the entire training-time sanctions regime. Lin admits SenseTime "may still need to use the best chips to ensure the speed of our iteration" โ€” i.e., training quietly happens on whatever Nvidia hardware they can secure โ€” but inference can be entirely sovereign.

For builders, three takeaways. First, watch the benchmark community: Hugging Face and Twitter ML accounts will likely have independent eval numbers within days, and U1's claim of "far faster than top US models" needs verification on standardized vision-language benchmarks (MMMU, MMBench, ScienceQA) before it can be trusted. Second, the multi-chip-vendor support pattern is replicable and quietly important: if you're building open-source models, designing for portability across heterogeneous accelerators (not just Nvidia) is becoming a strategic feature, not an afterthought. Third, this is another data point in the broader "open source as iteration speed" thesis โ€” Lin's quote ("being open source or closed source is not the winning factor; the speed of iteration is") echoes the strategic bet DeepSeek and Mistral have made. The Chinese AI strategy under sanctions has converged on the same answer: ship open weights, accept the loss of proprietary moats, win on iteration velocity and ecosystem breadth. That's a more durable position than US frontier labs currently occupy.