Netflix open-sourced VOID, a video diffusion model that removes objects from videos while understanding the physical interactions those objects had with the scene. Built on CogVideoX and fine-tuned with synthetic data from Google's Kubric and Adobe's HUMOTO, VOID handles causality—if you remove a person holding a guitar, the guitar falls naturally instead of floating in mid-air. The system requires 40GB+ VRAM and ships with two transformer checkpoints that can run separately or together for better temporal consistency.

This addresses a real production pain point that VFX teams know well. Standard inpainting models are sophisticated background painters, but they don't reason about physics. Remove an actor from a scene and you're left with floating props that defy gravity. Netflix has been dealing with this problem at scale across their content pipeline, and VOID represents their solution built from actual production needs rather than academic curiosity.

The implementation details reveal the engineering complexity: VOID combines multiple AI systems including Meta's SAM2 for segmentation, Gemini 3 Pro for scene analysis, and optical flow corrections in a second pass. The Apache 2.0 license means commercial use is allowed, which is significant given Netflix's typically protective approach to their internal tools. The 40GB VRAM requirement limits practical adoption to A100-class hardware, though the tutorial suggests T4/L4 may work with CPU offload.

For developers, this is less about immediate deployment and more about understanding where video AI is heading. VOID shows that effective video editing AI requires multi-model orchestration and physics reasoning, not just better inpainting. The open-source release gives builders a reference implementation for production-grade video manipulation workflows, even if the compute requirements put it out of reach for most indie developers today.