Conntour raised $7 million from General Catalyst and Y Combinator to build what they're calling an "AI search engine" for security video systems. The startup lets security teams query camera feeds using natural language—think "show me all people wearing red jackets near the south entrance between 2-4 PM" instead of scrubbing through hours of footage manually.
This hits a real pain point in enterprise security. Most organizations have hundreds or thousands of cameras generating terabytes of footage that's essentially unsearchable without massive human effort. The computer vision tech to identify objects and people exists, but making it queryable through natural language is the interface breakthrough that could actually get used. It's the same pattern we've seen work in other domains—take existing AI capabilities and wrap them in a conversational interface that non-technical users can actually operate.
With only one source reporting this and no technical details about their AI models or accuracy benchmarks, there's a lot we don't know. Are they using existing vision models like CLIP or building custom ones? What's their false positive rate? How do they handle edge cases like poor lighting or partially obscured subjects? Security applications demand high accuracy—you can't have the system missing actual incidents or flagging innocent behavior.
For developers building similar systems, the lesson here is interface design matters as much as model performance. The AI capabilities to analyze video feeds exist, but packaging them into something security teams will trust and actually use daily is the real challenge. Query accuracy, response time, and seamless integration with existing security infrastructure will determine whether this becomes a useful tool or expensive shelfware.
