A developer has successfully replaced vector databases with Google's Memory Agent pattern for their Obsidian note-taking system, storing structured memories in SQLite and feeding them directly to Claude Haiku 4.5. The system stores about 650 memories within the 250k context window at roughly 300 tokens per structured memory entry, eliminating the need for Pinecone, Redis, or embedding pipelines that were previously required to give AI assistants persistent memory.
This approach challenges the assumption that vector search is necessary for AI memory systems. The math has fundamentally changed â older models with 4K-8K token limits required embedding-based retrieval to find relevant documents without loading everything into context. But with Claude Haiku 4.5's 250k context window, you can simply dump hundreds of structured memories directly into the prompt and let the model reason over them. It's a return to simpler architecture that sidesteps the complexity of embedding pipelines, similarity search tuning, and vector database infrastructure.
While this is a single developer's experiment rather than peer-reviewed research, it highlights a broader shift happening as context windows expand. The approach particularly shines for temporal queries like "what happened on Feb 1" or "recap my last meeting with X" â exactly the kind of structured, date-based retrieval that embeddings handle poorly. However, the 650-memory limit means this pattern works for personal productivity tools but likely won't scale to enterprise knowledge bases with millions of documents.
For developers building AI assistants, this suggests it's worth questioning whether you actually need vector search infrastructure. If your use case involves hundreds rather than millions of memories, and you need precise temporal or structured retrieval, direct LLM reasoning over SQLite might be simpler and more reliable than building embedding pipelines. The key insight: sometimes the best architecture is the one you don't have to build.
