mgrep with Founding Engineer Rui Huang
The Problem with grep for AI Agents
I hosted a talk with Rui Huang, a founding engineer at Mixedbread, to discuss a tool that has become essential to my workflow:
mgrep
I sync every repo I have locally automatically 100% of the time. And I’ve seen companies replace their production semantic search systems with Mixedbread’s cloud service because it just finds more relevant results and ranks them properly.
Rui’s talk focused on how to use their semantic search technology, specifically mgrep, to build more effective coding agents. I was excited to hear from him directly, as he’s on the engineering team building the product.
This post covers the key points from our conversation, including the limitations of standard tools like grep, how mgrep provides a semantic alternative, and a look at the multi-vector and multimodal technology that powers it.
Sign up for the next Mixedbread talk for an even deeper dive on what makes mgrep work and the research that went into it
The Problem with grep for AI Agents
The ‘RAG is dead’ headlines are bombastic, but common. The argument goes:
modern AI agents are so powerful, they don’t need fancy semantic search. They can just use basic tools like grep to find context with better results than semantic search.
This argument usually targets older semantic search methods.
While agentic search using grep is powerful, it has significant drawbacks. Rui explained the problems his team observed from watching agents perform long-running tasks.
1. It’s Slow and Expensive.
grep is a pattern-matching tool. To find relevant context for a high-level task like “refactor the authentication flow,” an agent must guess all possible keywords (authentication, middleware, credentials) and run multiple tool calls. This dramatically increases latency and token consumption, stuffing your context window with noise.
2. It Degrades Quality.
As an agent makes more tool calls, the context window fills with partially useful information. The original intent gets diluted. Rui pointed out this is when you see agents start to hallucinate or get stuck, often responding with phrases like “You’re absolutely right” without making progress.
I’ve found this to be true in my own work. A faster feedback loop is critical. When an agent uses fewer tokens and gets to the point quicker, I can iterate faster and stay focused.
Introducing mgrep: Semantic Search for Code
Mixedbread’s solution is mgrep, a command-line tool designed to be a semantic version of grep. Instead of matching patterns, it understands the intent behind a natural language query.
# The Semantic Query
mgrep “how streaming is implemented”The tool returns a list of relevant files with specific line numbers and a similarity score, which helps the agent gauge the confidence of each result.
This approach allows the agent to progressively discover context instead of stuffing the entire contents of multiple files into its prompt.
The impact is not subtle. In Mixedbread’s internal tests with Claude on complex coding tasks, mgrep delivered a massive performance advantage
53% fewer tokens used
48% faster response times
3.2x better quality responses
Note: I’ve seen similar gains in my own experiments, which is what inspired me to host this talk.
Live Demo: mgrep vs. grep
Rui demonstrated the difference with a live playground that runs Claude in two environments side-by-side: one with standard grep and one with mgrep. The task was to query the large React codebase (over 6 million tokens) with the question: “Explain how the useEffect hook works in common patterns.”
It’s important to note that mgrep is complementary to grep, not a replacement. Agents are given both tools and can choose the right one for the job: mgrep for semantic exploration and grep for exact-match symbol searching.
Try the playground yourself to see the difference without any setup.
Getting Started: Setup & Terminal Demo
mgrep is a CLI tool that syncs a local directory to a cloud-backed search index. Rui walked through the setup.
First, install the tool via npm:
npm install -g @mixedbread/mgrep # or pnpm / bunThen, log in to connect to your Mixedbread account:
mgrep loginOnce set up, you can sync any directory. Rui used Andrej Karpathy’s nanogpt repository as an example. The watch command indexes the files in the current folder, respecting .gitignore, and syncs them to a remote Mixedbread store.
mgrep watchThe ingestion is high-speed and cheap. The entire React codebase, with 60 million tokens, takes only about five minutes and costs $20 to index.
Once synced, you can query your code with natural language. mgrep also includes an -a (--answer) flag that uses an LLM to get a direct answer from the search results, complete with citations.
This gives the agent a concise summary, reducing the need to process large file snippets.
Integrating mgrep with Coding Agents
While you can use mgrep manually, its core usage is in integrating it with agents. Mixedbread provides plugins for popular coding agents like Claude Code
A simple install command configures the agent to be aware of mgrep and how to use it.
mgrep install-claude-codeThis command automatically sets up the necessary skills and prompts. The mgrep watch process runs in the background during an agent session, ensuring the index is always up-to-date.
These plugins are transparent. They are essentially wrappers around the mgrep CLI with a pre-written prompt. This makes it easy to customize the behavior by creating your own prompts in your agent’s configuration files if you need more control.
Under the Hood: Multi-Vector & Multimodal Search
mgrep is powered by the Mixedbread Search API. When you sync files, they are sent to a Mixedbread store where their pipeline takes over.
This is more than just a vector database. The service analyzes file types, applies appropriate chunking strategies (e.g., different logic for Markdown vs. code), and generates embeddings using state-of-the-art models.
The key innovation is multi-vector search. Traditional semantic search creates one vector for a large chunk of text. Mixedbread represents every single word as its own vector. This creates a much richer, more granular representation of the source material. Advanced quantization techniques make this approach scalable and affordable.
This system is also multimodal. It can natively index and search images, videos, audio, and PDFs without first transcribing them to text. Rui showed a fun demo where he searched a folder of cat pictures using queries like “sad cat” and “angry cat.”
This is powerful for codebases that include diagrams, visual assets, or complex PDF documentation. An agent can find relevant visual information that would be completely invisible to a text-only tool like grep.
For example, in legal domains, PDFs are often the source of truth for contracts. And for e-commerce platforms, there may be lots of images of different products that need to be worked with.
Conclusion: Give AI the Best Tools
The rise of capable agents doesn’t mean semantic search is dead. It means agents need better search tools. Combining a semantic tool like mgrep with an exact-match tool like grep gives an agent a far more powerful and efficient way to understand a codebase.
The key takeaways are:
Agentic search needs semantic search. Relying on grep alone is slow, expensive, and leads to lower-quality results for complex tasks.
Better tools lead to better agents.
mgrepimproves agent performance by reducing token usage, increasing speed, and providing more relevant context.The future is multimodal. As agents become more capable of understanding different data types, their tools must keep up. Native search across code, PDFs, and images is a significant advantage.
No matter how advanced AI agents become, their performance will always be constrained by the quality of their tools. Search is a fundamental part of the coding process, and it shouldn’t be the bottleneck.
Q&A Highlights
We ended with a few questions from the audience:
Multilingual Support: Mixedbread’s models are multilingual by default, supporting languages like Arabic, Chinese, and Korean. Search the gallery of art search demo in different languages!
Cost: Indexing is priced per token. As an example, indexing the entire React codebase (6M+ tokens) costs roughly $20. For large-scale use, the team is open to discussing discounts.
Engineering Optimizations: The impressive latency is due to deep research in quantization and a fully optimized, end-to-end pipeline. The team plans to release more blog posts detailing these engineering efforts.












Great walkthrough of mgrep. The grep vs semantic search debate misses the point: its about matching the right tool to the task. The 53% token reduction is huge but what stood out is the multi-vector approach for representing every word individually. Most people are still thinking in terms of chunk-level embeddings, which is why they hit quality cielings. This granularity could be a real game changer for codebases with dense context dependencies.