Agentic Code Search with GitHits

Ligthning Talk with Olli-Pekka Heinisuo and Jaakko Timonen

Sep 16, 2025

At a recent Elite AI-assisted Coding session, Olli-Pekka Heinisuo and Jaakko Timonen from Softlandia introduced their new product, GitHits, a tool designed to tackle one of the most persistent frustrations in software development: finding accurate, up-to-date answers to complex coding problems. The talk detailed the motivation behind GitHits, demonstrated its current capabilities, and outlined a vision for its future as a specialized search engine for engineers.

The Developer's Wall

Olli-Pekka began by describing a scenario familiar to every developer. You are deep in a project, you encounter an unexpected error or an obscure API, and you hit a wall.

You're building something, you hit the wall, and you can't find an answer anywhere. Documentation is often out of date. AIs, while good at generating code, cannot cover every edge case you can encounter. So you have to jump out of your IDE, go to Google, go to Stack Overflow, or any other tool you can imagine, and start searching for answers that might solve your problem.

He argued that the answers to these problems often exist, but they are buried within the vast, unstructured knowledge base of open-source software on platforms like GitHub — hidden in pull requests, issue threads, discussions, or example code that has not yet made its way into official documentation.

The idea for GitHits was born from a real-world example of this problem. A colleague was struggling to initialize a class from Microsoft's Azure Speech SDK. The necessary information was not in the published documentation, but it was available within Microsoft's public GitHub repository.

The only way to find the description of how that thing worked was to go to GitHub, search it with specific filters, and then find the answers from the search results. Most people don't end up in GitHub search normally, but it's a very powerful tool.

This experience crystallized the core mission for GitHits: to build a tool that systematically and intelligently taps into the fresh, living knowledge base of open-source repositories to provide developers with precise, actionable answers.

A Demo of GitHits in Action

Olli-Pekka then demonstrated the tool. The user interface is a clean, chat-like window where a developer can select a programming language (currently Python, TypeScript, and JavaScript are supported) and describe their problem. This can be a general question, a specific error message, or a few keywords.

Behind the scenes, a multi-agent system orchestrates the search and synthesis process:

Triage Agent: First, an agent determines if the user's query requires a search on GitHub or if it is a general question that can be answered directly.
Search Agent: If a search is needed, this agent runs a series of targeted queries against the GitHub API. It searches not only source code but also issues and discussions to find the most relevant information.
Reranking Agent: The search results are then passed to a reranking agent, which scores and combines the findings to identify the most promising sources. The ranking considers factors like repository stars and license permissiveness.
Example Agent: Finally, an agent generates a complete, ready-to-use code example based on the top-ranked sources, along with a natural language summary explaining the solution.

A key feature of the UI is its transparency. Alongside the generated answer, GitHits displays the list of sources it used. Users can hover over a source to see why it was selected and click a link to jump directly to the relevant file and line number on GitHub. This allows developers to verify the information and gain deeper context.

The Architecture: Semantic Search on a Keyword Foundation

During the Q&A, Olli-Pekka elaborated on the technical architecture. He explained that while the underlying GitHub search API is primarily keyword-based, GitHits layers semantic understanding on top of it.

Under the hood, the GitHub search is mostly a keyword search; it doesn't have any semantic capabilities. The semantic capability comes from the LLMs in this tool. The LLMs decide what's relevant and what's not based on the user input.

The team made a conscious decision to query the GitHub API directly rather than building and maintaining their own index of open-source code. While tools like Sourcegraph do create their own indexes, this approach introduces latency.

The only way to get the fresh answers is to search GitHub directly... an indexing process, depending on how you do it, might take 24 hours for one repository. By the time the indexing is done, it's already outdated.

This direct-search approach ensures that GitHits always provides answers based on the absolute latest code, issues, and discussions.

Extending, Not Replacing, AI Assistants

A crucial part of the GitHits strategy is to augment, rather than compete with, existing AI coding assistants like GitHub Copilot or Cursor. Olli-Pekka revealed that they are actively developing an MCP server. This will allow other AI agents to use GitHits as a specialized tool. When an agent like Copilot hits a roadblock because its training data is outdated, it could be prompted to query the GitHits API to find a solution based on real-time information.

This positions GitHits as a foundational RAG (Retrieval-Augmented Generation) layer for the broader AI development ecosystem. While models like GPT-4 have web search capabilities, they typically query general search engines like Google or Bing. These engines do not deeply index the source code, issues, and discussions where critical technical solutions are often found. GitHits fills this gap with its focused, high-precision search. As one audience member noted, this precision is the core value.

The Vision: A Search Engine for Software Engineers

Looking ahead, Jaakko described the "maximalist version" of the product.

We had the brand GitHits, and I just put "search engine for software engineers" [on a hoodie]. And then when I started interviewing developers, I noticed that people started referring to us as a code search. That's what we are, probably. Initially, a search engine for software engineers is the ultimate game.

This vision involves expanding beyond public GitHub repositories to include private repos, other platforms like GitLab and Bitbucket, and supplementary knowledge sources such as Reddit and Stack Overflow. The ultimate goal is to build the definitive, most up-to-date data source for software development problems — a resource that could eventually be used by other AI tools for both retrieval and model training.

The team is currently in a private beta, relentlessly collecting feedback to refine the product and its messaging. While the business model is not yet finalized, the goal is to keep the tool highly accessible, possibly with a freemium offering and a low-cost subscription. For developers tired of hitting a wall and spending hours searching for answers, GitHits presents a promising new path forward.

Elite AI Assisted Coding

Discussion about this post

Ready for more?