Why Similarity Search Fails for Engineering Planning
Every major AI coding tool uses the same approach to find relevant context: embed the code into vectors, embed the query, and return the nearest neighbors. This is similarity search, and it's the backbone of how Copilot, Cursor, and most RAG-based coding assistants decide what code to show the AI model.
For many tasks, this works well enough. If you're writing a new function and want to see how similar functions are written in your codebase, similarity search finds good examples. If you're looking for how error handling works in your project, it surfaces relevant patterns.
But similarity search has a fundamental limitation that becomes critical for engineering planning: it finds code that looks related, not code that is structurally connected. And for planning — scoping features, estimating impact, understanding what breaks when something changes — structural connection is what matters.
The similarity trap
Consider this scenario. You're planning a change to the user authentication flow. You ask your AI assistant to find relevant code, and it uses similarity search to pull context.
The results include:
- The
AuthControllerin the API gateway — relevant - A
testAuthHelperutility in the test suite — looks similar but not relevant to the change - An
AuthProvidercomponent from the frontend — uses similar words but is a different layer entirely - A comment block in an old migration that mentions "authentication changes" — pure noise
What similarity search missed:
- The
SessionMiddlewarethat depends on the auth controller's session format — structurally connected but doesn't use the word "auth" - The
WebhookValidatorin a different repository that validates request signatures using the same auth token format — cross-repo dependency invisible to single-repo search - The
RateLimiterthat keys on the authenticated user ID — a downstream dependency that would break if the user ID format changes
The similarity search results look plausible. They contain code related to authentication. But they miss the dependencies that actually matter for scoping the change — and include noise that doesn't.
Similarity search vs structural resolution
The distinction is fundamental:
- Similarity search answers: "What code talks about similar things?"
- Structural resolution answers: "What code is connected to this specific code?"
Structural resolution uses the dependency graph — imports, function calls, type references, interface implementations, cross-service API calls, event subscriptions — to trace actual connections. It doesn't care whether the code uses similar words. It cares whether the code is linked through the system's architecture.
Where similarity search works
Similarity search excels at discovery tasks:
- Finding examples of a pattern: "Show me how retry logic is implemented"
- Locating features: "Where is the payment processing code?"
- Learning conventions: "How does this codebase handle validation?"
These are "find me something like X" queries, and embedding-based search handles them well. The results don't need to be structurally connected to anything — they just need to be relevant examples.
Where similarity search fails
Similarity search breaks down for impact analysis and scope validation:
- "What breaks if I change this function's return type?" — Requires tracing callers, not finding similar functions
- "Which services consume this API endpoint?" — Requires cross-repo interface mapping, not text similarity
- "What's the full scope of adding a field to this model?" — Requires traversing serializers, validators, tests, and API consumers
- "Is this change backward-compatible?" — Requires understanding the contract between producer and consumer, not pattern matching
For these questions, similarity search doesn't just underperform — it actively misleads. It returns code that looks relevant, which the AI model then uses as context for its response. The result is a plausible-sounding but structurally incorrect analysis.
Why Copilot gets implementation wrong
This is the root cause of a frustration every engineer has experienced: your AI coding tool generates code that compiles, passes the linter, and looks correct — but doesn't integrate properly with the rest of the system. It uses the wrong import path, calls a deprecated version of a function, or follows a pattern from a different part of the codebase that doesn't apply here.
The AI isn't stupid. It's working with the wrong context. Similarity search gave it code that looks like what it needs, not code that's actually connected to where it's working. The AI faithfully follows the patterns in its context window, but those patterns are from the wrong neighborhood.
This is why adding more tokens to the context window doesn't fix the problem. More context from similarity search means more noise, not more signal. The issue isn't the quantity of context — it's the relevance model. Text similarity is the wrong relevance model for implementation tasks.
Structural resolution as the foundation
The alternative is to build the context window using structural resolution. Instead of "find code that talks about similar things," you start from the specific code being changed and traverse outward through the dependency graph:
- Start at the change point — The specific function, class, or module being modified
- Trace direct dependents — What imports this? What calls this? What extends this?
- Cross service boundaries — What other services consume the interface this code participates in?
- Apply confidence scoring — Not all connections are equally certain. Score each dependency by its resolution method.
- Build the context window from the graph — Include code that is structurally connected, ordered by proximity in the dependency graph
This produces a context window where every piece of code is there for a reason: it's connected to the change being made. The AI model generates code that fits because its context is structurally relevant, not just textually similar.
The hybrid approach
The best results come from combining both approaches. Similarity search is good for discovery — finding the starting point, understanding patterns, exploring unfamiliar code. Structural resolution is good for precision — tracing impact, validating scope, building implementation context.
ArcLume uses this hybrid approach in its planning pipeline. When generating stories from a transcript or brief, the system uses semantic search to identify relevant areas of the codebase, then switches to structural resolution to trace dependencies, validate scope, and build precise implementation context.
Through the MCP server, both approaches are available to your AI coding tools. search_code provides semantic search for discovery. get_interfaces and the dependency metadata on search results provide structural context for precision. Your AI assistant can use both, choosing the right approach for each question.
Moving beyond similarity
Similarity search was a reasonable first step for AI coding tools. When the alternative was no context at all, finding code that looks related was a massive improvement. But as teams use AI for more complex tasks — planning features, scoping changes, understanding cross-service impact — the limitations of text similarity become the bottleneck.
The next generation of AI coding tools will need to understand code the way engineers do: not as text to be matched, but as a graph to be traversed. Structural scoping builds on this foundation, using the structural graph to ground every planning decision in the real architecture of your system.
The question isn't whether your code is similar. The question is whether your code is connected.
Ready to try ArcLume?
ArcLume is currently in beta. Connect your repos, build a knowledge graph, and start generating codebase-aware epics and stories.
Join the Beta