Previously “StoryMaps”

← Blog

Confidence Scoring for Code Dependencies: Why Your AI Tools Should Tell You What They Don't Know

March 13, 2026 | 5 min read | ArcLume Team

Static analysis can tell you a lot about how your code connects. Import graphs, function call chains, type hierarchies — these are all deterministic. If file A imports function B, that's a fact.

But real systems are full of connections that can't be resolved statically. A REST endpoint consumed by another service via HTTP. A Kafka topic name assembled from environment variables. A function dispatched through a strategy pattern where the concrete implementation is selected at runtime. A database query that references a table also written to by a completely different service.

Most code analysis tools handle this ambiguity by either ignoring it or pretending it doesn't exist. They show you the dependencies they can resolve and stay silent about the ones they can't. This creates a false sense of completeness — you think you're seeing the full picture, but you're only seeing the easy parts.

Confidence scoring for code dependencies is a different approach. Instead of binary "this dependency exists" or "it doesn't," every detected dependency carries a confidence score that reflects how certain the analysis is about the connection.

Why binary dependency resolution fails

Consider a typical microservices system. Service A publishes an event to a Kafka topic. Service B consumes that topic. From Service A's perspective, the relationship is: "I publish to order.completed." From Service B's perspective: "I subscribe to order.completed."

If both topic names are string literals in the code, a static analysis tool can match producer to consumer with certainty. But what if:

  • The topic name is constructed from a config variable: `${config.prefix}.order.completed`
  • The consumer uses a pattern subscription: order.*
  • The topic name is defined in a shared constants package that both services import, but with different version locks
  • The consumer is registered dynamically based on a feature flag

In each case, the dependency probably exists, but a static analysis tool can't be 100% certain. The question is: should it hide this information entirely, or should it show you the dependency with an honest assessment of its certainty?

How confidence scoring works

ArcLume assigns a confidence score to every detected dependency based on the resolution method used to identify it. The scoring model considers several factors:

Resolution method

  • Direct import (95-100%) — The dependency is resolved through a static import statement or require call. The file path or package name is a literal string.
  • Type-based resolution (85-95%) — The dependency is inferred from type annotations, interface implementations, or generic type parameters.
  • String literal matching (70-90%) — The dependency is matched by comparing string literals across codebases. Endpoint paths, topic names, and queue names that appear as literals in both producer and consumer code.
  • Pattern matching (50-75%) — The dependency is inferred from naming conventions, file structure patterns, or partial string matches. For example, a file at handlers/order-completed.handler.ts is likely a handler for the order.completed event.
  • Semantic inference (30-60%) — The dependency is suggested by embedding similarity between code chunks. The code looks like it's related, but there's no structural proof.

Contextual adjustments

The base score from the resolution method is adjusted by contextual signals:

  • Co-occurrence in documentation — If both sides of a dependency are mentioned together in README files, comments, or commit messages, confidence increases.
  • Historical co-change — If files on both sides of a dependency tend to change in the same commits or PRs, confidence increases.
  • Framework conventions — Known framework patterns (like NestJS decorator-based routing or AdonisJS controller binding) enable higher-confidence resolution even without direct imports.
  • Dynamic construction — If the dependency target is constructed dynamically (string interpolation, computed property access), confidence decreases proportionally to the amount of dynamic content.

What this looks like in practice

When ArcLume analyzes your codebase and builds the dependency graph, every edge in the graph carries its confidence score. This shows up in several places:

In the MCP server, when you ask about cross-service dependencies, results include confidence levels. Your AI assistant can factor this into its recommendations: "The notification service likely consumes this event (82% confidence based on string literal matching), but the analytics service might also consume it (45% confidence based on naming convention)."

In story generation, confidence scores affect how ArcLume presents implementation context. High-confidence dependencies appear as definite impacts: "You will need to update X." Lower-confidence dependencies appear as advisories: "Verify whether the analytics service also consumes this event."

In downstream impact mapping, the confidence score determines whether an impact is shown as confirmed or potential. This prevents both false negatives (missing a real dependency) and false positives (treating a coincidental similarity as a hard dependency).

Why this matters for AI-assisted development

AI coding tools are confident by default. When you ask Copilot or Cursor to help you change an endpoint, they'll generate the change without mentioning that three other services might break. They don't know about those services, and even if they did, they have no way to express uncertainty.

Confidence scoring introduces a crucial concept: calibrated uncertainty. Instead of either showing a dependency or hiding it, the system tells you:

  • What it's sure about
  • What it thinks is likely but can't prove
  • What it doesn't know

This is especially important for structural scoping — the practice of grounding planning decisions in the real structure of your codebase. Without confidence scoring, structural scoping gives you either a false sense of certainty (showing only what it can prove) or a noisy mess (showing every possible connection). With confidence scoring, you get an honest, nuanced view of your system's dependency landscape.

The reliability gap in dependency resolution

Most dependency resolution tools were built for package managers and build systems, where dependencies are declared explicitly. They work well for "which npm packages does this project use?" but poorly for "which services are affected if I change this API response?"

The gap between what's declared and what's real is where bugs hide. Confidence scoring doesn't close this gap entirely — some dependencies are genuinely impossible to detect statically — but it makes the gap visible. And a known unknown is far less dangerous than an unknown unknown.

ArcLume's approach to dependency resolution accuracy is to be honest about what static analysis can and cannot determine, and to surface that honesty in every tool and workflow that uses the dependency graph. Because the most dangerous thing a planning tool can do is make you think you've found everything when you haven't.


Ready to try ArcLume?

ArcLume is currently in beta. Connect your repos, build a knowledge graph, and start generating codebase-aware epics and stories.

Join the Beta