Mapping Kafka Consumers Across Repositories with AST Parsing

March 13, 2026 | 5 min read | ArcLume Team

You need to change the schema of a Kafka message. Maybe you're adding a field to the order.completed event. Maybe you're deprecating one. The first question is always the same: who consumes this topic?

If your Kafka producers and consumers live in the same repository, this is a solvable problem with grep. But in a microservices architecture — where the producer is in the order service, the consumers are scattered across notification, analytics, billing, and warehouse services, each in its own repository — grep doesn't help. You're left searching Confluence pages, asking in Slack, or hoping the schema registry has documentation someone remembered to update.

AST parsing offers a better approach: parse every repository, extract Kafka producer and consumer registrations from the syntax tree, match topics across repositories, and build a complete Kafka dependency map automatically.

The problem with text search

The naive approach to Kafka consumer tracking is searching for the topic name as a string. This breaks for several common reasons:

Topic names in variables — The topic is defined as a constant (const TOPIC = 'order.completed') and referenced by variable name, not string literal, everywhere else.
Topic name construction — Teams prefix topics with environment names: `${env}.order.completed`. The full topic name never appears as a literal in the code.
Framework abstractions — Libraries like KafkaJS, kafkajs-avro, or Spring Kafka wrap topic subscriptions in decorators, configuration objects, or builder patterns. The topic name is buried in a nested structure.
Shared constants packages — Topic names are defined in a shared npm package or Java module, then imported by each service. The string literal only appears once, in the shared package.

Text search finds the obvious cases and misses the rest. For cross-service Kafka dependencies, you need a tool that understands the code's structure, not just its text.

How AST parsing solves this

Abstract Syntax Tree parsing converts source code into a structured tree representation. Instead of searching for strings, you traverse the tree looking for specific patterns: function calls to known Kafka client methods, decorator arguments, configuration objects with topic keys.

Extracting producers

In a typical KafkaJS setup, a producer sends messages like this:

await producer.send({
  topic: 'order.completed',
  messages: [{ value: JSON.stringify(orderData) }],
})

The AST parser identifies the producer.send() call, extracts the topic property from the argument object, and resolves its value. If the value is a string literal, resolution is immediate. If it's a variable reference, the parser traces the variable to its definition — even across files through import chains.

Extracting consumers

Consumer patterns vary more widely. KafkaJS consumers subscribe to topics through consumer.subscribe():

await consumer.subscribe({
  topics: ['order.completed', 'order.cancelled'],
  fromBeginning: false,
})

NestJS applications use decorators:

@EventPattern('order.completed')
async handleOrderCompleted(data: OrderCompletedEvent) {
  // ...
}

Spring Kafka uses annotations:

@KafkaListener(topics = "order.completed", groupId = "billing")
public void handleOrderCompleted(OrderCompletedEvent event) {
  // ...
}

Each pattern requires a different AST extraction rule. The parser maintains a library of known patterns for popular Kafka client libraries and frameworks, and extracts topic names from the relevant AST nodes.

Resolving topic names across repos

Once producers and consumers are extracted from every repository, the matching step connects them. A topic name extracted from a producer in the order-service repo is matched to the same topic name in consumer registrations across all other repos.

This is where confidence scoring becomes important. A direct string literal match carries high confidence (90%+). A match where one side uses a constant and the parser successfully traced it to its value is still high confidence (85%+). A match based on partial string similarity or naming convention gets a lower score (50-70%).

Building the dependency map

The output of this analysis is a complete Kafka dependency map — a data structure that shows, for every detected topic:

Which service(s) produce to it, with file paths and function signatures
Which service(s) consume from it, with handler functions and consumer group IDs
The confidence score for each producer-consumer link
Whether the topic is defined in a shared constants package
The message schema (if defined as a TypeScript type, Avro schema, or protobuf message)

This map answers the original question instantly: "If I change the schema of order.completed, I need to update consumers in notification-service (src/handlers/order-completed.handler.ts:23), billing-service (src/kafka/consumers.ts:45), and analytics-service (src/events/order-events.ts:12). The warehouse-service might also consume it (62% confidence based on a pattern match)."

Beyond Kafka: the general pattern

The same AST-based approach works for any cross-service communication mechanism:

REST endpoints — Extract route definitions from controllers and match them to HTTP client calls in consumer services
BullMQ / SQS / RabbitMQ queues — Extract queue names from producer and consumer registrations
gRPC services — Parse proto files and match service definitions to client stubs
WebSocket events — Extract event names from emit and on/subscribe calls
GraphQL subscriptions — Parse schema definitions and match to subscription handlers

ArcLume's indexing pipeline applies this pattern across all detected communication mechanisms, building a unified interface map that covers every way your services talk to each other. This is what powers the get_interfaces tool in the MCP server — a single query that shows you all producers and consumers for any interface type, across all your repositories.

Practical applications

Once you have a cross-repo Kafka dependency map, several workflows become trivial:

Schema evolution — Before changing a message schema, see every consumer that needs updating. No more broken consumers discovered in production.
Dead topic detection — Find topics that have producers but no consumers (wasted compute) or consumers but no producers (dead code).
Impact scoping — When structurally scoping a feature that involves event-driven flows, automatically include all affected consumers in the scope.
Onboarding — New engineers can see the complete event flow for any domain without reading documentation that may be out of date.
Migration planning — When migrating from one message broker to another, the dependency map tells you exactly which services need updating and in what order.

Try it on your repositories

ArcLume's indexing pipeline automatically detects and maps Kafka dependencies (along with REST, BullMQ, gRPC, and WebSocket interfaces) when you connect your repositories. No configuration required — the AST parsers identify the patterns automatically.

Connect your repositories and run get_interfaces through the MCP server to see your Kafka dependency map. You might be surprised by what you find.

Ready to try ArcLume?

ArcLume is currently in beta. Connect your repos, build a knowledge graph, and start generating codebase-aware epics and stories.

Join the Beta