Previously “StoryMaps”

Codebase Indexing

Last updated: March 11, 2026

How indexing works

When you connect a GitHub repository, ArcLume runs a multi-stage indexing pipeline:

  1. Clone — fetches your repository at the latest commit
  2. Parse — extracts code symbols (functions, classes, methods, modules) across all supported languages
  3. Chunk — splits code into semantic chunks with metadata (file path, symbol name, signature, line numbers, imports, exports)
  4. Embed — generates vector embeddings for each chunk using Voyage Code-3, stored in PostgreSQL with pgvector
  5. Detect interfaces — identifies API endpoints, message queue producers/consumers, database access patterns, and other interface points using pattern matching

What gets indexed

The indexer processes all source files in your repository and extracts:

  • Code chunks — functions, classes, methods, interfaces, type definitions, and more
  • File metadata — language, file path, imports, exports
  • Symbol relationships — which files import which, call graphs, class hierarchies
  • Interface points — REST routes, Kafka topics, BullMQ queues, gRPC services, WebSocket endpoints, shared database access

Supported languages

Code parsing supports a wide range of languages including TypeScript, JavaScript, Python, Go, Java, Rust, Ruby, C#, and more. Embedding and semantic search work with any language that Voyage Code-3 supports.

Exclusions

By default, common directories are excluded from indexing (e.g. node_modules, dist, .git, vendor directories). You can customize exclusions per repository from the repo settings page:

  1. Go to Settings → Repos
  2. Click on a repository
  3. Under Exclusions, add or remove path patterns
  4. Re-index to apply changes

Re-indexing

Indexing runs automatically when a repository is first connected. You can manually trigger a re-index at any time from the repo settings page. A full re-index clears the previous index and rebuilds from scratch. ArcLume also listens for GitHub push webhooks to re-index on new commits.

Monitoring progress

The repo settings page shows real-time indexing status with phases:

  • Queued → Cloning → Parsing → Embedding → Detecting interfaces → Complete

You can cancel an in-progress index at any time.


Ready to map your codebase?

ArcLume builds a knowledge graph of your code and generates production-ready epics, stories, and implementation code.

Get Started Free