How indexing works
When you connect a GitHub repository, ArcLume runs a multi-stage indexing pipeline:
- Clone — fetches your repository at the latest commit
- Parse — extracts code symbols (functions, classes, methods, modules) across all supported languages
- Chunk — splits code into semantic chunks with metadata (file path, symbol name, signature, line numbers, imports, exports)
- Embed — generates vector embeddings for each chunk using Voyage Code-3, stored in PostgreSQL with pgvector
- Detect interfaces — identifies API endpoints, message queue producers/consumers, database access patterns, and other interface points using pattern matching
What gets indexed
The indexer processes all source files in your repository and extracts:
- Code chunks — functions, classes, methods, interfaces, type definitions, and more
- File metadata — language, file path, imports, exports
- Symbol relationships — which files import which, call graphs, class hierarchies
- Interface points — REST routes, Kafka topics, BullMQ queues, gRPC services, WebSocket endpoints, shared database access
Supported languages
Code parsing supports a wide range of languages including TypeScript, JavaScript, Python, Go, Java, Rust, Ruby, C#, and more. Embedding and semantic search work with any language that Voyage Code-3 supports.
Exclusions
By default, common directories are excluded from indexing (e.g. node_modules, dist, .git, vendor directories). You can customize exclusions per repository from the repo settings page:
- Go to Settings → Repos
- Click on a repository
- Under Exclusions, add or remove path patterns
- Re-index to apply changes
Re-indexing
Indexing runs automatically when a repository is first connected. You can manually trigger a re-index at any time from the repo settings page. A full re-index clears the previous index and rebuilds from scratch. ArcLume also listens for GitHub push webhooks to re-index on new commits.
Monitoring progress
The repo settings page shows real-time indexing status with phases:
- Queued → Cloning → Parsing → Embedding → Detecting interfaces → Complete
You can cancel an in-progress index at any time.
Ready to map your codebase?
ArcLume builds a knowledge graph of your code and generates production-ready epics, stories, and implementation code.
Get Started Free