Graphify + code-review-graph: Build a Self-Updating Knowledge Graph for Claude Code and other AI ...
Every developer working with LLMs on a large codebase eventually hits the same wall: context windows are finite, but codebases are not.
You start a new AI coding session, ask about the payment flow — and your agent starts re-reading dozens of files just to get oriented. Twenty thousand tokens evaporated before a single line of code is written. Multiply that by every session, every team member, every day.
Two open-source tools solve this in different but complementary ways:
- Graphify — converts your folder into a queryable knowledge graph with community detection, Obsidian-compatible reports, and cross-file traversal
- code-review-graph (CRG) — builds a SQLite-backed AST graph with blast-radius analysis, embedding-based semantic search, ~25 MCP tools (allow-listable to a working set of 8), and sub-second incremental updates
This guide walks through installing both tools, connecting them to any AI coding agent — Claude Code, Cursor, Gemini CLI, Windsurf, GitHub Copilot, and more — wiring auto-updates for code edited by humans, git commits, or the agent itself, and pairing everything with an Obsidian vault as a persistent memory layer.
Pick one, both, or none — every section degrades gracefully. Each tool works standalone; the smart-grep-hook, session-start cheatsheet, and CLAUDE.md routing rules all detect what is present and adapt. Use just graphify if you want a pure CLI / zero-MCP setup. Use just CRG if you want embedding-aware semantic search and PR-grade impact tools. Use both for the full stack (CRG primary, graphify on miss). If neither is installed, the agent silently falls back to grep — no broken hooks, no errors.
All commands in this guide were tested on Ubuntu and macOS across multiple real pnpm monorepos of varying sizes.
Real Numbers from Two Test Projects
Before diving in, here's what both tools produced across two real codebases — one a full-stack TypeScript monorepo with 5 packages, the other a lighter frontend-only repo:
| Metric | Graphify (AST-only) | code-review-graph |
|---|---|---|
| Files indexed (large) | 1,020 | 1,052 |
| Nodes (large) | 3,815 | 5,780 |
| Edges (large) | 4,830 | 30,611 |
| Files indexed (small) | 702 | 711 |
| Nodes (small) | 2,035 | 2,773 |
| Edges (small) | 2,357 | 15,037 |
| Communities | 750 / 499 | 28 wiki pages each |
| Incremental update | ~10s (8 workers) | 0.425s |
| LLM tokens used | 0 | 0 |
| Storage |
graphify-out/ (JSON) |
.code-review-graph/ (SQLite) |
How Each Tool Works
Graphify — Two-Pass Graph with Communities
Your Code
│
▼
Pass 1: Tree-sitter AST ← 0 tokens, 25 languages
(classes, functions, imports, call graphs)
│
▼
Pass 2: AI Extraction ← only for PDFs, images, markdown (optional)
(semantic relationships via Claude subagents)
│
▼
NetworkX Graph + Leiden Clustering
│
├── graphify-out/graph.json (queryable)
├── graphify-out/GRAPH_REPORT.md (750 communities, Obsidian links)
├── graphify-out/graph.html (interactive visual)
└── graphify-out/cache/ (SHA256 per file)
Each edge has a confidence tag:
| Tag | Source | Confidence |
|---|---|---|
EXTRACTED |
Directly in AST | 1.0 |
INFERRED |
Reasonable deduction | 0.7–0.9 |
AMBIGUOUS |
Needs review | <0.7 |
On the large monorepo: 87% EXTRACTED · 13% INFERRED · 0% AMBIGUOUS
code-review-graph — Blast-Radius Graph with MCP
Your Code (git-tracked files)
│
▼
Tree-sitter AST (23 languages, 0 tokens)
│
▼
SQLite (.code-review-graph/graph.db)
│
├── Nodes: functions, classes, files
├── Edges: imports, calls, inheritance
└── Full-text search index (FTS5)
│
▼
~25 MCP tools available (allow-list to ~8 via CRG_TOOLS env — see "Strip Unused CRG Tools")
(semantic_search_nodes, query_graph, get_impact_radius, list_communities, ...)
Installation
Install one or both — the rest of the guide shows where each section applies.
Ubuntu
# Pick what you need:
pip install graphifyy # graphify CLI (note: two y's on PyPI)
pip install code-review-graph # CRG (CLI + MCP server, includes embeddings extra)
# Verify (only the lines for installed tools):
graphify --help | head -5
code-review-graph --version
macOS
# Via uv (fastest):
uv tool install graphifyy # graphify only
uv tool install code-review-graph # CRG only — or run both lines for both
# Or via pipx:
brew install pipx
pipx install graphifyy
pipx install code-review-graph
PyPI quirk: The package is
graphifyy(two y's). The CLI command after install isgraphify(one y).
Step 1: Create Ignore Files
Before building any graph, exclude noise from indexing. Place these at your project root.
.graphifyignore
node_modules/
dist/
build/
.pnpm-store/
coverage/
*.min.js
*.min.css
*.map
pnpm-lock.yaml
yarn.lock
*.lock
*.log
.env*
graphify-out/
.code-review-graph/
*.example.*
.code-review-graphignore (same content)
node_modules/
dist/
build/
.pnpm-store/
coverage/
*.min.js
*.min.css
*.map
pnpm-lock.yaml
yarn.lock
*.lock
*.log
.env*
graphify-out/
.code-review-graph/
*.example.*
Step 2: Build the Graphs Manually
code-review-graph
cd /path/to/your-project
# Full build (first time) — parses all files
code-review-graph build
# Output (large monorepo):
# Full build: 1052 files, 5780 nodes, 30611 edges (postprocess=full)
# Output (smaller frontend repo):
# Full build: 711 files, 2773 nodes, 15037 edges (postprocess=full)
Graphify (AST-only, no LLM cost)
cd /path/to/your-project
# AST-only update (no API key required)
graphify update .
# Output (large monorepo):
# Rebuilt: 3815 nodes, 4830 edges, 750 communities
# graph.json, graph.html and GRAPH_REPORT.md updated in graphify-out
# Output (smaller repo):
# Rebuilt: 2035 nodes, 2357 edges, 499 communities
For the richer semantic graph (PDFs, images, markdown — uses LLM):
# Full extraction with Claude subagents (requires ANTHROPIC_API_KEY)
graphify extract .
Step 3: Register with Your AI Agent
code-review-graph (auto-configures 5 platforms)
code-review-graph install
This single command:
- Writes
.mcp.json(Claude Code MCP server config) - Writes
.cursor/mcp.json,.opencode.json, Zed settings,.cursorrules,GEMINI.md,AGENTS.md - Creates
.claude/skills/for Claude Code tool integration - Installs hooks in
.claude/settings.json(move these out — see below) - Installs a git pre-commit hook
- Updates
.gitignoreto exclude.code-review-graph/
Post-install housekeeping: code-review-graph install is aggressive — it writes configs for every AI IDE it knows about. Most teams only use one. Add the noise to .gitignore:
# .gitignore additions
AGENTS.md
GEMINI.md
.mcp.json # keep only .mcp.example.json
.cursorrules
.windsurfrules
.opencode.json
.kiro/
Three-file hook pattern: settings.json stays clean — permissions only, no hooks. Hooks go into example/local:
-
.claude/settings.example.json(committed) — documents the full hook structure for teammates. Contains all graph hooks (Stop, SessionStart, PreToolUse). Copy from here to activate. -
.claude/settings.local.json(gitignored) — your actual personal hooks that fire at runtime. Copy fromsettings.example.jsonon first setup, then customize with env vars or secrets. You can also copy individual hooks into your global~/.claude/settings.jsonif you want them active across all projects.
// .claude/settings.example.json — committed reference, shows hook structure
// Copy this file to .claude/settings.local.json and customize
{
"hooks": {
"PostToolUse": [],
"Stop": [
{
"_comment": "CRG auto-update after AI finishes turn — once per turn, PID-guarded.",
"hooks": [{
"type": "command",
"command": "command -v code-review-graph >/dev/null 2>&1 && [ -d .code-review-graph ] && { PF=/tmp/crg-claude.pid; if [ -f \"$PF\" ] && kill -0 \"$(cat \"$PF\")\" 2>/dev/null; then true; else { code-review-graph update --skip-flows 2>/dev/null && nohup code-review-graph embed >/dev/null 2>&1 & } & echo $! > \"$PF\"; fi; } || true",
"timeout": 5
}]
}
],
"SessionStart": [
{
"_comment": "CRG graph status on session open",
"hooks": [{
"type": "command",
"command": "command -v code-review-graph >/dev/null 2>&1 && code-review-graph status 2>/dev/null || true",
"timeout": 10
}]
},
{
"_comment": "Setup nudge — prompt to initialize if CLI installed but no graph built yet",
"hooks": [{
"type": "command",
"command": "if command -v code-review-graph >/dev/null 2>&1 && git rev-parse --is-inside-work-tree >/dev/null 2>&1 && [ ! -d .code-review-graph ]; then printf '{\"systemMessage\":\"Graph tool installed but not yet initialized. Ask me to set up: code-review-graph (code-review-graph install)\"}'; fi || true",
"timeout": 10
}]
},
{
"_comment": "Graph query cheatsheet — injected once per session (~150 tokens)",
"hooks": [{
"type": "command",
"command": "if [ -f .code-review-graph/graph.db ] || [ -f graphify-out/graph.json ]; then STATS=\"\"; TOOL_LINES=\"\"; if [ -f .code-review-graph/graph.db ]; then STATS=$(python3 -c \"import sqlite3; c=sqlite3.connect('.code-review-graph/graph.db'); n=c.execute('SELECT COUNT(*) FROM nodes').fetchone()[0]; e=c.execute('SELECT COUNT(*) FROM edges').fetchone()[0]; print(f'{n} nodes, {e} edges'); c.close()\" 2>/dev/null || echo \"\"); TOOL_LINES=\" where is X defined → semantic_search_nodes_tool(query=X)\\n who calls X → query_graph_tool(pattern=callers_of, target=X)\\n pre-refactor blast → get_impact_radius_tool(changed_files=[...])\\n community/cluster → list_communities_tool()\\n code review context → get_review_context_tool(changed_files=[...])\"; fi; if [ -f graphify-out/graph.json ]; then GFY_STATS=$(python3 -c \"import json; g=json.load(open('graphify-out/graph.json')); nodes=g.get('nodes',[]); comms=len(set(n.get('community','') for n in nodes if n.get('community',''))); print(f'{len(nodes)} nodes, {comms} communities')\" 2>/dev/null || echo \"\"); [ -n \"$GFY_STATS\" ] && STATS=\"${STATS:+$STATS | }graphify: $GFY_STATS\"; TOOL_LINES=\"${TOOL_LINES:+$TOOL_LINES\\n} CRG miss / explore → graphify query '<term>' --graph graphify-out/graph.json\\n path A→B → graphify path '<from>' '<to>' --graph graphify-out/graph.json\"; fi; printf '{\"hookSpecificOutput\":{\"hookEventName\":\"SessionStart\",\"additionalContext\":\"GRAPH QUERY CHEATSHEET (%s) — use BEFORE Read/Grep/Bash-find on code:\\n%s\\nSkip graph for: .md .json .yml .log .jsonl configs cross-repo paths.\\nOverride grep gate: append --graph-tried to any Bash command.\"}}' \"$STATS\" \"$TOOL_LINES\"; fi",
"timeout": 5
}]
}
]
}
}
# .claude/settings.local.json — gitignored, actual runtime hooks
# Copy from settings.example.json on first setup:
cp .claude/settings.example.json .claude/settings.local.json
# Then add any personal env vars or secrets
Why Stop, not PostToolUse?
PostToolUsefires after every individual file edit — if Claude makes 10 edits in one response, it fires 10 times. Thepgrepguard that prevents double-spawning has a race window of a few milliseconds, which is not enough to block concurrent spawns reliably. The result: multiple CRG Python processes pile up, load average hits 12+, RAM saturates, and the machine slows to a crawl.Stopfires once when the AI finishes its entire turn — all edits batched, single update triggered. The PID-file guard then prevents overlap across turns (skip if the previous turn's update is still running). Zero pile-up by design.Why graphify update is NOT in any Claude hook:
graphify updatetakes ~10s+ on a real monorepo (even with SHA256 caching). Running it inStoporPostToolUsewould block the session or cause the process to hang in the background, accumulating across turns. It belongs only in git hooks (post-commit,post-checkout) where it runs as a fully detachednohupprocess after the commit returns — the developer has already moved on, so the latency is invisible.
# .claude/settings.local.json — gitignored, actual runtime hooks (copy from example)
The .mcp.json it creates (keep as .mcp.example.json only):
{
"mcpServers": {
"code-review-graph": {
"command": "uvx",
"args": ["code-review-graph", "serve"],
"type": "stdio"
}
}
}
Graphify (AI Agent Integration)
# Adds graphify section to CLAUDE.md + PreToolUse hook (Claude Code)
graphify claude install
Graphify's claude install adds a PreToolUse hook that intercepts grep, rg, find commands and redirects the agent to graphify query instead — turning search interception into graph navigation. For other agents: code-review-graph install already writes equivalent config to GEMINI.md, AGENTS.md, .cursorrules, and Zed settings.
Note:
graphify claude installwrites thePreToolUsehook into.claude/settings.json. Move it to.claude/settings.example.json(committed as a reference) and copy to.claude/settings.local.jsonto activate it locally — not every teammate will have graphify installed, so it shouldn't fire automatically for everyone.
Auto-Update on Commit (the full picture)
Three independent triggers keep the graph fresh — install all of them so no edit path leaves it stale:
| Trigger | Hook location | Tool | Notes |
|---|---|---|---|
| Claude finishes a turn |
settings.example.json → settings.local.json Stop hook |
CRG only | ~0.425s — fast enough to run after every AI turn; PID-guarded |
| Any commit |
.git/hooks/post-commit (or .husky/post-commit) |
graphify + CRG | terminal commits, IDE commits, other AI tools; graphify runs as background nohup
|
| Branch switch |
.git/hooks/post-checkout (or .husky/post-checkout) |
graphify + CRG | smart: ≤5 files diff → incremental update; >5 files or new branch → full rebuild; background nohup
|
graphify is NOT in the Claude Stop hook.
graphify updatetakes ~10s+ on large monorepos — too slow for an AI turn hook. It would pile up, hang in the background, and saturate CPU/RAM (observed: 3 stuck processes at 65–73% CPU each). Use git hooks instead — the developer has already moved on by the time graphify finishes.
Graphify ships its own installer for the git side:
graphify hook install # writes post-commit + post-checkout
Code-review-graph does not. Its install command writes a pre-commit hook that runs detect-changes --brief (a status warning before the commit lands) but no post-commit hook to update the SQLite graph after. Without one, .code-review-graph/graph.db only refreshes when Claude touches a file — every terminal/IDE/other-tool commit drifts.
Add the post-commit update yourself. Both forms are detached so git commit returns immediately.
Resource guard required. Graph rebuilds are CPU-intensive. Without guards, multiple rebuilds can pile up (observed: 3 concurrent graphify processes at 65–73% CPU each, load average 12+, RAM saturated). Every hook below includes a
_resources_okcheck (CPU ≤ 50% of cores, memory ≥ 2 GB free) and apgrepprocess deduplication guard. If either check fails, the rebuild silently skips — the next commit retriggers a fresh attempt.
Plain git project — append to .git/hooks/post-commit:
#!/bin/sh
# Knowledge graph tools are optional — each section silently skips if tools are absent.
# Returns 0 (ok) when CPU load and free memory are within acceptable limits.
# CPU: 1-min load average must be <= 50% of logical core count (Linux/macOS)
# or aggregate load percentage <= 50% (Windows).
# Memory: effectively available memory must be >= 2048 MB on all platforms.
_resources_ok() {
_os=$(uname -s 2>/dev/null)
case "$_os" in
Linux)
_nproc=$(nproc 2>/dev/null || grep -c '^processor' /proc/cpuinfo 2>/dev/null || echo 1)
_cpu_ok=$(awk -v n="$_nproc" 'NR==1 { print ($1 / n <= 0.50) ? "1" : "0" }' /proc/loadavg 2>/dev/null)
if [ "${_cpu_ok:-1}" != "1" ]; then
echo "[graph hook] Skipping rebuild — CPU load above 50% threshold"
return 1
fi
_mem_ok=$(awk '/^MemAvailable:/ { print ($2 / 1024 >= 2048) ? "1" : "0" }' /proc/meminfo 2>/dev/null)
if [ "${_mem_ok:-1}" != "1" ]; then
echo "[graph hook] Skipping rebuild — available memory below 2 GB threshold"
return 1
fi
;;
Darwin)
_nproc=$(sysctl -n hw.logicalcpu 2>/dev/null || echo 1)
_cpu_ok=$(sysctl -n vm.loadavg 2>/dev/null | awk -v n="$_nproc"
