Graphify + code-review-graph: Build a Self-Updating Knowledge Graph for Claude Code and other AI ...

Every developer working with LLMs on a large codebase eventually hits the same wall: context windows are finite, but codebases are not.

You start a new AI coding session, ask about the payment flow — and your agent starts re-reading dozens of files just to get oriented. Twenty thousand tokens evaporated before a single line of code is written. Multiply that by every session, every team member, every day.

Two open-source tools solve this in different but complementary ways:

  • Graphify — converts your folder into a queryable knowledge graph with community detection, Obsidian-compatible reports, and cross-file traversal
  • code-review-graph (CRG) — builds a SQLite-backed AST graph with blast-radius analysis, embedding-based semantic search, ~25 MCP tools (allow-listable to a working set of 8), and sub-second incremental updates

This guide walks through installing both tools, connecting them to any AI coding agent — Claude Code, Cursor, Gemini CLI, Windsurf, GitHub Copilot, and more — wiring auto-updates for code edited by humans, git commits, or the agent itself, and pairing everything with an Obsidian vault as a persistent memory layer.

Pick one, both, or none — every section degrades gracefully. Each tool works standalone; the smart-grep-hook, session-start cheatsheet, and CLAUDE.md routing rules all detect what is present and adapt. Use just graphify if you want a pure CLI / zero-MCP setup. Use just CRG if you want embedding-aware semantic search and PR-grade impact tools. Use both for the full stack (CRG primary, graphify on miss). If neither is installed, the agent silently falls back to grep — no broken hooks, no errors.

All commands in this guide were tested on Ubuntu and macOS across multiple real pnpm monorepos of varying sizes.


Real Numbers from Two Test Projects

Before diving in, here's what both tools produced across two real codebases — one a full-stack TypeScript monorepo with 5 packages, the other a lighter frontend-only repo:

Metric Graphify (AST-only) code-review-graph
Files indexed (large) 1,020 1,052
Nodes (large) 3,815 5,780
Edges (large) 4,830 30,611
Files indexed (small) 702 711
Nodes (small) 2,035 2,773
Edges (small) 2,357 15,037
Communities 750 / 499 28 wiki pages each
Incremental update ~10s (8 workers) 0.425s
LLM tokens used 0 0
Storage graphify-out/ (JSON) .code-review-graph/ (SQLite)

How Each Tool Works

Pipeline Architecture: Graphify vs code-review-graph

Graphify — Two-Pass Graph with Communities

Your Code
    │
    ▼
Pass 1: Tree-sitter AST   ← 0 tokens, 25 languages
(classes, functions, imports, call graphs)
    │
    ▼
Pass 2: AI Extraction     ← only for PDFs, images, markdown (optional)
(semantic relationships via Claude subagents)
    │
    ▼
NetworkX Graph + Leiden Clustering
    │
    ├── graphify-out/graph.json       (queryable)
    ├── graphify-out/GRAPH_REPORT.md  (750 communities, Obsidian links)
    ├── graphify-out/graph.html       (interactive visual)
    └── graphify-out/cache/           (SHA256 per file)
Enter fullscreen mode Exit fullscreen mode

Each edge has a confidence tag:

Tag Source Confidence
EXTRACTED Directly in AST 1.0
INFERRED Reasonable deduction 0.7–0.9
AMBIGUOUS Needs review <0.7

On the large monorepo: 87% EXTRACTED · 13% INFERRED · 0% AMBIGUOUS

code-review-graph — Blast-Radius Graph with MCP

Your Code (git-tracked files)
    │
    ▼
Tree-sitter AST (23 languages, 0 tokens)
    │
    ▼
SQLite (.code-review-graph/graph.db)
    │
    ├── Nodes: functions, classes, files
    ├── Edges: imports, calls, inheritance
    └── Full-text search index (FTS5)
    │
    ▼
~25 MCP tools available (allow-list to ~8 via CRG_TOOLS env — see "Strip Unused CRG Tools")
(semantic_search_nodes, query_graph, get_impact_radius, list_communities, ...)
Enter fullscreen mode Exit fullscreen mode

Installation

Install one or both — the rest of the guide shows where each section applies.

Ubuntu

# Pick what you need:
pip install graphifyy         # graphify CLI (note: two y's on PyPI)
pip install code-review-graph # CRG (CLI + MCP server, includes embeddings extra)

# Verify (only the lines for installed tools):
graphify --help | head -5
code-review-graph --version
Enter fullscreen mode Exit fullscreen mode

macOS

# Via uv (fastest):
uv tool install graphifyy           # graphify only
uv tool install code-review-graph   # CRG only — or run both lines for both

# Or via pipx:
brew install pipx
pipx install graphifyy
pipx install code-review-graph
Enter fullscreen mode Exit fullscreen mode

PyPI quirk: The package is graphifyy (two y's). The CLI command after install is graphify (one y).


Step 1: Create Ignore Files

Before building any graph, exclude noise from indexing. Place these at your project root.

.graphifyignore

node_modules/
dist/
build/
.pnpm-store/
coverage/
*.min.js
*.min.css
*.map
pnpm-lock.yaml
yarn.lock
*.lock
*.log
.env*
graphify-out/
.code-review-graph/
*.example.*
Enter fullscreen mode Exit fullscreen mode

.code-review-graphignore (same content)

node_modules/
dist/
build/
.pnpm-store/
coverage/
*.min.js
*.min.css
*.map
pnpm-lock.yaml
yarn.lock
*.lock
*.log
.env*
graphify-out/
.code-review-graph/
*.example.*
Enter fullscreen mode Exit fullscreen mode

Step 2: Build the Graphs Manually

code-review-graph

cd /path/to/your-project

# Full build (first time) — parses all files
code-review-graph build

# Output (large monorepo):
# Full build: 1052 files, 5780 nodes, 30611 edges (postprocess=full)

# Output (smaller frontend repo):
# Full build: 711 files, 2773 nodes, 15037 edges (postprocess=full)
Enter fullscreen mode Exit fullscreen mode

Graphify (AST-only, no LLM cost)

cd /path/to/your-project

# AST-only update (no API key required)
graphify update .

# Output (large monorepo):
# Rebuilt: 3815 nodes, 4830 edges, 750 communities
# graph.json, graph.html and GRAPH_REPORT.md updated in graphify-out

# Output (smaller repo):
# Rebuilt: 2035 nodes, 2357 edges, 499 communities
Enter fullscreen mode Exit fullscreen mode

For the richer semantic graph (PDFs, images, markdown — uses LLM):

# Full extraction with Claude subagents (requires ANTHROPIC_API_KEY)
graphify extract .
Enter fullscreen mode Exit fullscreen mode

Step 3: Register with Your AI Agent

code-review-graph (auto-configures 5 platforms)

code-review-graph install
Enter fullscreen mode Exit fullscreen mode

This single command:

  • Writes .mcp.json (Claude Code MCP server config)
  • Writes .cursor/mcp.json, .opencode.json, Zed settings, .cursorrules, GEMINI.md, AGENTS.md
  • Creates .claude/skills/ for Claude Code tool integration
  • Installs hooks in .claude/settings.json (move these out — see below)
  • Installs a git pre-commit hook
  • Updates .gitignore to exclude .code-review-graph/

Post-install housekeeping: code-review-graph install is aggressive — it writes configs for every AI IDE it knows about. Most teams only use one. Add the noise to .gitignore:

# .gitignore additions
AGENTS.md
GEMINI.md
.mcp.json          # keep only .mcp.example.json
.cursorrules
.windsurfrules
.opencode.json
.kiro/
Enter fullscreen mode Exit fullscreen mode

Three-file hook pattern: settings.json stays clean — permissions only, no hooks. Hooks go into example/local:

  • .claude/settings.example.json (committed) — documents the full hook structure for teammates. Contains all graph hooks (Stop, SessionStart, PreToolUse). Copy from here to activate.
  • .claude/settings.local.json (gitignored) — your actual personal hooks that fire at runtime. Copy from settings.example.json on first setup, then customize with env vars or secrets. You can also copy individual hooks into your global ~/.claude/settings.json if you want them active across all projects.
// .claude/settings.example.json  committed reference, shows hook structure
// Copy this file to .claude/settings.local.json and customize
{
  "hooks": {
    "PostToolUse": [],
    "Stop": [
      {
        "_comment": "CRG auto-update after AI finishes turn — once per turn, PID-guarded.",
        "hooks": [{
          "type": "command",
          "command": "command -v code-review-graph >/dev/null 2>&1 && [ -d .code-review-graph ] && { PF=/tmp/crg-claude.pid; if [ -f \"$PF\" ] && kill -0 \"$(cat \"$PF\")\" 2>/dev/null; then true; else { code-review-graph update --skip-flows 2>/dev/null && nohup code-review-graph embed >/dev/null 2>&1 & } & echo $! > \"$PF\"; fi; } || true",
          "timeout": 5
        }]
      }
    ],
    "SessionStart": [
      {
        "_comment": "CRG graph status on session open",
        "hooks": [{
          "type": "command",
          "command": "command -v code-review-graph >/dev/null 2>&1 && code-review-graph status 2>/dev/null || true",
          "timeout": 10
        }]
      },
      {
        "_comment": "Setup nudge — prompt to initialize if CLI installed but no graph built yet",
        "hooks": [{
          "type": "command",
          "command": "if command -v code-review-graph >/dev/null 2>&1 && git rev-parse --is-inside-work-tree >/dev/null 2>&1 && [ ! -d .code-review-graph ]; then printf '{\"systemMessage\":\"Graph tool installed but not yet initialized. Ask me to set up: code-review-graph (code-review-graph install)\"}'; fi || true",
          "timeout": 10
        }]
      },
      {
        "_comment": "Graph query cheatsheet — injected once per session (~150 tokens)",
        "hooks": [{
          "type": "command",
          "command": "if [ -f .code-review-graph/graph.db ] || [ -f graphify-out/graph.json ]; then STATS=\"\"; TOOL_LINES=\"\"; if [ -f .code-review-graph/graph.db ]; then STATS=$(python3 -c \"import sqlite3; c=sqlite3.connect('.code-review-graph/graph.db'); n=c.execute('SELECT COUNT(*) FROM nodes').fetchone()[0]; e=c.execute('SELECT COUNT(*) FROM edges').fetchone()[0]; print(f'{n} nodes, {e} edges'); c.close()\" 2>/dev/null || echo \"\"); TOOL_LINES=\"  where is X defined    → semantic_search_nodes_tool(query=X)\\n  who calls X           → query_graph_tool(pattern=callers_of, target=X)\\n  pre-refactor blast    → get_impact_radius_tool(changed_files=[...])\\n  community/cluster     → list_communities_tool()\\n  code review context   → get_review_context_tool(changed_files=[...])\"; fi; if [ -f graphify-out/graph.json ]; then GFY_STATS=$(python3 -c \"import json; g=json.load(open('graphify-out/graph.json')); nodes=g.get('nodes',[]); comms=len(set(n.get('community','') for n in nodes if n.get('community',''))); print(f'{len(nodes)} nodes, {comms} communities')\" 2>/dev/null || echo \"\"); [ -n \"$GFY_STATS\" ] && STATS=\"${STATS:+$STATS | }graphify: $GFY_STATS\"; TOOL_LINES=\"${TOOL_LINES:+$TOOL_LINES\\n}  CRG miss / explore    → graphify query '<term>' --graph graphify-out/graph.json\\n  path A→B              → graphify path '<from>' '<to>' --graph graphify-out/graph.json\"; fi; printf '{\"hookSpecificOutput\":{\"hookEventName\":\"SessionStart\",\"additionalContext\":\"GRAPH QUERY CHEATSHEET (%s) — use BEFORE Read/Grep/Bash-find on code:\\n%s\\nSkip graph for: .md .json .yml .log .jsonl configs cross-repo paths.\\nOverride grep gate: append --graph-tried to any Bash command.\"}}' \"$STATS\" \"$TOOL_LINES\"; fi",
          "timeout": 5
        }]
      }
    ]
  }
}
Enter fullscreen mode Exit fullscreen mode
# .claude/settings.local.json — gitignored, actual runtime hooks
# Copy from settings.example.json on first setup:
cp .claude/settings.example.json .claude/settings.local.json
# Then add any personal env vars or secrets
Enter fullscreen mode Exit fullscreen mode

Why Stop, not PostToolUse? PostToolUse fires after every individual file edit — if Claude makes 10 edits in one response, it fires 10 times. The pgrep guard that prevents double-spawning has a race window of a few milliseconds, which is not enough to block concurrent spawns reliably. The result: multiple CRG Python processes pile up, load average hits 12+, RAM saturates, and the machine slows to a crawl. Stop fires once when the AI finishes its entire turn — all edits batched, single update triggered. The PID-file guard then prevents overlap across turns (skip if the previous turn's update is still running). Zero pile-up by design.

Why graphify update is NOT in any Claude hook: graphify update takes ~10s+ on a real monorepo (even with SHA256 caching). Running it in Stop or PostToolUse would block the session or cause the process to hang in the background, accumulating across turns. It belongs only in git hooks (post-commit, post-checkout) where it runs as a fully detached nohup process after the commit returns — the developer has already moved on, so the latency is invisible.

# .claude/settings.local.json — gitignored, actual runtime hooks (copy from example)
Enter fullscreen mode Exit fullscreen mode

The .mcp.json it creates (keep as .mcp.example.json only):

{
  "mcpServers": {
    "code-review-graph": {
      "command": "uvx",
      "args": ["code-review-graph", "serve"],
      "type": "stdio"
    }
  }
}
Enter fullscreen mode Exit fullscreen mode

Graphify (AI Agent Integration)

# Adds graphify section to CLAUDE.md + PreToolUse hook (Claude Code)
graphify claude install
Enter fullscreen mode Exit fullscreen mode

Graphify's claude install adds a PreToolUse hook that intercepts grep, rg, find commands and redirects the agent to graphify query instead — turning search interception into graph navigation. For other agents: code-review-graph install already writes equivalent config to GEMINI.md, AGENTS.md, .cursorrules, and Zed settings.

Note: graphify claude install writes the PreToolUse hook into .claude/settings.json. Move it to .claude/settings.example.json (committed as a reference) and copy to .claude/settings.local.json to activate it locally — not every teammate will have graphify installed, so it shouldn't fire automatically for everyone.

Auto-Update on Commit (the full picture)

Auto-Update Triggers: PostToolUse, git commit, branch switch

Three independent triggers keep the graph fresh — install all of them so no edit path leaves it stale:

Trigger Hook location Tool Notes
Claude finishes a turn settings.example.jsonsettings.local.json Stop hook CRG only ~0.425s — fast enough to run after every AI turn; PID-guarded
Any commit .git/hooks/post-commit (or .husky/post-commit) graphify + CRG terminal commits, IDE commits, other AI tools; graphify runs as background nohup
Branch switch .git/hooks/post-checkout (or .husky/post-checkout) graphify + CRG smart: ≤5 files diff → incremental update; >5 files or new branch → full rebuild; background nohup

graphify is NOT in the Claude Stop hook. graphify update takes ~10s+ on large monorepos — too slow for an AI turn hook. It would pile up, hang in the background, and saturate CPU/RAM (observed: 3 stuck processes at 65–73% CPU each). Use git hooks instead — the developer has already moved on by the time graphify finishes.

Graphify ships its own installer for the git side:

graphify hook install   # writes post-commit + post-checkout
Enter fullscreen mode Exit fullscreen mode

Code-review-graph does not. Its install command writes a pre-commit hook that runs detect-changes --brief (a status warning before the commit lands) but no post-commit hook to update the SQLite graph after. Without one, .code-review-graph/graph.db only refreshes when Claude touches a file — every terminal/IDE/other-tool commit drifts.

Add the post-commit update yourself. Both forms are detached so git commit returns immediately.

Resource guard required. Graph rebuilds are CPU-intensive. Without guards, multiple rebuilds can pile up (observed: 3 concurrent graphify processes at 65–73% CPU each, load average 12+, RAM saturated). Every hook below includes a _resources_ok check (CPU ≤ 50% of cores, memory ≥ 2 GB free) and a pgrep process deduplication guard. If either check fails, the rebuild silently skips — the next commit retriggers a fresh attempt.

Plain git project — append to .git/hooks/post-commit:

#!/bin/sh
# Knowledge graph tools are optional — each section silently skips if tools are absent.

# Returns 0 (ok) when CPU load and free memory are within acceptable limits.
# CPU: 1-min load average must be <= 50% of logical core count (Linux/macOS)
#      or aggregate load percentage <= 50% (Windows).
# Memory: effectively available memory must be >= 2048 MB on all platforms.
_resources_ok() {
  _os=$(uname -s 2>/dev/null)
  case "$_os" in
    Linux)
      _nproc=$(nproc 2>/dev/null || grep -c '^processor' /proc/cpuinfo 2>/dev/null || echo 1)
      _cpu_ok=$(awk -v n="$_nproc" 'NR==1 { print ($1 / n <= 0.50) ? "1" : "0" }' /proc/loadavg 2>/dev/null)
      if [ "${_cpu_ok:-1}" != "1" ]; then
        echo "[graph hook] Skipping rebuild — CPU load above 50% threshold"
        return 1
      fi
      _mem_ok=$(awk '/^MemAvailable:/ { print ($2 / 1024 >= 2048) ? "1" : "0" }' /proc/meminfo 2>/dev/null)
      if [ "${_mem_ok:-1}" != "1" ]; then
        echo "[graph hook] Skipping rebuild — available memory below 2 GB threshold"
        return 1
      fi
      ;;
    Darwin)
      _nproc=$(sysctl -n hw.logicalcpu 2>/dev/null || echo 1)
      _cpu_ok=$(sysctl -n vm.loadavg 2>/dev/null | awk -v n="$_nproc"

Visit Website