Karpathy LLM Wiki + Graphify: The Truth Behind the Graph View

The real-world impact and pitfalls of automated knowledge archiving — is that graph view on Instagram Reels actually useful, or just visual theater?

Andrej Karpathy — former Director of AI at Tesla — sparked roughly 16 million Twitter views with a single concept: the LLM wiki. His analogy was precise: Obsidian is the IDE, the LLM is the programmer, the wiki is the codebase. Elegant framing. But practitioners who actually built the system tell a consistent story: "The more notes accumulated, the more directionless it became." This post dissects whether that experience reflects user error or a structural design flaw — and whether the promised workflow ("throw keywords at it and let the AI organize everything") is technically achievable.

🧩 What Is an LLM Wiki — and How It Differs from RAG

The core distinction between traditional RAG (retrieval-augmented generation) and an LLM wiki comes down to when knowledge is processed. RAG retrieves and assembles chunks on every query — live, at inference time. An LLM wiki pre-compiles knowledge into human-readable Markdown at ingestion time, producing stable wiki pages that persist across sessions. The inconsistency problem inherent in RAG — different chunk combinations on every call — is addressed by committing to pre-edited, versioned pages. This matters because RAG consistency degrades with corpus size, while a pre-compiled wiki's quality is bounded by the compilation pass, not the retrieval pass.

Dimension	Traditional RAG	LLM Wiki
Processing Time	Live retrieval on every query	Pre-compiled at ingestion time
Storage Format	Vector embeddings (opaque)	Human-readable Markdown
Knowledge Linkage	Similarity-based retrieval	Explicit [[links]] between concepts
Consistency	Low (different combination every query)	High (edit history tracked)

The architecture is a clear three-tier hierarchy. Source layer (immutable raw inputs — never modified) → Wiki layer (LLM-synthesized summary and concept pages) → governing both: the schema (a contract like CLAUDE.md that defines how to ingest and maintain the wiki). The schema is what separates a principled wiki from an unmanaged note dump.


graph TD
  S[Source Layer
PDFs · Articles · Notes
Immutable Truth] --> W[Wiki
LLM-Compiled
Summaries & Concept Synthesis]
  C[Schema
CLAUDE.md Contract
Collection & Maintenance Rules] --> W
  W --> A[AI Agent
Cross-Session Reference]
  style S fill:#e8f8f5,stroke:#16a085,color:#117a65
  style W fill:#fef9e7,stroke:#f39c12
  style C fill:#eaf2f8,stroke:#2980b9
  style A fill:#eafaf1,stroke:#27ae60,color:#1e8449

🔗 Diagram summary: Immutable sources and schema rules both feed into the wiki, where the LLM synthesizes concept pages. The AI agent references only the compiled wiki across sessions — it does not re-read raw sources each time.

Karpathy's own research wiki reportedly spans roughly 100 documents and 400,000 words. Adding one new document triggers automatic updates to 10–15 existing pages. In theory, the architecture is genuinely elegant.

⚙️ Graphify — The Other Half of the Stack

Graphify is an open-source tool that converts an entire directory into a queryable knowledge graph (pip install graphifyy). It uses Tree-sitter to parse ASTs, NetworkX to build the graph, and Leiden clustering to group related nodes — extracting structure mechanically from code, SQL schemas, papers, images, and video. It integrates with Claude Code, Cursor, and Gemini CLI.

The key design principle is a clean separation of concerns. Graphify owns the code-structure layer — file dependencies, function calls, schema relationships — preserving structural facts independently of LLM hallucinations. Obsidian owns the human-readable wiki layer — decisions, in-progress context, concept explanations. Neither layer does the other's job.


graph TD
  X[Code · Papers · Schemas] --> G[Graphify
AST Code Graph]
  Y[Notes · Decisions] --> O[Obsidian
Human-Readable Wiki]
  G --> E[AI Agent
No Re-read Per Session]
  O --> E
  style X fill:#eaf2f8,stroke:#2980b9
  style Y fill:#eaf2f8,stroke:#2980b9
  style G fill:#e8f8f5,stroke:#16a085,color:#117a65
  style O fill:#fef9e7,stroke:#f39c12
  style E fill:#eafaf1,stroke:#27ae60,color:#1e8449

🔗 Diagram summary: Code and papers go through Graphify as an AST graph; notes and decisions go through Obsidian as a readable wiki. The AI agent combines both layers — no need to re-read raw source files from scratch each session.

📊 Market Reality — Large Numbers, Skeptical Reading Required

The AI-enhanced personal knowledge management (PKM) market is projected to grow from roughly $1.65 billion in 2025 to $6.15 billion by 2030, a CAGR of 30.3%. The Obsidian community plugin count exceeds 2,700, and Instagram "Obsidian Notes" Reels number over 550.

2025

$1.65B

2030 (Projected)

$6.15B

AI-enhanced PKM market size — CAGR 30.3%, approximately 3.7× growth over five years

The performance claims from major open-source implementations are striking. claude-obsidian reports 80–200 wiki pages auto-generated within 30 days. claude-code-memory-setup claims up to 71.5× reduction in token consumption per session. Before accepting these at face value, stop.

🧠 Read the numbers critically: The "71.5× reduction" is derived from a single case study — a 126-file TypeScript React project — and assumes LLM call cost is zero in AST mode. There is no factor-by-factor decomposition. Treat it as a directional signal pointing toward "significant savings possible," not a reproducible benchmark.

🔍 Why First-Hand Reports Are Negative — Four Structural Flaws

The recurring pattern of "grows too large and loses coherence" is not user error — it is a predictable consequence of design gaps that the critical community has identified repeatedly. Four systemic problems:

① No Epistemic Filter
When the LLM ingests and synthesizes, it does not ask "is this claim verifiable?" Trustworthy insight and plausible error accumulate with equal authority. The wiki has no mechanism to distinguish a validated fact from a confident hallucination — both enter the system with identical formatting and weight.

② No Knowledge Lifecycle
"A fresh speculation added yesterday" and "a fact validated a hundred times over" share the same layout. A year-old concept and today's insight are visually indistinguishable. There is no TTL, no staleness marker, no decay signal — stale knowledge and current knowledge look identical.

③ No Entropy Resistance
There is no automatic duplicate detection or contradiction surfacing. The structure is write-only — the wiki grows under its own weight until it collapses. As critics put it: "A wiki that only grows is a wiki that will eventually break."

④ Compounding Errors
Once an LLM-generated error enters the wiki, the next synthesis pass builds on top of it. Hallucinations compound with interest — each new page is synthesized from previously synthesized (and potentially flawed) pages, amplifying the original error with each pass.

Two hard physical ceilings compound these design gaps.

Effective single-context window ceiling ~100K tokens (~300–500 files)

A wiki exceeding roughly 100K tokens no longer fits in a single context window. Resolving this with a two-tier split means you have re-invented RAG — the very problem the LLM wiki was meant to escape. Additionally, Obsidian's native graph view supports only click, zoom, and pan — it does not support in-graph editing, creating new connections, or persisting node rearrangements. Past 300 notes the graph becomes a tangled mass better suited to screenshots than navigation. Add the overhead of configuring cron jobs, git hooks, and Python scripts, and the community has self-diagnosed the irony: members spend more time browsing other people's setups than doing actual work.

⚖️ Where It Works — and Where It Lets You Down

🟢 Where Real Value Exists
▶ Small-scale personal research (50–200 notes) — reading notes, paper summaries, decision logs. Cross-session context cost reduction and unexpected concept connections are genuine, measurable benefits.
▶ Long-running codebase maintenance (Graphify + Claude Code) — the AST graph preserves structural truth independently of LLM hallucinations, making it a reliable structural source of record.
▶ AI agent memory layer — structured context injection via schemas like CLAUDE.md demonstrably improves session-to-session coherence.

🔴 Where Expectations Need Adjustment
▶ Replicating the Instagram graph view — the visual is the result of 3–6 months of patient accumulation. Reels show the finished gold mine, not the mining process.
▶ Enterprise-scale team knowledge management — access control, concurrent edit conflict resolution, and audit logs are all absent from the current tooling.
▶ Fast-moving domains — in fields like AI, regulation, or markets where truth has a short half-life, synthesized pages go stale quickly and the system has no automatic staleness detection.

🧠 The graph view paradox: The range where the graph is visually compelling (roughly 100–400 notes) is real. But the prettier the graph, the more it is optimized for display rather than navigation. Practitioners consistently report using search and [[links]] far more than the graph view for actual work.

🎯 The Diagnosis Is Right — The Maintenance Burden Is Not

Karpathy's core diagnosis is correct. LLMs have no cross-session memory. RAG lacks consistency at scale. Pre-compiling knowledge is more efficient than repeated ad-hoc retrieval. The insight itself is sound. However, implementation complexity and ongoing maintenance cost are substantially higher than advertised — and successful deployments are overwhelmingly from experienced engineers working on long-term, single-domain projects.

"Just give it keywords and the AI will organize everything" is technically achievable. Working open-source implementations exist. The trap is in the word "just." Sustaining value requires periodic human intervention — pruning, error correction, reorientation. Full automation is not possible with current technology. The negative first-hand reports were accurate observations, not misuse.

Whether to adopt breaks into three scenarios based on scale and purpose.


flowchart TD
  A([Evaluating a Knowledge System]) --> B{What is the
archive for?}
  B -->|Agent memory layer| C[Recommended Now
CLAUDE.md + Graphify]
  B -->|Single domain
under 200 notes| D[Conditionally Recommended
Regular pruning required]
  B -->|All-domain second brain| E[Not Recommended
Bloat and drift inevitable]
  style A fill:#3498db,stroke:#2980b9,color:#ffffff
  style B fill:#fef9e7,stroke:#f39c12
  style C fill:#eafaf1,stroke:#27ae60,color:#1e8449
  style D fill:#fef9e7,stroke:#f39c12,color:#e67e22
  style E fill:#fdedec,stroke:#e74c3c,color:#c0392b

🔁 Diagram summary: Agent memory layer — recommended immediately. Single-domain under 200 notes with regular pruning — conditionally recommended. An all-domain "second brain" — not recommended, as no automation today reliably prevents bloat and loss of direction.

Scenario	Use Case	Key Requirements
A · Recommended	Per-project agent memory	CLAUDE.md + decision notes + Graphify code graph
B · Conditional	Single-domain personal accumulation	Under 200 notes + regular pruning + graph view used for health checks only
C · Not Recommended	All-domain "second brain"	No automation today prevents bloat and drift at this scope

🧠 A closing observation: The people behind impressive graph views share one trait — discipline, not tooling. They focused on a single domain and built slowly over months. The tool is a prerequisite, not the cause. The first step toward a successful adoption is resetting expectations: not "an AI-grown second brain," but "a garden you commit to tending."

References

▶ AI Critique — Karpathy LLM Wiki
▶ Innobu — Enterprise Reality Check
▶ GitHub — lucasrosati/claude-code-memory-setup
▶ GitHub — safishamsi/graphify
▶ decodethefuture — LLM Wiki Three-Tier Architecture

This content is provided for informational purposes on knowledge management tools and technology trends, and does not constitute a recommendation to adopt any specific tool or make any investment. Cited figures and performance claims reflect source data at time of publication and may differ in real-world environments.

SW Develope

Software Development Notes

Curated from a software engineering perspective — each post is verified before publishing.

Blog

Written based on publicly available data and cited sources. Last updated: June 8, 2026

이 블로그 검색

SW Develope