Karpathy LLM Wiki + Graphify: The Truth Behind the Graph View
Karpathy LLM Wiki + Graphify: The Truth Behind the Graph View
The real-world impact and pitfalls of automated knowledge archiving — is that graph view on Instagram Reels actually useful, or just visual theater?
Andrej Karpathy — former Director of AI at Tesla — sparked roughly 16 million Twitter views with a single concept: the LLM wiki. His analogy was precise: Obsidian is the IDE, the LLM is the programmer, the wiki is the codebase. Elegant framing. But practitioners who actually built the system tell a consistent story: "The more notes accumulated, the more directionless it became." This post dissects whether that experience reflects user error or a structural design flaw — and whether the promised workflow ("throw keywords at it and let the AI organize everything") is technically achievable.
🧩 What Is an LLM Wiki — and How It Differs from RAG
The core distinction between traditional RAG (retrieval-augmented generation) and an LLM wiki comes down to when knowledge is processed. RAG retrieves and assembles chunks on every query — live, at inference time. An LLM wiki pre-compiles knowledge into human-readable Markdown at ingestion time, producing stable wiki pages that persist across sessions. The inconsistency problem inherent in RAG — different chunk combinations on every call — is addressed by committing to pre-edited, versioned pages. This matters because RAG consistency degrades with corpus size, while a pre-compiled wiki's quality is bounded by the compilation pass, not the retrieval pass.
| Dimension | Traditional RAG | LLM Wiki |
|---|---|---|
| Processing Time | Live retrieval on every query | Pre-compiled at ingestion time |
| Storage Format | Vector embeddings (opaque) | Human-readable Markdown |
| Knowledge Linkage | Similarity-based retrieval | Explicit [[links]] between concepts |
| Consistency | Low (different combination every query) | High (edit history tracked) |
The architecture is a clear three-tier hierarchy. Source layer (immutable raw inputs — never modified) → Wiki layer (LLM-synthesized summary and concept pages) → governing both: the schema (a contract like CLAUDE.md that defines how to ingest and maintain the wiki). The schema is what separates a principled wiki from an unmanaged note dump.
graph TD
S[Source Layer
PDFs · Articles · Notes
Immutable Truth] --> W[Wiki
LLM-Compiled
Summaries & Concept Synthesis]
C[Schema
CLAUDE.md Contract
Collection & Maintenance Rules] --> W
W --> A[AI Agent
Cross-Session Reference]
style S fill:#e8f8f5,stroke:#16a085,color:#117a65
style W fill:#fef9e7,stroke:#f39c12
style C fill:#eaf2f8,stroke:#2980b9
style A fill:#eafaf1,stroke:#27ae60,color:#1e8449
🔗 Diagram summary: Immutable sources and schema rules both feed into the wiki, where the LLM synthesizes concept pages. The AI agent references only the compiled wiki across sessions — it does not re-read raw sources each time.
Karpathy's own research wiki reportedly spans roughly 100 documents and 400,000 words. Adding one new document triggers automatic updates to 10–15 existing pages. In theory, the architecture is genuinely elegant.
⚙️ Graphify — The Other Half of the Stack
Graphify is an open-source tool that converts an entire directory into a queryable knowledge graph (pip install graphifyy). It uses Tree-sitter to parse ASTs, NetworkX to build the graph, and Leiden clustering to group related nodes — extracting structure mechanically from code, SQL schemas, papers, images, and video. It integrates with Claude Code, Cursor, and Gemini CLI.
The key design principle is a clean separation of concerns. Graphify owns the code-structure layer — file dependencies, function calls, schema relationships — preserving structural facts independently of LLM hallucinations. Obsidian owns the human-readable wiki layer — decisions, in-progress context, concept explanations. Neither layer does the other's job.
graph TD
X[Code · Papers · Schemas] --> G[Graphify
AST Code Graph]
Y[Notes · Decisions] --> O[Obsidian
Human-Readable Wiki]
G --> E[AI Agent
No Re-read Per Session]
O --> E
style X fill:#eaf2f8,stroke:#2980b9
style Y fill:#eaf2f8,stroke:#2980b9
style G fill:#e8f8f5,stroke:#16a085,color:#117a65
style O fill:#fef9e7,stroke:#f39c12
style E fill:#eafaf1,stroke:#27ae60,color:#1e8449
🔗 Diagram summary: Code and papers go through Graphify as an AST graph; notes and decisions go through Obsidian as a readable wiki. The AI agent combines both layers — no need to re-read raw source files from scratch each session.
📊 Market Reality — Large Numbers, Skeptical Reading Required
The AI-enhanced personal knowledge management (PKM) market is projected to grow from roughly $1.65 billion in 2025 to $6.15 billion by 2030, a CAGR of 30.3%. The Obsidian community plugin count exceeds 2,700, and Instagram "Obsidian Notes" Reels number over 550.
The performance claims from major open-source implementations are striking. claude-obsidian reports 80–200 wiki pages auto-generated within 30 days. claude-code-memory-setup claims up to 71.5× reduction in token consumption per session. Before accepting these at face value, stop.
🔍 Why First-Hand Reports Are Negative — Four Structural Flaws
The recurring pattern of "grows too large and loses coherence" is not user error — it is a predictable consequence of design gaps that the critical community has identified repeatedly. Four systemic problems:
When the LLM ingests and synthesizes, it does not ask "is this claim verifiable?" Trustworthy insight and plausible error accumulate with equal authority. The wiki has no mechanism to distinguish a validated fact from a confident hallucination — both enter the system with identical formatting and weight.
"A fresh speculation added yesterday" and "a fact validated a hundred times over" share the same layout. A year-old concept and today's insight are visually indistinguishable. There is no TTL, no staleness marker, no decay signal — stale knowledge and current knowledge look identical.
There is no automatic duplicate detection or contradiction surfacing. The structure is write-only — the wiki grows under its own weight until it collapses. As critics put it: "A wiki that only grows is a wiki that will eventually break."
Once an LLM-generated error enters the wiki, the next synthesis pass builds on top of it. Hallucinations compound with interest — each new page is synthesized from previously synthesized (and potentially flawed) pages, amplifying the original error with each pass.
Two hard physical ceilings compound these design gaps.
A wiki exceeding roughly 100K tokens no longer fits in a single context window. Resolving this with a two-tier split means you have re-invented RAG — the very problem the LLM wiki was meant to escape. Additionally, Obsidian's native graph view supports only click, zoom, and pan — it does not support in-graph editing, creating new connections, or persisting node rearrangements. Past 300 notes the graph becomes a tangled mass better suited to screenshots than navigation. Add the overhead of configuring cron jobs, git hooks, and Python scripts, and the community has self-diagnosed the irony: members spend more time browsing other people's setups than doing actual work.
⚖️ Where It Works — and Where It Lets You Down
▶ Small-scale personal research (50–200 notes) — reading notes, paper summaries, decision logs. Cross-session context cost reduction and unexpected concept connections are genuine, measurable benefits.
▶ Long-running codebase maintenance (Graphify + Claude Code) — the AST graph preserves structural truth independently of LLM hallucinations, making it a reliable structural source of record.
▶ AI agent memory layer — structured context injection via schemas like
CLAUDE.md demonstrably improves session-to-session coherence.
▶ Replicating the Instagram graph view — the visual is the result of 3–6 months of patient accumulation. Reels show the finished gold mine, not the mining process.
▶ Enterprise-scale team knowledge management — access control, concurrent edit conflict resolution, and audit logs are all absent from the current tooling.
▶ Fast-moving domains — in fields like AI, regulation, or markets where truth has a short half-life, synthesized pages go stale quickly and the system has no automatic staleness detection.
🎯 The Diagnosis Is Right — The Maintenance Burden Is Not
Karpathy's core diagnosis is correct. LLMs have no cross-session memory. RAG lacks consistency at scale. Pre-compiling knowledge is more efficient than repeated ad-hoc retrieval. The insight itself is sound. However, implementation complexity and ongoing maintenance cost are substantially higher than advertised — and successful deployments are overwhelmingly from experienced engineers working on long-term, single-domain projects.
"Just give it keywords and the AI will organize everything" is technically achievable. Working open-source implementations exist. The trap is in the word "just." Sustaining value requires periodic human intervention — pruning, error correction, reorientation. Full automation is not possible with current technology. The negative first-hand reports were accurate observations, not misuse.
Whether to adopt breaks into three scenarios based on scale and purpose.
flowchart TD
A([Evaluating a Knowledge System]) --> B{What is the
archive for?}
B -->|Agent memory layer| C[Recommended Now
CLAUDE.md + Graphify]
B -->|Single domain
under 200 notes| D[Conditionally Recommended
Regular pruning required]
B -->|All-domain second brain| E[Not Recommended
Bloat and drift inevitable]
style A fill:#3498db,stroke:#2980b9,color:#ffffff
style B fill:#fef9e7,stroke:#f39c12
style C fill:#eafaf1,stroke:#27ae60,color:#1e8449
style D fill:#fef9e7,stroke:#f39c12,color:#e67e22
style E fill:#fdedec,stroke:#e74c3c,color:#c0392b
🔁 Diagram summary: Agent memory layer — recommended immediately. Single-domain under 200 notes with regular pruning — conditionally recommended. An all-domain "second brain" — not recommended, as no automation today reliably prevents bloat and loss of direction.
| Scenario | Use Case | Key Requirements |
|---|---|---|
| A · Recommended | Per-project agent memory | CLAUDE.md + decision notes + Graphify code graph |
| B · Conditional | Single-domain personal accumulation | Under 200 notes + regular pruning + graph view used for health checks only |
| C · Not Recommended | All-domain "second brain" | No automation today prevents bloat and drift at this scope |
References
▶ AI Critique — Karpathy LLM Wiki
▶ Innobu — Enterprise Reality Check
▶ GitHub — lucasrosati/claude-code-memory-setup
▶ GitHub — safishamsi/graphify
▶ decodethefuture — LLM Wiki Three-Tier Architecture
This content is provided for informational purposes on knowledge management tools and technology trends, and does not constitute a recommendation to adopt any specific tool or make any investment. Cited figures and performance claims reflect source data at time of publication and may differ in real-world environments.
Curated from a software engineering perspective — each post is verified before publishing.
Written based on publicly available data and cited sources. Last updated: June 8, 2026
댓글
댓글 쓰기