Back to Blog
RAGAILLMGraphRAGGraphitiLightRAGFull Stack

From Vector RAG to LightRAG to Graphiti: My Journey Building Smarter AI Systems

I lost a 2k/month EUR client because I didn't understand RAG well enough. Here's how that failure pushed me to go deeper — from vector search to graph-based memory — and what I learned building through all three approaches.

D
Dheeraj Jha
March 9, 2026
7 min read
From Vector RAG to LightRAG to Graphiti: My Journey Building Smarter AI Systems

Since the start of my product-building journey, one quote has always stuck with me:

You don't need to build a product for millions of users on day one. First build it for your first 10 users — then think about 100, and then millions.

I applied the same mindset to learning RAG systems. Start simple. Understand what you have. Then evolve.


Why I Started Taking RAG Seriously

A few months ago, "RAG" felt like an overwhelming buzzword. Retrieval-Augmented Generation. It sounded smart, but I didn't understand it end-to-end.

That ignorance had a real cost — I lost a 2k/month EUR client because I am not confident enough to build rag system at that time. So, I didn't fake that I know how to build rag system for them.

That was the moment I decided to start learning rag systems and automations. I went back to basics, rebuilt my understanding from scratch, and started experimenting with each approach one by one.


Stage 1 — Vector RAG (Where I Started)

The first thing that clicks when you're learning RAG is the vector-based approach. It's simple, fast, and a great entry point.

How it works:

  1. Chunk your documents into smaller pieces
  2. Convert each chunk into a high-dimensional vector (embedding)
  3. Store vectors in a vector database (Pinecone, Weaviate, Supabase pgvector)
  4. At query time, embed the user's question and find the most similar chunks
  5. Pass those chunks as context to the LLM
User Question → Embed → Vector Search → Top-K Chunks → LLM → Answer

What I liked:

  • Fast to build. A working prototype
  • Extremely low latency (~120ms per query)
  • Great when queries are simple and self-contained
  • Works well when you just need to "find the right passage and answer"

Where it broke down:

As the dataset grew, I started noticing answers that were technically correct but frustratingly incomplete. The system could find individual relevant passages, but it couldn't connect facts spread across multiple documents.

If a user asked something like "Which clients have had recurring issues with component X across their maintenance history?" — the vector approach would return loosely related chunks without linking the relationships between them.

Vector RAG is great. Until your queries start needing to reason across connected information.


Stage 2 — LightRAG (The Middle Ground)

After hitting the wall with vector-only RAG, I started exploring LightRAG — a system that combines graph structures with vector indexing.

How it works:

LightRAG maintains two levels of representation:

  • Low-level nodes: extracted entities and relationships from individual chunks
  • High-level nodes: aggregated concepts across the entire corpus

Each entity still has a vector embedding, so you get the best of both worlds — fast similarity search and the ability to traverse relationships between entities.

User Question → Hybrid Retrieval (Vector + Graph) → Connected Facts → LLM → Answer

What I liked:

  • Significant improvement on multi-hop reasoning tasks
  • ~30% faster than GraphRAG alternatives due to lightweight traversal
  • Incremental updates — new documents are unioned into the existing graph without full recomputation
  • Much lower token cost compared to full GraphRAG (which can consume 600k+ tokens per retrieval)

What I found in practice:

In my experiments, LightRAG felt slower than a pure vector pipeline when queries were simple. The graph layer adds overhead that's only justified when you're actually dealing with relational queries.

The key signal I learned: if your users are asking comparison questions, multi-entity questions, or "how does X relate to Y" questions — that's when LightRAG earns its place.


Stage 3 — Graphiti (Where I Am Now)

Currently, I'm exploring graph-based RAG using Graphiti, built by Zep AI. It's a step up in sophistication, and honestly, it's the most interesting architecture I've worked with.

How it's different:

Graphiti doesn't just build a knowledge graph at indexing time — it treats data as events. Every document or interaction is processed as an "episode" that extracts entities and relationships with explicit temporal context.

This means:

  • Each fact records when it occurred and when it was ingested
  • Facts can be invalidated over time (e.g., "this was true in 2023 but not now")
  • You can query: "what was the relationship between these two entities in Q1 2024?"
User Question → Temporal Graph Search → Time-Aware Facts → LLM → Answer

Why this matters:

For most simple chatbots, this is overkill. But for agent-based systems — applications where an AI needs persistent memory across sessions and the underlying data is always changing — Graphiti is a different category of tool.

Think of it less as "retrieval" and more as memory for agents.

I'm currently building a demo: a knowledge graph chatbot that indexes project documentation, and Graphiti handles the memory layer. The results on complex relational questions are noticeably better than the vector-only version.


The Real Differences: A Quick Comparison

CategoryVector RAGLightRAGGraphiti
Query latency~120ms~80ms< 1s
Simple factual queriesGreatGreatGreat
Multi-hop reasoningStrugglesGoodGreat
Temporal awarenessNoneLimitedFirst-class
Incremental updatesExpensiveEfficientEvent-driven
Setup complexityLowMediumHigher
Best forChatbots, searchEnterprise data, reasoningAgents, dynamic memory

When to Use Which

Stick with Vector RAG when:

  • You're building a chatbot, FAQ bot, or docs search
  • Queries are mostly "find the right passage and answer"
  • Latency is critical and reasoning depth isn't
  • You want the fastest path to production

Move to LightRAG when:

  • Your users start asking questions that span multiple entities or documents
  • You're indexing structured enterprise data (supply chain, financials, org charts)
  • Data changes frequently and you need incremental updates without full re-indexing

Consider Graphiti when:

  • You're building AI agents that need long-term memory across sessions
  • Temporal reasoning is core to your use case (audits, history, compliance)
  • The underlying data evolves over time and you need to track what was true when

What Production Actually Looks Like

The most sophisticated RAG systems I've seen use hybrid approaches:

  1. Vector RAG + reranking — fast broad retrieval, then semantic reranking for precision
  2. LightRAG + cross-encoder — graph-based retrieval with SBERT/ColBERT reranking
  3. Graphiti + vector fallback — temporal memory for agents, vector search for general information

For most SaaS products, my current recommendation is:

  • MVP: Vector RAG with hybrid BM25 + embeddings
  • Production at scale: LightRAG for cost-efficient reasoning
  • Agent/enterprise tier: Graphiti for temporal knowledge and persistent memory

The tech evolves fast. Vector RAG, LightRAG, Graphiti — these are today's options. But the mental model of matching retrieval complexity to query complexity stays constant.

Start simple. Understand your failure modes. Then add complexity only where it earns its place.