Skip to contentNew: Does ChatGPT recommend your brand? Free 60-second AI visibility check →
By The DDH Team · Digital Dashboard Hub

Build GraphRAG (2026): Knowledge Graph Construction, Community Search, and Claude Generation

By The DDH Team at Digital Dashboard HubUpdated

Stop writing AI prompts from scratch.

Tell us your business + your task + your model. We write the prompt — perfectly tuned for ChatGPT, Claude, Grok, Gemini, Midjourney, or any model. Plus 500+ pre-built prompts in your library.

14 days, no card. Cancel in 2 clicks.

Standard RAG — chunk a corpus, embed chunks, retrieve top-K by similarity — answers lookup queries well: 'What is the return policy?', 'What does function X do?', 'What happened on date Y?' But it fails on a class of questions that require synthesizing information across many documents: 'What are the major themes in this corpus?', 'What organizations are connected to person X across all these reports?', 'What are all the causal chains linking event A to outcome B?' These queries require a global understanding of the corpus, not local retrieval of similar chunks.

GraphRAG (Microsoft Research, Edge et al. 2024, arxiv.org/abs/2404.16130) addresses this gap by building a knowledge graph from the corpus during an offline indexing step. An LLM extracts entities (people, places, organizations, concepts) and relationships from each chunk. A graph is constructed. Community detection (Leiden algorithm) partitions the graph into clusters. LLM-generated summaries are produced for each community. At query time, global search aggregates community summaries across the entire graph; local search focuses on the entity-level subgraph around query-relevant entities. The result: answers to multi-hop and corpus-wide questions that chunk-based RAG cannot produce.

This tutorial covers: the Microsoft GraphRAG v2 pipeline setup, the entity extraction prompt pattern, community detection, the two search modes (global vs local), cost modeling ($100-500 per 1M corpus tokens for graph construction), and a direct comparison to vanilla RAG, LightRAG, and Neo4j+vector alternatives. Related: Build RAG with Pinecone · Build RAG with pgvector · Hybrid search BM25 + dense · RAG architecture decision tree · RAG cost per query calculator.

Digital Dashboard Hub

Writing good prompts for ONE AI is hard. Writing them for GPT-5, Claude, Gemini, Perplexity, Midjourney and 6 more is a full-time job. DDH's AI Prompt Builder writes once, runs everywhere — locked to your niche, voice, and brand tone.

Free 14 days, no card.

GraphRAG stack components and cost (2026)

Feature
Component
Role
Pricing
Microsoft GraphRAG (open-source)Entity extraction, community detection, index pipelineOpen-source (MIT), github.com/microsoft/graphrag — pip install graphrag
Anthropic Claude Sonnet 4.6 (graph construction)Entity + relationship extraction LLM$3/1M input tokens, $15/1M output tokens; construction is LLM-heavy: $100-500/1M corpus tokens typical
OpenAI text-embedding-3-small (node embeddings)Entity node embeddings for local search$0.02/1M tokens
Neo4j Community / AuraDBOptional: persistent graph database for entity/relation storageCommunity: open-source; AuraDB Free: 50K nodes; Professional: $65/mo (neo4j.com/pricing)
LightRAG (alternative)Lighter-weight graph RAG with same entity/relation extraction patternOpen-source (MIT), github.com/HKUDS/LightRAG — lower indexing cost than Microsoft GraphRAG
Anthropic Claude Sonnet 4.6 (query time)Global/local search generation$3/1M input, $15/1M output; community summaries can be large (10K-50K tokens per global query)

GraphRAG construction cost is dominated by LLM calls during entity extraction and community summarization. Published estimates (Microsoft Research blog, 2024; community benchmarks, 2025) range from $0.10-$0.50 per 1,000 corpus tokens for construction with GPT-4-class models. Construction is a one-time cost per corpus version; query-time cost is per query. Claude Sonnet 4.6 pricing from docs.anthropic.com/en/docs/about-claude/pricing, June 2026.

Phase 1: When GraphRAG wins and when vanilla RAG wins

GraphRAG is not a universal upgrade over standard RAG. It has a significantly higher offline indexing cost (LLM-heavy extraction over every chunk), more complex infrastructure, and higher per-query cost (community summaries are large). The decision depends on your query distribution and corpus characteristics.

**GraphRAG wins when:**

- Queries require multi-hop reasoning: 'What companies are connected to the CEO of X through board memberships?' Dense retrieval cannot follow multi-hop chains — it retrieves documents mentioning X but does not traverse relationships.

- Queries ask about corpus-wide patterns: 'What are the major themes in this collection of 500 earnings calls?' This requires synthesizing across the entire corpus, not retrieving similar chunks.

- Queries involve entity relationships: 'What is the relationship between Organization A and Person B across all these reports?' The knowledge graph captures these relationships explicitly.

- Your corpus is a coherent domain (e.g., all SEC filings for a company, all papers in a research area, all contracts in a legal matter) where entities and relationships are meaningful.

**Vanilla RAG wins when:**

- Queries are lookup-style: 'What is the return policy?', 'What does function X do?' Standard retrieval is fast, cheap, and accurate for these.

- Cost is constrained: GraphRAG construction at $100-500/1M corpus tokens is expensive for large corpora. A 100M-token corpus could cost $10K-50K to index.

- Corpus updates frequently: re-indexing after corpus changes requires re-running entity extraction over changed documents (incremental re-indexing is supported in GraphRAG v2 but adds complexity).

- Your team lacks experience with graph data structures: debugging entity extraction errors, community detection failures, and graph query logic requires graph expertise that most ML teams don't have on day 1.

**Decision rule (simplified):** if more than 20% of your queries involve multi-hop reasoning, entity-relationship questions, or corpus-wide synthesis, benchmark GraphRAG. If fewer than 20%, standard hybrid RAG (see hybrid search BM25 + dense) is likely the better investment.


Phase 2: Install Microsoft GraphRAG v2 and initialize a project

Microsoft GraphRAG is an open-source Python package. GraphRAG v2 (released late 2024) added incremental indexing, better community detection, and CLI improvements over the v1 release.

```bash # Install GraphRAG pip install graphrag==2.1.0 # Initialize a new GraphRAG project mkdir my_graphrag_project && cd my_graphrag_project python -m graphrag init --root . ```

The `init` command creates the directory structure:

``` my_graphrag_project/ ├── .env # API keys ├── settings.yaml # pipeline configuration ├── input/ # your source documents go here │ └── (your .txt or .md files) └── output/ # generated knowledge graph artifacts ```

Configure `.env` with your API keys:

```bash # .env GRAPHRAG_API_KEY=your_anthropic_api_key_here # or OpenAI key ```

Configure `settings.yaml` to use Claude Sonnet 4.6. GraphRAG v2 added Anthropic as a supported LLM provider alongside OpenAI. Edit the `llm` section:

```yaml # settings.yaml (key sections) llm: api_key: ${GRAPHRAG_API_KEY} type: anthropic_chat model: claude-sonnet-4-6 max_tokens: 4000 temperature: 0 request_timeout: 180.0 max_retries: 10 requests_per_minute: 50 tokens_per_minute: 50000 embeddings: api_key: ${OPENAI_API_KEY} # embeddings still via OpenAI for now async_mode: threaded llm: api_type: openai api_key: ${OPENAI_API_KEY} model: text-embedding-3-small chunks: size: 1200 overlap: 100 group_by_columns: [id] entity_extraction: max_gleanings: 1 # number of LLM retries per chunk for extraction completeness community_reports: max_length: 2000 # tokens per community summary claim_extraction: enabled: false # disable claim extraction to reduce cost ```

Setting `temperature: 0` for the extraction LLM is important for reproducibility — you want the same entities extracted from the same chunk on every run. `max_gleanings: 1` means GraphRAG will retry entity extraction once per chunk if the initial extraction yields few entities; higher gleanings improve completeness but increase cost.


Phase 3: Document preparation and the entity extraction pipeline

Place your source documents in the `input/` directory as `.txt` or `.md` files. GraphRAG v2 supports CSV and JSON input formats as well (configure `input.file_type` in settings.yaml).

Run the indexing pipeline:

```bash # Run the full indexing pipeline python -m graphrag index --root . # For large corpora, run with verbose logging to monitor progress python -m graphrag index --root . --verbose ```

The indexing pipeline runs these phases sequentially: (1) text chunking, (2) entity extraction per chunk using the LLM, (3) entity resolution (deduplication of the same entity mentioned with different names), (4) relationship extraction between co-occurring entities, (5) community detection with Leiden algorithm, (6) community summary generation with the LLM.

The entity extraction prompt in GraphRAG uses a few-shot pattern. The default extraction prompt is in `graphrag/prompt_tune/defaults.py`. For production, customize the entity types to your domain:

```python # custom_entity_extraction_prompt.py # Used to override the default extraction prompt in settings.yaml # (prompt_tune.extraction.entity_types) CUSTOM_ENTITY_EXTRACTION_PROMPT = """ -Goal- Given a text document, identify all entities of the specified types and all relationships among them. -Entity Types- {entity_types} -Steps- 1. Identify all entities of the listed types. For each entity, extract: - entity_name: Name of the entity - entity_type: One of: {entity_types} - entity_description: Comprehensive description of entity attributes 2. Identify all relationships between recognized entities. For each relationship: - source_entity: name of the source entity - target_entity: name of the target entity - relationship_description: description of the relationship between source and target - relationship_strength: a numeric score 1-10 indicating strength 3. Return output as a well-formed JSON list. ###################### -Examples- ###################### {examples} ###################### -Real Data- ###################### Entity_types: {entity_types} Text: {input_text} Output: """ ```

Customizing entity types for your domain is the highest-impact tuning step. The default entity types are `["organization", "person", "geo", "event"]`. For a financial corpus, add `["company", "person", "financial_instrument", "regulatory_body", "metric"]`. For a technical codebase, use `["module", "function", "class", "concept", "dependency"]`. Generic entity types produce noisy extractions that degrade community quality.


Phase 4: Community detection with Leiden algorithm

After entity extraction, GraphRAG builds a graph where nodes are entities and edges are relationships with strength weights. Leiden community detection (Traag et al., 2019, nature.com/articles/s41598-019-41695-z) partitions this graph into communities — groups of entities that are more densely connected to each other than to the rest of the graph. Leiden is preferred over Louvain for this task because it produces communities with stronger guarantees of internal connectivity.

GraphRAG v2 implements multi-level community detection: the graph is partitioned into large top-level communities (C0), then each community is sub-partitioned into finer communities (C1, C2, ...). Global search uses top-level community summaries for corpus-wide queries; local search uses fine-grained communities for entity-level queries.

The community structure is stored in `output/communities.parquet` and `output/community_reports.parquet` after indexing. Inspect it:

```python import pandas as pd # Load community reports community_reports = pd.read_parquet("output/community_reports.parquet") print(community_reports.columns.tolist()) # ['id', 'title', 'community', 'level', 'rank', 'rank_explanation', # 'full_content', 'summary', 'findings', 'full_content_json'] # How many communities at each level? print(community_reports.groupby("level")["id"].count()) # level 0: 12 (broad clusters) # level 1: 47 (medium clusters) # level 2: 180 (fine clusters) # Inspect a community summary print(community_reports.iloc[0]["summary"][:500]) ```

Each community report contains: a `title` (entity cluster name generated by the LLM), a `summary` (2-3 sentence overview), and `findings` (structured list of key claims with citations to source chunks). These are what global search queries use as context for Claude generation.

Community detection settings in `settings.yaml`:

```yaml cluster_graph: max_cluster_size: 10 # max entities per leaf community summarize_descriptions: max_length: 500 # tokens per entity description summary community_reports: max_length: 2000 # tokens per community report max_input_length: 8000 # max tokens of entity context fed to community summarizer ```

`max_cluster_size: 10` controls community granularity. Smaller values produce more communities (finer granularity, better for precise queries) but increase the number of LLM calls for community summarization. For corpora under 100 documents, 10 is reasonable; for large corpora, increase to 20-30 to limit LLM costs during community summarization.


Phase 5: Global search — corpus-wide synthesis queries

Global search is used for corpus-wide questions: 'What are the major themes in this corpus?', 'What are the most important entities across all documents?' It works by: (1) selecting all community summaries at a target level, (2) shuffling them and batching into context windows, (3) asking Claude to identify query-relevant points from each batch, (4) aggregating and ranking the intermediate answers, (5) generating a final synthesis.

```python import graphrag.query.context_builder.builders as builders from graphrag.query.llm.oai.chat_openai import ChatOpenAI from graphrag.query.structured_search.global_search.community_context import GlobalCommunityContext from graphrag.query.structured_search.global_search.search import GlobalSearch from graphrag.query.indexer_adapters import ( read_indexer_entities, read_indexer_community_reports, ) import pandas as pd # Load indexed artifacts entity_df = pd.read_parquet("output/create_final_entities.parquet") entity_embedding_df = pd.read_parquet("output/create_final_entities.parquet") community_df = pd.read_parquet("output/create_final_communities.parquet") community_report_df = pd.read_parquet("output/community_reports.parquet") entities = read_indexer_entities(entity_df, entity_embedding_df, community_level=2) community_reports = read_indexer_community_reports( community_report_df, community_df, community_level=2 ) # Use Claude Sonnet 4.6 via the Anthropic-compatible interface llm = ChatOpenAI( api_key=os.environ["ANTHROPIC_API_KEY"], api_base="https://api.anthropic.com/v1", model="claude-sonnet-4-6", max_retries=10, ) context_builder = GlobalCommunityContext( community_reports=community_reports, entities=entities, token_encoder=tiktoken.get_encoding("cl100k_base"), ) search_engine = GlobalSearch( llm=llm, context_builder=context_builder, token_encoder=tiktoken.get_encoding("cl100k_base"), max_data_tokens=12_000, # tokens per intermediate generation pass map_llm_params={"max_tokens": 1000, "temperature": 0, "response_format": {"type": "json_object"}}, reduce_llm_params={"max_tokens": 2000, "temperature": 0}, allow_general_knowledge=False, json_mode=True, ) result = await search_engine.asearch("What are the major themes in this corpus?") print(result.response) ```

The `community_level` parameter determines which community hierarchy level is used. Level 0 gives the broadest community summaries (fewest, largest communities). Level 2 gives finer granularity. For corpus-wide 'themes' queries, level 0-1 is usually better; for entity-specific queries, level 2-3 is more precise. If you're unsure, level 2 is a reasonable default.

Global search token cost: each intermediate map pass uses `max_data_tokens` tokens of community context plus ~500 tokens of prompt. With 50 communities at level 2 and 12K tokens per batch, expect 4-5 LLM calls for the map step plus 1 reduce call. Total: ~60K input tokens + ~6K output tokens per global query at Sonnet 4.6 pricing = 60K × $0.000003 + 6K × $0.000015 = $0.18 + $0.09 = **$0.27 per global query**. This is expensive compared to standard RAG (~$0.01/query). Use global search only for queries that require corpus-wide synthesis.


Phase 6: Local search — entity-level queries

Local search is used for entity-specific questions: 'What is known about Person X?', 'What projects is Organization Y involved in?' It retrieves the entity's community context, relationships, and associated text chunks using vector similarity on entity embeddings.

```python from graphrag.query.structured_search.local_search.mixed_context import LocalSearchMixedContext from graphrag.query.structured_search.local_search.search import LocalSearch from graphrag.query.indexer_adapters import ( read_indexer_relationships, read_indexer_text_units, read_indexer_covariates, ) relationship_df = pd.read_parquet("output/create_final_relationships.parquet") text_unit_df = pd.read_parquet("output/create_final_text_units.parquet") relationships = read_indexer_relationships(relationship_df) text_units = read_indexer_text_units(text_unit_df) # Local search needs entity embeddings for similarity matching from graphrag.query.llm.oai.embedding import OpenAIEmbedding text_embedder = OpenAIEmbedding( api_key=os.environ["OPENAI_API_KEY"], api_base="https://api.openai.com/v1", api_type=OpenaiApiType.OpenAI, model="text-embedding-3-small", max_retries=20, ) context_builder = LocalSearchMixedContext( community_reports=community_reports, text_units=text_units, entities=entities, relationships=relationships, covariates=None, # disable covariate extraction (reduces cost) entity_text_embeddings=pd.read_parquet("output/create_final_entities.parquet"), embedding_vectorstore_key="entity.title", text_embedder=text_embedder, token_encoder=tiktoken.get_encoding("cl100k_base"), ) local_search_engine = LocalSearch( llm=llm, context_builder=context_builder, token_encoder=tiktoken.get_encoding("cl100k_base"), llm_params={"max_tokens": 2000, "temperature": 0}, context_builder_params={ "use_community_summary": False, # use full community reports for local "shuffle_data": True, "include_community_rank": True, "min_community_rank": 0, "community_rank_name": "rank", "include_entity_rank": True, "entity_rank_description": "number of relationships", "include_relationship_weight": True, "relationship_ranking_attribute": "rank", "max_tokens": 12_000, }, response_type="multiple paragraphs", ) result = await local_search_engine.asearch("Tell me about the relationships between Microsoft and OpenAI") print(result.response) ```

Local search cost: ~15K-25K input tokens per query (entity context + community context + relationships + source chunks) plus generation output. At Sonnet 4.6 pricing: 20K × $0.000003 + 1K × $0.000015 = $0.06 + $0.015 = **$0.075 per local query**. More expensive than standard RAG (~$0.01/query) but cheaper than global search. For entity-specific lookup queries, consider adding a classification step: route simple lookup queries to standard RAG, route entity-relationship questions to GraphRAG local search.


Phase 7: GraphRAG construction cost model

Construction cost is the largest barrier to GraphRAG adoption. It is dominated by LLM calls during entity extraction and community summarization. Here is a worked cost model for a 1M-token corpus (roughly 750K words, ~500 pages of text).

``` Construction pipeline cost breakdown — 1M corpus tokens ──────────────────────────────────────────────────────────────────────────── Step LLM calls Input tokens Output tokens Cost ──────────────────────────────────────────────────────────────────────────── Chunking (1200 tok) n/a n/a n/a $0 Entity extraction ~833 chunks 1M (chunks) ~200K (JSON) $3.00 + $3.00 Entity summarization ~1000 ents ~2M (context) ~200K (summ) $6.00 + $3.00 Community detection n/a (grph) n/a n/a $0 Community summaries ~50-200 com ~500K (ents) ~100K (summ) $1.50 + $1.50 ──────────────────────────────────────────────────────────────────────────── Total (approx) ~$18.00 ```

This estimate uses Claude Sonnet 4.6 at $3/1M input and $15/1M output. The published range of $100-500/1M corpus tokens applies to GPT-4 class models at GPT-4 pricing ($10-30/1M input). With Claude Sonnet 4.6, the same pipeline costs roughly 3-10x less. At $18 per 1M corpus tokens, indexing a 10M-token corpus (a large enterprise knowledge base) costs ~$180 — a one-time setup cost that is reasonable for many use cases.

The most expensive step is entity summarization — the LLM re-reads all text associated with each entity to produce a consolidated description. `claim_extraction.enabled: false` in settings.yaml eliminates the claim extraction step (disabled by default in v2), saving ~20-30% of construction cost. Reducing `entity_extraction.max_gleanings` from 1 to 0 eliminates the retry step, saving another 10-20% at the cost of lower extraction completeness.

Re-indexing strategy for corpus updates: GraphRAG v2 supports incremental indexing — run `python -m graphrag index --root . --resume` to pick up new documents without re-processing existing ones. Changed documents require re-extraction (the pipeline detects changes via content hash). Full re-indexing is needed when you change the extraction prompt or entity types — the graph structure changes and existing summaries become inconsistent.


Phase 8: LightRAG — a lighter-weight alternative

LightRAG (HKUDS, arxiv.org/abs/2410.05779, github.com/HKUDS/LightRAG) follows the same entity-relationship extraction pattern as Microsoft GraphRAG but with a simpler pipeline that reduces construction cost by approximately 60-70%. The trade-off: LightRAG's community detection and summarization are less sophisticated, and global search quality is lower than GraphRAG's on complex corpus-wide queries.

```bash pip install lightrag-hku==1.2.0 ```

```python import os from lightrag import LightRAG, QueryParam from lightrag.llm.anthropic import claude_complete_if_cache from lightrag.llm.openai import openai_embedding from lightrag.utils import EmbeddingFunc import numpy as np async def claude_model(prompt, system_prompt=None, history_messages=[], **kwargs): return await claude_complete_if_cache( "claude-sonnet-4-6", prompt, system_prompt=system_prompt, history_messages=history_messages, api_key=os.environ["ANTHROPIC_API_KEY"], **kwargs, ) async def embedding_func(texts: list[str]) -> np.ndarray: return await openai_embedding( texts, model="text-embedding-3-small", api_key=os.environ["OPENAI_API_KEY"], ) rag = LightRAG( working_dir="./lightrag_cache", llm_model_func=claude_model, embedding_func=EmbeddingFunc( embedding_dim=1536, max_token_size=8192, func=embedding_func, ), ) # Insert documents with open("./my_document.txt") as f: await rag.ainsert(f.read()) # Query in different modes local_result = await rag.aquery("What is known about Entity X?", param=QueryParam(mode="local")) global_result = await rag.aquery("What are the main themes?", param=QueryParam(mode="global")) hybrid_result = await rag.aquery("Detailed question", param=QueryParam(mode="hybrid")) print(local_result) ```

LightRAG's `mode` parameter (`local`, `global`, `hybrid`, `naive`) maps approximately to Microsoft GraphRAG's search modes. `hybrid` mode runs both local entity-level search and global community-level search in parallel, fusing results — this is LightRAG's recommended default. `naive` mode falls back to chunk-based vector retrieval with no graph traversal (comparable to standard RAG).

LightRAG vs Microsoft GraphRAG summary: LightRAG is 60-70% cheaper to index, easier to set up (no settings.yaml, fewer dependencies), and has a simpler codebase for customization. Microsoft GraphRAG has more mature multi-level community detection, better global search quality on complex corpus-wide queries, and an active enterprise adoption path (Azure AI Search integration). For new projects, start with LightRAG to validate that graph-based retrieval adds value for your query distribution before committing to the full GraphRAG infrastructure.


Phase 9: Neo4j + vector as a production graph store alternative

Microsoft GraphRAG stores the knowledge graph in Parquet files. For production deployments where the graph needs to be queried, updated, and integrated with other systems, a dedicated graph database provides transactional integrity and rich query capabilities. Neo4j is the most widely deployed graph database in production.

Neo4j AuraDB has built-in vector index support (as of Neo4j 5.11), enabling combined graph traversal + vector similarity queries in a single Cypher query:

```python from neo4j import GraphDatabase driver = GraphDatabase.driver( os.environ["NEO4J_URI"], auth=(os.environ["NEO4J_USER"], os.environ["NEO4J_PASSWORD"]) ) # Store entities as nodes with embeddings def create_entity(tx, name: str, entity_type: str, description: str, embedding: list[float]): tx.run( """ MERGE (e:Entity {name: $name}) SET e.type = $entity_type, e.description = $description, e.embedding = $embedding """, name=name, entity_type=entity_type, description=description, embedding=embedding ) # Store relationships as edges def create_relationship(tx, source: str, target: str, rel_type: str, description: str, strength: float): tx.run( """ MATCH (s:Entity {name: $source}) MATCH (t:Entity {name: $target}) MERGE (s)-[r:RELATES_TO {type: $rel_type}]->(t) SET r.description = $description, r.strength = $strength """, source=source, target=target, rel_type=rel_type, description=description, strength=strength ) # Combined graph traversal + vector similarity (Cypher) def retrieve_entity_neighbors( query_embedding: list[float], top_k_entities: int = 5, hop_depth: int = 2, ) -> list[dict]: with driver.session() as session: # Step 1: find top-K most similar entities by vector # Step 2: retrieve their graph neighborhood (up to hop_depth hops) result = session.run( """ CALL db.index.vector.queryNodes('entity_embeddings', $top_k, $query_embedding) YIELD node AS seed, score CALL apoc.path.subgraphNodes(seed, {maxLevel: $hop_depth}) YIELD node RETURN DISTINCT node.name AS name, node.type AS type, node.description AS description, score ORDER BY score DESC """, top_k=top_k_entities, query_embedding=query_embedding, hop_depth=hop_depth, ) return [dict(record) for record in result] ```

The Neo4j approach gives you: (1) persistent graph storage across application restarts (no re-loading Parquet files); (2) Cypher queries for complex multi-hop traversal; (3) full ACID transactions for graph updates; (4) built-in vector indexes for entity similarity search. The trade-off: operational overhead of running Neo4j (AuraDB free tier is limited to 50K nodes — use the Professional tier at $65/mo for production corpora).

For teams already using Neo4j for other purposes (recommendation systems, fraud detection, network analysis), extending it to support GraphRAG is natural. For teams that don't have Neo4j, the Parquet-based Microsoft GraphRAG storage is simpler to start with — migrate to Neo4j if you need more advanced graph querying later.


Phase 10: Combining GraphRAG with standard RAG — query routing

The production pattern for most teams is not GraphRAG-only or vanilla-RAG-only but a hybrid pipeline with query routing. A lightweight classifier routes each incoming query to the appropriate retrieval strategy based on query type.

```python import anthropic ROUTER_PROMPT = """Classify this query into one of three retrieval strategies. strategies: - LOOKUP: single-fact, exact-value, or definition query ("What is X?", "How do I do Y?", "What is the value of Z?") - ENTITY_GRAPH: query about relationships, connections, or facts about a specific entity ("What are all connections between A and B?", "What is known about person X?") - GLOBAL_SYNTHESIS: corpus-wide pattern or theme query ("What are the major themes?", "What are the key trends across all documents?") Return ONLY the strategy name, nothing else. Query: {query}""" def route_query(query: str) -> str: """Classify query into retrieval strategy. Returns 'LOOKUP', 'ENTITY_GRAPH', or 'GLOBAL_SYNTHESIS'.""" client = anthropic.Anthropic() resp = client.messages.create( model="claude-haiku-4-5", # cheap fast model for routing max_tokens=20, messages=[{"role": "user", "content": ROUTER_PROMPT.format(query=query)}] ) return resp.content[0].text.strip() async def smart_retrieve(query: str, standard_rag_index, graphrag_root: str) -> str: """Route query to appropriate retrieval strategy and generate answer.""" strategy = route_query(query) print(f"Query strategy: {strategy}") if strategy == "LOOKUP": # Standard dense/hybrid RAG (fast, cheap, accurate for lookup) # Use your Pinecone or pgvector retrieval function here chunks = await standard_rag_index.retrieve(query, top_k=5) return await generate_with_claude(query, chunks) elif strategy == "ENTITY_GRAPH": # GraphRAG local search (entity-relationship queries) result = await local_search_engine.asearch(query) return result.response elif strategy == "GLOBAL_SYNTHESIS": # GraphRAG global search (corpus-wide synthesis) result = await search_engine.asearch(query) return result.response else: # Fallback to standard RAG chunks = await standard_rag_index.retrieve(query, top_k=5) return await generate_with_claude(query, chunks) ```

Using Claude Haiku 4.5 for the routing classifier costs $1/1M input tokens × ~100 tokens per routing call = $0.0001/call — negligible. The routing step adds ~100-200ms latency but prevents expensive GraphRAG global search calls ($0.27/query) from being triggered by simple lookup queries. At 100K daily queries with 10% needing global synthesis: without routing = 100K × $0.27 = $27,000/day; with routing = 90K × $0.01 + 10K × $0.27 = $900 + $2,700 = $3,600/day. Routing saves $23,400/day at this scale — it is not optional for production GraphRAG systems.

Production checklist

  1. 1

    Validate GraphRAG adds value before full corpus construction

    Build a small pilot: 50-100 representative documents, run the full GraphRAG pipeline, evaluate answer quality on 20 multi-hop and 20 lookup queries. Compare to standard hybrid RAG on the same queries. If GraphRAG wins on fewer than 30% of queries, standard RAG is likely the better investment. GraphRAG construction costs $18-180 per 1M corpus tokens — validate the ROI before committing.

    → Open the RAG cost per query calculator
  2. 2

    Customize entity types before indexing

    The default entity types (organization, person, geo, event) are generic. For domain-specific corpora, customize entity_types in settings.yaml before the first run. Bad entity types produce noisy extractions that cascade into poor community quality. Re-extraction after changing entity types requires a full re-index.

  3. 3

    Set temperature: 0 for all extraction LLM calls

    Entity extraction must be deterministic for reproducible graph construction. temperature: 0 ensures the same text produces the same entities on every run. Nonzero temperature causes drift in entity names and relationship descriptions across re-indexes, breaking entity resolution.

  4. 4

    Disable claim extraction in settings.yaml

    claim_extraction.enabled is false by default in GraphRAG v2 but verify it. Claim extraction adds 20-30% to construction cost and is rarely needed for standard RAG-style generation. Enable it only if your use case requires attributed factual claims with source citations per claim.

  5. 5

    Implement query routing before scaling beyond pilot

    Route LOOKUP queries to standard RAG, ENTITY_GRAPH queries to local search, and GLOBAL_SYNTHESIS queries to global search. Without routing, every query runs the most expensive path. Use Claude Haiku 4.5 for classification (~$0.0001/call) — routing pays for itself after one redirected global search call.

  6. 6

    Monitor graph construction token usage with a cost estimator

    Run a dry-run cost estimate before indexing a large corpus: count total corpus tokens with `tiktoken`, multiply by 2.5x for entity extraction overhead (the pipeline processes each token multiple times across extraction, summarization, and community steps). For corpora over 10M tokens, set up cost alerts in your Anthropic console before running the indexer.

Frequently Asked Questions

What is GraphRAG and how is it different from standard RAG?

Standard RAG retrieves chunks by embedding similarity — it answers lookup queries well but cannot synthesize across many documents or follow entity relationships. GraphRAG builds a knowledge graph from the corpus: entities (people, orgs, concepts) and their relationships are extracted with an LLM, organized into communities via Leiden clustering, and summarized. At query time, global search aggregates community summaries for corpus-wide questions; local search traverses entity-level subgraphs for relationship queries. The key difference: GraphRAG answers 'what are all connections between X and Y across the entire corpus' — a query that standard RAG cannot produce a coherent answer for. Source: Edge et al. (2024), arxiv.org/abs/2404.16130.

How much does GraphRAG indexing cost?

With Claude Sonnet 4.6 at $3/1M input + $15/1M output, construction costs approximately $15-25 per 1M corpus tokens. With GPT-4-class models at their pricing, the same pipeline costs $100-500/1M tokens (the published range). The most expensive steps are entity summarization (reading all text for each entity) and community summarization (reading all entity descriptions for each community). A 1M-token corpus (roughly 750K words) costs approximately $18 with Sonnet 4.6.

When should I use LightRAG instead of Microsoft GraphRAG?

LightRAG is faster to set up, 60-70% cheaper to index, and has a simpler codebase. Use it for: prototyping and pilot evaluation, corpora under 500 documents, teams without graph database experience, cost-sensitive workloads. Microsoft GraphRAG has more mature multi-level community detection, better global search quality on complex corpus-wide queries, and an Azure integration path. Use it for: production enterprise deployments, corpora where global synthesis quality is critical, teams with existing Parquet/dataframe infrastructure.

What is the Leiden algorithm and why does GraphRAG use it?

Leiden (Traag et al., 2019, nature.com/articles/s41598-019-41695-z) is a community detection algorithm for graphs. It partitions a graph into communities (densely connected subgraphs) by optimizing modularity. Leiden is preferred over the older Louvain algorithm because it guarantees that communities are internally connected — Louvain can produce disconnected communities in some cases. GraphRAG uses multi-level Leiden to produce a hierarchy of communities (broad at level 0, fine at higher levels) that supports both global and local search at different granularities.

Can I use GraphRAG with an open-source LLM instead of Claude or GPT?

Yes, but with caveats. GraphRAG's entity extraction requires the LLM to reliably output structured JSON. Smaller open-source models (Llama 3 8B, Mistral 7B) frequently produce malformed JSON and require prompt engineering and retry logic beyond what GraphRAG's default pipeline handles. Models at the 70B parameter range (Llama 3 70B, Mixtral 8x22B) perform well enough for entity extraction with appropriate prompting. For production quality comparable to GPT-4 or Sonnet 4.6, use a 70B+ model or a GPT-4/Sonnet-class hosted model. The construction cost savings from switching to a self-hosted 70B model are real but require GPU infrastructure investment.

What query types are NOT good for GraphRAG?

GraphRAG is overkill (and more expensive) for: (1) simple factual lookups — 'what is X's phone number', 'what is the capital of Y'; (2) queries over small corpora (<20 documents) where standard RAG produces good results; (3) queries where the answer is always in a single document; (4) real-time or frequently updated corpora where re-indexing overhead is prohibitive. Always run standard RAG as a baseline before building GraphRAG — many teams are surprised by how well standard hybrid RAG handles their query distribution.

How do I update the knowledge graph when documents change?

GraphRAG v2 supports incremental indexing: run `python -m graphrag index --root . --resume` to process only new and changed documents. Changed documents are detected by content hash. The pipeline re-extracts entities from changed documents and updates the graph. Full community re-detection is triggered after any graph update, which can be expensive for large corpora. For high-update-frequency corpora, consider setting up a separate 'recent documents' index with standard RAG and running GraphRAG indexing weekly or monthly on stable corpus segments.

How does GraphRAG compare to LlamaIndex KnowledgeGraphIndex?

LlamaIndex KnowledgeGraphIndex (in llama-index-core) offers a similar entity-relationship extraction pattern but without Leiden community detection or the global/local search dichotomy. It extracts entity triplets (subject, predicate, object), stores them in a NetworkX graph or a graph database (Neo4j, Nebula), and uses graph traversal for retrieval. It is simpler and cheaper than Microsoft GraphRAG but produces lower-quality global synthesis answers. LlamaIndex KnowledgeGraphIndex is a good starting point if you're already using LlamaIndex; Microsoft GraphRAG or LightRAG is the better choice if you're building a new GraphRAG-specific pipeline.

Build better prompts for graph-augmented generation.

GraphRAG retrieval surfaces relationships and themes. Claude's generation prompt determines how well those are synthesized into a coherent answer. Our AI Prompt Generator builds XML-structured Claude prompts for multi-document synthesis, entity-relationship answers, and corpus-wide analysis. 14-day free trial, no card.

Browse all prompt tools →