Added Token and LLM Cost Estimation to Microsoft's GraphRAG Indexing Pipeline

1 KhaledAlam 1 5/6/2025, 9:11:35 PM blog.khaledalam.net ↗

Comments (1)

KhaledAlam · 6h ago
Microsoft’s open-source GraphRAG project lacked a way to estimate token usage and LLM cost prior to running the indexing pipeline. I recently contributed a feature that adds a CLI flag (--estimate-cost) which previews token counts and cost estimates for both embedding and summarization steps.

It simulates chunking using the same logic as GraphRAG’s actual pipeline, pulls live model pricing from a hosted JSON, and includes output token projections. The user is also prompted whether to proceed with full indexing after seeing the estimates.

This is particularly useful when working with large corpora or limited OpenAI quotas.

Blog post (with technical deep dive and lessons learned): https://blog.khaledalam.net/how-i-added-token-llm-cost-estim...

GitHub PR: https://github.com/microsoft/graphrag/pull/1917