CocoIndex – open-source ETL saves you >90% compute for AI workloads

4 georgehe9 1 8/12/2025, 3:11:08 PM github.com ↗

Comments (1)

georgehe9 · 1d ago
Hi HN, I’m George — I left Google after 10 years working on infrastructure and am building CocoIndex https://cocoindex.io with my friend Linghua.

CocoIndex is an open source ETL framework that does incremental processing designed for AI workloads. We cut >90% of compute costs by processing only what’s changed — effortless fresh context for AI.

It is easy to build scalable, production-grade pipelines like Lego in hours. Think it as n8n with python blocks but for large scale RAG pipelines.

You can build vector index, knowledge graph and custom logic with any modality in the pipeline with AI. To get started, you can run `pip install -U cocoindex`.

We’ve built 10+ examples to get you started. If you prefer to read blog: https://cocoindex.io/blogs/tags/examples

If you prefer to read code: https://github.com/cocoindex-io/cocoindex?tab=readme-ov-file...

We’ve made 75+ releases since our last launch.

Looking forward to learning your thoughts! Best, George