Show HN: CocoIndex – Open-Source Data transformation for AI, only process delta

4 badmonster 1 6/15/2025, 5:24:01 PM github.com ↗
Hi HN community,

I’ve been working on CocoIndex, an open-source data transformation framework for AI. it supports incremental processing and only processes what’s changed, for data freshness especially when resources cannot catch up.

With 100 lines of python you can easily set up a production grade pipeline from vector embeddings, knowledge graphs, structured extraction etc.

Once you bring the pipeline as live mode, it automatically detects source change, and keeps the target store updated, with minimal amount of processing.

I’m building this framework to be as simple as possible to develop with, yet production-ready from day one — combining the best practices and lessons I’ve learned over the past years.

Previously, I was a tech lead at Google on projects like search indexing and ETL infra for 8 years. After I left Google last year, I went through pivoting hell. In all the projects I’ve built, data still sits in the center of the problem. I need to write custom logic to prepare data for AI and I need the data to stay fresh for AI to generate effective decisions or recommendations. So I started CocoIndex.

Getting started: https://cocoindex.io/docs/getting_started/quickstart

Examples: https://github.com/cocoindex-io/cocoindex/tree/main/examples

If you prefer to read: https://cocoindex.io/blogs/tags/examples/

Would love to learn your thoughts. Thanks! Linghua

Comments (1)

badmonster · 18h ago
The core framework is written in Rust because it is performant, robust, and can bind to different languages. Currently it has a Python SDK. I’m thinking of supporting typescript soon. Would love to hear your thoughts, thanks