Show HN: Per row context understanding for data transformations

1 abhijithneil 0 9/4/2025, 8:57:30 PM github.com ↗
Traditional databases rely on RAG and vector databases or SQL-based transformations/analytics. But will they be able to preserve per-row contextual understanding?

We’ve released Agents as part of Datatune:

https://github.com/vitalops/datatune

In a single prompt, you can define multiple tasks for data transformations, and Datatune performs the transformations on your data at a per-row level, with contextual understanding.

Example prompt:

"Extract categories from the product description and name. Keep only electronics products. Add a column called ProfitMargin = (Total Profit / Revenue) * 100"

Datatune interprets the prompt and applies the right operation (map, filter, or an LLM-powered agent pipeline) on your data using OpenAI, Azure, Ollama, or other LLMs via LiteLLM.

Key Features

- Row-level map() and filter() operations using natural language

- Agent interface for auto-generating multi-step transformations

- Built-in support for Dask DataFrames (for scalability)

- Works with multiple LLM backends (OpenAI, Azure, Ollama, etc.)

- Compatible with LiteLLM for flexibility across providers

- Auto-token batching, metadata tracking, and smart pipeline composition

Token & Cost Optimization

- Datatune gives you explicit control over which columns are sent to the LLM, reducing token usage and API cost:

- Use input_fields to send only relevant columns

- Automatically handles batching and metadata internally

- Supports setting tokens-per-minute and requests-per-minute limits

- Defaults to known model limits (e.g., GPT-3.5) if not specified

- This makes it possible to run LLM-based transformations over large datasets without incurring runaway costs.

Comments (0)

No comments yet