Weiser: A Lightweight, OSS, AI-Friendly Data Quality Framework

1 pacofvf 1 7/2/2025, 6:12:23 PM weiser.ai ↗

Comments (1)

pacofvf · 18h ago
After becoming frustrated with the difficulty of implementing reliable and transparent data quality checks, I developed a new framework called Weiser. It’s inspired by tools like Soda and Great Expectations, but built with a different philosophy: simplicity, openness, and zero lock-in.

If you’ve tried Soda, you’ve probably noticed that many of the proper checks (like change over time, anomaly detection, etc.) are hidden behind their cloud product. Great Expectations, while powerful, can feel overly complex and brittle for modern analytics workflows. I wanted something that falls between lightweight, expressive, and flexible enough to integrate into any analytics stack.

Weiser is config-based; you define checks in YAML, and it runs them as SQL against your data warehouse. There’s no SaaS platform, no telemetry, no signup: just a CLI tool and some opinionated YAML.

Some examples of built-in checks:

1. Row count drops compared to a historical window

2. Unexpected nulls or category values

3. Distribution shifts

4. Anomaly detection

5. Cardinality changes

The framework is fully open-source (MIT license), and its goal is to be both human- and machine-readable. I’ve been using LLMs to help generate and refine Weiser configs, which work surprisingly well, far better than trying to wrangle pandas or SQL directly via prompt. I already have an MCP server that works well but it's a pain in the ass to install it in Claude Desktop, I don't want you to waste time doing that. Once Anthropic fixes their dxt format, I will release a MCP tool for Claude Desktop.

Currently it only supports PostgreSQL and Cube as datasource, and for destination for the checks results it supports postgres and duckdb(S3), I will add snowflake and databricks for datasources in the next few days. It doesn’t do orchestration, you can run it via cron, Airflow, GitHub Actions, whatever you want.

If you’ve ever duct-taped together DBT tests, SQL scripts, or ad hoc dashboards to catch data quality issues, Weiser might be helpful. I would love any feedback or ideas. It’s early days, but I’m trying to keep it clean and useful for both analysts and engineers. I'm also working on a better GUI.

GitHub: https://github.com/weiser-ai/weiser Docs: https://weiser.ai/docs/tutorial/getting-started

I'm happy to answer questions or hear about what other folks are doing to address this problem.