Show HN: A real world streaming data generator in Python

1 ashishbagri 0 5/9/2025, 7:21:48 AM github.com ↗
I've built GlassGen to solve the common problem of generating real time synthetic data for testing, demos, and ML datasets. While Faker is great for individual data points, GlassGen adds:

- Configurable data publishing (CSV, Kafka, Webhooks)

- Precise rate control (records/second)

- Controlled data duplication

- Extensible architecture for custom generators and sinks

Key features:

- Built on top of Faker for reliable data generation

- Simple JSON/YAML configuration

- Support for complex data relationships

- Real-time data streaming to Kafka

- Custom sink implementations

GitHub: https://github.com/glassflow/glassgen

Docs: https://glassgen.glassflow.dev/

Would love feedback from the community, especially on:

1. Additional sink types that would be useful

2. Performance optimization opportunities

3. Ideas for handling more complex data relationships

Comments (0)

No comments yet