I've built an open source streaming library for async pipelines

1 ju-bezdek 1 6/2/2025, 10:07:56 AM github.com ↗

Comments (1)

ju-bezdek · 1d ago
I’ve been working with LLMs a lot lately, and one consistent UX bottleneck is inference speed.

Many tasks follow this pattern: process small chunks → batch for inference → split results again. Parallelizing helps, but naive asyncio.gather approaches often backfire—each stage waits on the slowest batch, killing responsiveness. Mixing fast per-item logic with slower batch steps needs smarter coordination.

Technical approach: Built a pipeline library that handles the streaming coordination automatically. Uses async generators throughout with intelligent queuing for order preservation when needed.

Architecture decisions:

Stream-first design: Results flow by default, with optional collection Flexible ordering: Choose between speed (unordered) and sequence (ordered) Memory efficiency: O(batch_size) memory usage, not O(dataset_size) Backpressure handling: Automatic coordination between fast and slow stages Error boundaries: Configurable failure strategies at task level.