Ask HN: How are Kafka or event-driven systems used in LLM infrastructure?

3 pella_may 0 7/12/2025, 4:41:15 PM
Curious how some event-driven technologies like Kafka (or alternatives) fit into the backend and/or infrastructure of large LLM providers.

Some of the questions I have in mind are more:

1. How do large LLM providers handle the flow of training data, evaluation results and human feedback? Are these managed through event streams (like Kafka) for real-time processing or do they rely more on batch processing and traditional ETL pipelines?

2. For complex ML pipelines with deps (eg. data ingestion -> preprocessing -> training -> evaluation -> deployment), do they use event-driven orchestration where each stage publishes some completion events or do they use traditional workflow orchestrators like Airflow with polling-based dependency management?

3. How do they handle real-time performance monitoring and safety signals? Are these event-driven systems that can trigger immediate responses (like model rollbacks) or are they primarily batch analytics with some delayed reactions?

I'm basically trying to understand how far the event-driven paradigm fits in modern AI infra and I would love any high-level insights if someone is (or has been) working with it.

Comments (0)

No comments yet