Show HN: LLM Based Spark Profiler
27 ambrood 5 4/9/2025, 8:42:45 PM datasre.ai ↗
Hey HN,
Spark event logs run into 100s of MBs and offer a wealth of insight into your workloads but making sense of them has always been quite a bit prohibitive. We’ve recently built a lightweight tool that automatically parses Spark event logs and surfaces targeted insights to help you optimize your data jobs.
Whether you’re chasing down a bottleneck or balancing performance vs. cost, the profiler got you covered with real-time configuration recommendations, data skew analysis, and more.
Curious how it works in action? Check out this quick Loom video for a walk-through: https://www.loom.com/share/07348eb54f6b440da93f96753937792a?...
We’d love your feedback — check it out at https://app.datasre.ai and let us know what you think!
Does it suffer from the same issue as other LLMs, where it will always identify potential optimizations or improvements even if none are truly needed?
We do quite a bit of aggregation over the log file, and generate summary stats and choose what bits to stuff in the LLM. Plan to support more platforms than just spark.
> Does it suffer from the same issue as other LLMs, where it will always identify potential optimizations or improvements even if none are truly needed?
Funnily enough, instructing sonnet-3.7 to not suggest unnecessary optimisations seems to have done the trick!