Ask HN: How does your team triage an alert that spans multiple tools?
1 gurmehar_kaur 0 6/9/2025, 2:39:21 PM
I’m researching day-to-day incident workflows and trying to learn how engineering teams triage multi-tool incidents today.
If you’re on-call, I’d love to hear:
- First 5 minutes: which tabs/CLIs do you open and in what order?
- How do you jump from “Datadog spike” → “suspect commit” (or infra change)?
- Where does the investigation usually stall or loop?
- Any small hack that shaved meaningful time?
- If you could delegate one step to an intern (human or software), which would it be?
Context for transparency: mid-size SaaS, k8s, Datadog, PagerDuty, GitHub. Trying to map real pain points before I build anything.
War stories about your last 2 a.m. incident, screenshots, “we tried X and it bombed” are all welcome. Thanks so much!
No comments yet