Ask HN: How does your team triage an alert that spans multiple tools?

1 gurmehar_kaur 0 6/9/2025, 2:39:21 PM
I’m researching day-to-day incident workflows and trying to learn how engineering teams triage multi-tool incidents today.

If you’re on-call, I’d love to hear:

- First 5 minutes: which tabs/CLIs do you open and in what order?

- How do you jump from “Datadog spike” → “suspect commit” (or infra change)?

- Where does the investigation usually stall or loop?

- Any small hack that shaved meaningful time?

- If you could delegate one step to an intern (human or software), which would it be?

Context for transparency: mid-size SaaS, k8s, Datadog, PagerDuty, GitHub. Trying to map real pain points before I build anything.

War stories about your last 2 a.m. incident, screenshots, “we tried X and it bombed” are all welcome. Thanks so much!

Comments (0)

No comments yet