STAR method · behavioral interview · worked example
“Tell me about a time you handled a crisis or a production incident.”
STAR answer for crisis management and incident response questions — example for engineering and DevOps interviews.
Use this as a model — then adapt it with your own specific situation and measurable outcomes.
Situation
At 11:45 PM on a Friday, our primary database ran out of disk space due to an uncontrolled log table growth, taking down all write operations across our entire application — affecting approximately 8,000 active users.
Task
I was the on-call engineer. I had to restore service as quickly as possible, communicate transparently with stakeholders, and prevent recurrence — all without making changes that could cause secondary failures.
Action
Within five minutes of the alert firing, I confirmed the root cause using our monitoring dashboard and connected to the database host. I identified that the audit log table had grown from 40GB to 650GB in 72 hours due to a misconfigured logging verbosity deployed that Thursday. I did not delete logs blindly; I first archived them to S3 using a non-blocking copy, then truncated the table safely. Service was restored in 22 minutes. I immediately posted a status update to our incident Slack channel, emailed a brief summary to the affected enterprise client at 12:30 AM, and filed a detailed post-mortem by 9 AM Saturday covering: what happened, why the alert didn't fire sooner, and three specific changes to prevent recurrence.
Result
Service was restored in 22 minutes with no data loss. The enterprise client responded to my email thanking me for the proactive communication. The post-mortem changes — disk usage alerting at 70% + log retention policy — were implemented the following week. No disk-related incident has occurred since.
The free STAR Builder tool helps you structure a complete answer for any behavioral question, then scores it on specificity, relevance, and impact. No sign-up required.