Blog, Learning Center January 10, 2024

7 Incident Response Metrics and How to Use Them

by Jeff Aldorisio
by Jeff Aldorisio

In a world where cybercriminals continuously evolve their threat methodologies, most security professionals believe that it’s no longer a question of “if” an organization will experience a data security event but rather “when” it will happen. As you work to better secure your IT stack, you need to ensure that you establish a robust incident response plan that provides quantitative data. As you look to mature your cybersecurity resiliency, understanding these seven incident response metrics and how to use them can provide you with a way to reduce risk and respond to incidents more efficiently.

1. Mean time to detect (MTTD)

MTTD is defined as the average amount of time your team needs to detect a security incident. To measure MTTD, you add up the total amount of time it takes your team to detect incidents during a given period and divide that by the number of incidents. You can use this to compare the effectiveness between teams or use it as a way to measure your current controls monitoring.

For example, if Team A reports 10 incidents in a month, and it takes 1000 minutes to detect, then this is how you calculate their MTTD:

1000/10 = 100 minutes to detect

Meanwhile, Team B might report 8 incidents in a month, but it takes them 1500 minutes to detect, their MTTD looks like this:

1500/8= 187.5 minutes to detect

Based on the MTTD, Team B takes 87.5 minutes longer to detect a security incident than Team A. Using this metric, you can look for ways to reduce the MTTD so that malicious actors spend less time in your systems.

2. Mean time to acknowledge (MTTA)

MTTA measures the amount of time between a system generating an alert and a member of your IT staff responding to the alert. While MTTD focuses on how long it takes to detect or get alerted to an incident, MTTA focuses on how long it takes to notice and start working on the problem.

The higher the MTTA, the longer it takes your IT staff to acknowledge an incident report. You can use this metric to track how well your security team is prioritizing alerts. If your team is unable to prioritize high-risk alerts, then it can take them longer to start working on remediating the risk. A lower MTTA means that your team is responding rapidly to security alerts, showing that they are prioritizing them well.

3. Mean time to recovery (MTTR)

MTTR is the amount of time it takes your staff to get an affected system back up and running again. MTTR gives you insight into how rapidly your incident response team can get you and any impacted customers back to normal.

For calculating MTTR, take the sum of downtime for a given period and divide it by the number of incidents. For example, if you had a total of 20 minutes of downtime caused by 2 different events over a period of two days, your MTTR looks like this:

20/2= 10 minutes

Now, if the downtime was 40 minutes total for the 2 incidents, your MTTR looks like this:

40/2 = 20 minutes

Since the second MTTR is larger, you can tell that you have a problem with your incident response processes. You can’t tell what the problem is, but you at least know where to focus further investigation.

4. Mean time to contain (MTTC)

MTTC brings together both MTTD, MTTA, and MTTR for a more holistic look at how well your organization responds to incidents. MTTC focuses on how long your incident response team takes to detect an incident, acknowledge the incident, and effectively prevent a cybercriminal from doing more harm.

To calculate MTTC, take the sum of the hours spent detecting, acknowledging, and resolving an alert, and divide it by the number of incidents.

For example, if your organization experienced 2 incidents that took 3 hours each to detect, 2 hours each to acknowledge it and 5 hours each to resolve, your MTTC would look like this:

(2+2+3+3+2+2)/2 = 14/2 = 7 hours

Many consider MTTC one of the most important incident response metrics because a low MTTC gives a holistic look at how your team works together. If the MTTC is high, then you want to start drilling down into which area – detection, acknowledgment, or recovery – is the weakest link.

5. System availability

Your interconnected IT ecosystem also includes vendors whose incident response programs need monitoring. Since you’re not “on the ground” with your vendors, you need to find ways for measuring third-party processes.

System availability gives you insight into how well your cloud-services are managing their products. This metric focuses less on how well your organization manages incidents and more on how your vendors manage incidents.

Often, a security incident such as a Distributed Denial of Service (DDoS) attack takes cloud services offline. The system availability metric gives you a way to measure the reliability of your services provider as well as visibility into how rapidly they contain an incident. The higher the system availability metric, the more reliable the service is. For example, a service with a system availability of 90% is more reliable than a service with 80% availability.

6. Service level agreement (SLA) compliance

Another metric for measuring third-party incident response maturity is comparing the clauses in your SLAs with the reality of the service provided. Your SLA might include service levels and expected responses. As part of the SLA, you might be measuring availability and recovery time. You then can compare the actual service availability and actual MTTR to those listed in the SLA. If the vendor is not meeting the requirements, either having availability issues that exceed the maximum allowance or taking to long to resolve an incident, you may want to start looking for another vendor.

7. Mean time between failures (MTBF)

MTBF measures the time between system failures during normal operations. You can measure MTBF by taking the total number of hours that systems operation during a specified period then dividing that number by how many failures occurred during the same period.

For example, if your database runs for 5000 hours over the course of a month and experiences 3 failures, your MTBF is:

5000/3= 1,667 hours between failures

A different application running for the same 5000 hours has 8 failures during the same period. For that application, your MTBF is:

5000/8 = 625 hours

Because the MTBF is lower for the second system, you can see that it causes more outages during the same period as the first system. The lower the MTBF, the more maintenance the system needs.

While MTBF is often used when looking to compare technology life expectancies prior to purchase, you can use MTBF after you purchase the product as an indicator of end of life. The older a system is, the more often it may fail.

When applying this to cybersecurity and incident response, you can look at MTBF as an indicator of an end of life system which might be at a higher risk of being the weakest link in an IT stack. This can help you find a starting point when engaging in forensic research.

SecurityScorecard enables organizations to measure incident response effectiveness

SecurityScorecard’s security ratings platform continuously monitors your IT ecosystem, providing easy-to-read A-F security ratings. Our platform takes ten categories of risk factors into account, including IP reputation, DNS health, endpoint security, network security, patching cadence, web application security, social engineering, hacker chatter, and information leakage.

Our continuous monitoring tool provides real-time alerts to new risks affecting your ecosystem, prioritizing the alerts based on risk level, and suggesting remediation activities. With our platform, you can more effectively respond to security incidents or potential incidents and support your strategies with traditional incident response metrics.

Request a Demo