Blog, Learning Center June 7, 2021

KPIs for Security Operations & Incident Response

Creating a resilient cybersecurity program means understanding your current security posture and knowing what you want your future cybersecurity posture to look like. In today’s constantly changing threat landscape, cybersecurity maturity is the best way to mitigate security incident risks. However, understanding maturity means knowing where you started so that you can evaluate where you end up. With that in mind, senior leadership needs to work with the security team to create key performance indicators (KPIs) for security operations and incident response.

Mean Time to Detect (MTTD)

This metric may be the most important both for measuring your security program’s effectiveness and for mitigating risk overall. MTTD is the average amount of time it takes your security team and technologies to notice abnormal behavior that indicates potentially malicious, suspicious, or risky behavior in your ecosystem. The lower the MTTD, the better your program is.

As part of measuring this, you want to make sure that you continuously fine-tune the systems that alert your security staff because too many false positives can increase the time it takes to detect a true risk.

Alarm Time to Triage (TTT)

According to one Palo Alto research report, the average security operations team received over 11,000 alerts per day. However, not all risks and alerts are created equally. Your team needs to be able to rapidly prioritize the alerts that indicate the highest risk to your organization’s data. During this stage, the security team is looking for high, medium, and low-risk alerts. The faster they can triage and prioritize the alerts, the sooner the incident response team can acknowledge the alert. All of this ultimately leads to a faster process, reduced risk, and more resilient program.

Alarm Time to Qualify (TTQ)

The triage process naturally leads to the qualification process. In some cases, your security operations team may determine that an alert qualifies to be moved to the incident response team. In some cases, this KPI overlaps with the meantime to acknowledge (MTTA) because the incident response team can’t start acknowledging and moving the research forward until the security operations team qualifies it.

Mean Time to Acknowledge (MTTA)

Related to MTTD, this metric tells you the average amount of time it takes your security operations and incident response team to acknowledge an alert before they begin doing an investigation. This is the reason you want the MTTD to be lower. The better the alerts your team gets, the sooner they can acknowledge real threats. You want to keep the MTTA as low as possible as part of creating KPIs.

Mean Time to Investigate (MTTI)

This metric is the average amount of time it takes the incident response team to investigate an alert after acknowledging it. This is the second most important KPI for your security operations and incident response teams. The longer it takes to investigate an alert, the more time malicious actors have to embed themselves in the organization’s systems. Once malicious actors have found a way to hide in your ecosystem, your team will have a harder time resolving, recovering, and

Mean Time to Resolve (MTTR)

Once your team acknowledges the threat, they need to resolve it. MTTI focuses on the amount of time it takes the incident response team to get from the investigation to the recovery step. This is another KPI that you want to have a low number. The resolution may be that the alert was a false positive or it may be that the team had to eradicate a threat and recover a system. In either case, the sooner that the team can complete all the necessary resolution steps, the stronger your cybersecurity posture is.

Mean Time to Contain (MTTC)

MTTC is the amount of time it takes the security team to locate the threat actors and prevent them from moving further into your systems and networks. For example, containment may be quarantining an email account, resetting a user password, or shutting down a server. Containment is the first step toward recovery. The faster your team contains the threat actor, the more resilient your program is.

Mean Time to Recover (MTTR)

Your teams haven’t completed their security and incident response processes until they have recovered the affected system back to its pre-incident state. At the end of the day, this is probably the third most important metric because it functionally incorporates the MTTD, MTTI, MTTR, and MTTC.

Cost per incident

According to the NetDiligence 2020 Claims Study the average cost per incident for small and medium-sized organizations was $175,000 and for large enterprises $9.2 million. Because not all security incidents are data breaches, it’s important to consider the amount of downtime, resources, and other activities associated with security incidents when calculating this amount. If your organization can prove a reduction in costs over time, you can use this as a KPI that aligns with other metrics like MTTR for data around your program’s maturity.

Number of incidents per device or host

This KPI gives you visibility into how well your organization is monitoring and mitigating risks. For example, if you have certain devices or hosts that are more likely to experience an incident, then your teams should be considering the reason these are an increased risk. They might need to install security patches or place anti-virus monitoring on the endpoint.

Mean Time Between Failures (MTBF)

Although generally considered a measure of a system’s reliability, MTBF is the average time between system outage or repairable failures. From a security standpoint, organizations can look at MTBF as a metric for understanding both system reliability and ability to mitigate data security incident risk since certain attack types, like a Distributed Denial of Service (DDoS) attack align with a system outage. As long as the organization can separate out software bugs and security incidents, then this can be a useful metric.

SecurityScorecard: Enabling security operations and incident response teams

SecurityScorecard’s security ratings platform enables security operations and incident response teams by giving them visibility into cybersecurity risks across their hyperconnected ecosystem. Our security ratings give at-a-glance visibility using an A-F rating scale so that teams can more rapidly detect secure issues so that they can take proactive risk mitigation steps.

Our security ratings platform continuously monitors your ecosystem across ten categories of risk and prioritizes alerts so that your teams can strengthen security before malicious actors can gain access to your systems. With our prioritized alerts, your organization can reduce the amount of time it takes your security teams to triage and qualify alerts, ultimately leading to a more streamlined, resilient security operations and incident response program.

Take control of your cyber security posture with SecurityScorecard