Escaping Dante’s SOC Inferno: The Violence of Destructive Metrics
Welcome to our third post in the Dante’s SOC Inferno series, in which we’ll be exploring the horrific acts that can happen with SOC metrics that lead to bad behavior and poor security practices.
SOC analysts are often being measured by what are frankly bullsh*t metrics, driven by out-of-date thinking and tools that don’t drive a change of practice. Even worse, bad metrics mean threats aren’t remediated.
If you’re being measured by “number of ticket closures” or “mean time to resolution” or “same-day ticket closure” you’ve probably at least been tempted to focus on closing incidents that are easy to resolve and not related to real threats. Or closing out incidents that weren’t completely investigated, never mind resolved. This type of action clearly does nothing to drive improvements in your organization’s security posture — nor does it make you feel good about your work quality.
All too often the excuse for not changing metrics is “Well, this is what we’ve always measured” — or worse, “Well this is what we report to leadership, and they haven’t asked for anything different.” Or “This is what I used to track in my previous job,” or, “I asked around, and this is what many SOCs are tracking,” or even worse for the vendor community, “These are the only metrics that my tool can give me.”
Poor metrics, and the behaviors they drive, can cause considerable harm. We’re calling this the Seventh Circle of SOC Hell, and we’re going to help you escape it.
As a SOC analyst, here are some B.S. metrics which are hurting both you and your org
B.S. Metric #1: Whack-a-mole security caused by MTTR. Mean time to resolution (MTTR) is supposed to show the time it takes to eradicate a threat once it’s discovered. For some analysts, this means closing out incidents without any real investigation just to drive down their MTTR. For example, it’s faster to take a compromised machine off the network, wipe the data and then redeploy it rather than to identify what caused the system to fail and determine whether other devices are exposed to the same threat. Sure, the MTTR numbers will look good, but your organization’s security posture is worse. This is a BAD metric that leads to BAD behavior.
B.S. Metric #2: Closing more incidents ≠ better security. Juicing up stats is not a winning move. Some analysts under pressure to prove to their bosses that they’re actually working can get pretty creative with metrics. I once discovered an analyst juiced up her stats by continually closing the same incident. Worse, the incident she was closing was based on alerts fired off when one of her executives was simply badging into the facility. Eventually, this type of behavior is going to be found out — so don’t do it!
B.S. Metric #3: Metrics used in isolation are dangerous. Another common mistake is using metrics without any context. For example, dwell time measures the length of time a threat actor goes undetected in an environment. Typically, the lower the number, the better. But focusing only on operational metrics for some aspects of the SOC, such as dwell time or MTTR will paint a partial and often misleading picture of the SOC’s efficiency for the organization.
B.S. Metric #4: Counting SIEM data feeds creates a false sense of security. It may seem like measuring the number of data feeds into a SIEM is a good idea, and the more feeds, the better. But more data doesn’t mean better protection. More information can often lead to more false positives, especially if you’re heavily reliant on correlation rules.
B.S. Metric #5: True positive rates aren’t always better. Some security leaders incorrectly believe that reporting high true positive rates is good enough. But, the correct outcome of any alert should be the resulting positive steps taken to eliminate the threat, not just a metric that says you’ve correctly identified a threat. And true positives are only one side of the coin — what about false negatives?
Escaping the Bullsh*t Metrics Circle of SOC Hell
Here are some tips for better metrics:
Tip #1: Share operations and impact-focused metrics. Metrics should show how you’ve uncovered problems and taken actions that had a positive impact on the business. A less mature SOC views the R in MTTR as the initial response. A more mature SOC believes the R is the final incident remediated which includes the investigation, root cause analysis and final resolution. Think about your day-to-day — would you rather be truly resolving a security incident, or closing incidents out quickly just to keep your stats up?
Tip #2: Tie SOC metrics to overall risk management. Ideally, your SOC metrics should ladder up to your company’s risk management framework. That framework should identify systems that are critical to ensure the ongoing operations of the business. For example, let’s say your company conducts some of its business online. Your SOC effectiveness metrics could tell the story of how your company was exposed to a specific threat that could have caused up to $10M in damages by exploiting a vulnerability in your online business application. However, based on the actions taken by the SOC team, the company only suffered a minor loss of $250K due to the time it took to patch and remediate the vulnerability.
Tip #3: Use metrics to demonstrate your value to the business. Executives and business leaders aren’t impressed by the number of security alerts you responded to in an hour. Instead, they want to hear how your efforts ensured the adoption of a new capability for one of your company’s subsidiaries, or how you helped your sales team close a deal by being involved with the client evaluation or how you helped fulfill customer requests.
Tip #4: Use trend data to spot patterns. Rather than relying on point-in-time metrics like the number of incidents closed in a month, start collecting trend data. For example, by tracking recurring incidents over a period of time, you may be able to find a recurring incident that could be a good candidate for automation.
Tip #5: Mean time to Answer (MTTA). This is something we talk to organizations about a lot. Say someone comes to you with a question — such as “I read about <pick your threat/attack/breach of choice> in the <pick your publication of choice>. Are we impacted or at risk?”. How long does it take you to answer that question? Can you answer that question?
Think about how many questions are also asked every day as part of normal response activities. Let’s take the example of a malware infection. Organizations should consider understanding MTTA by figuring out how long it takes to answer the following questions:
- What triggered the malware?
- Was AV running. If not, why?
- What are the malware components?
- Has the malware been seen before by insiders or outside the network?
- Were there any abnormal network activities after the infection?
- Were there any IOC indications from other security tools?
- Was there any evidence of the endpoint being changed (Registry, Scheduled Tasks, etc.)?
- Where did the malware come from (web, USB, email)?
- Was any data taken?
This is not an all-encompassing list but should get you started by measuring the time it takes to answer each of these questions. Understanding these metrics will help reduce MTTR (remediate) and allow SOC management to understand which questions take the most time to answer. Once we know what is taking our team the longest, we can evaluate if we are asking the right questions. We can also determine if there are opportunities to bring automation in to assist with answering these questions and driving overall MTTA down.
A SOC should never fall into the trap of sharing metrics for the sake of reporting. And analyst friends definitely (obviously!), don’t try to game the system, even if it’s seemingly letting/rewarding you for doing so. Instead, show your boss and your boss’ boss that you know what you’re doing and how you’re making a positive impact. And please do go ahead and use this post to have a discussion with your leadership about implementing changes to how security operations are being measured today, to help them escape the Hell of poor metrics too.
(Note to your boss: Hi! We hope this post is useful to you. And BTW we’re not suggesting you immediately scrap all our metrics and start over, but why not kick off an assessment of whether our metrics drive improvements in our security posture? Find out if the measurements you have in place today are really helping drive operational improvements and are representing the brilliant capabilities and efforts of our team.)
At Exabeam, we’ve helped a lot of SOC teams transform their operational metrics, which in turn creates a healthier environment for analysts to truly thrive and put their awesomeness to the best use.
Harmful metrics should not be your Hell.
Fancy taking a peek into what a malware investigation looks like with Exabeam? Check out this video to learn more.
Be sure to check out our other articles in this series: