Hidden SOC Killers: Three Signs Your SOC is Broken
I’m passionate about the development of effective security organizations—especially security operation centers (SOCs).
I’m passionate about the development of effective security organizations—especially security operation centers (SOCs). Building my ethos over the years, I’ve found looking closely at organizational beliefs, behaviors, industry norms, and prior decisions is critical. Leading with vision, emotional intelligence (being more than a SOC technician), and with an understanding of the big picture are all required.
I’ve also learned there are signs to what could end up becoming the killer of your SOC. Here are three that I’ve seen a lot recently:
The investigative failures of using Notepad
Yup, Notepad. It’s interesting that with all the money spent on information security departments, Notepad is the most widely used application. Why?
Observe how often analysts use Notepad to copy and paste details from their SIEM and numerous point solutions to manually find out what happened before, during, and after an event —otherwise known as manual timeline creation.
It’s a symptom of two larger issues:
- There’s no single investigative window pane. Why? The investigative process focuses on too many point solutions.
- The full scope of an incident is often never known. Why? The investigations can’t be closed because you’re unable to see the complete security story.
The result becomes error-prone investigations, analytic inconsistences, and exhausted analysts.
Figure 1- An example of an investigation using Notepad
Now there’s nothing inherently wrong with Notepad, but if it’s being used, understand less than favorable outcomes will happen— and be aware it will lead to analyst burnout— which takes us to our next topic.
Alert fatigue: The analytic failure of having too many high severity alerts
High alerts should be addressed first, right? Maybe, or maybe not.
If you receive 1,000+ alerts a day, how are they prioritized and investigated? What does your investigative process look like? In many cases, trying to respond to mountains of high-severity alerts is the worst way to run a SOC! It’s also a recipe for madness and staff turnover.
If an investigation is triggered by every high alert from your point solution, you have problem. It’s even worse if every alert opens a ticket. Too often this is the practice I see in the field.
This formula creates a whack-a-mole investigative culture. Your staff will end up being so busy closing tickets, they won’t be able to perform a complete investigation. While they’re putting out one fire, they can’t see the full picture of the larger attack.
Whack-a-mole cultures also tend to avoid good investigative hygiene. Often, I observe a failure to review which credentials are involved, and if the adversary has moved on and infected other assets. So, lateral movement and the low-and-slow attack process (the core to any major breach) isn’t even part of the investigation.
Another favorite is getting a high severity ticket from a managed security service provider (MSSP) with only a malware alert and an IP address. This is an unenriched event that requires manual enrichment from your staff. I too have received unenriched tickets, only to ask these questions:
- What’s the host name and who was logged into the system at that time?
- Why doesn’t the MSSP provide this important information?
- What happened before, during, and after the event in question?
High severity tickets, lacking enrichment, are a great way for everyone to miss lunch and dinner—unless, you blindly reimage and move along, leading to the next common failure.
A response failure: Reimaging
When events aren’t enriched and the baseline behavior isn’t known, each investigation ends up taking too much time to do manually, while the clock keeps ticking. With a whack-a-mole culture, analysts are overwhelmed and unhappy, which leads to the only available option— “just reimage it.”
I call this the “ignorant reimage”—too much useful detail will remain forever unknown—unless you have help. A system reimage might be part of your response, but it shouldn’t be confused as an investigative procedure, unless you can answer these questions:
- What occurred before, during, and after the alert?
- Did this infection or attack jump to other systems?
- What credentials were involved, and were they used on other systems?
Never ignore the impact of compromised credentials. Your SOC must:
- Understand if an account was involved in an incident.
- Have a baseline of normal behavior.
- Perform a complete response procedure.
Until you understand the entire picture, you’ll never be able to respond effectively.
Leadership and the costs of creating friction on your team
I believe your job is to make your staff happier—and more expensive. By this, I mean your team becomes increasingly valuable as you develop and invest in their skills over time.
Allowing operationally wasteful processes to continue day after day, causing friction and burning through unnecessary analyst cycles, doesn’t make anyone valuable. Ignoring the long-term human effects of too much information, too much wasted time, too many alerts, and too little context only serves to stunt the growth of your staff—and your career as a leader.
Listen to your staff complaints. Here are some common ones:
- “The log query took four hours. I didn’t get to bed last night until 3:00 am.”
- “We don’t have logs for that, so I don’t know which system is infected.”
- “The ticket from the MSSP is junk and not helpful.”
- “The second shift didn’t put the machine name in the ticket.”
- “The new guy didn’t follow our process.”
As a leader, it’s your job to reach for the aspirational. Create capabilities that differentiate your security program from your industry peers—but first, reconsider the high friction, manual, and incomplete processes of your team’s operations.