The size of hard drives, logs, and other data sources has grown immensely in the past few years. I’ve had many different roles within the DFIR (digital forensics and incident response) space, including SOC analyst, incident responder, and forensic examiner, and this massive increase in available data poses challenges in all of those areas. Fully combing through a multi-terabyte hard drive takes longer than smaller drives. Intrusion investigations can rapidly balloon from one computer to many, as attackers become more sophisticated and move around their victim’s environment. Many intrusion or breach investigations can span dozens (or even hundreds) of devices. Companies are increasingly getting better about logging; both by collecting from more sources and by logging more verbosely.
Much of this increase is great from an investigator’s perspective; more data means we can potentially answer more of our questions. However, this deluge of data is a double-edged sword. Investigators increasingly have to work to find ways to limit the data they are looking at to keep it manageable. For example, creating a “super timeline” is a common technique while doing a detailed investigation on an endpoint, but now these timelines can get so voluminous that a whole array of methods have popped up to reduce the number of entries and hone in on data that is actually relevant to the investigation. Likewise, incident responders sometimes have no choice but to just grab “triage” information (a handful of artifacts with the most “bang for your buck”) from hosts, because doing full acquisitions takes too long and produces too much data to be able to review in the available amount of time. SOC analysts are often drowning in logs and alerts, and many times the logs are in a wide range of locations and formats, which can make answering seemingly simple questions very time consuming.
My personal approach to tackling this challenge as an investigator has been a combination of scripting, data science, and experience. In many engagements that I worked on I created Python scripts to parse artifacts or otherwise transform data from one form to another. I would use various techniques to try and find trends or spot anomalies in the data. I spent hours poring through logs searching for what didn’t belong or for what my experience had taught me “didn’t feel right”. I would start with one source of data and use any leads I found to try and pivot into another data set to answer my questions. The whole process can become very convoluted, with lots of twists and turns and dead ends and “Eureka!” moments, which is why the field can be so exciting and frustrating at the same time. I think this contributes to the feeling that DFIR is both an art and a science, as a combination of both can seem essential to closing a case.
I increasingly began to realize that the DFIR industry needs to embrace more methodical ways of spotting outliers. In many respects, spotting what doesn’t belong is what an investigation can be distilled down to. Data science is a field that is growing every day, as a wide range of industries are realizing that the vast amount of data available to them is just too valuable not to utilize it to the fullest. DFIR is no different. I read a few data science books and took some classes, and the more I learned, the more I was convinced that applying these kind of techniques would be essential for anyone to be successful in this field.
I didn’t know that the user and entity behavior analytics (UEBA) field even existed a few months ago, but once I found out about Exabeam and what their product does I was very intrigued. Some companies that I had worked with previously had problems creating general baselines for their environment; Exabeam creates individual baselines for each user. During past IR engagements, we often would find something that looked wrong or unusual and call our client, just to be told it was expected behavior (sometimes this was true, and sometimes our contact was mistaken). Exabeam can answer the question of what is expected and what is anomalous with actual data, without being influenced or colored by opinion. This pushes DFIR more towards science, and in my opinion that is a great thing. Scientific and fact-based approaches can be automated and sped up, while art takes both time and experience. Speed is essential both for wading through oceans of data and for catching threats before they can spread. Unifying different log sources and formats, compiling statistics on typical behaviors, and performing calculations to determine what is “abnormal” are all things computers can do very well and very fast. I saw in Exabeam a fusion of data science, automation, and security knowledge that could greatly speed up investigations and identify threats that otherwise would be missed. I saw a great product that could push the evolution of DFIR forward, and I wanted to help add my knowledge and experience to make it even better. I wanted to be part of the solution.
And that is why I joined Exabeam.
(Ryan Benson, a Senior Threat Researcher at Exabeam, previously worked as a Manager on the Digital Forensics team at Stroz Friedberg, and as an Incident Analyst at Mandiant. He is a GIAC Certified Forensic Analyst and Certified Incident Handler.)