InfoSec Trends › Data Science and Security Research: Two Parts of the Whole

Data Science and Security Research: Two Parts of the Whole

Published
January 26, 2016

Reading time
3 mins

My friend, and Exabeam’s Chief Data Scientist, Derek Lin, previously wrote a blogpost about the wrong and right ways for data science and security operation teams to interact. In this post, I would like to expand on that idea and talk about the nature of the two disciplines, their complementary aspects, and how each is indispensable for meaningful security analytics.

Data Science

Data science has done wonders in many vertical domains, from retail to marketing to biology. It is only natural to try and apply it to the security domain, especially when traditional rule based approaches have reached their limits. However, the security domain has some unique aspects which makes it more challenging for data science to consume directly.

A major difference between security and other domains is that in other domains the result that is required from the data science is usually well-defined, e.g. the existence of a disease or lack thereof, the list of movies a user is likely to watch, etc. In security analytics, however, the required output is the identification of “malicious behavior.” This definition is subjective and much more vague. Analyzing results is also difficult as there is usually more than one way to explain unusual behaviors (more on that here).

Another challenge is the wide gap between the raw security data and the data that algorithms can successfully consume. My favorite example is that a simple logon in a Windows environment produces at least 4 different log events of different types. Without understanding the intricacies of the Kerberos protocol, how it is implemented in Windows and its interaction with the security logs, these events will be treated independently, whereas in fact they are all part of a single transaction.

Trying to digest security data directly will almost surely guarantee results that are useless, albeit statistically significant.

Security Research

Security research can fill much of the gaps mentioned above. Good security researchers will not only understand the meaning of the logs, they will also know how to extract information that is concealed or exists in them indirectly. They will understand things like the difference in the logging mechanism of domain controllers versus member servers, and whether an event is likely to appear often or seldom. They will know the meaning of hundreds of event types and sub-types, and which events are must haves and which can be safely ignored.

However, when it comes to identifying servers or users that are frequently observed together, or applying collective inference to decisions, most security researchers will have a harder time. In most cases, security researchers will not be familiar with concepts such as Probabilistic Graphical Models, Bayesian Networks and Collaborative Filtering, and the problems they can solve. This where the security domain ends and the data science domain starts again.

Bottom Line

As you can see, there is a symbiotic relationship between data science and security research. Neither will be fully effective without the other. Good security research as well as top notch data science are needed in order to provide meaningful security analytics. Each practitioner will have their strengths and shortcomings, but bringing together the right mix of both will create the much desired whole and complete solution that is so desperately needed today.

Tags: Data Science,

See a world-class SIEM solution in action

Most reported breaches involved lost or stolen credentials. How can you keep pace?

Exabeam delivers SOC teams industry-leading analytics, patented anomaly detection, and Smart Timelines to help teams pinpoint the actions that lead to exploits.

Whether you need a SIEM replacement, a legacy SIEM modernization with XDR, Exabeam offers advanced, modular, and cloud-delivered TDIR.

Get a demo today!

Cookie	Duration	Description
cookielawinfo-checbox-analytics	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Analytics".
cookielawinfo-checbox-functional	11 months	The cookie is set by GDPR cookie consent to record the user consent for the cookies in the category "Functional".
cookielawinfo-checbox-others	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Other.
cookielawinfo-checkbox-necessary	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookies is used to store the user consent for the cookies in the category "Necessary".
cookielawinfo-checkbox-performance	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Performance".
viewed_cookie_policy	11 months	The cookie is set by the GDPR Cookie Consent plugin and is used to store whether or not user has consented to the use of cookies. It does not store any personal data.

r-tec DE

Featured Data Sheet

Exabeam Fusion

Featured Solution Brief

Exabeam Fusion on Google Cloud

Featured Resource

Exabeam Fusion on Google Cloud

r-tec DE

r-tec DE

Featured Data Sheet

Exabeam Fusion

Featured Solution Brief

Exabeam Fusion on Google Cloud

Featured Resource

Exabeam Fusion on Google Cloud

r-tec DE

Data Science and Security Research: Two Parts of the Whole

Similar Posts

Recent Posts

Stay Informed

See a world-class SIEM solution in action