A big topic at recent security conferences has been the use of user behavior analytics (UBA) to assess cyber security risk. This approach is enabled by the recent application of data science and data modeling. However, any data science has to be supported by a platform designed from the ground up to enable this effort. In this blog, I’ll share thoughts on the areas where data science are most effective for UBA, and how data science is best supported by a data platform capable of tracking states of entities’ (user and machine accounts) to maximize informational value. In addition, with a stateful context-based tracking platform, security analysts can perform alert triage and incidence response effectively and efficiently.
Data science for UBA
As Gartner coined the term, user entity behavior analytics (UEBA or UBA) is about signature-less, user-centric behavior-based anomaly detection. We can appreciate the idea of modeling users’ behaviors with data science tools in order to find deviations from the norms as anomalies. Conceptually, that’s all well and fine. In fact, a wealth of machine learning and statistical profiling techniques exist. But naïve or simplistic applications of out-of-the-box generic anomaly detection algorithms to find anomalous events in user credential activities are prone to false-positive alerts. As a result, false-positives end up wasting an analyst’s precious time and resource on fruitless investigations.
UBA generally falls in the realm of unsupervised learning where there is little or no labeled training data to guide the learning. As such, the key to detecting anomalies is to understand the contexts of the anomalies. For every user’s anomaly, we need to evaluate it in proper context. Is the user’s current anomalous event also anomalous for a peer group of other users behaving similarly in the past? If yes, perhaps we should increase the risk score. If a user accesses to a large number of assets be explained in context that she is likely an IT administrator? If so, maybe we should suppress the alert. A simple-minded approach to building a generic behavior classifier for all users will not work.
Whether factual or probability, contextual information is important to enhance the precision of an anomaly detection system.
The problem is that accurate context information is not always available. In security analytics, a common data source for context information is the Lightweight Directory Access Protocol (LDAP) data, commonly used for network user validation. Unfortunately, the manually entered user, asset and group data are not always accurate, up to date, or sufficient for our purpose. But the need to deal with noisy or imperfect information is where data science shines. By mining historical log data, we can gain insights to improve, infer, or create new contextual information. For example, given a host machine, one could mine its past user logon events to determine its function in the infrastructure, based on behavior metrics including volume of log events. Or given an account’s past network access event log, one could model and predict the role of the account owner from a number of behavior cues. This derived information provides valuable context for both increasing the precision and reducing the false positives.
However, analyzing the log data to do the above is predicated on a data platform that can provide clean signals for analysis in the first place. Unless the input data has informational coherence and consistency, subsequent data science-based learning risks suffering from high degree of noise or information loss.
Stateful User TrackingTM
This is easier said than done. Security log data is stateless. To maximize the informational value of log information, a platform must be able to piece events together to track the state of data, and ultimately, a user’s behavior. Let’s see some examples of stateful tracking to provide coherent data for subsequent learning tasks.
Per-host state tracking: In order to do a good job at modeling users, a UBA solution must also perform a detailed modeling of hosts’ behaviors. Hosts sharing similar behavior profiles over logged events may be identified for peer grouping for context information. Due to the use of dynamic host configuration protocol (DHCP) on a network, a host’s IP can change over time. As logged events are usually tied to IP addresses, it is difficult to perform accurate behavior profiling for a given host over the event space for anomaly detection. Unless there is a DNS resolution to map an IP to a host in a stateful manner, behavior profiling at the host level will be noisy and alert quality will be questionable.
Per-user state tracking: Lateral movement detection is about finding anomalous account-switching activities of an actor or an actual person, logging in to multiple systems with one or more user credentials for malicious intent. The ability to group an actor’s activities into a session, including activities involving switching machines and switching identities is as important as identifying a user’s lateral movement. Active Directory (AD) logs record user-to-device logon events. The events are stateless and are logged independently (no indication of how an event from an account is related to that of another account). With the right logic, per-user state tracking pieces together all events associated with an actor to a single user session from the time of logon to the time of logoff. When done correctly, account-switching activities, whether benign or malicious, are readily observed in the stateful user session. Only then do these user sessions become clean and usable informational units for subsequent modeling to determine whether a lateral movement has occurred. Without per-user state tracking, inferring the lateral movement at the later stages over a noisy data space is sub-optimal.
In short, these examples of stateful tracking on the host and user level at the platform layer stitch events together to get the most coherent facts before the learning takes place. Without stateful tracking, behavior-modeling results in a high false-positive rate due to learning from the wrong signals.
The Big Data challenge for tracking users over identity/machine switching or hosts over dynamic IP assignment can’t be addressed just by using directly a traditional processing system such as Kafka/RabbitMB and Spark Streaming/Storm. Those systems perform and scale very well when the data can be sharded by keys such as user or host. A more sophisticated system using an actor model approach is required to implement a scalable stateful tracking data processing platform that accounts for the more complex relationships among the various entities involved (users, hosts, multiple user accounts, etc.). Exabeam’s data platform leverages open-source technologies based on the Actor Model to provide an unparalleled ability to piece information in real-time for stateful tracking and correlation to maximize the information gain for behavior learning and UBA analysis.
A platform that tracks all activities in chronological order within a user session naturally also provides an ideal environment for alert triage and incidence response.
Alert triage and incident response
A stateful-tracking platform identifies and maintains the full history of all movements for each user and ‘connects the dots’ automatically and in real time. Currently, this is a hard, manual and error prone task performed by security engineers and forensics experts using log management systems and often manually searching log files from multiple machines. With a stateful-tracking platform, analysts can then view all normalized events and activities within a user session, allowing easy interpretation of user history, including account switching involving multiple credentials, for example. Such a platform gives analysts one single environment to work in, with no need to move back and forth among different queries and tools, increasing productivity.
What I describe is the beginning of what a next-generation security platform is about – a behavioral state engine built with equal parts for security and data science. The clear advantage is working in an integrated environment for alert triage and incident response that makes detection and the analysis processes much more effective and efficient.
You’ve got to click the button below. See what attackers think is magic…