Understanding UEBA: From Raw Events to Scored Events - Exabeam

Understanding UEBA: From Raw Events to Scored Events

Published
November 18, 2022

Author

Reading time
6 mins

In the last two posts of this series, I talked about evaluation criterias and technical components required in building a user and entity behavior analytics (UEBA) risk engine. In this post, let’s examine how we here at Exabeam actually turn volumes of security and network events into alerts with risk scores for prioritization in New-Scale SIEM™.

In this article:

Event parsing, normalization, and enrichment

Raw event logs from various security and network devices need to be parsed and normalized to extract meanings before they are ingested by a risk engine. At Exabeam, more than 8,000 parsers are in place to cope with the multitude of commercially available security and network logs with changing formats in versions over time. These regular expression-based parsers are knowledge-heavy and require the dedicated effort of a small team.

Parser development must be based on a framework that specifies the desired standards of logs and events. The Exabeam Common Information Model (CIM) is one such framework, which enforces events’ consistency and compliance internally and across security and network products. CIM introduces and formalizes conventions and concepts to types of events. This allows a common language to reference data fields in events, so as to develop risk indicators that generate anomalies.

Once parsed and normalized, events are further enriched to enhance their information value. This gives us richer and more accurate data for assessing the event risk. For example, username missing in an event may be inferred from those of neighboring events sharing the same source host; similarly, host name missing in an event is filled by stateful tracking of information from neighboring events in time. 

Risk indicators

Once normalized and enriched, an event is ingested and examined against a collection of risk indicators. Risk indicators calculate measurable properties of an event. Some risk indicators are based on indicators of compromise (IoCs) or signatures. Some are behavior-based — for example, whether the source country of this VPN event is unusual for this user, whether the number of bytes transferred in the last 24 hours is unusual for this device, or whether this VPN login event followed a physical badge access event in the last N hours. Note that a behavior-based risk indicator is not your typical correlation rule. A correlation rule — for example, whether the number of email bytes is more than 10 MB — is static and takes no historical context into account. On the other hand, a behavior-based risk indicator would compare the number against the historical numbers; it would only trigger if it is unusually large as compared to the history.

At Exabeam, more than 800 risk indicators are constructed from a multitude of commercially available security and network products. This constitutes a body of knowledge that encodes the collective expertise and experience of security researchers — an intellectual property that was developed over years.

These indicators, wherever applicable, are calculated per each event. Needless to say, answering questions such as whether an asset access is unusual for a user, or whether byte size transferred is anomalous, is are extremely challenging in environments of high-volume, high-velocity events, typically from 8K to 70K or potentially reaching 1 million events per second (EPS). Exabeam uses a cloud-native computing environment with a patent-pending architecture for scalability.

Event scoring

Knowing that an event triggered one or more risk indicators is interesting, but ultimately a numeric risk is desired for prioritization. One choice is to manually assign a score to a risk indicator if it is triggered, then simply let the event’s risk score be a sum of scores of the triggered indicators.  While it may be simple, the manual score assignment is subjective and it may require a significant effort of post-production scoring tuning, since every network environment is different. This manual tuning is not precise and can be laborious at times. An automated scoring method is desired — one that derives the event risk by learning from data. I sketch out the idea below.

In real life, our brains are capable of processing pieces of evidence jointly in assessing the value or risk of a situation.  A rare combination of evidence is always more interesting than a collection of frequently seen evidence. Scoring an event based on the observed values of risk indicators, a.k.a. machine-learning features, works the same way. A risk engine learns from historical data to know which combination of triggered risk indicators of an event is rarely or commonly seen. Mathematically formalized learning can then be used to score a future event based on its state of triggered indicators. This, in essence, is machine learning from data to perform predictive analytics. The event scoring is data-driven and fully automated.

The benefits are threefold:

  1. Manual score assignment is eliminated.
  2. The score is reflective of the degree of interest.
  3. Scoring is dynamic, always learning from and adapting to the environment.
Understanding UEBA: From Raw Events to Scored Events

More details of this patented risk scoring system are available in this paper published by the Institute of Electrical and Electronics Engineers.

Conclusion

The journey of transforming a raw event to a score in a UEBA engine begins with the event parsing and normalization. The event is then examined by a comprehensive collection of risk indicators, before finally going through a machine learning stage for scoring. This analytical transformation is the result of cybersecurity knowledge application, machine learning, and cloud-scale computing. In my next post, I’ll discuss how to turn scored events into presentable threat stories.

Want to learn more about UEBA?

Get The Ultimate Guide to Behavioral Analytics

This comprehensive guide was created to help organizations evaluating UEBA solutions better understand it and how it can be adopted to improve your overall security posture with faster, easier, and more accurate threat detection, investigation, and response (TDIR).

Read this eBook to gain clarity on confusion about the growing UEBA market, and learn about:

  • What UEBA is and why it is needed
  • How UEBA is different from other security tools
  • The different types of UEBA solutions
  • Factors to consider when evaluating UEBA solutions
  • Threat-centric use cases
The Ultimate Guide to Behavioral Analytics

Similar Posts

Understanding UEBA: From Scored Events to Stories

Exabeam Alert Triage with Dynamic Alert Prioritization Now Available in Exabeam Fusion and Exabeam Security Investigation

Building a UEBA Risk Engine




Recent Posts

Fourth-gen SIEM is New-Scale SIEM: Cloud-native SIEM at Hyperscale

The New CISO Podcast: Solving Security Puzzles

Understanding UEBA: From Scored Events to Stories

See a world-class SIEM solution in action

Most reported breaches involved lost or stolen credentials. How can you keep pace?

Exabeam delivers SOC teams industry-leading analytics, patented anomaly detection, and Smart Timelines to help teams pinpoint the actions that lead to exploits.

Whether you need a SIEM replacement, a legacy SIEM modernization with XDR, Exabeam offers advanced, modular, and cloud-delivered TDIR.

Get a demo today!