Components, best practices, and next-gen capabilitiesRead More
In this chapter of the Essential Guide to SIEM, we explain how SIEM systems are built, how they go from raw event data to security insights, and how they manage event data on a huge scale. We cover both traditional SIEM platforms and modern SIEM architecture based on data lake technology.
Security information and event management (SIEM) platforms collect log and event data from security systems, networks and computers, and turn it into actionable security insights. SIEM technology can help organizations detect threats that individual security systems cannot see, investigate past security incidents, perform incident response and prepare reports for regulation and compliance purposes.
The Log Management Process - data collection, data management and log retention history
The Log Flow - from millions of events to a handful of meaningful alerts
SIEM Log Sources - security systems, network devices, cloud systems and more
SIEM Hosting Models - self-hosted self-managed, cloud-hosted self-managed, hybrid-managed, and fully-managed
SIEM Sizing - event velocity, calculating EPS and total event volume, hardware requirements and deployment options, including data lake
SIEM Outputs - reporting, dashboards, and visualizations and advanced analytics
Collects and aggregates data from security systems and network devices
Gathers log data for standards like HIPAA, PCI/DSS, HITECH, SOX and GDPR and generates reports
Combines internal data with third-party data on threats and vulnerabilities
Stores long-term historical data, useful for compliance and forensic investigations
Links events and related data into security incidents, threats or forensic findings
Enables exploration of log and event data to discover details of a security incident
uses statistical models and machine learning to identify deeper relationships between data elements
Enables security staff to run queries on log and event data to proactively uncover threats
Analyses events and sends alerts to notify security staff of immediate issues
Helps security teams identify and respond to security incidents, bringing in all relevant data rapidly
Creates visualizations to let staff review event data, identify patterns and anomalies
Advanced SIEMs can automatically respond to incidents by orchestrating security systems in an approach known as security orchestration and response (SOAR)
A SIEM server, at its root, is a log management platform. Log management involves collecting the data, managing it to enable analysis, and retaining historical data.
SIEMs collect logs and events from hundreds of organizational systems (for a partial list, see Log Sources below). Each device generates an event every time something happens, and collects the events into a flat log file or database. The SIEM can collect data in four ways:
The SIEM is tasked with collecting data from the devices, standardizing it and saving it in a format that enables analysis.
Next-generation SIEMs come pre-integrated with common cloud systems and data sources, allowing you to pull log data directly. Many managed cloud services and SaaS applications do not allow you to install traditional SIEM collectors, making direct integration between SIEM and cloud systems critical for visibility.
SIEMs, especially at large organizations, can store mind-boggling amounts of data. The data needs to be:
Next-generation SIEMs are increasingly based on modern data lake technology such as Amazon S3, Hadoop or ElasticSearch, enabling practically unlimited data storage at low cost.
Industry standards like PCI DSS, HIPAA and SOX require that logs be retained for between 1 and 7 years. Large enterprises create a very high volume of logs every day from IT systems (see SIEM Sizing below). SIEMs need to be smart about which logs they retain for compliance and forensic requirements. SIEMs use the following strategies to reduce log volumes:
Historic logs are not only useful for compliance and forensics. They can also be used for deep behavioral analysis. Next-generation SIEMs provide user and entity behavior analytics (UEBA) technology, which uses machine learning and behavioral profiling to intelligently identify anomalies or trends, even if they weren’t captured in the rules or statistical correlations of the traditional SIEMs.
Next-generation SIEMs leverage low-cost distributed storage, allowing organizations to retain full source data. This enables deep behavioral analysis of historic data, to catch a broader range of anomalies and security issues.
A SIEM captures 100 percent of log data from across your organization. But then data starts to flow down the log funnel, and hundreds of millions of log entries can be whittled down to only a handful of actionable security alerts.
SIEM platforms integrate with a large variety of security and organizational data sources, and can parse, aggregate and analyze the data for security significance. Here are just a few examples of data sources.
Until recently SIEMs couldn’t access log and event data from cloud infrastructure like AWS or Microsoft Azure, or SaaS applications like SalesForce and Google Apps. This created a huge blind spot in security monitoring. Some next generation solutions come with pre-built connectors and SIEM integrations with modern cloud technology.
Next-generation SIEMs incorporate automated incident response technology. This requires new kinds of integrations, which instead of drawing data from other organizational systems (inbound) is able to perform automated changes to other organizational systems (outbound).
Here are examples of IT systems that next-gen SIEMs integrate with to run automated security playbooks in response to security incidents:
Most SIEM systems come with a good number of existing connectors to common security and IT tools. For example, the Exabeam Security Management Platform provides over 350 integrations with inbound data sources and outbound systems for incident response automation.
Exabeam’s built-in integrations include popular vendors of authentication and access management systems, cloud access security brokers (CASB), cloud infrastructure platforms, data loss prevention (DLP) systems, email security tools, endpoint security platforms (EPP) and endpoint detection and response (EDR), firewalls, malware analysis, network analysis and monitoring, physical access and monitoring tools, security analytics, and more.
Some SIEMs provide APIs that allow you to add custom integrations to home-grown security or IT systems, or vendors as yet not supported by the SIEM tool.
Over the past few years, organizations have transitioned from operating SIEMs exclusively in-house, to a range of managed SIEM models. Hosting is moving to the cloud, and security management can be provided by managed security service providers (MSSPs).
This is the traditional SIEM deployment model—host the SIEM in your data center, often with a dedicated SIEM appliance, maintain storage systems, and manage it with trained security personnel. This model made SIEM a notoriously complex and expensive infrastructure to maintain.
MSSP Handles: Receiving events from organizational systems, collection and aggregation.
You Handle: Correlation, analysis, alerting and dashboards, security processes leveraging SIEM data.
You Handle: Purchasing software and hardware infrastructure.
MSSP Together with Your Security Staff: Deploys SIEM event collection / aggregation, correlation, analysis, alerting and dashboards.
MSSP Handles: Event collection, aggregation, correlation, analysis, alerting and dashboards.
You Handle: Security processes leveraging SIEM data.
The following considerations can help you select a SIEM deployment model:
A majority of SIEMs today are deployed on-premises. This requires organizations to carefully consider the size of log and event data they are generating, and the system resources required to manage it.
A common measure of velocity is events per second (EPS), defined as:EPS can vary between normal and peak times. For example, a Cisco router might generate 0.6 events per second on average, but during peak times, such as during an attack, it can generate as many as 154 EPS.
According to the SIEM Benchmarking Guide by the SANS Institute, organizations should strike a balance between normal and peak EPS measurements. It’s not practical, or necessary, to build a SIEM to handle peak EPS for all network devices, because it’s unlikely all devices will hit their peak at once. On the other hand, you must plan for crisis situations, in which the SIEM will be most needed.
The sum of these two numbers is the total estimated velocity.
In addition, the SANS guide recommends adding:
So that the final number of events Per Day will be:
(Total Peak Events per Day + Total Normal Events Per Day) * 110% headroom * 110% growth
The following table, provided by SANS, shows typical Average EPS (normal EPS) and Peak EPS for selected network devices. The data is several years old but can provide ballpark figures for your initial estimates.
To size your SIEM, conduct an inventory of the devices you intend to collect logs from. Multiply the number of similar devices by their estimated EPS, to get a total number of Events Per Day across your network.
A rule of thumb is that an average event occupies 300 bytes. So for every 1,000 EPS (86.4 million Events Per Day), the SIEM needs to store:
After you determine your event velocity and volume, consider the following factors to size hardware for your SIEM:
In the past decade, networks have grown, the number of connected devices has exploded, and data volumes have risen exponentially. In addition, there is a growing need to have access to all historical data—not just a filtered, summarized version of the data—to enable deeper analysis. Modern SIEM technology can make sense of huge volumes of historic data and use it to discover new anomalies and patterns.
Another benefit of data lake storage is that hardware costs become predictable. You can simply add nodes to the data lake, running on commodity or cloud hardware, to grow data storage linearly. SIEMs based on data lake technology can easily add new data sources or expand data retention at low cost.
In 2015 O’Reilly released a report named The Security Data Lake, which offered a robust approach for storing SIEM data in a Hadoop data lake. The report clarifies that data lakes do not replace SIEMs—the SIEM is still needed for its ability to parse and make sense of log data from many different systems, and later analyze and extract insights and alerts from the data.
The data lake, as a companion to a SIEM, provides:
Today additional technical options exist for implementing data lakes, besides the heavyweight Hadoop—including ElasticSearch, Cassandra and MongoDB.
The main purpose of a SIEM is to generate actionable insights for security teams. These come in several forms:
Next-generation SIEMs use behavioral profiling and machine learning techniques to identify security incidents and help teams collect pertinent data for the incident, across devices, user profiles and time periods.
A dashboard and automatically-created incident ticket, provided by Exabeam’s next-generation SIEM platform.
Historically, SIEMs were expensive, monolithic enterprise infrastructures, built with proprietary software and custom hardware provisioned to handle its large data volumes. Along with the software industry in general, SIEMs are evolving to become more agile and lightweight, and much smarter than they were before.
Next-generation SIEM solutions use a modern architecture that is more affordable, easier to implement, and helps security teams discover real security issues faster:
To see an example of a modern SIEM architecture, see Exabeam’s Security Intelligence Platform.
Components, best practices, and next-gen capabilitiesRead More
How SIEMs are built, how they generate insights, and how they are changingRead More
SIEM under the hood - the anatomy of security events and system logsRead More
User and Entity Behavioral Analytics detects threats other tools can’t seeRead More
Beyond alerting and compliance - SIEMs for insider threats, threat hunting and IoTRead More
From correlation rules and attack signatures to automated detection via machine learningRead More
Security Automation and Orchestration (SOAR) - the future of incident responseRead More
A comprehensive guide to the modern SOC - SecOps and next-gen techRead More
Evaluation criteria, build vs. buy, cost considerations and complianceRead More
SIEM Essentials QuizRead More