02

SIEM Architecture: Technology, Process and Data

In this chapter of the Essential Guide to SIEM, we explain how SIEM systems are built, how they go from raw event data to security insights, and how they manage event data on a huge scale. We cover both traditional SIEM platforms and modern SIEM architecture based on data lake technology.

Security information and event management (SIEM) platforms collect log and event data from security systems, networks and computers, and turn it into actionable security insights. SIEM technology can help organizations detect threats that individual security systems cannot see, investigate past security incidents, perform incident response and prepare reports for regulation and compliance purposes.

In this Chapter You Will Learn:

12 Components and Capabilities in a SIEM Architecture

  • 01

    Data aggregation

    Collects and aggregates data from security systems and network devices

  • 07

    Compliance

    Gathers log data for standards like HIPAA, PCI/DSS, HITECH, SOX and GDPR and generates reports

  • 02

    Threat intelligence feeds

    Combines internal data with third-party data on threats and vulnerabilities

  • 08

    Retention

    Stores long-term historical data, useful for compliance and forensic investigations

  • 03

    Correlation and security monitoring

    Links events and related data into security incidents, threats or forensic findings

  • 09

    Forensic analysis

    Enables exploration of log and event data to discover details of a security incident

  • 04

    Analytics

    uses statistical models and machine learning to identify deeper relationships between data elements

  • 10

    Threat hunting

    Enables security staff to run queries on log and event data to proactively uncover threats

  • 05

    Alerting

    Analyses events and sends alerts to notify security staff of immediate issues

  • 11

    Incident response

    Helps security teams identify and respond to security incidents, bringing in all relevant data rapidly

  • 06

    Dashboards

    Creates visualizations to let staff review event data, identify patterns and anomalies

  • 12

    SOC automation

    Advanced SIEMs can automatically respond to incidents by orchestrating security systems in an approach known as security orchestration and response (SOAR)

SIEM Logging Process

A SIEM server, at its root, is a log management platform. Log management involves collecting the data, managing it to enable analysis, and retaining historical data.

Data Collection

SIEMs collect logs and events from hundreds of organizational systems (for a partial list, see Log Sources below). Each device generates an event every time something happens, and collects the events into a flat log file or database. The SIEM can collect data in four ways:

  1. Via an agent installed on the device (the most common method)
  2. By directly connecting to the device using a network protocol or API call
  3. By accessing log files directly from storage, typically in Syslog format
  4. Via an event streaming protocol like SNMP, Netflow or IPFIX

The SIEM is tasked with collecting data from the devices, standardizing it and saving it in a format that enables analysis.

Next Gen SIEM

Next-generation SIEMs come pre-integrated with common cloud systems and data sources, allowing you to pull log data directly. Many managed cloud services and SaaS applications do not allow you to install traditional SIEM collectors, making direct integration between SIEM and cloud systems critical for visibility.

Data Management

SIEMs, especially at large organizations, can store mind-boggling amounts of data. The data needs to be:

  • Stored—either on-premises, in the cloud or both
  • Optimized and indexed—to enable efficient analysis and exploration
  • Tiered—hot data necessary for live security monitoring should be on high-performance storage, whereas cold data, which you may one day want to investigate, should be relegated to high-volume inexpensive storage mediums

Next Gen SIEM

Next-generation SIEMs are increasingly based on modern data lake technology such as Amazon S3, Hadoop or ElasticSearch, enabling practically unlimited data storage at low cost.

Log Retention

Industry standards like PCI DSS, HIPAA and SOX require that logs be retained for between 1 and 7 years. Large enterprises create a very high volume of logs every day from IT systems (see SIEM Sizing below). SIEMs need to be smart about which logs they retain for compliance and forensic requirements. SIEMs use the following strategies to reduce log volumes:

  • Syslog servers—syslog is a standard which normalizes logs, retaining only essential information in a standardized format. Syslog lets you compress logs and retain large quantities of historical data.
  • Deletion schedules—SIEMs automatically purge old logs that are no longer needed for compliance. By accessing log files directly from storage, typically in Syslog format.
  • Log filtering—not all logs are needed for the compliance requirements faced by your organization, or for forensic purposes. Logs can be filtered by the source system, times, or by other rules defined by the SIEM administrator.
  • Summarization—log data can be summarized to maintain only important data elements such as the count of events, unique IPs, etc.

Next Gen SIEM

Historic logs are not only useful for compliance and forensics. They can also be used for deep behavioral analysis. Next-generation SIEMs provide user and entity behavior analytics (UEBA) technology, which uses machine learning and behavioral profiling to intelligently identify anomalies or trends, even if they weren’t captured in the rules or statistical correlations of the traditional SIEMs.


Next-generation SIEMs leverage low-cost distributed storage, allowing organizations to retain full source data. This enables deep behavioral analysis of historic data, to catch a broader range of anomalies and security issues.

The Log Flow

A SIEM captures 100 percent of log data from across your organization. But then data starts to flow down the log funnel, and hundreds of millions of log entries can be whittled down to only a handful of actionable security alerts.

SIEMs filter out noise in logs to keep pertinent data only. Then they index and optimize the relevant data to enable analysis. Finally, around 1% of data, which is the most relevant for your security posture, is correlated and analyzed in more depth. Of those correlations, the ones which exceed security thresholds become security alerts.

log-flow-diagram@3x

SIEM Integrations

SIEM platforms integrate with a large variety of security and organizational data sources, and can parse, aggregate and analyze the data for security significance. Here are just a few examples of data sources.

  • Security Events

    security-events-icon@3x
    • Intrusion Detection Systems
    • Endpoint Security (Antivirus, antimalware)
    • Data Loss Prevention
    • VPN Concentrators
    • Web Filters
    • Honeypots
    • Firewalls
  • Network Logs

    network-logs-icon@3x
    • Routers
    • Switches
    • DNS Servers
    • Wireless Access Points
    • WAN
    • Data Transfers
    • Private Cloud Networks (VPC)
  • Applications and Devices

    applications-devices-icon@3x
    • Application Servers
    • Databases
    • Intranet Applications
    • Web Applications
    • SaaS Applications
    • Cloud-Hosted Servers
    • End-User Laptops or Desktops
    • Mobile Devices
  • IT Infrastructure

    it-infrastructure-icon@3x
    • Configuration
    • Locations
    • Owners
    • Network Maps
    • Vulnerability Reports
    • Software Inventory

Next Gen SIEM

Until recently SIEMs couldn’t access log and event data from cloud infrastructure like AWS or Microsoft Azure, or SaaS applications like SalesForce and Google Apps. This created a huge blind spot in security monitoring. Some next generation solutions come with pre-built connectors and SIEM integrations with modern cloud technology.

Next-generation SIEMs incorporate automated incident response technology. This requires new kinds of integrations, which instead of drawing data from other organizational systems (inbound) is able to perform automated changes to other organizational systems (outbound).

Here are examples of IT systems that next-gen SIEMs integrate with to run automated security playbooks in response to security incidents:

  • Authentication and access management—automatically disabling user accounts, resetting passwords, making changes to user groups on access control systems like Active Directory
  • Cloud infrastructure—modifying tags applied to cloud services, disabling accounts, stopping or destroying instances, on public cloud systems like AWS and Microsoft Azure.
  • Email security—deleting or quarantining emails, sending an email, notifying users of security events, on SMTP email servers and enterprise systems like Microsoft Exchange.
  • Endpoint security—isolating devices from the network, wiping and reimaging endpoints, deleting files, listing files or active processes, on Linux, Windows, Mac, and mobile endpoints.
  • Firewalls—blocking or unblocking IPs and domains on firewalls like CheckPoint and Palo Alto Networks.
  • Forensics—running virus scans on devices, scanning files, detonating suspected malware in sandboxes.
  • Information Technology Service Management (ITSM)—creating tickets, changing ticket status, adding comments or data to tickets, reassigning tickets, closing incidents on ITSM systems like JIRA or ServiceNow.

Existing SIEM Integrations vs Custom Integrations

Most SIEM systems come with a good number of existing connectors to common security and IT tools. For example, the Exabeam Security Management Platform provides over 350 integrations with inbound data sources and outbound systems for incident response automation.

Exabeam’s built-in integrations include popular vendors of authentication and access management systems, cloud access security brokers (CASB), cloud infrastructure platforms, data loss prevention (DLP) systems, email security tools, endpoint security platforms (EPP) and endpoint detection and response (EDR), firewalls, malware analysis, network analysis and monitoring, physical access and monitoring tools, security analytics, and more.

Some SIEMs provide APIs that allow you to add custom integrations to home-grown security or IT systems, or vendors as yet not supported by the SIEM tool.

SIEM Hosting Models

Over the past few years, organizations have transitioned from operating SIEMs exclusively in-house, to a range of managed SIEM models. Hosting is moving to the cloud, and security management can be provided by managed security service providers (MSSPs).

  • security-events@3x

    Self-Hosted, Self-Managed

    This is the traditional SIEM deployment model—host the SIEM in your data center, often with a dedicated SIEM appliance, maintain storage systems, and manage it with trained security personnel. This model made SIEM a notoriously complex and expensive infrastructure to maintain.

  • cloud-siem-self-managed@3x

    Cloud SIEM, Self Managed

    MSSP Handles: Receiving events from organizational systems, collection and aggregation.

    You Handle: Correlation, analysis, alerting and dashboards, security processes leveraging SIEM data.

  • self-hosted-hybrid-managed@3x

    Self-Hosted, Hybrid-Managed

    You Handle: Purchasing software and hardware infrastructure.

    MSSP Together with Your Security Staff: Deploys SIEM event collection / aggregation, correlation, analysis, alerting and dashboards.

  • siem-as-a-service@3x

    SIEM as a Service

    MSSP Handles: Event collection, aggregation, correlation, analysis, alerting and dashboards.

    You Handle: Security processes leveraging SIEM data.

Which Hosting Model is Right for You?

The following considerations can help you select a SIEM deployment model:

  • Do you have an existing SIEM infrastructure? If you’ve already purchased the hardware and software, opt for self-hosted self-managed, or leverage an MSSP’s expertise to jointly manage the SIEM with your local team.
  • Are you able to move data off-premises? If so, a cloud-hosted or fully managed model can reduce costs and management overhead.
  • Do you have security staff with SIEM expertise? The human factor is crucial in getting true value from a SIEM. If you don’t have trained security staff, rent the analysis services via a hybrid-managed or SIEM-as-a-Service model.

SIEM Sizing: Velocity, Volume and Hardware Requirements

A majority of SIEMs today are deployed on-premises. This requires organizations to carefully consider the size of log and event data they are generating, and the system resources required to manage it.

Calculating Velocity: Events Per Second (EPS)

A common measure of velocity is events per second (EPS), defined as:EPS can vary between normal and peak times. For example, a Cisco router might generate 0.6 events per second on average, but during peak times, such as during an attack, it can generate as many as 154 EPS.

According to the SIEM Benchmarking Guide by the SANS Institute, organizations should strike a balance between normal and peak EPS measurements. It’s not practical, or necessary, to build a SIEM to handle peak EPS for all network devices, because it’s unlikely all devices will hit their peak at once. On the other hand, you must plan for crisis situations, in which the SIEM will be most needed.

A Simple Model for Predicting EPS During Normal and Peak Times

  1. Measure Normal EPS and Peak EPS, by looking at 90 days of data for the target system
  2. Estimate the Number of Peaks per Day
  3. Estimate the Duration in Seconds of a Peak, and by extension, Total Peak Seconds Per Day
  4. Calculate Total Peak Events per Day = (Total Peak Seconds Per Day) * Peak EPS
  5. Calculate Total Normal Events per Day = (Total Seconds – Total Peak Seconds Per Day) * Normal EPS

The sum of these two numbers is the total estimated velocity.

In addition, the SANS guide recommends adding:

  • 10% for headroom
  • 10% for growth

So that the final number of events Per Day will be:

(Total Peak Events per Day + Total Normal Events Per Day) * 110% headroom * 110% growth

Calculating Velocity: Events Per Second (EPS)

The following table, provided by SANS, shows typical Average EPS (normal EPS) and Peak EPS for selected network devices. The data is several years old but can provide ballpark figures for your initial estimates.

 

To size your SIEM, conduct an inventory of the devices you intend to collect logs from. Multiply the number of similar devices by their estimated EPS, to get a total number of Events Per Day across your network.

Storage Needs

A rule of thumb is that an average event occupies 300 bytes. So for every 1,000 EPS (86.4 million Events Per Day), the SIEM needs to store:

Hardware Sizing

After you determine your event velocity and volume, consider the following factors to size hardware for your SIEM:

  • Storage format-how will files be stored? Using a flat file format, a relational database or an unstructured data store like Hadoop?
  • Storage deployment and hardware-is it possible to move data to the cloud? If so, cloud services like Amazon S3 and Azure Blob Storage will be highly attractive for storing most SIEM data. If not, consider what storage resources are available locally, and whether to use commodity storage with Hadoop or NoSQL DBs, or high-performance storage appliances.
  • Log compression-what technology is available to compress log data? Many SIEM vendors advertise compression ratios of 1:8 or more.
  • Encryption-is there a need to encrypt data as it enters the SIEM data store? Determine software and hardware requirements.
  • Hot storage (short-term data)-needs high performance to enable real time monitoring and analysis.
  • Long-term storage (data retention)-needs high volume, low cost storage media to enable maximum retention of historic data.
  • Failover and backup-as a mission critical system, the SIEM should be built with redundancy, and be backed with a clear business continuity plan.

Scalability and Data Lakes

In the past decade, networks have grown, the number of connected devices has exploded, and data volumes have risen exponentially. In addition, there is a growing need to have access to all historical data—not just a filtered, summarized version of the data—to enable deeper analysis. Modern SIEM technology can make sense of huge volumes of historic data and use it to discover new anomalies and patterns.

Next Gen SIEM

Another benefit of data lake storage is that hardware costs become predictable. You can simply add nodes to the data lake, running on commodity or cloud hardware, to grow data storage linearly. SIEMs based on data lake technology can easily add new data sources or expand data retention at low cost.

In 2015 O’Reilly released a report named The Security Data Lake, which offered a robust approach for storing SIEM data in a Hadoop data lake. The report clarifies that data lakes do not replace SIEMs—the SIEM is still needed for its ability to parse and make sense of log data from many different systems, and later analyze and extract insights and alerts from the data.

The data lake, as a companion to a SIEM, provides:

  • Nearly unlimited, low cost storage based on commodity devices.
  • New ways of processing big data—tools in the Hadoop ecosystem, such as Hive and Spark, enable fast processing of huge quantities of data, while enabling traditional SIEM infrastructure to query the data via SQL.
  • The possibility of retaining all data across a multitude of new data sources, like cloud applications, IoT and mobile devices.

Today additional technical options exist for implementing data lakes, besides the heavyweight Hadoop—including ElasticSearch, Cassandra and MongoDB.

SIEM Reporting, Dashboards and Visualization

The main purpose of a SIEM is to generate actionable insights for security teams. These come in several forms:

  • Dashboards--display status of security-related systems and metrics and highlight potential security issues
  • Alerts and notifications--prompt security staff to investigate an anomaly or apparent security issue
  • APIs and web services--enable the use of external systems, such as BI and behavioral analytics tools, to access SIEM data and analyze it from new perspectives
  • Data exploration--enable security staff to freely explore data to actively hunt for threats, or investigate a known security incident

Next Gen SIEM

Next-generation SIEMs use behavioral profiling and machine learning techniques to identify security incidents and help teams collect pertinent data for the incident, across devices, user profiles and time periods.

A dashboard and automatically-created incident ticket, provided by Exabeam’s next-generation SIEM platform.

SIEM Architecture: Then and Now

Historically, SIEMs were expensive, monolithic enterprise infrastructures, built with proprietary software and custom hardware provisioned to handle its large data volumes. Along with the software industry in general, SIEMs are evolving to become more agile and lightweight, and much smarter than they were before.

Next-generation SIEM solutions use a modern architecture that is more affordable, easier to implement, and helps security teams discover real security issues faster:

  • Modern data lake technology--offering big data storage with unlimited scalability, low cost and improved performance.
  • New managed hosting and management options--MSSPs are helping organizations implement SIEM, by running part of the infrastructure (on premises or on the cloud), and by providing expertise to manage security processes.
  • Dynamic scalability and predictable costs--SIEM administrators no longer need to meticulously calculate sizing, and make architectural changes when data volumes grow. SIEM storage can now grow dynamically and predictably when volumes increase.
  • Enrich data with context--This is essential to filter out false positives in the SIEM solution to analyze data and be able to effectively detect and respond to real threats.
  • New insights with User and Entity Behavior Analytics (UEBA)--SIEM architectures today include advanced analytics components such as machine learning and behavioral profiling, which go beyond traditional correlations to discover new relationships and anomalies across huge data sets. Read more in our chapter on UEBA.
  • Powering incident response--modern SIEMs leverage Security Orchestration and Automation (SOAR) technology that helps identify and automatically respond to security incidents, and supports incident investigation by Security Operation Center staff. Read more in our chapter on incident response.

To see an example of a modern SIEM architecture, see Exabeam’s Security Intelligence Platform.

More like this

If you’d like to see more content like this, visit the Exabeam Information Security Blog

View The Blog

CH04

UEBA

User and Entity Behavioral Analytics detects threats other tools can’t see

Read More