Get a Demo
02

SIEM Architecture: Technology, Process and Data

In this chapter of the Essential Guide to SIEM, we explain how SIEM systems are built, how they go from raw event data to security insights, and how they manage event data on a huge scale. We cover both traditional SIEM platforms and in newer SIEM architecture based on data lake technology.

Security Information and Event Management (SIEM) platforms collect log and event data from security systems, networks and computers, and turn it into actionable security insights. SIEM technology can help organizations detect threats that individual security systems cannot see, investigate past security incidents, perform incident response and prepare reports for regulation and compliance purposes.

In this Chapter You Will Learn:

  • The Log Management Process - data collection, data management and historic log retention

  • The Log Flow - from millions of events to a handful of meaningful alerts

  • SIEM Log Sources - security systems, network devices, cloud systems and more

  • SIEM Hosting Models - self-hosted self-managed, cloud-hosted self-managed, hybrid-managed, and fully-managed

  • SIEM Sizing - event velocity, calculating EPS and total event volume, hardware requirements and deployment options, including data lake

  • SIEM Outputs - reporting, dashboards, and visualizations and advanced analytics

12 Components and Capabilities in a SIEM Architecture

  • 01

    Data aggregation

    Collects and aggregates data from security systems and network devices

  • 07

    Compliance

    Gathers log data for standards like HIPAA, PCI/DSS, HITECH, SOX and GDPR and generates reports

  • 02

    Threat intelligence feeds

    Combines internal data with third-party data on threats and vulnerabilities

  • 08

    Retention

    Stores long-term historical data, useful for compliance and forensic investigations

  • 03

    Correlation and security monitoring

    Links events and related data into security incidents, threats or forensic findings

  • 09

    Forensic analysis

    Enables exploration of log and event data to discover details of a security incident

  • 04

    Analytics

    uses statistical models and machine learning to identify deeper relationships between data elements

  • 10

    Threat hunting

    Enables security staff to run queries on log and event data to proactively uncover threats

  • 05

    Alerting

    Analyses events and sends alerts to notify security staff of immediate issues

  • 11

    Incident response

    Helps security teams identify and respond to security incidents, bringing in all relevant data rapidly

  • 06

    Dashboards

    Creates visualizations to let staff review event data, identify patterns and anomalies

  • 12

    SOC automation

    Advanced SIEMs can automatically respond to incidents but orchestrating security systems, known as Security Orchestration and Response (SOAR)

The Log Management Process

A SIEM server, at its root, is a log management platform. Log management involves collecting the data, managing it to enable analysis, and retaining historical data.

Data Collection

SIEMs collect logs and events from hundreds of organizational systems (for a partial list, see Log Sources below). Each device generates an event every time something happens, and collects the events into a flat log file or database. The SIEM can collect data in four ways:

  1. Via an agent installed on the device (the most common method)
  2. By directly connecting to the device using a network protocol or API call
  3. By accessing log files directly from storage, typically in Syslog format
  4. Via an event streaming protocol like SNMP, Netflow or IPFIX

The SIEM is tasked with collecting data from the devices, standardizing it and saving it in a format that enables analysis.

Next Gen SIEM

Next-generation SIEMs come pre-integrated with common cloud systems and data sources, allowing you to pull log data directly. Many managed cloud services and SaaS applications do not allow you to install traditional SIEM collectors, making direct integration between SIEM and cloud systems critical for visibility.

Data Management

SIEMs, especially at large organizations, can store mind-boggling amounts of data. The data needs to be:

  • Stored—either on-premise, in the cloud or both
  • Optimized and indexed—to enable efficient analysis and exploration
  • Tiered—hot data necessary for live security monitoring should be on high performance storage, whereas cold data, which you may one day want to investigate, should be relegated to high-volume inexpensive storage mediums

Next Gen SIEM

Next-generation SIEMs are increasingly based on modern data lake technology such as Amazon S3, Hadoop or ElasticSearch, enabling practically unlimited data storage at low cost.

Log Retention

Industry standards like PCI DSS, HIPAA and SOX require that logs be retained for between 1 and 7 years. Large enterprises create a very high volume of logs every day from IT systems (see SIEM Sizing below). SIEMs need to be smart about which logs they retain for compliance and forensic requirements. SIEMs use the following strategies to reduce log volumes:

  • Syslog servers—syslog is a standard which normalizes logs, retaining only essential information in a standardized format. Syslog lets you compress logs and retain large quantities of historical data.
  • Deletion schedules—SIEMs automatically purge old logs that are no longer needed for complianceBy accessing log files directly from storage, typically in Syslog format.
  • Log filtering—not all logs are really needed for the compliance requirements faced by your organization, or for forensic purposes. Logs can be filtered by source system, times, or by other rules defined by the SIEM administrator.
  • Summarization—log data can be summarized to maintain only important data elements such as the count of events, unique IPs, etc.

Next Gen SIEM

Historic logs are not only useful for compliance and forensics. They can also be used for deep behavioral analysis. Next-generation SIEMs provide User Entity Behavioral Analytics (UEBA) technology, which uses machine learning and and behavioral profiling to intelligently identify anomalies or trends, even if they weren’t captured in the rules or statistical correlations of the traditional SIEMs.


Next-generation SIEMs leverage low-cost distributed storage, allowing organizations to retain full source data. This enables deep behavioral analysis of historic data, to catch a broader range of anomalies and security issues.

The Log Flow

A SIEM captures 100% of log data from across your organization. But then data starts to flow down the log funnel, and hundreds of millions of log entries can be whittled down to only a handful of actionable security alerts.

SIEMs filter out noise in logs to keep pertinent data only. Then they index and optimize the relevant data to enable analysis. Finally, around 1% of data, which is the most relevant for your security posture, is correlated and analyzed in more depth. Of those correlations, the ones which exceed security thresholds become security alerts.

log-flow-diagram@3x

SIEM Logging Sources

Which organizational systems feed their logs to the SIEM? And which other business data is of interest to a SIEM?

  • Security Events

    security-events-icon@3x
    • Intrusion Detextion Systems
    • Endpoint Security (Antivirus, antimalware)
    • Data Loss Prevention
    • VPN Concentrators
    • Web Filters
    • Honeypots
    • Firewalls
  • Network Logs

    network-logs-icon@3x
    • Routers
    • Switches
    • DNS Servers
    • Wireless Access Points
    • WAN
    • Data Transfers
    • Private Cloud Networks (VPC)
  • Applications and Devices

    applications-devices-icon@3x
    • Application Servers
    • Databases
    • Intranet Applications
    • Web Applications
    • SaaS Applications
    • Cloud-Hosted Servers
    • End-User Laptops or Desktops
    • Mobile Devices
  • IT Infrastructure

    it-infrastructure-icon@3x
    • Configuration
    • Locations
    • Owners
    • Network Maps
    • Vulnerability Reports
    • Software Inventory

Next Gen SIEM

Until recently SIEMs couldn’t access log and event data from cloud infrastructure like AWS or Microsoft Azure, or SaaS applications like SalesForce and Google Apps. This created a huge blind spot in security monitoring. Some next generation solutions come with pre-built connectors and SIEM integrations with modern cloud technology.

SIEM Logging Sources

Which organizational systems feed their logs to the SIEM? And which other business data is of interest to a SIEM?

  • security-events@3x

    Security Events

    This is the traditional SIEM deployment model—host the SIEM in your data center, often with a dedicated SIEM appliance, maintain storage systems, and manage it with trained security personnel. This model made SIEM a notoriously complex and expensive infrastructure to maintain.

  • cloud-siem-self-managed@3x

    Cloud SIEM, Self Managed

    MSSP Handles: Receiving events from organizational systems, collection and aggregation.

    You Handle: Correlation, analysis, alerting and dashboards, security processes leveraging SIEM data.

  • self-hosted-hybrid-managed@3x

    Self-Hosted, Hybrid-Managed

    You Handle: Purchasing software and hardware infrastructure.

    MSSP Together with Your Security Staff: Deploying SIEM event collection / aggregation, correlation, analysis, alerting and dashboards.

  • siem-as-a-service@3x

    SIEM as a Service

    MSSP Handles: Event collection, aggregation, correlation, analysis, alerting and dashboards.

    You Handle: Security processes leveraging SIEM data.

Which Hosting Model is Right for You?

The following considerations can help you select a SIEM deployment model:

  • Do you have an existing SIEM infrastructure? If you’ve already purchased the hardware and software, opt for self-hosted self-managed, or leverage an MSSP’s expertise to jointly manage the SIEM with your local team.
  • Are you able to move data off-premises? If so, a cloud-hosted or fully managed model can reduce costs and management overhead.
  • Do you have security staff with SIEM expertise? The human factor is crucial in getting true value from a SIEM. If you don’t have trained security staff, rent the analysis services via a hybrid-managed or SIEM as a Service model.

SIEM Sizing: Velocity, Volume and Hardware Requirements

A majority of SIEMs today are deployed on-premises. This requires organizations to carefully consider the size of log and event data they are generating, and the system resources required to manage it.

Calculating Velocity: Events Per Second (EPS)

A common measure of velocity is Events Per Second (EPS), defined as:EPS can vary between normal and peak times. For example, a Cisco router might generate 0.6 events per second on average, but during peak times, such as during an attack, it can generate as many as 154 EPS.

According to the SIEM Benchmarking Guide by the SANS Institute, organizations should strike a balance between normal and peak EPS measurements. It’s not practical, or necessary, to build a SIEM to handle peak EPS for all network devices, because it’s unlikely all devices will hit their peak at once. On the other hand, you must plan for crisis situations, in which the SIEM will be most needed.

A Simple Model for Predicting EPS During Normal and Peak Times

  1. Measure Normal EPS and Peak EPS, by looking at 90 days of data for the target system
  2. Estimate the Number of Peaks per Day
  3. Estimate the Duration in Seconds of a Peak, and by extension, Total Peak Seconds Per Day
  4. Calculate Total Peak Events per Day = (Total Peak Seconds Per Day) * Peak EPS
  5. Calculate Total Normal Events per Day = (Total Seconds – Total Peak Seconds Per Day) * Normal EPS

The sum of these two numbers is the total estimated velocity.

In addition, the SANS guide recommends adding:

  • 10% for headroom
  • 10% for growth

So that the final number of events Per Day will be:

(Total Peak Events per Day + Total Normal Events Per Day) * 110% headroom * 110% growth

Calculating Velocity: Events Per Second (EPS)

The following table, provided by SANS, shows typical Average EPS (normal EPS) and Peak EPS for selected network devices. The data is several years old but can provide ballpark figures for your initial estimates.

 

In order to size your SIEM, conduct an inventory of the devices you intend to collect logs from. Multiply the number of similar devices by their estimated EPS, to get a total number of Events Per Day across your network.

Storage Needs

A rule of thumb is that an average event occupies 300 bytes. So for every 1,000 EPS (86.4 million Events Per Day), the SIEM needs to store:

Hardware Sizing

After you determine your event velocity and volume, consider the following factors to size hardware for your SIEM:

  • Storage format-how will files be stored? Using a flat file format, a relational database or an unstructured data store like Hadoop?
  • Storage deployment and hardware-is it possible to move data to the cloud? If so, cloud services like Amazon S3 and Azure Blob Storage will be highly attractive for storing most SIEM data. If not, consider what storage resources are available locally, and whether to use commodity storage with Hadoop or NoSQL DBs, or high-performance storage appliances.
  • Log compression-what technology is available to compress log data? Many SIEM vendors advertise compression ratios of 1:8 or more.
  • Encryption-is there a need to encrypt data as it enters the SIEM data store? Determine software and hardware requirements.
  • Hot storage (short-term data)-needs high performance to enable real time monitoring dna analysis.
  • Long-term storage (data retention)-needs high volume, low cost storage media to enable maximum retention of historic data.
  • Failover and backup-as a mission critical system, the SIEM should be built with redundancy, and be backed with a clear business continuity plan.

Scalability and Data Lakes

In the past decade, networks have grown, the number of connected devices has exploded, and data volumes shot up exponentially. In addition, there is a growing need to have access to all historic data—not just a filtered, summarized version of the data—to enable deeper analysis. Modern SIEM technology can make sense of huge volumes of historic data and use it to discover new anomalies and patterns.

Next Gen SIEM

Another benefit of data lake storage is that hardware costs become predictable. You can simply add nodes to the data lake, running on commodity or cloud hardware, to grow data storage linearly. SIEMs based on data lake technology can easily add new data sources or expand data retention at low cost.

In 2015 O’Reilly released a report named The Security Data Lake, which offered a robust approach for storing SIEM data in a Hadoop data lake. The report clarifies that data lakes do not replace SIEMs—the SIEM is still needed for its ability to parse and make sense of log data from many different systems, and later analyze and extract insights and alerts from the data.

The data lake, as a companion to a SIEM, provides:

  • Nearly unlimited, low cost storage based on commodity devices.
  • New ways of processing big data—tools in the Hadoop ecosystem, such as Hive and Spark, enable fast processing of huge quantities of data, while enabling traditional SIEM infrastructure to query the data via SQL.
  • The possibility of retaining all data across a multitude of new data sources, like cloud applications, IoT and mobile devices.

Today additional technical options exist for implementing data lakes, besides the heavyweight Hadoop—including ElasticSearch, Cassandra and MongoDB.

SIEM Reporting, Dashboards and Visualization

The main purpose of a SIEM is to generate actionable insights for security teams. These come in several forms:

  • Dashboards--display status of security-related systems and metrics and highlight potential security issues
  • Alerts and notifications--prompt security staff to investigate an anomaly or apparent security issue
  • APIs and web services--enable the use of external systems, such as BI and behavioral analytics tools, to access SIEM data and analyze it from new perspectives
  • Data exploration--enable security staff to freely explore data to actively hunt for threats, or investigate a known security incident

Next Gen SIEM

Next-generation SIEMs use behavioral profiling and machine learning techniques to identify security incidents and help teams collect pertinent data for the incident, across devices, user profiles and time periods.

A dashboard and automatically-created incident ticket, provided by Exabeam’s next-generation SIEM platform.

SIEM Architecture: Then and Now

Historically, SIEMs were an expensive, monolithic enterprise infrastructure, built with proprietary software and custom hardware provisioned to handle its large data volumes. Along with the software industry in general, SIEMs are evolving to become more agile and lightweight, and much smarter than they were before.

Next-generation SIEM solutions use a modern architecture that is more affordable, easier to implement, and helps security teams discover real security issues faster:

  • Modern data lake technology--offering big data storage with unlimited scalability, low cost and improved performance.
  • New managed hosting and management options--MSSPs are helping organizations implement SIEM, by running part of the infrastructure (on-premises or on the cloud), and by providing expertise to manage security processes.
  • Dynamic scalability and predictable costs--SIEM administrators no longer need to meticulously calculate sizing, and make architectural changes when data volumes grow. SIEM storage can now grow dynamically and predictably when volumes increase.
  • New insights with User Entity Behavioral Analytics (UEBA)--SIEM architectures today include advanced analytics components such as machine learning and behavioral profiling, which go beyond traditional correlations to discover new relationships and anomalies across huge data sets. Read more in our chapter on UEBA.
  • Powering incident response--modern SIEMs leverage Security Orchestration and Automation (SOAR) technology that helps identify and automatically respond to security incidents, and supports incident investigation by Security Operation Center staff. Read more in our chapter on incident response.

To see an example of a modern SIEM architecture, see Exabeam’s Security Intelligence Platform.

More like this

If you’d like to see more content like this, visit the Exabeam Information Security Blog:

View the Blog

CH04

UEBA

User and Entity Behavioral Analytics detects threats other tools can’t see

Read More