Log Aggregation, Processing and Analysis for Security - Exabeam

Log Aggregation, Processing and Analysis for Security

Logs and events are a foundation of modern security monitoring, investigation and forensics, and SIEM systems. In this chapter you’ll learn in-depth how logs are aggregated, processed and stored, and how they are used in the security operations center (SOC).

What is Log Aggregation?

Log aggregation is the process of collecting logs from multiple computing systems, parsing them and extracting structured data, and putting them together in a format that is easily searchable and explorable by modern data tools.

There are four common ways to aggregate logs – many log aggregation systems combine multiple methods.


A standard logging protocol. Network administrators can set up a Syslog server that receives logs from multiple systems, storing them in an efficient, condensed format which is easily queryable.

Log aggregators can directly read and process Syslog data.

Event Streaming

Protocols like SNMP, Netflow and IPFIX allow network devices to provide standard information about their operations, which can be intercepted by the log aggregator, parsed and added to central log storage.

Log Collectors

Software agents that run on network devices, capture log information, parse it and send it to a centralized aggregator component for storage and analysis.

Direct Access

Log aggregators can directly access network devices or computing systems, using an API or network protocol to directly receive logs. This approach requires custom integration for each data source.

What is Log Processing?

Log processing is the art of taking raw system logs from multiple sources, identifying their structure or schema, and turning them into a consistent, standardized data source.

The Log Processing Flow


Each log has a repeating data format which includes data fields and values. However, the format varies between systems, even between different logs on the same system.

A log parser is a software component that can take a specific log format and convert it to structured data. Log aggregation software includes dozens or hundreds or parsers written to process logs for common systems.


Normalization merges events containing different data into a reduced format which contains common event attributes. Most logs capture the same basic information – time, network address, operation performed, etc.

Categorization involves adding meaning to events – identifying log data related to system events, authentication, local/remote operations, etc.


Log enrichment involves adding important information that can make the data more useful.

For example, if the original log contained IP addresses, but not actual physical locations of the users accessing a system, a log aggregator can use a geolocation data service to find out locations and add them to the data.


Modern networks generate huge volumes of log data. To effectively search and explore log data, there is need to create an index of common attributes across all log data.

Searches or data queries that use the index keys can be an order of magnitude faster, compared to a full scan of all log data.


Because of the massive volumes of logs, and their exponential growth, log storage is rapidly evolving. Historically, log aggregators would store logs in a centralized repository. Today, logs are increasingly stored on data lake technology, such as Amazon S3 or Hadoop.

Data lakes can support unlimited storage volumes with low incremental storage cost, and can provide access to the data via distributed processing engines like MapReduce, or modern high performance analytics tools.

Log Types

Almost every computing system generates logs. Below are a few of the most common sources of log data.

Endpoint Logs

An endpoint is a computing device within a network – such as a desktop, laptop, smartphone, server or workstation. Endpoints generate multiple logs, from different levels of their software stack – hardware, operating system, middleware and database, and applications. Endpoint logs are taken from the lower levels of the stack, and used to understand the status, activity and health of the endpoint device.

Router Logs

Network devices like routers, switches and load balancers are the backbone of network infrastructure. Their logs provide critical data about traffic flows, including destinations visited by internal users, sources of external traffic, traffic volumes, protocols used, and more. Routers typically transmit data via the Syslog format, and data can be captured and analyzed via your network’s Syslog servers.

Application Event Logs

Applications running on servers or end user devices generate and log events. The Windows operating system provides a centralized event log that collects startup, shutdown, heartbeat and run-time error events from running applications. In Linux, application log messages can be found in the /var/log folder. In addition, log aggregators can directly collect and parse logs from enterprise applications, such as email, web or database servers.Endpoint logs are taken from the lower levels of the stack, and used to understand the status, activity and health of the endpoint device.

IoT Logs

A new and growing source of log data is Internet of Things (IoT) connected devices. IoT devices may log their own activity and/or sensor data captured by the device. IoT visibility is a major challenge for most organizations, as many devices have no logging at all, or save log data to local file systems, limiting the ability to access or aggregate it. Advanced IoT deployments save log data to a central cloud service; many are adopting a new log collection protocol, syslog-ng, which focuses on portability and central log collection.

Common Log Formats

Common log formats: CSV, JSON, key value pair , Common Event Format (CEF)

  • CSV Log Format
    5:39:55 → Time
    [Fname, Lname, [email protected]] → User Credentials
    Sign-in Failed → Authentication Event → IP /app/office365 → App User Signed Into
  • JSON Log Format
    MachineName → User’s host
    Message → The event is a Kerberos service ticket (user already authenticated and sending access request for specific service)
    TimeGenerated → Time of event
    TargetUserName → Username attempting to login
    TargetDomainName → Domain user attempted to login to
    ServiceName → Service user attempted to log into
  • Common Event Format (CEF)
    CEF is an open log management standard that makes it easier to share security-related data from different network devices and applications. It also provides a common event log format, making it easier to collect and aggregate log data. CEF uses the syslog message format.
  • Common Event Format
    CEF:Version|Device Vendor|Device Product|Device Version|Signature

    Bracket enclosing Trend Micro .. 3.5.4 → Uniquely identifies the sending device. No two products may use the same vendor-product pair.
    600 → Unique identifier per event type, for example in IDS systems each signature or rule has a unique Signature
    ID 4 → Severity of the event from 1-10
    Suser=Master.. → a collection of key-value pairs which allow the log entry to contain additional info, from an extensive Extension Dictionary including events like deviceAction, ApplicationProtocol, deviceHostName, destinationAdress and DestinationPort, or custom events.
  • Sample Log Entry
    Jan 18 11:07:53 dsmhost CEF:0|Trend Micro|Deep Security Manager|3.5.4|600|Administrator SignedIn|4|suser=Master…

What is Log Monitoring?

There is a wealth of information in log files that can help identify problems and patterns in production systems. Log monitoring involves scanning log files, searching for patterns, rules or inferred behavior that indicates important events, and triggering an alert sent to operations or security staff.

Log monitoring can help identify problems before they are experienced by users. It can uncover suspicious behavior that might represent an attack on organizational systems. It can also help record baseline behavior of devices, systems or users, in order to identify anomalies that require investigation.

Common Security Log Events

  • Report from antivirus software that a device is infected by malware
  • Report from firewall about traffic to/from a prohibited network address
  • Attempt to access a critical system from an unknown host or IP address
  • Repeated failed attempts to access a critical system
  • Change in user privileges
  • Usage of insecure or prohibited protocols / ports

Common Security Incidents

  • Malicious email received and activated by organizational users
  • Malicious website accessed by organizational users (e.g., drive by download)
  • Improper or prohibited usage by an authorized user
  • Unauthorized access
  • An attempt to compromise, deny access to, or delete organizational systems
  • Loss or theft of equipment, such as employee laptops, servers
  • Data leak or malware infection via removable media

Log Analysis for Security with SIEM

In the security world, the primary system that aggregates logs, monitors them and generates alerts about possible security systems, is a Security Information and Event Management (SIEM) solution.

SIEM platforms aggregate historical log data and real-time alerts from security solutions and IT systems like email servers, web servers and authentication systems.

They analyze the data and establish relationships that help identify anomalies, vulnerabilities and incidents. The SIEM’s main focus is on security-related events such as suspicious logins, malware or escalation of privileges.

The SIEM’s goal is to identify which events has security significance and should be reviewed by a human analyst, and send notifications for those events. Modern SIEMs also provide extensive dashboards and data visualization tools, allowing analysts o actively seek data points that might indicate a security incident—known as threat hunting.

Traditional SIEM Log Analysis

Traditionally, the SIEM used two techniques to generate alerts from log data: correlation rules, specifying a sequence of events that indicates an anomaly, which could represent a security threat, vulnerability or active security incident; and vulnerabilities and risk assessment, which involves scanning networks for known attack patterns and vulnerabilities.

The drawback of these older techniques is that they generate a lot of false positives, and are not successful at detecting new and unexpected event types

Next-Generation SIEM Log Analysis

Advanced SIEMs use technology called User Event Behavioral Analytics (UEBA). UEBA leverages machine learning to look at patterns of human behavior, automatically establish baselines, and intelligently identify suspicious or anomalous behavior.

This can help detect risks that are unknown or difficult to define with correlation rules, such as insider threats, targeted attacks, fraud, and anomalies across long periods of time or across multiple organizational systems.

Using Endpoint Logs for Security

Traditionally, monitoring and security efforts focused on network traffic to identify threats. Today, there is a growing focus on endpoints, such as desktop computers, servers and mobile devices. Endpoints are frequently targeted by threat actors who can bypass traditional security measures—for example, a laptop forgotten on a train can be stolen by an attacker and used to penetrate organizational systems. Without careful monitoring of the laptop’s activity, this and similar attacks could go undetected.

Windows Event Logs

The Windows operating system provides an event logging protocol that allows applications, and the operating system itself, to log important hardware and software events. The events can be viewed directly by an administrator using the Windows Event Viewer.

Which events are logged?

Events logged in Windows event logs include application installations, security management (see Windows Security Logs below), initial startup operations, and problems or errors. All these event types can have security significance, and should be monitored by log aggregation and monitoring tools.

Example of Windows Event Log

Warning 5/11/2018 10:29:47 AM Kernel-Event Tracing      1 Logging

Windows Security Logs

The Windows Security Log is a part of the Windows Event Log framework. It contains security-related events specified by administrators using the system’s audit policy. Microsoft describes the Security Log as “Your Best and Last Defense” when investigating security breaches on Windows systems.

Which events are logged?

The following types of Windows log events can be defined as security events: account logon, account management, directory service access, logon, object access (for example, file access), policy change, privilege use, tracking of system processes, system events.

iOS Logs and iOS Crash Reports

Unlike Windows and Linux, the iOS operating system does not log system and application events by default, with the exception of application crash reports. iOS 10.0 onwards offers a logging API that allows specific applications to log application events and store them to a centralized location on disk. Log messages can be viewed using the Console app of the log command-line tool.

Because iOS does not provide convenient remote access to logs, several third-party solutions have emerged that allow for remote collection and aggregation of iOS logs.

Linux Event Logs

Linux logs record a timeline of events that occur in the Linux operating system and applications. Central system logs are stored in the /var/log directory, and logs for specific applications may be stored in the application folder, for example ‘~/.chrome/Crash Reports’ for Google Chrome.

Which events are logged?

There are Linux log files for system events, kernel, package managers, boot processes, Xorg, Apache, MySQL, and other common services. As in Windows, all these events could possibly have security significance.

Which are the most critical Linux logs to monitor?

  • /var/log/syslog or /var/log/messages – stores all activity data across the Linux system.
  • /var/log/auth.log or /var/log/secure – stores authentication logs
  • /var/log/boot.log – messages logged during startup
  • /var/log/maillog or var/log/mail.log – events related to email servers
  • /var/log/kern – Kernel logs
  • /var/log/dmesg – device driver logs
  • /var/log/faillog – failed login attempts
  • /var/log/cron – events related to cron jobs or the cron daemon
  • /var/log/yum.log – events related to installation of yum packages
  • /var/log/httpd/ – HTTP errors and access logs containing all HTTP requests
  • /var/log/mysqld.log or /var/log/mysql.log – MySQL log files

Managing Endpoint Detection and Response (EDR) Logs

Endpoint Detection and Response (EDR) technology helps to detect, investigate and mitigate security incidents on organizational endpoints. EDR is complementary to traditional endpoint tools such as antivirus, Data Loss Prevention (DLP) and SIEM. EDR technology provides visibility into events taking place on endpoints, including application access and activity, operating system operations, creation, modification, copying and movement of data, memory usage, and user access to predefined sensitive data.

EDR systems provide aggregated logs that allow security teams to analyze and explore events from across the enterprise endpoint portfolio.

Symantec Endpoint Protection Logs

Symantec Endpoint Protection is a security suite that includes intrusion prevention, firewall, and anti-malware. Endpoint Protection logs contains information about configuration changes, security-related activities such as virus detections, errors on specific endpoints, and traffic that enters and exits the endpoint.

Which events are logged?

Symantec Endpoint Protection log types include:

  • Policy modifications
  • Application and device control—events on endpoint devices where some behavior was blocked
  • Compliance logs
  • Computer status – operational status such as computer name, IP address, infection status
  • Deception logs – attacker interaction with “honeypots” deployed by the security solution
  • Network and host exploit mitigation
  • Virus scan events
  • Risk events detected by Symantec
  • System log – information about operating system and services.

McAfee Endpoint Security

McAfee Endpoint Security provides centralized management for endpoint devices, anti-malware protection, application containment, web security, threat forensics and machine learning analysis for detection of unknown threats.

The solution allows you to set each endpoint device to one of three log levels: no logging, event logging, and debug logging. Logs are saved on the endpoints in the McAfee folder.

Which events are logged?

McAfee Endpoint Security saves several log files on each endpoint device:

  • myAgent.log – aggregate log file containing historic logs
  • myNotices.log – notices and warnings generated by the McAfee agent
  • myUninstall.log – software uninstall events
  • myUpdate.log – software update events
  • myInstall.log – software installation events

Managing Firewall Logs

Firewall logs are extremely valuable for security analysis, because they contain trails of almost all traffic flowing into and out of your network. If malicious activity is occurring, even if it cannot be detected by known malware or attack signatures, it will be captured by the firewall and can probably be seen by analyzing firewall logs for unusual behavior.

For example, when a zero-day virus infects computers on your network, even if it cannot be detected yet by antivirus software, firewall logs may show unusually high numbers of denied connections, or allowed connections, with suspicious remote hosts. A routine review of firewall logs can discover trojans or rootkits trying to connect to their command and control systems via IRC, over the firewall.

Cisco Syslog and Logging Levels

Symantec Endpoint Protection is a security suite that includes intrusion prevention, firewall, and anti-malware. Endpoint Protection logs contains information about configuration changes, security-related activities such as virus detections, errors on specific endpoints, and traffic that enters and exits the endpoint.

Cisco routers save logs in syslog format, and also allow logs to be viewed by the admin interface. Messages are tagged with message codes—for example, most denied connections have a message code in the 106001 to 106023 range. Most firewall devices do not have local storage space, so logs must be configured to be sent elsewhere—Cisco allows saving logs to a syslog server on the network, via SMTP, via console port, telnet, or several other options.

What log entries are important to analyze?

  • Connections allowed by firewall security policies—these can help spot “holes” in the security policies
  • Connections denied by firewall security policies—might contain suspicious or attack behavior
  • Using the deny rate logging feature can show DoS or brute force attacks
  • IDS activity messages—show attacks identified by Cisco Intrusion Detection features
  • User authentication and command usage—let you review and audit firewall policy changes
  • Bandwidth usage—shows connections by duration and traffic volume—outliers could be interesting to investigate
  • Protocol usage messages—show protocols and port numbers—can show unusual or insecure protocols used on the network
  • NAT or PAT connections—check if you receive a report of malicious activity coming inside your network

Check Point Logging

Check Point routers can save logs in syslog format, and also allow logs to be viewed over an admin interface. Check Point routers maintain a security log which saves events that are deemed to have security significance.

Categories of events saved to security log:

  • Connection Accepted
  • Connection Decrypted
  • Connection Dropped
  • Connection Encrypted
  • Connection Rejected
  • Connection Monitored—a security event was monitored but not blocked according to current firewall policy
  • URL allowed—URL allowed for access by internal users
  • URL Filtered—URL disallowed for access by internal users
  • Virus Detected—virus detected in an email
  • Potential Spam Stamped—email marked as potential spam
  • Potential Spam Detected—email rejected as potential spam
  • Mail Allowed—non-spam email was logged
  • VStream Antivirus blocked a connection.

Severity levels in the Check Point security log:

  • Red—connection attempts blocked by the firewall, by security policy downloaded from the Service Center or user-defined rules
  • Orange—traffic detected as suspicious but accepted by the firewall
  • Green—traffic accepted by the firewall

The impact of a next-gen SIEM on the SOC can be significant:

  • Reduce Alert Fatigue via User Entity Behavioral Analytics (UEBA) that goes beyond correlation rules, helps reduce false positives and discover hidden threats.
  • Improve MTTD by helping analysts discover incidents faster and gather all relevant data.
  • Improve MTTR by integrating with security systems and leveraging Security Orchestration, Automation and Response (SOAR) technology.
  • Enable Threat Hunting by giving analysts fast and easy access and powerful exploration of unlimited volumes of security data.

Exabeam is an example of a next-generation SIEM which combines data lake technology, visibility into cloud infrastructure, behavioral analytics, an automated incident responder and a threat hunting module with powerful data querying and visualization.

Log Management and Next Generation SIEMs

Log management has always been complex, and is becoming more so with the proliferation of network devices, endpoints, microservices and cloud services, and exponentially increasing traffic and data volumes.

In a security environment, next-generation Security Information and Event Management (SIEM) solutions can help manage and extract value from security-relevant log events:

  • Next-generation SIEMs are based on data lake technology which can store unlimited data volumes of historical logs
  • Next-generation SIEMs come with User Event Behavioral Analytics technology which can automatically establish baseline activity for devices and users, and identify anomalous or suspicious behavior
  • Next-generation SIEMs provide advanced data exploration capabilities which can help security analysts perform threat hunting by actively searching through logs

Exabeam is an example of a next-generation SIEM platform that provides these capabilities. It can pull together logs from enterprise systems and security tools and perform the complete log management process, including log collection and aggregation, log processing, log analysis using advanced analytics and UEBA technology, and alerting about security incidents.

See Exabeam in action: Request a demo