Log Aggregation: How it Works, Methods, and Tools

Log Aggregation: How it Works, Methods, and Tools

Even if you aren’t aware, most of the devices your organization uses produce or are capable of producing event log data with valuable information about their activity, your system’s health, and functionality. Log aggregation can help you get the most out of these logs and minimize the time and headaches manually sifting through them. You will want to use a log management solution, like a SIEM, that includes log aggregation capabilities.

What are logs?

Logs are records with continuous streams of time-stamped events generated by systems and applications. They record event types, times, origins, and  whatever finer level of detail has been specified. They are used for debugging software, identifying security breaches, and providing insight into system operations. The file type of logs and the data structure vary by developer, application, and system.

Logs are crucial  to understanding the health of applications, network infrastructure, and security issues. When used correctly, it helps IT teams identify and address issues more quickly and ensure that faulty systems are not impeding worker productivity or customer experience. Log data is critical when using third-party applications or infrastructures, such as clouds, as the combination of added layers of complexity and the inability to modify functionality can lead to IT issues.

Even if you don’t want to make use of logs, most organizations are required to retain them for compliance with regulatory bodies and as proof of operations in the case of financial or forensic audits.


What is log aggregation?

Log aggregation is collecting logs from multiple computing systems, parsing them and extracting structured data, and putting them together in a format that is easily searchable and explorable by modern data tools.

There are four common ways to aggregate logs — many log aggregation systems combine multiple methods.

Syslog

A standard logging protocol. Network administrators can set up a Syslog server that receives logs from multiple systems, storing them in an efficient, condensed format which is easily queryable.

Log aggregators can directly read and process Syslog data.

Event Streaming

Protocols like SNMP, Netflow and IPFIX allow network devices to provide standard information about their operations, which can be intercepted by the log aggregator, parsed and added to central log storage.

Log Collectors

Software agents that run on network devices, capture log information, parse it and send it to a centralized aggregator component for storage and analysis.

Direct Access

Log aggregators can directly access network devices or computing systems, using an API or network protocol to directly receive logs. This approach requires custom integration for each data source.


What is log processing?

Log processing is the art of taking raw system logs from multiple sources, identifying their structure or schema, and turning them into a consistent, standardized data source.

The Log Processing Flow

01 – LOG PARSING

Each log has a repeating data format which includes data fields and values. However, the format varies between systems, even between different logs on the same system.

A log parser is a software component that can take a specific log format and convert it to structured data. Log aggregation software includes dozens or hundreds or parsers written to process logs for common systems.

02 – LOG NORMALIZATION ADN CATEGORIZATION

Normalization merges events containing different data into a reduced format which contains common event attributes. Most logs capture the same basic information – time, network address, operation performed, etc.

Categorization involves adding meaning to events – identifying log data related to system events, authentication, local/remote operations, etc.

03 – LOG ENRICHMENT

Log enrichment involves adding important information that can make the data more useful.

For example, if the original log contained IP addresses, but not actual physical locations of the users accessing a system, a log aggregator can use a geolocation data service to find out locations and add them to the data.

04 – LOG INDEXING

Modern networks generate huge volumes of log data. To effectively search and explore log data, there is need to create an index of common attributes across all log data.

Searches or data queries that use the index keys can be an order of magnitude faster, compared to a full scan of all log data.

05 – LOG STORAGE

Because of the massive volumes of logs, and their exponential growth, log storage is rapidly evolving. Historically, log aggregators would store logs in a centralized repository. Today, logs are increasingly stored on data lake technology, such as Amazon S3 or Hadoop.

Data lakes can support unlimited storage volumes with low incremental storage cost, and can provide access to the data via distributed processing engines like MapReduce, or modern high performance analytics tools.


Log types

Almost every computing system generates logs. Below are a few of the most common sources of log data.

Endpoint logs

An endpoint is a computing device within a network – such as a desktop, laptop, smartphone, server or workstation. Endpoints generate multiple logs, from different levels of their software stack – hardware, operating system, middleware and database, and applications. Endpoint logs are taken from the lower levels of the stack, and used to understand the status, activity and health of the endpoint device.

Router logs

Network devices like routers, switches and load balancers are the backbone of network infrastructure. Their logs provide critical data about traffic flows, including destinations visited by internal users, sources of external traffic, traffic volumes, protocols used, and more. Routers typically transmit data via the Syslog format, and data can be captured and analyzed via your network’s Syslog servers.

Application event logs

Applications running on servers or end user devices generate and log events. The Windows operating system provides a centralized event log that collects startup, shutdown, heartbeat and run-time error events from running applications. In Linux, application log messages can be found in the /var/log folder. In addition, log aggregators can directly collect and parse logs from enterprise applications, such as email, web or database servers.Endpoint logs are taken from the lower levels of the stack, and used to understand the status, activity and health of the endpoint device.

IoT logs

A new and growing source of log data is Internet of Things (IoT) connected devices. IoT devices may log their own activity and/or sensor data captured by the device. IoT visibility is a major challenge for most organizations, as many devices have no logging at all, or save log data to local file systems, limiting the ability to access or aggregate it. Advanced IoT deployments save log data to a central cloud service; many are adopting a new log collection protocol, syslog-ng, which focuses on portability and central log collection.

Proxy logs

Many networks maintain a transparent proxy, providing visibility over the traffic of internal users. Proxy server logs contain requests made by users and applications on a local network, and application or service requests made over the Internet, such as application updates. To be valid, proxies must be enforced across all, or at least critical, segments of user traffic, and measures must be in place to decrypt and interpret HTTPS traffic..

Common log formats

Common log formats: CSV, JSON, key value pair , Common Event Format (CEF)

  • CSV Log Format
    5:39:55 → Time
    [Fname, Lname, name@company] → User Credentials
    Sign-in Failed → Authentication Event
    173.0.0.0 → IP /app/office365 → App User Signed Into
  • JSON Log Format
    MachineName → User’s host
    Message → The event is a Kerberos service ticket (user already authenticated and sending access request for specific service)
    TimeGenerated → Time of event
    TargetUserName → Username attempting to login
    TargetDomainName → Domain user attempted to login to
    ServiceName → Service user attempted to log into
  • Common Event Format (CEF)
    CEF is an open log management standard that makes it easier to share security-related data from different network devices and applications. It also provides a common event log format, making it easier to collect and aggregate log data. CEF uses the syslog message format.
  • Common Event Format
    CEF:Version|Device Vendor|Device Product|Device Version|Signature
    ID|Name|Severity|Extension

    Bracket enclosing Trend Micro .. 3.5.4 → Uniquely identifies the sending device. No two products may use the same vendor-product pair.
    600 → Unique identifier per event type, for example in IDS systems each signature or rule has a unique Signature
    ID 4 → Severity of the event from 1-10
    Suser=Master.. → a collection of key-value pairs which allow the log entry to contain additional info, from an extensive Extension Dictionary including events like deviceAction, ApplicationProtocol, deviceHostName, destinationAdress and DestinationPort, or custom events.
  • Sample Log Entry
    Jan 18 11:07:53 dsmhost CEF:0|Trend Micro|Deep Security Manager|3.5.4|600|Administrator SignedIn|4|suser=Master…

Log aggregation methods

As organizations expand and adopt a wider variety of applications, services, and infrastructures, logs become dispersed across locations and their usefulness drastically decreases due to inaccessibility and difference in data format. This issue can be solved with log aggregation, which centralizes log data, making it easier to analyze and search.

When logs are aggregated, the amount of time you need to spend tracking down files, deciphering data formats, and searching for specific errors within logs, much less connecting information between logs, drastically decreases. Aggregated logs are easier to analyze and provide a more robust view of your operations than can be accomplished through individual examination.

There are several methods you can choose to aggregate your logs, depending on your technical abilities and needs. These include:

  • Syslog — collects log data through a standard logging protocol; requires a central network daemon and client daemons for each log source to be forwarded
  • Event streaming — collects log data from network devices through streaming protocols like SNMP, Netflow, or IPFIX
  • Log collectors — collects logs from sources in real-time through an agent, typically a third-party option
  • Direct access — collects log data directly from network devices or systems through API or network protocol integration

3 open source log aggregation tools

There are several of third-party tools that have been created for log aggregation, and the ones you choose will depend on your specific needs. If you’re looking for solutions that you can completely customize, the following open-source tools might be for you. Keep in mind that although the tools themselves are free for many solutions, they require you to manage and maintain your system and cost in terms of operational complexity.

1. Elastic (Formerly ELK)

A popular solution involves creating a log management service with the Elastic Stack, also known as ELK due to its makeup of the following tools:

  • Elasticsearch — a near real-time RESTful search and analytics engine that indexes data for faster use and can integrate with Hadoop
  • Logstash — a log ingestor and processing pipeline system that transforms data and loads it into Elasticsearch for analysis
  • Kibana — a data visualization tool that includes machine learning functionality

Beats, a tool that can be included, is a set of agents that collect and send data to Elasticsearch directly or through Logstash and  metadata for context.

Elastic is highly flexible and customizable and can even provide some of the capability of a Security Information and Event Management (SIEM) system, but it cannotgenerate alerts without a paid add-on. Elastic can be hosted on-premise or in the cloud and its popularity means that it is well supported, including third-party services that can operate the system on your behalf for a fee.

2. Fluentd

Recommended by AWS and Google Cloud, Fluentd is a local aggregator that is often used as a replacement for Logstash in an Elastic stack. It uses a plugin system to create a Unified Logging Layer that integrates a variety of data sources from which it collects logs and sends them to a central storage system.

Fluentd currently has around 500 plugins available and its open-source nature allows you to create new ones as needed. Part of its popularity is due to its low resource requirements. It runs on only 30-40MB of memory and can process 13,000 events per second per core in use and can be used with an even lighter weight data forwarder called Fluent Bit.


Considerations for choosing log aggregation and management tools

Log management solutions must be powerful enough to ingest massive amounts of data from a wide variety of sources regardless of log format and be scalable to accommodate spikes in log volume. Provided these conditions are met, you should consider the following before selecting your solution.

Log collection

The solution you choose should grant control over how and when logs are collected and centralize collected log data outside of live applications. You need to be able to automate the process of log collection to reduce the impact on system resources, ensure that all errors are collected, and protect from data loss due to server failures.

Log ingestion

Solutions must be able to collect, format, and import data from external sources, including applications, servers, and platforms. Collected logs need to include all necessary data, be efficiently stored and indexed, and be easily accessible to teams for analysis and monitoring.

Log analysis and search

A good solution enables users to search using natural language structures and returns results quickly. Solutions should present log activity as close to real-time as possible and allow point-in-time searches.

Log monitoring and alerts

Solutions need to include the ability to set up customized alerts, including rules for when alerts are created and to whom they are sent. You should be able to trigger alerts from a wide variety of events using criteria such as the number of errors per minute and have them sent to a variety of sources, from Slack groups to personal email addresses.

Visualization and reporting

Effective solutions provide visualizations and reporting on system states and log volume for point-in-time analysis andover user-defined periods. The shift to DevSecOps teams necessitates the use of tools that simplify reporting and allow for easy sharing and viewing of requested reports, including graphs in visualizing data.

Cost-effectiveness

Efficient solutions should be able to meet your data requirements, in terms of volume and length of retention, at a reasonable cost. Solutions that offer flexibility and scalability, and those that provide granular pricing, are the best options.

Cost-effectiveness

Efficient solutions should be able to meet your data requirements, in terms of volume and length of retention, at a reasonable cost. Solutions that offer flexibility and scalability, and those that provide granular pricing, are the best options.


Conclusion

Log aggregation can mean the difference between identifying an application issue within an hour and having that application out of commission for a week while you struggle to connect the dots between an uncountable number of logs. A sound  log management system can simplify your search for errors and even alert you to possible issues before impacting productivity, allowing you to make the most of your time and energy.

See Exabeam in action: Request a demo