Six Design Considerations for your Security Data Lake - Exabeam

Panacea or Money Pit? Six Design Considerations for Your Security Data Lake

October 18, 2022


Reading time
9 mins

Large numbers of enterprises, such as retail conglomerates, consumer banks, airlines, and insurance companies are joining the rush to set up data lakes for handling petabytes of security data and logs. But many executives and architects assume that once they finish setting up log sources, applying parsers, and arming their security operations analysts with reports, their data lake will deliver the goods. Alas, if only that were true!

We’ve heard the war stories: internal politics on building versus buying, years to deploy, millions of dollars invested, and, unfortunately, multiple security threats missed or detected too late. If you’re considering investing in a new security data lake or replacing your existing one, take a pause. Be mindful of how you prepare — especially how you design yours.

To help with this exercise, here are six design considerations for your security data lake based on our interactions with successful Exabeam customers.

In this article:

  1. Design with the end in mind
  2. Listen to your users
  3. Identify all your data sources and retention needs
  4. Regarding disaster recovery, high availability, and fault tolerance — know what you need and why
  5. Don’t put the cart before the horse
  6. Think long-term and design for growth

1. Design with the end in mind

Clearly define your business goals, constraints, and use cases before designing your data lake. Many businesses find it easiest to carry these over from their legacy data lake. This is possibly the biggest mistake you can make, as you want to be certain you don’t miss out on new opportunities by designing for the future. Lending informed thought to how your new data lake will be used is a prerequisite for its design.

What’s more, your business goals and constraints will also influence the fundamental architecture decisions you’ll have to make. Here are some questions for you to consider:

  • Do you have strategic goals related to migrating to the cloud?
  • Do you foresee growth in certain types of business applications and infrastructure resources over the next 3–5 years?
  • Do you have limited bandwidth and data source connectivity challenges, or do you cluster infrastructure in some locations?
  • Do you need geographical isolation of logs for compliance to regulations?
  • Did you realize significant cost savings through Google Cloud Platform or long-term leases of data centers in some locations?
  • Make certain to capture all such details before you start designing. They will help you answer questions regarding cluster sizing and decentralization.

Determine the core value of your security data lake. Is it…

  1. Acquiring and aggregating data?
  2. Curating and enriching data?
  3. Supporting insight generation?

Or will your data lake be a combination of all three? This will help you identify the integrations and advanced features you really need. Otherwise you could be forced to painstakingly build and maintain custom capabilities.

2. Listen to your users

Perhaps you’re wondering why this is being called out so high on the list. Here’s why:

What comes naturally to the initial business and technical evaluators of a data lake vendor doesn’t necessarily carry over to the real-world analysts who are called upon to investigate security threats under a mountain of stress.

Your users (security operations analysts, admins, auditors) — those who’ve been using your existing system — will also be the users of the new security log management solution. Collected from the front line, they’re the ones who have the insights to inform you about what works and what doesn’t. Learn what a day in their life looks like. This will enable you to place a premium on user experience and performance.

Here are some additional considerations:

  • Are your users analysts or engineers?
  • Do they have any unique data preparation, transformation requirements?
  • How many team members? How much time should they have to allocate to daily tasks and repetitive actions that could be better served through automation?
  • What level of automation do they expect? In what areas?
  • What is their level of comfort in learning a new query language?

When we originally created Exabeam Data Lake, we fully understood these challenges. Since then, we have changed our core backbone tools and expanded our commercial offering in this space.  When we moved to the Google Cloud Platform, we had an opportunity to pick the best combination of search speed and visualization as part of building a new security log management infrastructure, and we went to great lengths to fully optimize it for the security operations team experience. A new search experience offers a simple point-and-click user interface with options to choose from, like correlation rule creation, custom dashboard and report development, and  pre-built compliance reporting.

3.  Identify all your data sources and retention needs

Four dimensions of logs — volume, velocity, variety, and retention — require your focus when designing a robust logging infrastructure.

  • Identify those data sources that generate a variable volume of logs and are prone to spikes due to traffic volume, seasonality, and other reasons. Exabeam Log Stream has the ability to pull the right fields from logs that you need to build meaningful events and store long term — offering both faster indexing, context enrichment, and more affordable long-term storage.
  • Understand different log formats and the proportion of structured vs. unstructured data. This will help you plan and prepare parsing requirements before you begin deployment. With the Exabeam Collectors and Log Stream, you don’t need to think about writing regular expressions to parse your content — we provide thousands of pre-built parsers (actually 7,937) to help you make sense of your security logs. The Exabeam Common Information Model is built into the Collector and Log Stream functions — making new log parsers faster to ingest, enrich, and build events from.
  • Know your log retention requirements upfront. Logs can be more easily searched using “searchable retention.” Be ready to talk with your vendor about your long-term search and long-term storage needs. They may not be identical; make sure that your purchasing and quote details are in alignment with your governance needs.

4. Regarding disaster recovery, high availability, and fault tolerance — know what you need and why

With the rise of global distributed systems, businesses are quick to outline all three as must-have requirements for their security log management. But often they fall prey to specifying “the works” without really understanding what their business requires. When quizzed about these capabilities, many IT and DevOps teams admit they don’t have an answer. Teams have invested millions in creating a fully operational mirror site — when all they needed was to meet compliance requirements pertaining to a set of log copies in a secondary location. You don’t want to make the same mistake.

Disaster recovery typically refers to a set of policies and procedures that restore operations and mission-critical system availability. Banks and other regulated industries require detailed disaster recovery mechanisms. But each business must determine the level of automation and sophistication it requires for itself. Some teams operate mirror sites with full replication of logs, context data, and user defined data; this can become very complex and costly.

High availability means your system offers a high level of operational performance for a given period of time. You want to be certain there is no single point of failure across your infrastructure and data pipelines. But businesses have different SLAs for availability. What makes sense for a bank may not necessarily apply to a SaaS business. So don’t fall into the “more nines than we require” trap.

Fault tolerance means you can continue to ingest data and secure your organization regardless of any failures within or outside of Exabeam. Exabeam products offer a high degree of availability and fault tolerance throughout the pipeline. As a cloud-managed service, Exabeam will deploy, maintain, scale, upgrade and patch its service so you can focus on what’s important, securing your organization. 

5.  Don’t put the cart before the horse

No doubt you’ve at least heard of Apache Spark, Amazon Kinesis, Elasticsearch, Kafka, Hadoop, and/or Cloudera from your engineers or IT. These are all very powerful technologies that leading innovative companies have used to solve data storage and modeling for  infrastructure and business problems. But that doesn’t mean they provide a prepackaged solution for all your security data lake or event store needs. So the next time someone on your team informs you about this terrific new technology or the next big thing, ask them why it matters to your organization and about which specific business problems it solves — not to mention, how long do they want to get operational covering core use cases like insider threat, compromised insiders or external threats like ransomware or phishing.

6. Think long-term and design for growth

As your business grows, your production environments correspondingly increase in scale, variety, and complexity. Your log volume will also expand, as you’ll want to add logs from newer applications and infrastructure that weren’t online when you initially designed your data lake. In selecting an architecture that can grow with your business, you want a system that lets you scale in a predictable and cost-efficient manner. As a cloud-managed service, Exabeam offers you that flexibility. It easily scales without having to re-architect your infrastructure. And what if your business expands geographies? Will your data lake easily accommodate reimaging at a German, Singaporean, or French data center? 

If you would like to share feedback with the Exabeam product management team, feel free to reach out through your technical account manager or leave a message on Exabeam Community.

Get Exabeam product news and announcements

Catch up on monthly product news, and visit the Exabeam Community for webinars and announcements.

Exabeam Community

Similar Posts

The New CISO Podcast: Solving Security Puzzles

New-Scale SIEM Brings Powerful Behavioral Analytics and Automated Investigation to Threat Detection, Investigation, and Response

Exabeam Security Log Management — Because Security Operations Isn’t IT Operations

Recent Posts

Fourth-gen SIEM is New-Scale SIEM: Cloud-native SIEM at Hyperscale

The New CISO Podcast: Solving Security Puzzles

Understanding UEBA: From Scored Events to Stories

See a world-class SIEM solution in action

Most reported breaches involved lost or stolen credentials. How can you keep pace?

Exabeam delivers SOC teams industry-leading analytics, patented anomaly detection, and Smart Timelines to help teams pinpoint the actions that lead to exploits.

Whether you need a SIEM replacement, a legacy SIEM modernization with XDR, Exabeam offers advanced, modular, and cloud-delivered TDIR.

Get a demo today!