Cybersecurity tools and techniques have to move fast to keep up with new and evolving cyber threats. The cutting edge of cybersecurity technology leverages machine learning making it possible for organizations to stay ahead of cyber threats. These are algorithms and systems that can use data from networks and applications to smartly detect, investigate, and even respond to cyber threats.
In this article you will learn:
- What is machine learning for cybersecurity and how it is used?
- The need for machine learning to defend against evolving threats
- Four applications of machine learning for cybersecurity
- How machine learning for cybersecurity is used in the modern SOC
What is machine learning for cybersecurity and how it is used?
Machine learning is built to approximate the processes of the human mind and allows computers to analyze information, make decisions and learn from past experiences.
In cybersecurity, machine learning algorithms help security teams save time by automatically identifying security incidents and threats, analyzing them, and even automatically responding to them in some cases. Machine learning is built into many modern security tools. It is gradually replacing older methods of inference, such as manually-defined rules and statistical correlations.
Machine learning algorithms come in many shapes and forms, but most of them perform one of three tasks:
|Machine Learning Task||How it Works||Cybersecurity Example|
|Regression||This algorithm identifies correlations between different datasets and understands how and to what degree they are related to each other.||Regression can be used to predict the next system call of an operating system process, and compare it to the actual call to identify anomalies.|
|Classification||Usually performed by supervised learning algorithms, which “train” on a dataset of previous observations, and try to apply what they learn to new, unseen data. Involves taking artifacts, which may be textual or multimedia content, and classifying them into one of several labels.||Classification can be used to classify a binary file into categories like legitimate software, spyware, adware and ransomware.|
|Clustering||Usually performed by unsupervised learning algorithms, which work directly on new data without considering previous examples. Clustering involves identifying commonalities between artifacts and grouping them according to their common features.||Clustering can be used to analyze traffic sessions and identify groups of sessions that may originate from the same source, to identify DDoS attacks.|
The need for machine learning for cybersecurity
Cyber threats are growing in complexity. Hundreds of millions of new strains of malware are identified each year. New types of malware programs can avoid detection by traditional anti virus, or even operate without using binary files at all (fileless attacks).
Attacks are becoming more multilayered, involving a combination of network-based techniques, malware, and web application attacks. Insider threats are a growing problem, and insider attacks are very difficult to distinguish from legitimate user activity. Attackers are also leveraging devices like mobile phones, connected devices in the office and home, and IoT infrastructure to carry out large scale attacks.
Artificial intelligence (AI) systems based on machine learning algorithms can help detect and mitigate many of these new threats. They are able to analyze a much larger volume of data than human security professionals, intelligently identify anomalies and suspicious behavior, and investigate threats by correlating many data points.
AI security systems are not perfect, and require human monitoring, setup and tuning, but are becoming an essential part of the cybersecurity arsenal in the 21st century.
Four uses of machine learning for cybersecurity
1. Network threat identification
Machine learning algorithms can be used to analyze large volumes of network traffic, both internal and external, and identify patterns that might be part of a security incident. There are two general approaches:
- User and event behavioral analytics (UEBA) which analyzes baseline network behavior and identifies anomalies that have security significance.
- Threat intelligence that involves correlating network traffic with known attack techniques, IP addresses and other identifiers of known threat actors, and alerting that a known attack or attacker may be active on the network.
2. Automated application security
Automated application security tools can use machine learning to identify anomalous traffic, and even block it or respond automatically to an attack. They can identify malicious behavior such as unauthorized access and misuse of privileged accounts. They can also help automatically detect and avoid software vulnerabilities through static or dynamic code analysis.
Machine learning can be used to improve the accuracy of existing approaches for detecting spam, malware or social engineering in email messages. Machine learning-based classification of images or other attachments to emails can help identify threats. Natural language processing (NLP) approaches analyze text within emails to see if the email may be part of a phishing campaign, and to analyze links within the email and decide if they are safe.
4. Next-gen antivirus
Traditional antivirus depends on signatures of known malware variants. This approach cannot deal with zero-day malware – new viruses that are not yet known to researchers or antivirus providers. It is also vulnerable to malware “mutations” where a virus is deliberately modified to evade detection. New AI-based approaches analyze malware source code or activity to understand if the software is legitimate or may be doing something malicious.
Machine Learning in the Modern SOC
Many organizations are building security operation centers (SOC), with a security information and event management (SIEM) at their heart. A SIEM aggregates security data and events from across the organization, correlates and analyzes it, to help identify and respond to security incidents.
Modern SIEM technology uses machine learning in a few ways to help identify threats more accurately and respond to them faster:
- Addressing unknown risks—identifying zero-day attacks and insider threats which appear very similar to regular user activity.
- Identify anomalies in user or device behavior—modelling normal behavior of users, network devices, or groups of peers, and identifying when a user or device deviates from the norm and exhibits suspicious behavior.
- Control false positive rate—machine learning algorithms used within behavior analytics can help control false positive rate by monitoring and tuning rules triggered because of anomalies.
- Phishing URL detection—there are many public and commercial data providers that offer blacklisting services or databases for potential phishing domain/URL lookup. However, like any signature-based approaches, newly-crafted phishing URLs cannot be identified this way. Machine learning offers a solution used for such a prediction task.
- Malicious domain detection—Malware’s use of algorithmically generated domain names has been around for a while. Sophisticated hackers have moved on to use randomly generated domain names based on dictionary words, instead of the 30+ alphanumeric characters. Machine learning approaches can be used to detect malicious domains by incorporating different techniques such as graph analysis, behavior modeling of domains and more.
- Detect network anomalies—modelling normal network behavior and identifying if something strange is happening on the network compared to a specific network segment, traffic type, time of day or period.
- Automated incident response—executing automated security playbooks in response to threats detected by machine learning techniques.
To see an example of a SIEM system that uses machine learning to detect and respond to security threats, learn more about the Exabeam Security Management Platform.
- Information Security: Goals, Types and Applications
- The 8 Elements of an Information Security Policy
- What is MITRE ATT&CK: An Explainer
- MITRE Publishes Domain Generation Algorithm T1483 in the ATT&CK Framework