When it comes to artificial intelligence (AI) and machine learning (ML), there’s no shortage of buzz and hype. Often referred to interchangeably, artificial intelligence and machine learning are part of our daily reality and technology lexicon—whether it’s in a product marketing pitch or a Netflix recommendation for which movie to see.
AI, ML, and Deep Learning in Cybersecurity
In cybersecurity, as these and other emerging technologies like deep learning (DL) evolve, their capabilities have become a driving force shaping modern cybersecurity solutions. At the same time security practitioners, fatigued by the barrage of artificial intelligence and machine learning messaging, are raising suspicions about vendor claims.
At the recent InteropITX conference, panelists echoed the same sentiment about the hype, asking what can be legitimately claimed as artificial intelligence. The audience was encouraged to look beyond the marketing spin and find out what’s really being offered.
I’m glad to see the hype cycle has reached its peak. It’s a healthy sign that security practitioners are asking the right questions and demanding to know what constitutes reality.
In order to ask the right questions, let’s start with a correct understanding of the terminology. Despite all the marketing messaging, for many of us it’s not always clear what some terms mean.
Figure 1: The relationship between artificial intelligence, machine learning, and deep learning
AI is often misunderstood, and not everyone agrees on what it means. The term artificial intelligence first appeared in the 1950s to describe systems comprising a set of human-defined, if/then decision rules—which have always been easily broken and hard to maintain.
For example, static correlation rules that raise alerts—used in traditional security information and event management (SIEM)—cannot learn and adapt. This results in a high number of false positives. Such AI systems appear to be intelligent in their decision-making because they make decisions. But in reality, they’re 100% predetermined (based on static rules) and are drafted by humans.
But the word “intelligence” has stuck with the public since AI’s introduction. Why not? It sounds cool. Yet today AI is often little more than a catchy marketing label, liberally applied to any system that performs tasks having some semblance of automated decision-making.
Contrast this with the modern Exabeam SIEM that dynamically learns from the behavioral patterns in data in order to make its decisions.
Machine learning is often expressed in the same breath as AI. But machine learning is more specific. To learn from collected data, it uses algorithms for prediction, classification, and insight generation.
With machine learning, a formal body of methods are grounded in solid mathematical foundations. Applied to cybersecurity, the right problems must be matched with the right machine learning tools.
But not all problems require advanced machine learning tools. For example, some popular indicators used in user behavior analytics (UBA) are based on simple statistical analysis, such as p-value hypothesis testing used for rare event detection.
On the other hand, many cybersecurity problems cannot be solved without machine learning. Consider the phishing scam domain detection shown in Figure 2. Here, the URLs, WHOIS data, other properties, as wells as the known (legitimate or malicious) labels of URLs are examined in a supervised learning setting to predict whether a domain is malicious. It does so without resorting to conventional, but less effective, blacklist-based matching.
Figure 2: Supervised learning for phishing domain detection
This is all the rage today. As with AI, deep learning evokes an air of sophistication, but it’s also subject to misunderstandings. As a tool within machine learning, deep learning is highly dependent on matching the right problems to the right tools.
Deep learning applications are best suited in the image processing and natural language processing fields. In cybersecurity, it has found a home in packet stream and malware binary analysis. These benefit most from supervised learning, when labeled (i.e., legitimate vs. malicious) data is available.
But for insider threat detection, deep learning doesn’t enjoy wide adoption for several technical reasons. One is the black box nature of the model, where it’s impossible to explain the causes of the alerts. This renders investigations difficult.
Peer behind the messaging and examine what’s under the hood
The cybersecurity marketplace is buzzing with AI and ML terminology. This isn’t surprising, as data-driven approaches do lead to exciting applications that were never possible before. That said, it’s all too easy to get confused—and lost in the hype.
Ask how are the problems or use cases being framed? Which analytical approaches are being used and why? Transparency and a thorough understanding of the terms and their use cases will help you demystify the hype.