MITRE Publishes Domain Generation Algorithm T1483 in the ATT&CK Framework

May 02, 2019
Shubham Goel
3 minutes to read

Table of Contents

MITRE released a significant update to the MITRE ATT&CK framework which included several new and updated techniques to mitigate cybersecurity threats. One of those techniques, domain generation algorithms (DGA) was submitted by our research team. We are excited to be contributors and will explain this technique in more detail.

Domain Generation Algorithm (T1483)

Introduction

The domain generation algorithm has remained a main source of communication for malware in the past 10 years. DGAs are designed to generate quick random seeds such as dictionary words, DWORD values, random digits, gibberish strings (hcbhjbdjbjhsb.ru) as domains which can be used to provide instructions for malware to exfiltrate data, provide updates and execute commands on a system remotely. Earlier families of malware used a static list of IP/domains which were eventually blocked by defenders. Attackers now write sophisticated DGA codes to circumvent defenders and draft thousands of DGAs, of which only a few have true instructions for command and control, to make their connection persist over the long term and stay resilient against enforcement actions. Malware like Kraken, Conficker, Murofet and Chopstick showcase DGAs whose attributes vary from date dependent, static and dynamic seeds.

Detection

Detecting dynamically generated domains can be challenging due to the rapid rotation of DGA seeds, constantly evolving malware families, sinkhole awareness, and the complexity of the DGA algorithm.

This makes signature-based detection to these signals irrelevant and requires a machine learning approach to detect those efficiently.

Using machine learning, DGA becomes a solvable problem. Some may be familiar with n-grams from natural language processing, where analysts count the frequency of how often words follow each other in normal speech or writing. Similarly, n-grams can be used to analyze the words in a domain name. If the words in a suspected domain name never follow each other in common use on the internet, they then have a high probability of being random.

One approach is to segment the domains into substrings with the size of “n”. Each substring of length n is called a gram. The larger the value of n, the smaller the number of substrings and vice versa as can be seen in the figure below. According to research 3, 4, 5 have the best accuracy when predicting randomness. For example, a 3-gram approach for word “youarepwned” would be you, oua, uar, epw, pwn, wne, ned. We, therefore, test the substrings (you, uar, epw) for randomness by excluding the top 1 million ranked domains such as the Alexa top million websites for example. We then check domains with high randomness against CDN whitelists and if the domain is not present, we check for a threshold value. If the level of randomness is higher than the threshold, it is deemed as a DGA-generated domain. This helps us increase the chance of detection and predict the range of random domains that may be generated.

We apply the same method for second- and third-level domains to detect the occurrence of the word, for example, xuxu(dot)youarepwned(dot)net). Our behavior-based analytics approach mixed with deep learning helps detect DGA before it causes any damage. In a previous post, our Chief Data Scientist Derek Lin discusses behavioral modeling for DGAs using a random forest model. For example, in the case of ransomware, the attacker will encrypt the files and request encryption keys and send sensitive data. Any malicious request in addition to a random DGA will provide substantial evidence for abnormal activity that can be further investigated. Inputs include logs that contain domain access information, such as DNS NX domain logs, DNS request/response logs, WHOIS information, passive DNS traffic, proxy traffic and EDR activity.

Conclusion

Signature-based detection is not considered an effective measure against DGAs because of the rapid changes in the algorithm. In addition to techniques like entropy change, frequency analysis and Markov chains Exabeam provides extensive detection techniques for behavior analytics using n-gram and machine learning. With the evolution of DGA techniques, it will be challenging to predict an adversary’s action and underscores the need to better prepare ourselves by sharing intelligence and working collaboratively.

Learn More About Exabeam

Learn about the Exabeam platform and expand your knowledge of information security with our collection of white papers, podcasts, webinars, and more.

White Paper
Unlocking the Power of AI in Security Operations: A Primer

Read Now
Blog
Seeing the Invisible: Visualizing and Protecting AI-Agent Activity with Exabeam & Google

Read Now
Podcast
Pick Your Pain: A Methodical Approach to Career Growth

Listen Now
White Paper
10 Reasons to Augment Your SIEM with Behavioral Analytics

Read Now
Blog
Why Rule Count Is a Misleading KPI for SIEM

Read Now
Guide
Eight Ways Agentic AI Will Reshape the SOC

Read Now
Show More

The Exabeam Product Portfolio

Exabeam Solutions

Resources

Why Exabeam

MITRE Publishes Domain Generation Algorithm T1483 in the ATT&CK Framework

Domain Generation Algorithm (T1483)

Introduction

Detection

Conclusion

Further reading

Learn More About Exabeam

Unlocking the Power of AI in Security Operations: A Primer

Seeing the Invisible: Visualizing and Protecting AI-Agent Activity with Exabeam & Google

Pick Your Pain: A Methodical Approach to Career Growth

10 Reasons to Augment Your SIEM with Behavioral Analytics

Why Rule Count Is a Misleading KPI for SIEM

Eight Ways Agentic AI Will Reshape the SOC