Skip to content

Securing the Future of Work: Agent Behavior Analytics with Google Cloud — Read the Blog

MITRE Publishes Domain Generation Algorithm T1483 in the ATT&CK Framework

  • May 02, 2019
  • Shubham Goel
  • 3 minutes to read

Table of Contents

    MITRE released a significant update to the MITRE ATT&CK framework which included several new and updated techniques to mitigate cybersecurity threats. One of those techniques, domain generation algorithms (DGA) was submitted by our research team. We are excited to be contributors and will explain this technique in more detail.

    Domain Generation Algorithm (T1483)

    Introduction

    The domain generation algorithm has remained a main source of communication for malware in the past 10 years. DGAs are designed to generate quick random seeds such as dictionary words, DWORD values, random digits, gibberish strings (hcbhjbdjbjhsb.ru) as domains which can be used to provide instructions for malware to exfiltrate data, provide updates and execute commands on a system remotely. Earlier families of malware used a static list of IP/domains which were eventually blocked by defenders. Attackers now write sophisticated DGA codes to circumvent defenders and draft thousands of DGAs, of which only a few have true instructions for command and control, to make their connection persist over the long term and stay resilient against enforcement actions. Malware like Kraken, Conficker, Murofet and Chopstick showcase DGAs whose attributes vary from date dependent, static and dynamic seeds.

    Detection

    Detecting dynamically generated domains can be challenging due to the rapid rotation of DGA seeds, constantly evolving malware families, sinkhole awareness, and the complexity of the DGA algorithm.

    This makes signature-based detection to these signals irrelevant and requires a machine learning approach to detect those efficiently.

    Using machine learning, DGA becomes a solvable problem. Some may be familiar with n-grams from natural language processing, where analysts count the frequency of how often words follow each other in normal speech or writing. Similarly, n-grams can be used to analyze the words in a domain name. If the words in a suspected domain name never follow each other in common use on the internet, they then have a high probability of being random.

    One approach is to segment the domains into substrings with the size of “n”. Each substring of length n is called a gram. The larger the value of n, the smaller the number of substrings and vice versa as can be seen in the figure below. According to research 3, 4, 5 have the best accuracy when predicting randomness. For example, a 3-gram approach for word “youarepwned” would be you, oua, uar, epw, pwn, wne, ned. We, therefore, test the substrings (you, uar, epw) for randomness by excluding the top 1 million ranked domains such as the Alexa top million websites for example. We then check domains with high randomness against CDN whitelists and if the domain is not present, we check for a threshold value. If the level of randomness is higher than the threshold, it is deemed as a DGA-generated domain. This helps us increase the chance of detection and predict the range of random domains that may be generated.

    We apply the same method for second- and third-level domains to detect the occurrence of the word, for example, xuxu(dot)youarepwned(dot)net). Our behavior-based analytics approach mixed with deep learning helps detect DGA before it causes any damage. In a previous post, our Chief Data Scientist Derek Lin discusses behavioral modeling for DGAs using a random forest model. For example, in the case of ransomware, the attacker will encrypt the files and request encryption keys and send sensitive data. Any malicious request in addition to a random DGA will provide substantial evidence for abnormal activity that can be further investigated. Inputs include logs that contain domain access information, such as DNS NX domain logs, DNS request/response logs, WHOIS information, passive DNS traffic, proxy traffic and EDR activity.

    Conclusion

    Signature-based detection is not considered an effective measure against DGAs because of the rapid changes in the algorithm. In addition to techniques like entropy change, frequency analysis and Markov chains Exabeam provides extensive detection techniques for behavior analytics using n-gram and machine learning. With the evolution of DGA techniques, it will be challenging to predict an adversary’s action and underscores the need to better prepare ourselves by sharing intelligence and working collaboratively.

    Further reading

    Learn More About Exabeam

    Learn about the Exabeam platform and expand your knowledge of information security with our collection of white papers, podcasts, webinars, and more.

    • White Paper

      Using MITRE ATT&CK® in Threat Hunting and Detection

    • Podcast

      Are You Relying on the Right Tools?

    • Blog

      Can You Detect Intent Without Identity? Securing AI Agents in the Enterprise 

    • Blog

      Securing the Future of Work: Agent Behavior Analytics with Google Cloud

    • Brief

      Exabeam and Google Cloud: Securing AI Agents and LLM Usage With Behavioral Analytics

    • Data Sheet

      Exabeam Success Services

    • Show More