Skip to main content

data science

Is this Chad's Personal E-mail Address? A Data Exfiltration Context

Data exfiltration is a common, multi-faceted security threat every enterprise faces. It’s defined as the unauthorized transfer of private data or intellectual property from a corporate computer to an external location. One way such illegitimate data transfer occurs is through the e-mail channel. The chance of a disgruntled or a departing employee e-mailing confidential data to their personal account is all too easy. How can this scenario be addressed? Several existing security products attempt to[…]

Topics: data science

Announcing Exabeam Advanced Analytics Version 3.3

We are thrilled to announce the general availability of the latest version of Exabeam Advanced Analytics (AA), our User and Entity Behavior Analytics solution.  Advanced Analytics version 3.3 helps our customers: Obtain deeper insight into user activity Streamline workflows across multiple Exabeam solutions Leverage their own data science algorithms for analytics Exabeam Advanced Analytics Version 3.3 Key features Dynamic Peer Grouping – capability examines a user’s behavior compared to their active directory (AD) peers helps[…]

Topics: data science, Product release

Machine Learning SDK for Security Analytics

Once I was asked by an aspiring data scientist what the challenges are in getting into the field of user and entity behavior analytics (UEBA).  After all, data scientists have been applying their skills successfully across many industries.  Yet, I believe security analytics poses some challenges representing high barriers of entries for a data scientist new to the area.  First, there is the obvious need to collect and process the 3Vs (volume, variety, and velocity)[…]

Topics: data science

Sharpening First-Time Access Alert for Insider Threat Detection

Residents participating in a neighborhood crime watch look out for signs of suspicious activity.  A new car parked on the street is probably the first thing to register in a resident’s mind.  Other hints like the time of day, what the driver carries, or how he loiters around all add up before one decides to call the police.  A User Behavior Analytics (UBA) system works much the same way, with various statistical indicators jointly working[…]

Topics: data science

Anomalous User Activity Detection in Enterprise Multi-Source Logs

Network users’ activities generate events every day.  Logged events collected from multiple sources are valuable for user activity profiling and anomaly detection.  A good analytics use case for insider threat detection is to see if a user’s collection of events today is anomalous to her historical daily collections of events.  In an earlier blog, I highlighted a method to address this use case that leverages distributed computing built on HDFS and Apache Spark.  In this[…]

Topics: data science

Account Resolution via Market Basket Analysis

Machine learning and statistical analysis have many practical applications in the detection of malicious user and entities as part of  User & Entity Behavior Analytics (UEBA) solutions.  Threat detection typically garners attention, this is as true on the show floor of security conferences, as it is for the text of marketing material.  Equally important, although less mentioned, is the application of machine learning for context estimation. Contextual information such as whether the machine is a[…]

Topics: data science

User Behavior Anomaly Detection Meets Distributed Computing

User Entity Behavior Analytics (UEBA) analyzes log data from different sources in order to find anomalies in users’ or entities’ behaviors. Depending on enterprise sizes and available log sources, data feeds can range from tens of gigabytes to terabytes a day. Typically, we need 30 days, if not more, to build proper behavior profiles. This calls for an analytics platform that is capable of ingesting and processing this volume of data. In this blog, I[…]

Topics: data science

Too Many Alerts… Just Give Me the Interesting Ones!

Security analysts often wrestle with the high volume of alerts generated from security systems and much like the protagonist in The Boy Who Cried Wolf, many alerts tend to be ignored. Human analysts quickly learn to ignore repeated alerts in order to focus on the interesting ones.  Learning to screen out repeated alerts as false positives allows analysts to focus their finite time where it matters most. A natural question, then, is whether we can[…]

Topics: data science, SECURITY

Ransomworm: Don’t Cry – Act.


In July last year, we released our research report on the Anatomy of a Ransomware attack in which we looked into both the financial model of ransomware and then detection as it unfolds. Due to the recent WannaCry ransomware craze, we think it’s time to revisit. When we addressed ransomware last year, we made a significant comment about the ever-evolving nature of malicious software. We predicted that in the near future (evidently now) ransomware will move[…]

Topics: data science, ransomware, SECURITY, SIEM, Uncategorized

A Machine Learning Study on Phishing URL Detection

Many network attack vectors start with a link to a phishing URL. A carefully crafted email containing the malicious link is sent to an unsuspecting employee. Once he or she clicks on or responds to the phishing URL, the cycle of information loss and damage begins. It would then seem highly desirable to nip the problem early by identifying and alerting on these malicious links. In this blog, I’ll share some research notes here on[…]

Topics: data science, SECURITY