Skip to main content

Sharpening First-Time Access Alert for Insider Threat Detection

Residents participating in a neighborhood crime watch look out for signs of suspicious activity.  A new car parked on the street is probably the first thing to register in a resident’s mind.  Other hints like the time of day, what the driver carries, or how he loiters around all add up before one decides to call the police.  A User Behavior Analytics (UBA) system works much the same way, with various statistical indicators jointly working together for insider threat detection.   Similar to seeing a new car on the street in a neighborhood watch, a frequently deployed indicator among UBA vendors is whether a user accessed a network entity (“entity”) for the first time, be it a machine, a network zone, or an application.  Indeed, alerts from this indicator correlate well with malicious insider or compromised account activities.  This makes sense, particularly for detection of lateral movement where an adversary is new to the network, accessing various entities for the first time.

What is the down side for such alerts?  The false positive rate.  User behavior is highly dynamic on a network, there will be legitimate user activities triggering such indicators.  A high false positive rate will render this indicator less than useful.  In this blog, I’ll sketch out how we can use machine learning to sharpen the precision of this indicator by reducing its false alert rate.  For more detail, the research behind this work will appear as a technical paper in the Intelligence Security Informatics Workshop in the upcoming International Conference of Data Mining 2017 in November.

The intuition is that a user’s first-time access to an entity can be predicted based on how his peers historically access network entities.  If the first-time access could have been predicted or expected, then we do not need to raise an alert on the user.  This reduces the false positives.  As an analogy in a neighborhood watch, if we could have expected a new car to park on this street at this hour because neighbors have reported similar recent instances, then a new parked car now won’t register much concern.  Similarly, the problem here is to predict a user’s first access by leveraging information from others’ past access records.  It is not surprising we frame this as a classical recommender system problem.

A Recommender System Based Approach

If you are not familiar with the recommender system, I briefly describe it here.  In retail analytics, when making a purchase suggestion the recommender system uses behavior and contextual data from other users in order to predict a user’s preference for an item.  For example, if I and some other users have similar movie watching experiences in the past (behavior data) or if I and some user group are in the same age bracket (contextual data), I am likely to be recommended a movie which these other viewers have watched but I haven’t.   Among the recommender system modeling alternatives, factorization machine is the preferred choice for its ability to incorporate both behavior and context information together.  It is beyond the scope of this blog to describe the factorization machine algorithm.  Refer to this paper from Rendle, et.al for detail.

Knowing what learning algorithm to apply is only part of the work.  Like in many machine learning tasks, suitable data conditioning work is required prior to its use as input for learning.  Here I highlight at least two issues to address.

Issues

First, user-to-entity access on a network is very sparse.  Typical users only access a handful of entities.   It is well known that data sparseness poses difficulty to a recommender system.  To reduce the data sparseness, users who accessed very few entities, as well as entities which were accessed by very few users, are simply removed from learning.  We do not have enough historical behavioral information to learn from them.  This means, for such users we simply let their first-access alerts stand.  Only users for whom we have enough information are evaluated for potential alert suppression on their activities.

Second, the recommender system learning is a process of gaining information from the past user-to-entity access and non-access records in history.  However, where users had no access records to entities does not mean our recommender system should blindly learn from it.  A user and an entity without an access connection is due to:

  • The user has no access privilege due to security policy.
  • The access is allowed but she has not done so yet in history (but may in the future; these are the accesses which we want to predict).

Since this is a supervised learning method, each sample for learning, eg. a user-to-entity tuple, must be labeled as “have access” or “not have access”.  We assign tuples with observed access as “have access”; however, we can’t blindly assign all other tuples as “not have access”.   Doing so would have left us no room for any future prediction as all possible user-to-entity access are now all accounted for in the history.  To allow meaningful learning and prediction, this issue is successfully addressed by carefully sampling some tuples and only then assigning them with the “not have access” label for learning, leaving all the other tuples available for future prediction.  Please refer to our paper for detail.

Before I talk about the results, it is interesting to see the effect of using different types of contextual data for the factorization machine-based recommender system modeling.  Contextual data such as peer groups helps the model to predict whether a user’s access to an entity is expected if some of her peers had already accessed the entity in history.  A few experiments were run.  As it turned out, the peer data defined as the combination of the user’s Windows security group (eg. “memberOf” key in Windows LDAP data) and the department information help the model yield the best prediction error, compared to the alternative where there is no context data leveraged or where just the Windows security group is used. 

Evaluation Data

Now let’s see how the method performs to suppress the false positives from raised alerts.  We’ll look at the first-time access alerts on users accessing 4 different types of network entities:

  • NKL: machines accessed by users via NTLM or Kerberos authentication protocol
  • RA: machines remotely accessed by users for file share or printer connection
  • NZ: network zones accessed by users
  • EP: endpoint processes run by users

The method makes a classification decision whether an alert should be kept or otherwise suppressed.  To quantitatively evaluate this classifier, we need alerts pre-labeled as legitimate or as malicious. Yet, known and validated malicious insider threat activity on an enterprise is either non-existent or hard to obtain, while nearly all network activity at any given time is legitimate.

Evaluation Setup

To allow an evaluation, we assume all observed alerts from logged data are of legitimate user behavior; these are negative-labeled samples.  This assumption is generally valid as nearly all such alerts from any given data period are typically due to the dynamic nature of user behavior on a network.  Ideally if the recommender system works as intended, the prediction scores for these are close to 1 – (closer to one means that the access is expected, while closer to zero means the access is not expected).

Positive-labeled samples are synthesized to mimic malicious behavior reflecting scenarios in which either a masquerading attacker does not know his way around the network or an ill-intentioned insider connects to network entities inconsistent to a victim’s and her peers’ habitual patterns.  The synthesis is performed via the negative sampling technique by pairing up a randomly but carefully selected entity to a user as if the user just had a first-time access.  An example selection criterion is that the selected entity to pair up with a user is one that has not been accessed by the user herself or by her peers in the past.  Ideally if the recommender system works as intended, the prediction scores for these are close to 0 – the access is not expected.

Results

Figure 1 shows the ROC curves of the classifier performance on alerts raised on users accessing the four entity types.  An ROC curve shows the tradeoff in the detection rate on the positive-labeled alerts vs. the false positive rate (or the alert suppression rate) on the negative-labeled alerts.  As we can see, for alerts over the RA, NZ, and NKL entity types, we can get near 50% of alert suppression rate while maintaining near 100% detection rate.  This is very good.  The performance for the EP alert type is still good but is somewhat inferior to others, having about 90% of detection rate at the same 50% suppression rate.  This is perhaps explained by the fact that in real-world situations endpoint processes exist that are run by many users, normal users and attackers alike.  The recommender system errs on the side of expecting such processes to be run by users too frequently, inadvertently suppressing more alerts than desired, including those from synthesized malicious access.

Figure 1. ROCs of four alert types 

To summarize, the user-to-entity first-access indicators for alerting are popularly deployed in UBA system for their ease of interpretation and good correlation with malicious activities. I showed a factorization machine-based recommender system method to reduce false positives from these alerts.  Context data from peer groups are considered in learning.  I pointed out some issues that must be addressed before applying the learning.   With real world data, alerts from users accessing four different entity types are evaluated.  The method is shown capable of reducing false positive alerts of the first-access indicator while maintaining its detection rate.

Work such as this is important for UBA.  All statistical indicators have varying degrees of false positives.  When an effort is made to improve the precision rate for each indicator, the overall anomaly cases presented to analysts will have good precision, reducing the false positive fatigue.  I will share more examples of false positive reduction methods for other classes of UBA indicators in future blogs.

Leave a Reply

Your email address will not be published. Required fields are marked *

Topics: data science
2017