InfoSec Trends › Is this Chad’s Personal E-mail Address? A Data Exfiltration Context

Is this Chad’s Personal E-mail Address? A Data Exfiltration Context

Published
February 14, 2018

Reading time
4 mins

Data exfiltration is a common, multi-faceted security threat every enterprise faces. It’s defined as the unauthorized transfer of private data or intellectual property from a corporate computer to an external location.

One way such illegitimate data transfer occurs is through the e-mail channel. The chance of a disgruntled or a departing employee e-mailing confidential data to their personal account is all too easy.

How can this scenario be addressed?

Several existing security products attempt to thwart illegitimate data transfers. For example, data loss prevention (DLP) solutions detect the presence of sensitive data inflight or at rest—usually by matching it against predefined signatures. Despite such offerings, detecting unauthorized data transfer remains an enterprise security challenge, particularly in relation to the volume of false positives they generate.

Additional contextual information is needed to help calibrate data transfer alerts and reduce false positives. For example, is an external address to which data is being emailed a personal account? E-mail from [email protected] to [email protected] should raise an analyst’s eyebrows and serve as evidence of risk.

But how can you match Chad with his external email accounts? Short of reading such data from known HR records (if they exist), one method is to mine it from historical e-mail records—a data science problem.

A data exploration exercise

A typical data science approach has two phases. The first is an exploration phase, where an examiner “feels” the data to gain intuition about it. The second phase is to engineer machine learning algorithms. At Exabeam, such exploration has enabled us to develop classification heuristics to determine if two e-mail addresses belong to the same person.

String-matching method

Most users adopt conventions in naming their personal email accounts. One observation is that a user’s “handle” is often based on their first and/or last name. To leverage this, naming variants can be evaluated for their similarity to an external address. For example, Chad might use chad_doe, chadd, cdoe, or chad.doe.

This approach is similar to domain name permutation used in identifying phishing sites. With an effectiveness rating of about 10%, it catches the low-hanging fruit in relation to linking users with external addresses. And it yields near-zero false positives.

Behavior-based method

But not all personal e-mail addresses use actual names as their root; many bear no correlation. Here, historical e-mail records can be used to determine whether there is sufficient behavioral fingerprints to link a corporate sender to an external receiver (e.g., [email protected] to [email protected]). Insights gleaned from data exploration in real environments have permitted Exabeam to develop such associative heuristics.

These are based on a variety of factors, including:

Frequency of communication between corporate and external addresses
Direction of communication between the addresses
Textual content within e-mail Subject: fields

Based on such factors, various metrics are used to classify whether a pair of work/personal addresses belong to the same person. One observed metric is whether a Subject: line contains a null string (supposing that one doesn’t bother completing the field when e-mailing oneself). Another is the ratio of messages between sender and receiver that are marked forwarded versus replied.

Such non-trivial data exploration enables Exabeam to develop useful metrics. Firm benchmark numbers are hard to come by, as there is no ground truth. Yet from the metrics we’ve constructed heuristics-based rules.

Consider the following: In an environment where no known volume of users have sent personal emails to themselves while at work, this method has matched up to 15% of the user base with personal e-mail addresses—and with near-zero false positives. Other enterprises may realize a different result depending on security practices in place. (Note: the heuristics rule threshold can be relaxed to claim more e-mail addresses—if a use case can tolerate some amount of false positives.)

In anomaly detection of malicious activity, context is the key in mitigating false positives. For detecting data exfiltration via the e-mail channel, DLP alerts can be prioritized by knowing whether a personal e-mail is involved. This is one way data analytics for context estimation can be used in Exabeam’s Advanced Analytics security offering.

Tags: Data Science,

See a world-class SIEM solution in action

Most reported breaches involved lost or stolen credentials. How can you keep pace?

Exabeam delivers SOC teams industry-leading analytics, patented anomaly detection, and Smart Timelines to help teams pinpoint the actions that lead to exploits.

Whether you need a SIEM replacement, a legacy SIEM modernization with XDR, Exabeam offers advanced, modular, and cloud-delivered TDIR.

Get a demo today!

Cookie	Duration	Description
cookielawinfo-checbox-analytics	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Analytics".
cookielawinfo-checbox-functional	11 months	The cookie is set by GDPR cookie consent to record the user consent for the cookies in the category "Functional".
cookielawinfo-checbox-others	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Other.
cookielawinfo-checkbox-necessary	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookies is used to store the user consent for the cookies in the category "Necessary".
cookielawinfo-checkbox-performance	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Performance".
viewed_cookie_policy	11 months	The cookie is set by the GDPR Cookie Consent plugin and is used to store whether or not user has consented to the use of cookies. It does not store any personal data.

r-tec DE

Featured Data Sheet

Exabeam Fusion

Featured Solution Brief

Exabeam Fusion on Google Cloud

Featured Resource

Exabeam Fusion on Google Cloud

r-tec DE

r-tec DE

Featured Data Sheet

Exabeam Fusion

Featured Solution Brief

Exabeam Fusion on Google Cloud

Featured Resource

Exabeam Fusion on Google Cloud

r-tec DE

Is this Chad’s Personal E-mail Address? A Data Exfiltration Context

A data exploration exercise

String-matching method

Behavior-based method

Similar Posts

Recent Posts

Stay Informed

See a world-class SIEM solution in action