How Criminals Can Build a “Web Dossier” from Your Browser - Exabeam

How Criminals Can Build a “Web Dossier” from Your Browser

Published
March 05, 2018

Author

Reading time
13 mins

All kinds of personal information, from your location, work hours, habits, banks, applications, and even passwords are there for the taking.

Web browsers store an incredible amount of sensitive information about you. Website developers have a variety of ways of using modern browsers to customize the experience for users. Advertisers also use these features to maximize the impact of ads shown on sites. The result is that a lot of information about you is stored deep in your browser, and it can potentially be exploited by cyber criminals in a number of ways. This blog will describe what we call the “web dossier” that can be created from these artifacts, how this profile can be exploited, and what you can do to protect yourself.

How Browsers Store Information

There are several types of data that your browser stores. We used five data types to construct our web dossier:

  • Visited Sites: information about web pages that a user browsed to, including information like URL, page title, and timestamp.
  • HTTP Cookies: small pieces of data sent from a website and stored on the user’s computer by the user’s web browser while the user is browsing.
  • LocalStorage: introduced with HTML5, these are an upgraded form of cookie that also allows more data stored locally, as the name suggests.
  • Saved login info: modern browsers all have some type of password manager where login information for various sites is stored in a single place. Note that cookies can also be used for authentication, commonly when users tick the “Remember Me” or “Keep Me Logged In” boxes on login pages.
  • Autofill: an option found in web browsers that allows the browser to fill out commonly-entered information in a web form for you.

Web browsers temporarily store parts of web pages in the browser cache. While not part of this study, the cache can contain images, JavaScript code, HTML files, and more. Cached items often expire and are deleted relatively quickly, but are rich sources of information while they last.

How We Did Our Research

To find out what information is stored locally in a browser, we visited the most popular sites on the internet, using the Alexa Top 1000 list as our guide. We used OpenWPM, a privacy measurement framework built on Firefox, with a few modifications. Using the OpenWPM framework, we had a Firefox browser visit the Alexa Top 1000, navigate to three links on each of the websites, using time delays to simulate a user browsing. This initial crawl did not log in to any of the sites, so no information about user accounts was expected. Instead, the focus of the analysis of the collected data was to search for identifiers of the device’s and user’s physical location. This type of geolocation is used by site owners to customize the experience, load balance traffic, and to serve specific ads to different regions.

In the second phase, we wanted to test what evidence of user accounts and actions on popular web apps could be found in the local browser files. In order to do this, we needed to create accounts on these sites, log in, perform a relevant action (e.g., send an email on a webmail server, view a document on a cloud storage platform, etc.), and see what traces could be found. For this phase, we used Google’s Chrome web browser, since it is the most widely used browser in the world at this time, and manually performed all the actions on the websites, in order to generate artifacts that are the most representative of real world users.

We selected the following subset of domains that we felt were common to many user profiles. Since tax day in the U.S. is coming up soon, we also included irs.gov (Alexa rank 99) in the subset.

  • google.com
    • drive.google.com
    • mail.google.com
  • youtube.com
  • facebook.com
  • reddit.com
  • yahoo.com
  • amazon.com
  • twitter.com
  • live.com
    • outlook.live.com
    • onedrive.live.com
  • instagram.com
  • netflix.com
  • linkedin.com
  • apple.com
  • whatsapp.com
  • paypal.com
  • github.com
  • dropbox.com
  • irs.gov

Findings

In the first phase of our research, we found 56 websites stored some level of geolocation information about the user on their local system, and 57 recorded a user’s IP address. These included popular ecommerce sites like Alibaba and Walmart; payment sites like American Express; news sites like CNN and USA Today; web plugins like the blog comment system Disqus; and others.

HTTP Cookies

Domain IP City Zip State Country
walmart.com X X X
zoom.us X
alibaba.com X
okta.com X X X X
cambridge.org X
disqus.com X

LocalStorage (HTML5 Cookies)

Domain IP City Zip State Country Timestamp
cnn.com X X X
taobao.com X X X
ibm.com X
upwork.com X
box.com X
nba.com X X X
usatoday.com X X X
cdiscount.com X X X
autodesk.com X
theatlantic.com X X
telegraph.co.uk X X X X
nytimes.com X X X X
cbssports.com X X X X

Table 1: Popular websites that track user IP address and/or geolocation using HTTP Cookies or LocalStorage (HTML5 Cookies)

For the second phase, we were able to extract a number of potentially sensitive items from popular services, including account usernames, associated email addresses, search terms, titles of viewed emails and documents, and downloaded files. Table 2 below shows some of the more notable examples.

Site Finding Type Artifact Type
mail.google.com Email Address Page Title
Mail Folder Name Page URL
Read Email Subject Page Title
Attachment Downloaded Page URL
Clicked Link in Email Page URL
google.com Search Query Terms Page URL
google.co.uk Maps Search Terms Page URL
amazon.com Account email address Browser Autofill
amazon.co.uk Products Viewed Page URL / Title
Search Queries Page URL
dropbox.com Account email address Browser Autofill
File Folder Structure Page URL
Viewing a File Page Title
Downloading a File Page URL
irs.gov (Direct Pay) First and Last Name Browser Autofill
Physical Address Browser Autofill
Email Address Browser Autofill
Payment Date Browser Autofill
Payment EFT Number Browser Autofill

Table 2: Potentially sensitive items extracted from local browser files for these popular websites

A more detailed table showing more sites is available at the end of this post.

In addition to these site actions, if a user chose to have the browser save their password for them using the built in password managers, we were able to extract those saved usernames and passwords for all sites tested. This isn’t a weakness of the website, but rather of the default password managers built into web browsers. For any website you visit, if you store your credentials in the Chrome password manager, your credentials would be available to criminals using the technique described below.

How Attackers Can Gain Access

For our research, we studied the data stored locally in Firefox and Chrome browsers. So, how can attackers access this information? As it turns out, pretty easily.

Creating malware to harvest information stored in a browser is quite straightforward, and variants have been around for years, including the Cerber, Kriptovor, and CryptXXX ransomware families.

The free NirSoft tool WebBrowserPassView dumps saved passwords from Internet Explorer, Mozilla Firefox, Google Chrome, Safari, and Opera. While ostensibly designed to help users recover their own passwords, it can be put to nefarious use. The recent ‘Olympic Destroyer’ malware used to disrupt the Pyeongchang Olympic Games reportedly took advantage of user credentials saved in the browser.

Another concern is anyone working on a shared computer or in a shared workspace. If a machine is unlocked, extracting browser data for analysis could be done in seconds with the insertion of a USB drive running specialized software or click of a web link to insert malware.

Many internet users may presume their passwords are stored safely by their browser. While it is true that browsers encrypt passwords, these are decrypted when used by the browser, and can be accessed by any process. Browsers often use host operating system APIs to protect saved passwords. Access to these are not exclusive to the browser, which is what the NirSoft tool and various malware exploits.

Web Dossier Exploits

There are a number of ways an attacker could exploit information compiled into a web dossier.

Account Discovery

An attacker could compile a list of applications you commonly log into from your URL history, including work applications and personal finance sites. Criminals can learn who in a company has access to the financial or payroll application, for example, and compile a list of usernames to use to break in. Knowing what applications are in use at a company can help an attacker craft more convincing phishing emails to try and trick users into exposing their passwords, which the attacker could then harvest.

It would also be simple for an attacker to learn the name of your bank, online broker, and retirement fund manager. In some instances, we were able to recover bank account numbers used to transfer funds to other banks. On a local tax collection site here in San Mateo County (not in the Alexa Top 1000, but in the county where Exabeam is located) we were able to find parcel numbers from tax filings – an easy way to identify the property owned by the filer.

And, of course, if you save your username and password in webforms or the browser’s password manager, as was shown above, your credentials are available.

Location History

Did you know keeping up with your favorite sports team could give away your location? We were able to extract different levels of geolocation indicators, including IP address, from a wide array of popular websites, including nba.com and cbssports.com. News sites, including cbsnews.com, cnn.com, usatoday.com, foxnews.com, telegraph.co.uk, nypost.com, and nytimes.com, also store information about a user’s location on that user’s local machine. Extracting historical location information from a web browser can paint a picture of a user’s habits and past activities. By extracting similar types of information from a broad range of websites, investigators can get multiple data points to help corroborate different geolocation data points. So an attacker can determine when you are at work and when you are at home, for example. Or if you work at a classified facility, as was recently discovered with Strava’s fitness tracker.

User Interests

Of course, with access to your URL history, an attacker can learn about your personal interests quite easily. There are two ways an attacker could manipulate this information. First, it is well known that attackers use hobbies to guess passwords. Second, if your hobbies or interests are controversial, unusual or even illegal, you may fall victim to online blackmail. And lastly, with the unfortunate rise of cyberbullying, especially among teens, a web dossier could be used to expose or embarrass the victim.

Device Discovery

Modern browsers offer the option of a consistent experience to users, no matter what device they are using. Because of this, it can be possible to extract information about what other devices a user owns by examining browser history. Some browsers explicitly sync records from multiple devices to each other, and some make use of “casting” or other screen sharing methods to communicate with other devices. By looking at this information, it may be possible to find a device that a user is trying to keep hidden, or to connect a personal machine to a work machine.

Figure 1: The Chrome database can be queried to find all devices connected to a user account.

How to Protect Yourself

Given that the most serious threat comes from criminals accessing your browser data via malware, the most important thing you should do is make sure you have endpoint protection, also known as anti-virus, software on your computer. This should stop most of the malware aimed at harvesting your information to create a profile.

If you are still concerned about someone accessing your machine – either a remote attacker who has gotten through or someone in your workplace or public space – then here are some ways to protect yourself. Note that they all come with some inconvenience, so we’ve noted the pros and cons.

Measure Artifacts Affected Pro Con
Incognito Mode All No* local artifacts saved from session for local attackers to exploit No browsing history means you won’t have customized sites, no saved logins, and fewer relevant suggestions.
Disable All HTTP Cookies HTTP Cookies No saved cookies to exploit Many websites will have issues, especially if a user needs to log in.
Disable 3rd Party HTTP Cookies HTTP Cookies Disables cookies from some advertising and tracking networks, with less disruption to usability Not all trackers are 3rd party, so some will still be available to exploit. Lots of valuable information is stored in 1st party cookies.
Disable Autofill Autofill Form history is not saved locally for potential exploitation. Users need to retype common information on websites.
Disable/Don’t Save Logins in Browser Saved Logins Not available for attackers to collect. Users need to remember logins and may use less secure/common ones. Third party password managers, such as, LastPass, may mitigate this.
Regularly Clear All (or select) Browsing Artifacts All (or select) Lessens the amount and length of data available for attackers. Less history is available for the browser to use to help with suggestions, or for the user to search for past things they have looked at.
Set a master password when using the browser’s default password manager (not available in Chrome) Saved Logins Attackers cannot view/dump saved passwords without knowing the master password Users need to type this password in to “unlock” their saved password, but only occasionally.

This option is not available in Chrome.

Use a 3rd-party Password Manager (LastPass, KeePass, 1Password, etc) Saved Logins Harder for attackers to access than the built-in password managers.

Have more advanced features that encourage better password practices by user.

Password managers are not perfect and can have vulnerabilities.

For cloud-based password managers, users are sending password information off to a 3rd-party and trusting that it is secure and confidential.

*Incognito mode eliminates the majority of artifacts, but there still are some. These traces are much fewer in number, are harder to get value from, and would take advanced analysis to recover. In addition, users making customizations (like adding extensions) can allow artifacts to be created in Incognito mode.

Browsers store many artifacts to make browsing and buying on the web easier, but collectively, this information can be mined, aggregated and used to create a profile many users may not realize. Ensuring endpoint protection and not leaving machines unlocked in public spaces are essential. Users should also consider changing browser settings to further protect their privacy. If users would like to learn more about how they can secure themselves, there is an excellent free guide put together by security experts on how users can browse more safely called “HowTo: Privacy & Security Conscious Browsing”.

Read the detailed site findings here

Similar Posts

Generative AI is Reshaping Cybersecurity. Is Your Organization Prepared?

British Library: Exabeam Insights into Lessons Learned

Beyond the Horizon: Navigating the Evolving Cybersecurity Landscape of 2024




Recent Posts

What’s New in Exabeam Product Development – March 2024

Take TDIR to a Whole New Level: Achieving Security Operations Excellence

Generative AI is Reshaping Cybersecurity. Is Your Organization Prepared?

See a world-class SIEM solution in action

Most reported breaches involved lost or stolen credentials. How can you keep pace?

Exabeam delivers SOC teams industry-leading analytics, patented anomaly detection, and Smart Timelines to help teams pinpoint the actions that lead to exploits.

Whether you need a SIEM replacement, a legacy SIEM modernization with XDR, Exabeam offers advanced, modular, and cloud-delivered TDIR.

Get a demo today!