The primary objective of an incident response plan is to respond to incidents before they become a major setback. As the frequency and types of data breaches increase, the lack of an incident response plan can lead to longer recovery times and increased cost.
In this post you will learn about:
- What is incident response?
- Why is incident response important?
- The three elements of incident response management
- A six-step incident response plan
- Incident response team
- Incident response tools
What is Incident Response?
Incident response is an approach to handling security breaches. The aim of incident response is to identify an attack, contain the damage, and eradicate the root cause of the incident. An incident can be defined as any breach of law, policy or unacceptable act that concerns information assets, such as networks, computers, or smartphones.
Why is incident response important?
When your organization responds to an incident quickly it can reduce losses, restore processes and services, and mitigate exploited vulnerabilities. An incident that is not effectively contained can lead to a data breach with catastrophic consequences. Incident response provides this first line of defense against security incidents, and in the long term, helps establish a set of best practices to prevent breaches before they happen.
The Three Elements of Incident Response Management
Optimal management of incident response should include:
1. A comprehensive plan
An incident response plan should prepare your team to deal with threats, indicate how to isolate incidents and identify their severity, how to stop the attack and eradicate the underlying cause, how to recover production systems, and how to conduct a post-mortem analysis to prevent future attacks. Learn more about incident response plans below.
2. The right people in place
Recruit the following roles for your incident response team: incident response manager, security analyst, IT engineer, threat researcher, legal representative, corporate communications, human resources, risk management, C-level executives, and external security forensic experts. Let all employees know what their responsibilities will be in the event of an attack. Learn more about the incident response team below.
Many vendors offer tools which handle security incidents on a large scale, instead of investigating one issue at a time. These tools analyze, alert about, and can even help remediate security events which could be missed due to insufficient internal resources.
Incident response tools work alongside current security measures. They obtain information for response via Netflow, system logs, endpoint alerts, and identity systems to assess security-related anomalies in the network. These tools can investigate threats including:
- Malware infections
- Password attacks
- Data leakage
- Abuse of privileges
- Other insider threats
Element #1: Six-Step Incident Response Plan
The Computer Security Incident Response Team (CSIRT) carries out the incident response plan. The incident response team includes IT staff with some security training or full-time security staff. These individuals analyze information about an incident and respond. They respond to two types of incidents: public and organizational.
Public incidents affect an entire community: for example terrorism, natural disasters, large-scale chemical spills, and epidemics. Organizational incidents are confined to a single organization. They may be physical, such as a bomb threat, or computer incidents, such as accidental exposure, theft of sensitive data, or exposure of trade secrets.
The incident response team also communicates with stakeholders within the organization, and external groups such as press, legal counsel, affected customers, and law enforcement. The SANS Institute’s Incident Handlers Handbook defines a six-step process for handling security incidents.
Here are steps your incident response team should take to prepare for cybersecurity incidents:
- Develop policies to implement in the event of a cyber attack
- Review security policy and conduct a risk assessment
- Prioritize security issues, know your most valuable assets and concentrate on critical security incidents
- Develop a communication plan
- Outline the roles, responsibilities, and procedures of your team
- Establish a corporate security policy
- Recruit and train team members, ensure they have access to relevant systems
- Ensure team members have access to relevant technologies and tools
Decide what criteria calls the team into action. A few examples of security incidents are detection of malware on corporate systems, a phishing attack, or a denial of service attack. A cumulative set of events could call a plan into action: for example, an unusual upload to a cloud storage site and an abnormal access alert in the same few hours.
IT systems gather events from monitoring tools, log files, error messages, firewalls, and intrusion detection systems. This data should be analyzed by automated tools and security analysts to decide if anomalous events represent security incidents.
When an incident is isolated it should be alerted to the incident response team. Team members coordinate the appropriate response to the incident:
- Identify and assess the incident and gather evidence.
- Decide on the severity and type of the incident and escalate if necessary.
- Document actions taken, addressing “who, what, where, why, and how.” This information may be used later as evidence if the incident reaches a court of law.
Once your team isolates a security incident, the aim is to stop further damage. This includes:
- Short term containment—an instant response, so the threat doesn’t cause further damage. This can include taking down production servers which have been hacked or isolating a network segment that is under attack.
- System Backup—you should back up all affected systems before you wipe and reimage them to take a forensic image. A forensic image is a bit-for-bit copy of a hard disk, or a specific disk partition. Disk images are created after an incident to maintain the state of a disk at a specific point in time and thus provide a static ‘snapshot’, which you can use as evidence of the security incident, and to investigate how the system was compromised.
- Long term containment—temporarily fix affected systems so they can be used in production. While this takes place, rebuild clean systems so you can bring them online in the recovery stage. Take measures to prevent the incident from recurring or escalating: install security patches on affected and associated systems, remove accounts and backdoors created by attackers, alter firewall rules and change the routes to null route the attacker address, etc.
Contain the threat and restore initial systems to their initial state, or close to it. The team should isolate the root cause of the attack, remove threats and malware, and identify and mitigate vulnerabilities that were exploited to stop future attacks. These steps may change the configuration of the organization. The aim is to make changes while minimizing the effect on the operations of the organization. You can achieve this by stopping the bleeding and limiting the amount of data that is exposed.
This is done as follows:
- Identify and fix all affected hosts, including hosts inside and outside your organization
- Isolate the root of the attack to remove all instances of the software
- Conduct malware analysis to determine the extent of the damage
- See if the attacker has reacted to your actions
- Anticipate a different type of attack and create a response
- Allow time to make sure the network is secure and that there is no further activity from the attacker
Ensure your team has removed malicious content and checked that the affected systems are clean. For example, if the attacker used a vulnerability, it should be patched, or if an attacker exploited a weak authentication mechanism it should be replaced with strong authentication.
Ensure that affected systems are not in danger and can be restored to working condition. The purpose of this phase is to bring affected systems back into the production environment
carefully, to ensure they will not lead to another incident. Ensure another incident doesn’t occur by restoring systems from clean backups, replacing compromised files with clean versions, rebuilding systems from scratch, installing patches, changing passwords and reinforcing network perimeter security (boundary router access control lists, firewall rulesets, etc).
Consider how long you need to monitor the network system, and how to verify that the affected systems are functioning normally. Calculate the cost of the breach and associated damages.
6. Lessons Learned
The incident response team and partners should communicate to improve future processes. Complete documentation that couldn’t be prepared during the response process. The team should identify how the incident was managed and eradicated.
See what actions were taken to recover the attacked system, the areas where the response team needs improvement, and the areas where they were effective. Reports on lessons learned provide a clear review of the entire incident and can be used in meetings, as benchmarks for comparison or as training information for new incident response team members.
Element #2: The Incident Response Team
To prepare for and attend to incidents, you should form a centralized incident response team, responsible for identifying security breaches and taking responsive actions. The team should include:
- Incident response manager (team leader)—coordinates all team actions and ensures the team focuses on minimizing damages and recovering quickly. Prioritizes actions during the isolation, analysis, and containment of an incident. Oversees all actions and guides the team during high severity incidents.
- Security analysts—the manager is assisted by a team of security analysts who work across departments to isolate and rectify flaws in the organization’s security systems, solutions, and applications. They recommend specific measures to improve the overall security posture.
- Lead investigator—isolates root cause, analyzes all evidence, manages other security analysts and conducts rapid system and service recovery.
- Threat researchers—provide the context of an incident and threat intelligence. They use this information and records of previous incidents to create a database of internal intelligence. On many security teams, threat researchers are gradually replaced by automated threat intelligence tools.
- Communications lead—communicates with all audiences inside and outside the company, including management, internal stakeholders, legal, press, and customers.
- Documentation and timeline lead—documents team investigation, discovery and recovery efforts. And, creates a timeline for each stage of the incident. Next-generation Security Information and Event Management (SIEM) systems are able to generate documentation and incident timelines automatically. For example, see the Exabeam Advanced Analytics module offered by the Exabeam Security Management Platform.
- HR/legal representation—an incident could develop into criminal charges. Thus you should have HR and legal guidance.
Goals of the Incident Response Team
The goal of the incident response team is to coordinate team members and resources during a cyber incident to minimize impact and quickly restore operations. This includes:
- Analysis—document the extent, priority, and impact of a breach to see which assets were affected and if the incident requires attention.
- Reporting—tell team members of reporting procedures. Gather relevant trending data to show the importance of the incident response team.
- Response—explore root causes, record findings and carry out recovery strategies and communicate the status of your organization to team members.
In modern Security Operations Centers (SOCs), advanced analytics plays an important role in identifying and investigating incidents. User and Entity Behavioral Analytics (UEBA) technology if used by many security teams to establish behavioral baselines of users or IT systems, and automatically identify anomalous behavior. This makes it much easier to security staff to identify events that might constitute a security incident.
Read our in-depth blog post: Why UEBA Should Be an Essential Part of Incident Response.
Which Qualities Should You Look for When Selecting Incident Response Team Members?
When assembling an incident response team consider:
- Availability—an incident response team should be available around the clock all days of the year. You may also need staff members to be physically on site during an incident, so choosing staff that live close to the office is an advantage.
- Virtual or volunteer team members—you may lack the resources to assign full-time responsibilities to all team members. Consider having some members form a ‘virtual’ incident response team. These team members can be called upon when an emergency occurs.
- Effective advocate or executive sponsor—a person at the level of a CISO who can communicate the effect of an incident to other executive members. This individual should also ensure the incident response team receives an appropriate budget and maintains the authority to respond quickly during a crisis.
- Monitor and bolster employee and team morale—your team may find it difficult to be on call all the time. They may lose focus and motivation. You can prevent staff burnout by granting opportunities for growth, learning and team building. You can also outsource some activities, which can reduce the workload and stress of the in-house team.
- Diversity—recruit technically diverse teams. You cannot expect team members to be experts in all areas of incident response. It is important to determine what skill gaps exist and to hire individuals who fill that gap.
Five Tips For Incident Response Team Members
1. Isolate exceptions
Technology alone cannot successfully detect security breaches. You should also rely on human insight. Following are a few conditions to watch for daily:
- Traffic anomalies—sensitive connections and servers used internally will typically have a stable traffic volume. If you notice a sudden increase in traffic, take notice.
- Accessing accounts without permission—privileged or administrator accounts have access to more information and systems than normal employees. However, employees tend to be the easiest entry point for cybercrime. Closely monitor privileged accounts and watch for privilege escalation on normal user accounts.
- Excessive consumption and suspicious files—if you see an increase in the performance of the memory or hard drives of your company, it could be that someone is illegally accessing them or leaking data.
Modern security tools such as User and Entity Behavioral Analytics (UEBA) automate these processes and can identify anomalies in user behavior or file access automatically. This provides much better coverage of possible security incidents and saves time for security teams. For example, see the Entity Analytics module, a part of Exabeam’s next-generation SIEM platform.
2. Use a centralized approach
Gather information from security tools and IT systems, and keep it in a central location, such as a SIEM system. Use this information to create an incident timeline, and conduct an investigation of the incident with all relevant data points in one place.
You can also use a centralized approach to allow for a quick automated response. Use data from security tools, apply advanced analytics and orchestrate automated responses on systems like firewalls and email servers, using technology like Security Orchestration, Automation, and Response (SOAR).
3. Assert, don’t assume
Don’t conduct an investigation based on the assumption that an event or incident exists. Instead of making assumptions, make assertions, based on a question that you can evaluate and verify. For example “If I’ve noted alert X on system Y, I should also see event Z occur in close proximity.”
Create your assertions based on your experience administering systems, writing software, configuring networks, building systems, etc., imagining systems and processes from the attacker’s eyes.
4. Eliminate impossible events
You may not know exactly what you are looking for. On these occasions eliminate occurrences that can be logically explained. You will then be left with the events that have no clear explanation.
- Unexplained inconsistencies or redundancies in your code
- Issues with accessing management functions or administrative logins
- Unexplained changes in volume of traffic (e.g., drastic drop)
- Unexplained changes in the content, layout, or design of your site
- Performance problems affecting the accessibility and availability of your website
5. Take post-incident measures
Continue monitoring your systems for any unusual behavior to ensure the intruder has not returned. Watch for new incidents and conduct a post-incident review to isolate any problems experienced during the execution of the incident response plan.
Incident Response Tools
|Incident response tool types||Why you need them||Tool examples|
|SIEM||Gathers and aggregates log data created in the technology infrastructure of the organization, including applications, host systems, network and security devices (e.g., antivirus filters and firewalls). Provides reports on security-related incidents, including malware activity and logins. It also sends alerts if the activity conflicts with existing rule sets, indicating a security issue.||Exabeam Security Management Platform (SMP) (including Data Lake, UEBA, Incident Responder), QRadar, USM, ESM|
|Intrusion Detection Systems (IDS) — Network & Host-based||Uses baselines or attack signatures to issue an alert when suspicious behavior or known attacks take place on a server, a host-based intrusion detection system (HIDS), or a network-based intrusion detection system (NIDS).||Snort, Suricata, BroIDS, OSSEC|
|Netflow Analyzers||Looks at actual traffic across border gateways and within a network. Netflow is used to track a specific thread of activity, to see what protocols are in use on your network, or to see which assets are communicating between themselves.||ntop, NfSen, Nfdump|
|Vulnerability Scanners||Isolates potential areas of risk, assesses the attack surface area of your organization for known weaknesses, and provides instructions for remediation. Vulnerabilities may be caused by misconfiguration, bugs in your own applications, or usage of third party components that can be exploited by attackers.||OpenVAS|
|Availability Monitoring||The aim of incident response is to limit downtime. A service or application outage can be the initial sign of an incident in progress. Availability monitoring stops adverse situations by studying the uptime of infrastructure components, including apps and servers. It tells the webmaster of issues before they impact the organization.||Nagios|
|Web Proxies||Controls access to websites and logs what is being connected. Many threats operate over HTTP, including being able to log into the remote IP address. The HTTP connection can also be essential for forensics and threat tracking.||Squid Proxy, IPFire|
Improving Incident Response Via Orchestration and Automation
One of the key steps in incident response is automatically eliminating false positives (events that are not really security incidents), and stitching together the event timeline to quickly understand what is happening and how to respond.
Exabeam offers a next-generation Security Information and Event Management (SIEM) that provides Smart Timelines, automatically stitching together both normal and abnormal behaviors. This helps investigators accurately pinpoint a series of anomalous events, along with its associated assets, users, and risk reasons, all attached to a single timeline.
This automatic packaging of events into an incident timeline saves a lot of time for investigators, and helps them mitigate security incidents faster, significantly lowering the mean time to respond (MTTR).
What metrics are needed by SOC Analysts for effective incident response?
- Mean Time to Detect (MTTD)—the effectiveness of your detection solution: Is it detecting most alerts or are the majority reported by users and system administrators? If your security operations team and their tools are not the greatest source of security alerts, you have an issue.
- Detection accuracy/false positive rates—the percentage of alerts that, upon investigation, are revealed to not be valid threats. False positives reduce a security team’s confidence in its tools and draws attention away from serious underlying problems. False positive feedback loops should be included in any incident management process, but enterprises must guard against becoming too lenient; the only thing worse than a false positive is a false negative in which a serious threat is overlooked.
- Mean Time to Respond/Repair (MTTR)—the time it takes to see a security concern, identify the impact, determine the course of action and implement it. These numbers can vary widely but over time trends will appear, providing useful insight about where you need to invest for additional protection, remediation and automation capabilities.
- See our in-depth guide on Incident Response Automation and Security Orchestration
- Read about Exabeam’s automated incident responder
- Read our blog post: Incident Response Plan 101
- Read our blog post: The Complete Guide to CSIRT Organization: Building an Incident Response Team