When security incidents occur, it’s vital to contain them quickly before the damage spreads. In case of a breach, every second matters, but time is already precious for overstretched IT and security teams. So, it makes sense to automate security wherever possible. This technical guide shows you how self-service resource isolation is an effective way to reduce the manual effort involved in incident response.
Resource Isolation is a method of remediating a security incident in the AWS cloud. Security incidents might occur in service or infrastructure domains. AWS already provides other services to log and monitor all activities and detect security-related events in the AWS Cloud environment. These include Amazon CloudTrail, Amazon CloudWatch, Amazon S3 Access Logs, VPC Flow Logs, Amazon GuardDuty, Amazon Detective, AWS Security Hub, and Amazon Macie.
The impact of a service domain incident can be painful, causing interruptions to vital services and enormous inconvenience to end-users, which can lead to reputational damage. Service domain incidents can affect a customer's AWS account, IAM permissions, resource metadata, and billing. If threat actors gain access to the IAM account, they can misuse the APIs to disrupt the existing setup.
The potential consequences of an infrastructure domain incident include operational downtime, data theft and compliance breaches. Depending on the severity of the breach and the industry involved, such incidents may be reportable under the GDPR and lead to substantial fines by the EU’s Data Protection Authorities.
Incidents in the infrastructure domain include data or network-related activity, such as the traffic to Amazon EC2 instances within the VPC, processes and data on Amazon EC2 instances, and other areas, like containers or other future services.
During a security event investigation, we might need to isolate resources as part of the response to a security anomaly. The intention behind isolating resources is to mitigate the potential impact, prevent further propagation of affected resources, limit the unintended exposure of data, and prevent other unauthorized access.
Developing a security API and codifying the manual steps saves considerable human effort and time. Incident responders can then invoke the API to remediate the issue. Over time, we can automate more steps and implement other runbooks, and ultimately automatically handle an assortment of classes of common incidents.
The solution focuses on isolating IAM users and EC2 resources in AWS accounts. It uses highly secured REST APIs that integrate standard notifications to operations teams and account and application owners. The solution uses a standard deployment workflow and manages the infrastructure resources as code with automation in mind. The following architecture flow diagram illustrates the solution approach.
On receiving an incident for resource isolation:
The earlier architecture flow diagram shows that several AWS resources are used to set up the API-based solution. These resources are provisioned and managed using Terraform as Infrastructure as a Code (IaaC). We define a standard CICD approach with different environments for development, testing, and production use. All code developed and reviewed in the development environment is merged correctly and deployed to the test environment to validate the solution with functional test cases. After successful testing, deployment is performed to the production environment to enable the self-service security incident response API.
Through codifying incident response runbooks and giving valid users or applications the API, we bring these benefits to incident response:
Read on for a deeper dive on each of these benefits of the API.
The AWS CAF, AWS Security Incidence Response Guide, and Well-Architected Framework recommend that customers formulate known procedures for incident response and test their runbooks before an incident. Testing processes before an event occurs decreases the time it takes to respond in a production environment.
For artifact gathering, codifying the processes into set code and infrastructure prepares us for data collection. Codifying standardizes the collection process into a repeatable and auditable sequence of what information was collected and when and how it was collected. This reduces the likelihood of missing data for future investigations.
We have developed the IAM user and EC2 instance isolations process for the current state. We can also implement automation of another runbook and add it as another feature.
Integrating the features in the same API achieves uniform security configurations and development processes. All the features are exposed as REST APIs and can be integrated easily with other applications or systems.