Incident Response and Forensics

Investigation and Recovery from Security Incidents

TL;DR

Incident response saves time, money, and reputation by enabling rapid detection, containment, and recovery. Forensics preserve evidence for investigation and legal proceedings. Key phases: Preparation → Detection/Analysis → Containment → Eradication → Recovery → Post-Incident Review. Automation and pre-planning reduce response time from hours to minutes.

Learning Objectives

Design incident response teams and processes
Develop incident classification and severity framework
Implement detection and alerting for common attacks
Plan containment and eradication strategies
Preserve forensic evidence for investigation
Conduct effective post-incident reviews

Core Concepts

Incident Response Phases

1. Preparation:

Define roles and responsibilities
Develop playbooks for common incident types
Test procedures (tabletop exercises)
Tools and access pre-provisioned

2. Detection and Analysis:

SIEM alerts and monitoring
Alert triage and severity assessment
Initial containment (isolate affected system?)
Notification to incident commander

3. Containment:

Short-term: Stop spread (isolate affected systems)
Long-term: Prepare systems for eradication
Collect forensic evidence
Maintain access for investigation

4. Eradication:

Remove attacker access (change passwords, revoke tokens)
Patch vulnerabilities that enabled attack
Remove malware/backdoors
Close exploitation vector

5. Recovery:

Restore from clean backups
Rebuild compromised systems
Verify no backdoors remain
Gradual reconnection to network
Monitor for re-compromise

6. Post-Incident Review:

Timeline reconstruction
Root cause analysis
Identify lessons learned
Update detection rules and playbooks

Incident Severity Classification

Severity	Impact	Response Time	Example
Critical	Full service down, data breach imminent	< 15 min	Ransomware on file servers
High	Significant service impact, breach confirmed	< 1 hour	Attacker active on critical system
Medium	Limited impact, contained	< 4 hours	Phishing with credentials obtained
Low	Minimal impact, isolated	< 24 hours	Failed login attempts, misconfig detected

Forensics Preservation

Chain of Custody:

Document what was captured, by whom, when
Cryptographic hashing (SHA-256) for files
Maintain secure storage of evidence
Restrict access (prevents contamination)

Evidence Collection:

Memory dumps before power off
Disk images (bit-for-bit copies)
Log files and application data
Network traffic captures (PCAP)
Timing and sequence of events

Practical Example

Incident Response Playbook

IncidentType: Ransomware Detection

Severity: Critical
SLA: 15 minutes to initial containment

Detection:
  Triggers:
    - Multiple file extensions changing rapidly
    - Encryption library activity detected
    - Backup deletion attempts
  Alert: P1-Ransomware-Detected

InitialResponse:
  Incident Commander: On-call CISO
  Team:
    - Security Engineer (evidence collection)
    - Infrastructure Engineer (isolation)
    - Forensics Analyst (investigation)
    - Communications (notification)
  Actions:
    1. Verify alert authenticity (false positive check)
    2. Isolate affected system from network
    3. Preserve memory dump for forensics
    4. Create disk snapshot
    5. Notify backup team (prevent auto-sync)
    6. Document timeline

Containment:
  - Isolate subnet/vlan
  - Disable affected user account
  - Revoke session tokens
  - Block outbound network to C2 domains
  - Scan for lateral movement

Eradication:
  - Identify attacker entry point (RDP, VPN, etc.)
  - Patch vulnerability or disable service
  - Remove malware/ransomware samples
  - Audit admin accounts for backdoors
  - Verify with EDR/SIEM

Recovery:
  - Restore from clean, pre-infection backup
  - Rebuild to baseline configuration
  - Monitor for re-infection
  - Gradual service restoration
  - Verify integrity

PostIncident:
  - Full forensic analysis
  - Timeline and attack chain
  - Root cause (phishing, unpatched server, etc.)
  - Update detection rules
  - Update playbook
  - Communication to stakeholders

Forensics Procedures

Memory Collection

# Linux: volatility framework
dd if=/dev/mem of=/mnt/usb/memory.dump bs=1M

# Windows: Windows Memory Diagnostic
dumpit.exe memory.dump

# Note: Collect before shutting down (volatile data lost)

Disk Imaging

# Create forensic image with hashing
dcfldd if=/dev/sda1 of=/mnt/external/disk.img hash=sha256 hashwindow=256M

Evidence Log

Chain of Custody:
Item: Memory dump from server-prod-01
Collected: 2025-02-14 10:45 UTC
Collected by: John Analyst (ID: john.analyst@company.com)
Reason: Ransomware incident response
Storage: Secure evidence vault (encryption + MFA)
Access: John Analyst, Jane CISO, Legal counsel
Integrity: SHA256: a3f4e2b1c9d8e7f6a5b4c3d2e1f0a9b8
Hash verified: 2025-02-14 10:46 UTC
Status: Preserved for investigation

When to Use / When Not to Use

Incident Response Best Practices

Clear incident classification and severity levels
Written playbooks for common incident types
Regular tabletop exercises and drills
Immediate notification to incident commander
Preservation of forensic evidence from start
Post-incident review and lessons learned
Metrics (MTTD, MTTR) tracked
Communication plan for stakeholders

Common Mistakes

No incident classification framework
Manual, ad-hoc response procedures
No tabletop exercises or testing
Delayed incident notification
Destroying evidence during response
No post-incident review
No metrics or improvement tracking
Silent breach (no external communication)

Design Review Checklist

One Takeaway

Incident response preparation saves hours during actual incidents. Playbooks, training, and tools pre-positioned reduce MTTR from days to hours. Post-incident reviews drive continuous improvement.

Complete Incident Response Examples

Example 1: Data Breach Response (Real Timeline)

2025-02-14 10:45 UTC: Anomaly Detected
  SIEM Alert: Unusual database query pattern
  - Selecting millions of rows (normally selects 1000s)
  - From sensitive tables (customers, transactions)
  - By service account (not normal)

  Alert severity: HIGH
  Incident commander paged

2025-02-14 10:48 UTC: Initial Investigation (3 min)
  Security engineer checks:
  - Query source: IP 192.168.1.50 (internal)
  - Service: Reports API (should only select aggregated data)
  - Query: "SELECT * FROM customers" (no WHERE clause!)
  - Duration: Running for 15 minutes already

  Verdict: Likely data exfiltration
  Action: IMMEDIATELY isolate affected system

2025-02-14 10:50 UTC: Containment (5 min)
  Infrastructure engineer:
  - Kills database connections from Reports API
  - Revokes API credentials
  - Isolates Reports API servers (network rules)
  - Stops running queries

  Security engineer:
  - Reviews database activity logs (how much data accessed?)
  - Begins forensics data collection (memory dump, disk snapshot)
  - Checks for lateral movement (other compromised systems?)

2025-02-14 11:15 UTC: Forensic Analysis (25 min)
  Discovery: Reports API container had unpatched OpenSSL bug
  - Attacker exploited to gain shell access
  - Attacker created backdoor (cron job running malicious script)
  - Attacker accessed database with stolen API key

  Data breach: Customer names, emails, phone numbers
  Estimate: 500,000 records accessed (but not all exfiltrated)

2025-02-14 11:30 UTC: Eradication (45 min)
  - Delete backdoor cron job
  - Patch OpenSSL vulnerability (rebuild container)
  - Scan all containers for similar exploits
  - Revoke all API keys (need to rotate)
  - Audit file system for suspicious changes

2025-02-14 13:00 UTC: Recovery (2 hours 15 min)
  - Rebuild Reports API from clean backup (pre-compromise)
  - Restore API keys (new, secure)
  - Monitor for signs of re-compromise
  - Test functionality (ensure no data loss)

2025-02-14 14:00 UTC: Notification (3 hours)
  Legal team determines: Breach notification law triggered
  - Notify 500K affected customers (email, breach notification letter)
  - Notify regulators (depending on jurisdiction)
  - Notify press (if required by law)

  Message: "We discovered unauthorized access to customer emails/phones.
           Passwords were NOT accessed. We've patched the vulnerability.
           Please monitor your email for scams."

2025-02-21 (1 week): Post-Incident Review
  - Timeline confirmed
  - Root cause: Unpatched OpenSSL + lack of network segmentation
  - Lessons learned:
    * Implement container image scanning (CVE checking)
    * Network segmentation (DB access only from specific IPs)
    * Database activity monitoring (unusual queries alerted)
    * Credential rotation policies (invalidate old API keys)
  - Action items assigned with owners and deadlines

Metrics:
  - MTTD (Mean Time To Detect): 15 minutes (alert triggered)
  - MTTR (Mean Time To Respond): 5 minutes (system isolated)
  - Data Exposed: 500K records (identified within 2 hours)

Example 2: Ransomware Attack Response

Timeline:

T+0 (8 AM Monday):
  Alert: High volume of file modifications detected
  Files being renamed with .encrypted extension
  Verdict: Ransomware confirmed

  Immediate Actions:
    - Page incident commander (critical)
    - Isolate affected systems (network isolation)
    - Preserve memory dump (volatile data)
    - Create disk snapshot (forensics)

T+5 min:
  Investigation: Ransomware type?
  - File signatures suggest: ALPHV (known ransomware gang)
  - Ransom note appears: "Pay $2M in Bitcoin or data deleted"

  Assessment:
    - Affected: Finance server, customer database backups
    - Not affected: Production servers (different subnet)
    - Damage: Backups encrypted (can't restore)
    - Options: Pay ransom (NO!), rebuild from different backups, law enforcement

T+30 min:
  Forensics reveals entry point: RDP service (port 3389)
  - Weak password on finance admin account
  - No MFA
  - Attacker brute-forced credentials

  Attacker path:
    RDP login → Admin shell → Disable antivirus
              → Deploy ransomware → Encrypt all accessible files
              → Display ransom note → Exit

T+1 hour:
  Eradication:
    - Kill all unauthorized processes
    - Patch RDP service (disable weak auth)
    - Re-enable antivirus (remove disable commands)
    - Force password reset (all domain accounts)
    - Enable MFA (all admins)
    - Enable network segmentation (finance isolated)

T+4 hours:
  Recovery:
    - Restore from offline backup (that wasn't encrypted)
    - Finance data from 24 hours ago (some work lost)
    - Monitor for re-compromise (unusual logins)

T+24 hours:
  Notification:
    - No evidence customer data accessed
    - But transparency: explain what happened, what we're doing
    - Offer credit monitoring anyway (goodwill)
    - Law enforcement notified

T+2 weeks:
  Post-incident review:
    - Why no MFA? (Should be mandatory for admins)
    - Why weak password? (Password policy insufficient)
    - Why RDP exposed to internet? (Should VPN-only)
    - Why no network segmentation? (Finance should be isolated)

  Fixes:
    - Enforce MFA for all admins (immediate)
    - Password policy: 16 chars, complexity (immediate)
    - VPN for RDP access only (immediate)
    - Network segmentation (2 weeks)
    - Backup testing (monthly)
    - Incident response drills (quarterly)

Result: Expensive but survived. Lessons learned. No customer data lost.

Forensics Best Practices

Collecting Evidence Without Contaminating

# ❌ BAD: Modifying data while investigating
ls -la /var/log/*            # Accesses file, changes atime
cat /var/log/auth.log        # Modifies atime
grep "attacker" /var/log/*   # Modifies files

# ✅ GOOD: Preserving evidence chain
# Step 1: Create bit-for-bit copy (before touching original)
dcfldd if=/dev/sda1 of=/mnt/usb/disk.img hash=sha256

# Step 2: Hash original (for integrity verification)
sha256sum /dev/sda1 > /mnt/usb/original.hash

# Step 3: Verify copy matches
sha256sum /mnt/usb/disk.img
# Should match original.hash

# Step 4: Only analyze the copy (not original)
# Mount copy as read-only
mount -o ro,loop /mnt/usb/disk.img /mnt/forensics

# Step 5: Document chain of custody
# Who accessed it, when, what tools, what was found

Evidence Preservation Checklist

class EvidencePreservation:
    def collect_evidence(self, incident_id: str):
        """Properly preserve evidence for investigation"""

        evidence = {
            'volatile_data': [],
            'disk_images': [],
            'network_captures': [],
            'logs': [],
        }

        # Memory (volatile, lost on reboot)
        if should_preserve_memory():
            memory = self.memory_dump(process='all')
            evidence['volatile_data'].append(self.hash_and_store(memory))

        # Network traffic (PCAP)
        pcap = self.capture_network_traffic()
        evidence['network_captures'].append(self.hash_and_store(pcap))

        # Disk (if needed, power off first to prevent changes)
        if should_image_disk():
            disk_image = self.create_disk_image('/dev/sda1')
            evidence['disk_images'].append(self.hash_and_store(disk_image))

        # Logs (application, system, audit)
        logs = self.collect_logs()
        evidence['logs'].append(self.hash_and_store(logs))

        # Chain of custody documentation
        self.document_chain_of_custody(
            incident_id=incident_id,
            items=evidence,
            collector='Security Team',
            timestamp=datetime.now(),
            storage_location='Evidence Vault (HSM encrypted)',
            access_restrictions='CISO, Legal, Forensics Analyst'
        )

        return evidence

    def hash_and_store(self, evidence_item):
        """Hash evidence, store securely, document"""
        sha256_hash = hashlib.sha256(evidence_item).hexdigest()

        # Store in secure evidence vault
        self.evidence_vault.store(
            data=evidence_item,
            hash=sha256_hash,
            encrypted=True,  # Encrypt in vault
            access_log=True  # Log all access
        )

        return {
            'item': evidence_item,
            'hash': sha256_hash,
            'stored_at': datetime.now(),
            'location': 'Secure Evidence Vault'
        }

Metrics for Measuring Response Effectiveness

class IncidentMetrics:
    def calculate_response_metrics(self, incident):
        """Measure incident response effectiveness"""

        # Time metrics
        metrics = {
            'MTTD': incident.detected_at - incident.occurred_at,
            'MTTR': incident.resolved_at - incident.detected_at,
            'MTPT': incident.patched_at - incident.occurred_at,  # Patch time
        }

        # Impact metrics
        metrics.update({
            'customers_affected': incident.affected_users,
            'data_exposed': incident.records_exposed,
            'downtime_minutes': (incident.resolved_at - incident.started_at).total_seconds() / 60,
            'financial_impact': incident.incident_cost,  # Cost of incident + response
        })

        # Response quality
        metrics.update({
            'playbook_used': incident.used_predefined_playbook,
            'all_steps_followed': incident.followed_procedure,
            'escalated_appropriately': incident.escalation_correct,
            'communication_timely': incident.notified_within_sla,
        })

        # Lessons learned
        metrics.update({
            'root_cause_identified': incident.root_cause is not None,
            'preventive_actions_identified': len(incident.preventive_actions) > 0,
            'action_items_assigned': all(a.owner for a in incident.action_items),
        })

        return metrics

# Target metrics (best in class):
# MTTD: < 1 hour (detect within 1 hour)
# MTTR: < 4 hours (resolve within 4 hours)
# MTPT: < 24 hours (patch within 1 day)
# Customer notification: Within 24-72 hours
# Post-incident review: Within 2 weeks

One Takeaway

Incident response preparation saves hours during actual incidents. Playbooks, training, and tools pre-positioned reduce MTTR from days to hours. Post-incident reviews drive continuous improvement. The goal is not zero incidents (impossible) but rapid detection, containment, and recovery.

Incident Response and Forensics

TL;DR​

Learning Objectives​

Core Concepts​

Incident Response Phases​

Incident Severity Classification​

Forensics Preservation​

Practical Example​

Incident Response Playbook​

Forensics Procedures​

Memory Collection​

Disk Imaging​

Evidence Log​

When to Use / When Not to Use​

Design Review Checklist​

Complete Incident Response Examples​

Example 1: Data Breach Response (Real Timeline)​

Example 2: Ransomware Attack Response​

Forensics Best Practices​

Collecting Evidence Without Contaminating​

Evidence Preservation Checklist​

Metrics for Measuring Response Effectiveness​

One Takeaway​

References​

TL;DR

Learning Objectives

Core Concepts

Incident Response Phases

Incident Severity Classification

Forensics Preservation

Practical Example

Incident Response Playbook

Forensics Procedures

Memory Collection

Disk Imaging

Evidence Log

When to Use / When Not to Use

Design Review Checklist

Complete Incident Response Examples

Example 1: Data Breach Response (Real Timeline)

Example 2: Ransomware Attack Response

Forensics Best Practices

Collecting Evidence Without Contaminating

Evidence Preservation Checklist

Metrics for Measuring Response Effectiveness

One Takeaway

References