ADR-007: Kill Switch Design
Status
Accepted
Context
Autonomous security response systems pose risks if they malfunction:
- False positives could disable legitimate users/systems
- Bugs could trigger cascading actions
- Compromised AI could be weaponized
- External events may require immediate halt
We needed an emergency stop mechanism that is:
- Fast to activate (< 1 second)
- Globally effective
- Difficult to accidentally trigger
- Easy to recover from
Decision
We implemented a global kill switch with the following design:
Architecture
┌─────────────┐
│ Kill Switch │
│ State │
└──────┬──────┘
│
┌──────────────────┼──────────────────┐
│ │ │
▼ ▼ ▼
┌───────────────┐ ┌───────────────┐ ┌───────────────┐
│ Orchestrator │ │ Policy Engine │ │ Action Runner │
│ │ │ │ │ │
│ check() │ │ check() │ │ check() │
│ before │ │ before │ │ before │
│ processing │ │ evaluation │ │ execution │
└───────────────┘ └───────────────┘ └───────────────┘
State
#![allow(unused)] fn main() { pub struct KillSwitchStatus { pub active: bool, pub reason: Option<String>, pub activated_by: Option<String>, pub activated_at: Option<DateTime<Utc>>, } }
Check Points
The kill switch is checked at multiple points:
- Alert Processing: Before creating incidents from alerts
- Policy Evaluation: Before evaluating approval policies
- Action Execution: Before executing any response action
- Playbook Execution: Before running playbook stages
Activation
#![allow(unused)] fn main() { // Via API POST /api/kill-switch/activate { "reason": "Investigating false positive surge", "activated_by": "[email protected]" } // Via CLI tw-cli kill-switch activate --reason "Emergency maintenance" // Programmatic kill_switch.activate("Anomaly detected", "system").await; }
Deactivation
#![allow(unused)] fn main() { // Via API POST /api/kill-switch/deactivate { "reason": "Issue resolved" } // Only admins can deactivate }
Event Notification
Activation triggers:
KillSwitchActivatedevent to all subscribers- Dashboard alert banner
- Notification to configured channels
Consequences
Positive
- Immediate halt of all automation
- Clear audit trail of activation/deactivation
- Multiple activation methods (UI, API, CLI)
- Visible status in all interfaces
Negative
- In-memory state (lost on restart, resets to inactive)
- No automatic activation triggers yet
- Single global switch (no per-action granularity)
- Requires admin access to deactivate
Future Enhancements
- Persistent State: Store kill switch state in database
- Auto-Activation: Trigger on anomaly detection
- Scoped Switches: Per-action-type or per-connector switches
- Scheduled Deactivation: Auto-deactivate after timeout
- Two-Person Rule: Require multiple admins for deactivation
Operational Procedures
When kill switch is activated:
- All pending actions remain pending
- New alerts create incidents but stop at enrichment
- Dashboard shows prominent warning banner
- Existing approved actions are NOT rolled back
To recover:
- Investigate root cause
- Fix underlying issue
- Deactivate kill switch
- Manually review pending actions
- Resume normal operations