Guardrails Reference

The guardrails configuration file (config/guardrails.yaml) defines security boundaries for AI-automated actions. These rules apply regardless of the current autonomy level.

Deny List

Actions and targets that are never allowed automatically.

Denied Actions

deny_list:
  actions:
    - delete_user          # Too destructive
    - wipe_host            # Too destructive
    - delete_all_emails    # Too destructive
    - modify_firewall      # High risk

Add any action name here to prevent the AI from ever executing it. These actions can still be performed manually by an analyst.

Target Patterns

Regex patterns that match protected systems. Any automated action targeting a hostname or identifier that matches these patterns requires human approval.

deny_list:
  target_patterns:
    - ".*-prod-.*"         # Production systems
    - "dc\\d+\\..*"        # Domain controllers
    - ".*-critical-.*"     # Explicitly marked critical
    - ".*\\.corp\\..*"     # Corporate infrastructure

Protected IPs

Specific IP addresses that must never be targeted by automated actions.

deny_list:
  protected_ips:
    - "10.0.0.1"           # Core router
    - "10.0.0.2"           # DNS server
    - "10.0.0.3"           # DHCP server

Protected Users

User accounts that are protected from automated modifications (disable, password reset, etc.). Supports exact matches and glob patterns.

deny_list:
  protected_users:
    - "admin"
    - "root"
    - "administrator"
    - "service-account-*"
    - "svc-*"

Rate Limits

Prevent runaway automation by capping how many times each action can be executed.

rate_limits:
  isolate_host:
    max_per_hour: 5
    max_per_day: 20
    max_concurrent: 2

  disable_user:
    max_per_hour: 10
    max_per_day: 50
    max_concurrent: 5

  block_ip:
    max_per_hour: 20
    max_per_day: 100
    max_concurrent: 10

  quarantine_email:
    max_per_hour: 50
    max_per_day: 500
    max_concurrent: 20
FieldDescription
max_per_hourMaximum executions in a rolling 60-minute window
max_per_dayMaximum executions in a rolling 24-hour window
max_concurrentMaximum simultaneous in-flight executions

Approval Policies

Define when human approval is required, and at what level.

approval_policies:
  - name: critical_asset_protection
    description: "Require senior approval for actions on critical assets"
    condition:
      target_criticality:
        - critical
        - high
    requires: senior
    can_override: false

Condition Fields

FieldTypeDescription
target_criticalityList of stringsAsset criticality levels that trigger this policy
action_typeList of stringsAction types that trigger this policy
confidence_belowFloat (0.0-1.0)Trigger when AI confidence is below this threshold

Approval Levels

LevelWho can approve
analystAny analyst
seniorSenior analyst or above
managerSOC manager

Overridability

When can_override: true, a senior user can bypass the approval requirement. When false, the approval is mandatory and cannot be skipped.

Auto-Approve Rules

Actions that can be executed automatically when specific conditions are met, even in supervised mode.

auto_approve_rules:
  - name: ticket_operations
    description: "Auto-approve ticket creation and updates"
    action_types:
      - create_ticket
      - update_ticket
      - add_ticket_comment
    conditions:
      - confidence_above: 0.5

  - name: email_quarantine_high_confidence
    description: "Auto-approve email quarantine for high-confidence phishing"
    action_types:
      - quarantine_email
    conditions:
      - confidence_above: 0.95
      - verdict: true_positive

Condition Fields

FieldTypeDescription
confidence_aboveFloat (0.0-1.0)AI confidence must exceed this value
verdictStringAI verdict must match (e.g., true_positive)

All conditions in the list must be met (AND logic).

Data Policies

Control how sensitive data is handled in logs and LLM prompts.

data_policies:
  pii_filter: true
  pii_patterns:
    - "\\b\\d{3}-\\d{2}-\\d{4}\\b"      # SSN
    - "\\b\\d{16}\\b"                    # Credit card

  secrets_redaction: true
  secret_patterns:
    - "(?i)api[_-]?key"
    - "(?i)password"
    - "(?i)secret"
    - "(?i)token"
    - "(?i)credential"

  audit_data_access: true
FieldDescription
pii_filterEnable PII filtering in logs and LLM prompts
pii_patternsRegex patterns matching PII to redact
secrets_redactionEnable secret detection and redaction
secret_patternsRegex patterns matching secrets to redact
audit_data_accessLog all data access operations

Escalation Rules

Define automatic escalation triggers.

escalation_rules:
  - name: repeated_false_positives
    description: "Escalate if same alert type has high FP rate"
    condition:
      false_positive_rate_above: 0.5
      sample_size_min: 10
    action: escalate_to_analyst

  - name: incident_correlation
    description: "Escalate if multiple related incidents detected"
    condition:
      related_incidents_above: 3
      time_window_hours: 1
    action: escalate_to_senior

  - name: critical_severity
    description: "Always escalate critical severity incidents"
    condition:
      severity: critical
    action: escalate_to_manager

Escalation Actions

ActionDescription
escalate_to_analystRoute to any available analyst
escalate_to_seniorRoute to a senior analyst
escalate_to_managerRoute to the SOC manager