Guardrails Reference

The guardrails configuration file (config/guardrails.yaml) defines security boundaries for AI-automated actions. These rules apply regardless of the current autonomy level.

Deny List

Actions and targets that are never allowed automatically.

Denied Actions

deny_list:
  actions:
    - delete_user          # Too destructive
    - wipe_host            # Too destructive
    - delete_all_emails    # Too destructive
    - modify_firewall      # High risk

Add any action name here to prevent the AI from ever executing it. These actions can still be performed manually by an analyst.

Target Patterns

Regex patterns that match protected systems. Any automated action targeting a hostname or identifier that matches these patterns requires human approval.

deny_list:
  target_patterns:
    - ".*-prod-.*"         # Production systems
    - "dc\\d+\\..*"        # Domain controllers
    - ".*-critical-.*"     # Explicitly marked critical
    - ".*\\.corp\\..*"     # Corporate infrastructure

Protected IPs

Specific IP addresses that must never be targeted by automated actions.

deny_list:
  protected_ips:
    - "10.0.0.1"           # Core router
    - "10.0.0.2"           # DNS server
    - "10.0.0.3"           # DHCP server

Protected Users

User accounts that are protected from automated modifications (disable, password reset, etc.). Supports exact matches and glob patterns.

deny_list:
  protected_users:
    - "admin"
    - "root"
    - "administrator"
    - "service-account-*"
    - "svc-*"

Rate Limits

Prevent runaway automation by capping how many times each action can be executed.

rate_limits:
  isolate_host:
    max_per_hour: 5
    max_per_day: 20
    max_concurrent: 2

  disable_user:
    max_per_hour: 10
    max_per_day: 50
    max_concurrent: 5

  block_ip:
    max_per_hour: 20
    max_per_day: 100
    max_concurrent: 10

  quarantine_email:
    max_per_hour: 50
    max_per_day: 500
    max_concurrent: 20

Field	Description
`max_per_hour`	Maximum executions in a rolling 60-minute window
`max_per_day`	Maximum executions in a rolling 24-hour window
`max_concurrent`	Maximum simultaneous in-flight executions

Approval Policies

Define when human approval is required, and at what level.

approval_policies:
  - name: critical_asset_protection
    description: "Require senior approval for actions on critical assets"
    condition:
      target_criticality:
        - critical
        - high
    requires: senior
    can_override: false

Condition Fields

Field	Type	Description
`target_criticality`	List of strings	Asset criticality levels that trigger this policy
`action_type`	List of strings	Action types that trigger this policy
`confidence_below`	Float (0.0-1.0)	Trigger when AI confidence is below this threshold

Approval Levels

Level	Who can approve
`analyst`	Any analyst
`senior`	Senior analyst or above
`manager`	SOC manager

Overridability

When can_override: true, a senior user can bypass the approval requirement. When false, the approval is mandatory and cannot be skipped.

Auto-Approve Rules

Actions that can be executed automatically when specific conditions are met, even in supervised mode.

auto_approve_rules:
  - name: ticket_operations
    description: "Auto-approve ticket creation and updates"
    action_types:
      - create_ticket
      - update_ticket
      - add_ticket_comment
    conditions:
      - confidence_above: 0.5

  - name: email_quarantine_high_confidence
    description: "Auto-approve email quarantine for high-confidence phishing"
    action_types:
      - quarantine_email
    conditions:
      - confidence_above: 0.95
      - verdict: true_positive

Condition Fields

Field	Type	Description
`confidence_above`	Float (0.0-1.0)	AI confidence must exceed this value
`verdict`	String	AI verdict must match (e.g., `true_positive`)

All conditions in the list must be met (AND logic).

Data Policies

Control how sensitive data is handled in logs and LLM prompts.

data_policies:
  pii_filter: true
  pii_patterns:
    - "\\b\\d{3}-\\d{2}-\\d{4}\\b"      # SSN
    - "\\b\\d{16}\\b"                    # Credit card

  secrets_redaction: true
  secret_patterns:
    - "(?i)api[_-]?key"
    - "(?i)password"
    - "(?i)secret"
    - "(?i)token"
    - "(?i)credential"

  audit_data_access: true

Field	Description
`pii_filter`	Enable PII filtering in logs and LLM prompts
`pii_patterns`	Regex patterns matching PII to redact
`secrets_redaction`	Enable secret detection and redaction
`secret_patterns`	Regex patterns matching secrets to redact
`audit_data_access`	Log all data access operations

Escalation Rules

Define automatic escalation triggers.

escalation_rules:
  - name: repeated_false_positives
    description: "Escalate if same alert type has high FP rate"
    condition:
      false_positive_rate_above: 0.5
      sample_size_min: 10
    action: escalate_to_analyst

  - name: incident_correlation
    description: "Escalate if multiple related incidents detected"
    condition:
      related_incidents_above: 3
      time_window_hours: 1
    action: escalate_to_senior

  - name: critical_severity
    description: "Always escalate critical severity incidents"
    condition:
      severity: critical
    action: escalate_to_manager

Escalation Actions

Action	Description
`escalate_to_analyst`	Route to any available analyst
`escalate_to_senior`	Route to a senior analyst
`escalate_to_manager`	Route to the SOC manager

Triage Warden