ADR-001: Event Bus Architecture
Status
Accepted
Context
Triage Warden needs to coordinate multiple components (enrichment, analysis, action execution, notifications) in response to security incidents. We needed a way to:
- Decouple components for independent development and testing
- Enable real-time updates to the dashboard
- Support both synchronous and asynchronous processing
- Maintain an audit trail of all system events
Decision
We implemented an in-process event bus using Tokio channels with the following design:
Event Types
All significant system events are captured as TriageEvent variants:
AlertReceived- New alert from webhookIncidentCreated- Incident created from alertEnrichmentComplete- Single enrichment finishedEnrichmentPhaseComplete- All enrichments doneAnalysisComplete- AI analysis finishedActionsProposed- Response actions proposedActionApproved/Denied- Action approval decisionActionExecuted- Action completedStatusChanged- Incident status transitionTicketCreated- External ticket createdIncidentEscalated- Incident escalatedIncidentResolved- Incident resolvedKillSwitchActivated- Emergency stop triggered
Delivery Mechanisms
- Broadcast Channel: For real-time dashboard updates via SSE
- Named Subscribers: For component-specific processing queues
- Event History: In-memory buffer for recent event retrieval
Error Handling
Events are fire-and-forget with fallback logging:
publish()- Returns Result for cases where failure matterspublish_with_fallback()- Logs errors, never fails (for non-critical events)
Consequences
Positive
- Components are loosely coupled and independently testable
- Dashboard receives real-time updates without polling
- Complete event history available for debugging
- Failed subscribers don't block the main processing flow
Negative
- In-process only - no distributed event bus
- Event history is limited and in-memory (lost on restart)
- No guaranteed delivery or replay capability
- Broadcast channel has limited buffer (may drop events under load)
Future Considerations
For high-availability deployments, consider:
- Redis Pub/Sub for distributed events
- PostgreSQL LISTEN/NOTIFY for persistent events
- External message queue (RabbitMQ, Kafka) for durability