Alert rules
Define, manage, and tune alert rules through chat — or click through the UI. Rounds handles the lifecycle: rule creation, evaluation, notification, history, silencing, and alert-to-investigation handoff.
What you can do
- Create from a prompt — "Alert me when the API error rate exceeds 1% for 5 minutes"
- Modify existing rules — "Change the threshold on the high-latency alert to 500ms"
- Investigate a firing alert — start an evidence-backed investigation directly from the alert context
- Delete safely — confirmation prompt before destructive changes; audit-logged
- List & filter —
alert_rule.listreturns rules by folder, severity, state - Inspect history —
alert_rule.historyshows every state transition (firing / pending / resolved) with the values that triggered them
How to use it
Create a rule
In chat:
Alert when the 5xx rate on the checkout service is above 1% for 5 minutes
The alert agent runs:
metrics.metric_names/metrics.labelsto find the right metric + labelsmetrics.validateto confirm the rule expression is well-formedcreate_alert_rulewith: name, expression, threshold, evaluation interval, for-duration, severity, notification channels
Modify a rule
Bump the threshold on
high-checkout-latencyto 800ms
Agent calls alert_rule.list to find the rule by name, then modify_alert_rule with the new threshold. The change is immediate; next evaluation cycle uses the new value.
Inspect what fired
Show me the firing history for
high-error-rateover the last 24h
alert_rule.history returns the state transitions; the chat renders them as a timeline with the trigger values.
Silence a rule temporarily
Use the UI: Alerts → Silences → New. (Silences aren't currently exposed as agent tools to keep them deliberate / auditable.)
Examples
| Prompt | Resulting rule |
|---|---|
Alert me when disk usage on any host exceeds 90% | node_filesystem_avail_bytes / node_filesystem_size_bytes < 0.10, for 10m, severity=warning |
Page on-call when the order pipeline error rate > 5% | Custom expression scoped to the order datasource, severity=critical, routes to PagerDuty |
Warn if Redis memory > 80% for 15 min | redis_memory_used_bytes / redis_memory_max_bytes > 0.80, for 15m |
Limits
- Alert rules need a metric expression. Log-based alerts (Loki ruler) are planned but not in the current release.
- Notification channels (Slack, PagerDuty, email, webhook) configured separately under Admin → Notifications.
- Folder-scoped permissions apply:
alert.rules:writeonfolders:uid:<id>controls who can create/modify rules in that folder. - Automatic investigation and automatic remediation requests are planned as the next step after one-click investigations.
- The agent doesn't auto-silence on dependent failures; chain alerts via the notification dispatcher's
groupBy+inhibitrules instead.
Related
- Investigations — investigate why an alert fired
- Datasources — alert rules query the same datasources as dashboards
- Permissions —
basic:editorincludes alert.rules write; narrow with custom roles