Telemetry Dashboard Queries
internalProvider-parity query templates and alert mapping for p95/p99 operations.
Purpose
This page provides a practical query pack for runtime telemetry operations.
- Keep panel IDs identical between providers.
- Keep threshold semantics identical between providers.
- Allow syntax differences (PostHog vs SigNoz) without changing SLI meaning.
Panel IDs (canonical)
api_latency_percentiles_by_route_groupapi_error_rate_by_route_groupapi_slowest_endpoints_p99queue_latency_and_failures
SigNoz templates
Use these as PromQL-style templates in SigNoz dashboards. Replace metric names if your collector renames histograms.
# p50 latency by route_group (5m)
histogram_quantile(
0.50,
sum by (le, route_group) (
rate(http_server_request_duration_milliseconds_bucket{service_name="athas-backend"}[5m])
)
)# p95 latency by route_group (5m)
histogram_quantile(
0.95,
sum by (le, route_group) (
rate(http_server_request_duration_milliseconds_bucket{service_name="athas-backend"}[5m])
)
)# p99 latency by route_group (5m)
histogram_quantile(
0.99,
sum by (le, route_group) (
rate(http_server_request_duration_milliseconds_bucket{service_name="athas-backend"}[5m])
)
)# error rate (%) by route_group (5m)
100 * (
sum by (route_group) (rate(http_server_request_count{service_name="athas-backend",status_class="5xx"}[5m]))
/
sum by (route_group) (rate(http_server_request_count{service_name="athas-backend"}[5m]))
)PostHog templates
Use these as query templates in PostHog logs/traces views with equivalent field mappings.
-- p50, p95 and p99 latency by route_group over 5m buckets
SELECT
toStartOfFiveMinute(timestamp) AS ts,
properties.route_group AS route_group,
quantile(0.50)(properties.duration_ms) AS p50_ms,
quantile(0.95)(properties.duration_ms) AS p95_ms,
quantile(0.99)(properties.duration_ms) AS p99_ms
FROM logs
WHERE properties.service = 'athas-backend'
GROUP BY ts, route_group
ORDER BY ts DESC, route_group;-- error rate by route_group over 5m buckets
SELECT
toStartOfFiveMinute(timestamp) AS ts,
properties.route_group AS route_group,
100.0 * sum(if(properties.status_class = '5xx', 1, 0)) / count() AS error_rate_pct
FROM logs
WHERE properties.service = 'athas-backend'
GROUP BY ts, route_group
ORDER BY ts DESC, route_group;Alert mapping
Use the same thresholds from docs/telemetry/alerts-contract.yaml:
- warning:
p95 > 800msfor10m - critical:
p99 > 1500msfor10m - critical:
error_rate > 2%for10m
Validation flow
- Run
bun run telemetry:latency:checkagainst target env. - Compare synthetic snapshot with dashboard p50/p95/p99 trends.
- Trigger one controlled latency spike in non-prod and confirm alert behavior.