[release-12.3.3] [DOC] Add note to contact support for TraceQL alerts (#117258)
Some checks failed
Actionlint / Lint GitHub Actions files (push) Waiting to run
Backend Unit Tests / Detect whether code changed (push) Waiting to run
Backend Unit Tests / Grafana (1/8) (push) Blocked by required conditions
Backend Unit Tests / Grafana (2/8) (push) Blocked by required conditions
Backend Unit Tests / Grafana (3/8) (push) Blocked by required conditions
Backend Unit Tests / Grafana (4/8) (push) Blocked by required conditions
Backend Unit Tests / Grafana (5/8) (push) Blocked by required conditions
Backend Unit Tests / Grafana (6/8) (push) Blocked by required conditions
Backend Unit Tests / Grafana (7/8) (push) Blocked by required conditions
Backend Unit Tests / Grafana (8/8) (push) Blocked by required conditions
Backend Unit Tests / Grafana Enterprise (1/8) (push) Blocked by required conditions
Backend Unit Tests / Grafana Enterprise (2/8) (push) Blocked by required conditions
Backend Unit Tests / Grafana Enterprise (3/8) (push) Blocked by required conditions
Backend Unit Tests / Grafana Enterprise (4/8) (push) Blocked by required conditions
Backend Unit Tests / Grafana Enterprise (5/8) (push) Blocked by required conditions
Backend Unit Tests / Grafana Enterprise (6/8) (push) Blocked by required conditions
Backend Unit Tests / Grafana Enterprise (7/8) (push) Blocked by required conditions
Backend Unit Tests / Grafana Enterprise (8/8) (push) Blocked by required conditions
Backend Unit Tests / All backend unit tests complete (push) Blocked by required conditions
Lint Frontend / Detect whether code changed (push) Waiting to run
Lint Frontend / Lint (push) Blocked by required conditions
Lint Frontend / Typecheck (push) Blocked by required conditions
Lint Frontend / Verify API clients (push) Waiting to run
Lint Frontend / Verify API clients (enterprise) (push) Waiting to run
Verify i18n / verify-i18n (push) Waiting to run
End-to-end tests / Detect whether code changed (push) Waiting to run
End-to-end tests / Build & Package Grafana (push) Blocked by required conditions
End-to-end tests / Build E2E test runner (push) Blocked by required conditions
End-to-end tests / push-docker-image (push) Blocked by required conditions
End-to-end tests / dashboards-suite (old arch) (push) Blocked by required conditions
End-to-end tests / panels-suite (old arch) (push) Blocked by required conditions
End-to-end tests / smoke-tests-suite (old arch) (push) Blocked by required conditions
End-to-end tests / various-suite (old arch) (push) Blocked by required conditions
End-to-end tests / Verify Storybook (Playwright) (push) Blocked by required conditions
End-to-end tests / Playwright E2E tests (1/8) (push) Blocked by required conditions
End-to-end tests / Playwright E2E tests (2/8) (push) Blocked by required conditions
End-to-end tests / Playwright E2E tests (3/8) (push) Blocked by required conditions
End-to-end tests / Playwright E2E tests (4/8) (push) Blocked by required conditions
End-to-end tests / Playwright E2E tests (5/8) (push) Blocked by required conditions
End-to-end tests / Playwright E2E tests (6/8) (push) Blocked by required conditions
End-to-end tests / Playwright E2E tests (7/8) (push) Blocked by required conditions
End-to-end tests / Playwright E2E tests (8/8) (push) Blocked by required conditions
End-to-end tests / run-azure-monitor-e2e (push) Blocked by required conditions
End-to-end tests / All Playwright tests complete (push) Blocked by required conditions
End-to-end tests / A11y test (push) Blocked by required conditions
End-to-end tests / Publish metrics (push) Blocked by required conditions
End-to-end tests / All E2E tests complete (push) Blocked by required conditions
Frontend tests / Detect whether code changed (push) Waiting to run
Frontend tests / Unit tests (1 / 16) (push) Blocked by required conditions
Frontend tests / Unit tests (10 / 16) (push) Blocked by required conditions
Frontend tests / Unit tests (11 / 16) (push) Blocked by required conditions
Frontend tests / Unit tests (12 / 16) (push) Blocked by required conditions
Frontend tests / Unit tests (13 / 16) (push) Blocked by required conditions
Frontend tests / Unit tests (14 / 16) (push) Blocked by required conditions
Frontend tests / Unit tests (15 / 16) (push) Blocked by required conditions
Frontend tests / Unit tests (16 / 16) (push) Blocked by required conditions
Frontend tests / Unit tests (2 / 16) (push) Blocked by required conditions
Frontend tests / Unit tests (3 / 16) (push) Blocked by required conditions
Frontend tests / Unit tests (4 / 16) (push) Blocked by required conditions
Frontend tests / Unit tests (5 / 16) (push) Blocked by required conditions
Frontend tests / Unit tests (6 / 16) (push) Blocked by required conditions
Frontend tests / Unit tests (7 / 16) (push) Blocked by required conditions
Frontend tests / Unit tests (8 / 16) (push) Blocked by required conditions
Frontend tests / Unit tests (9 / 16) (push) Blocked by required conditions
Frontend tests / Decoupled plugin tests (push) Blocked by required conditions
Frontend tests / Packages unit tests (push) Blocked by required conditions
Frontend tests / All frontend unit tests complete (push) Blocked by required conditions
Integration Tests / Detect whether code changed (push) Waiting to run
Integration Tests / Sqlite (1/4) (push) Blocked by required conditions
Integration Tests / Sqlite (2/4) (push) Blocked by required conditions
Integration Tests / Sqlite (3/4) (push) Blocked by required conditions
Integration Tests / Sqlite (4/4) (push) Blocked by required conditions
Integration Tests / Sqlite Without CGo (1/4) (push) Blocked by required conditions
Integration Tests / Sqlite Without CGo (2/4) (push) Blocked by required conditions
Integration Tests / Sqlite Without CGo (3/4) (push) Blocked by required conditions
Integration Tests / Sqlite Without CGo (4/4) (push) Blocked by required conditions
Integration Tests / MySQL (1/16) (push) Blocked by required conditions
Integration Tests / MySQL (10/16) (push) Blocked by required conditions
Integration Tests / MySQL (11/16) (push) Blocked by required conditions
Integration Tests / MySQL (12/16) (push) Blocked by required conditions
Integration Tests / MySQL (13/16) (push) Blocked by required conditions
Integration Tests / MySQL (14/16) (push) Blocked by required conditions
Integration Tests / MySQL (15/16) (push) Blocked by required conditions
Integration Tests / MySQL (16/16) (push) Blocked by required conditions
Integration Tests / MySQL (2/16) (push) Blocked by required conditions
Integration Tests / MySQL (3/16) (push) Blocked by required conditions
Integration Tests / MySQL (4/16) (push) Blocked by required conditions
Integration Tests / MySQL (5/16) (push) Blocked by required conditions
Integration Tests / MySQL (6/16) (push) Blocked by required conditions
Integration Tests / MySQL (7/16) (push) Blocked by required conditions
Integration Tests / MySQL (8/16) (push) Blocked by required conditions
Integration Tests / MySQL (9/16) (push) Blocked by required conditions
Integration Tests / Postgres (1/16) (push) Blocked by required conditions
Integration Tests / Postgres (10/16) (push) Blocked by required conditions
Integration Tests / Postgres (11/16) (push) Blocked by required conditions
Integration Tests / Postgres (12/16) (push) Blocked by required conditions
Integration Tests / Postgres (13/16) (push) Blocked by required conditions
Integration Tests / Postgres (14/16) (push) Blocked by required conditions
Integration Tests / Postgres (15/16) (push) Blocked by required conditions
Integration Tests / Postgres (16/16) (push) Blocked by required conditions
Integration Tests / Postgres (2/16) (push) Blocked by required conditions
Integration Tests / Postgres (3/16) (push) Blocked by required conditions
Integration Tests / Postgres (4/16) (push) Blocked by required conditions
Integration Tests / Postgres (5/16) (push) Blocked by required conditions
Integration Tests / Postgres (6/16) (push) Blocked by required conditions
Integration Tests / Postgres (7/16) (push) Blocked by required conditions
Integration Tests / Postgres (8/16) (push) Blocked by required conditions
Integration Tests / Postgres (9/16) (push) Blocked by required conditions
Integration Tests / All backend integration tests complete (push) Blocked by required conditions
Reject GitHub secrets / reject-gh-secrets (push) Waiting to run
Build Release Packages / setup (push) Waiting to run
Build Release Packages / Dispatch grafana-enterprise build (push) Blocked by required conditions
Build Release Packages / / darwin-amd64 (push) Blocked by required conditions
Build Release Packages / / darwin-arm64 (push) Blocked by required conditions
Build Release Packages / / linux-amd64 (push) Blocked by required conditions
Build Release Packages / / linux-armv6 (push) Blocked by required conditions
Build Release Packages / / linux-armv7 (push) Blocked by required conditions
Build Release Packages / / linux-arm64 (push) Blocked by required conditions
Build Release Packages / / linux-s390x (push) Blocked by required conditions
Build Release Packages / / windows-amd64 (push) Blocked by required conditions
Build Release Packages / / windows-arm64 (push) Blocked by required conditions
Build Release Packages / Upload artifacts (push) Blocked by required conditions
Build Release Packages / publish-dockerhub (push) Blocked by required conditions
Build Release Packages / Dispatch publish NPM canaries (push) Blocked by required conditions
Build Release Packages / notify-pr (push) Blocked by required conditions
Shellcheck / Shellcheck scripts (push) Waiting to run
Run Storybook a11y tests / Detect whether code changed (push) Waiting to run
Run Storybook a11y tests / Run Storybook a11y tests (push) Blocked by required conditions
Swagger generated code / Detect whether code changed (push) Waiting to run
Swagger generated code / Verify committed API specs match (push) Blocked by required conditions
Dispatch sync to mirror / dispatch-job (push) Waiting to run
publish-technical-documentation-release / sync (push) Has been cancelled

[DOC] Add note to contact support for TraceQL alerts (#117256)

* Add note to contact support for TraceQL alerts

(cherry picked from commit addc8dc781)

Co-authored-by: Kim Nylander <104772500+knylander-grafana@users.noreply.github.com>
This commit is contained in:
grafana-delivery-bot[bot] 2026-02-02 12:20:00 -06:00 committed by GitHub
parent 4de7add2aa
commit 04a4e3f89f
No known key found for this signature in database
GPG key ID: B5690EEEBB952194

View file

@ -37,7 +37,7 @@ You can create trace-based alerts in Grafana Alerting using two main approaches:
This guide provides introductory examples and distinct approaches for setting up **trace-based alerts** in Grafana. Tracing data is commonly collected using **OpenTelemetry (OTel)** instrumentation. OTel allows you to integrate trace data from a wide range of applications and environments into Grafana.
## **Alerting on span metrics**
## Alerting on span metrics
OpenTelemetry provides processors that convert tracing data into Prometheus-style metrics.
@ -101,9 +101,9 @@ histogram_quantile(0.95,
) > 2
```
Heres the query breakdown
Heres the query breakdown:
- `traces_span_metrics_duration_seconds`
- `traces_span_metrics_duration_seconds`
Its a native histogram produced from spans using Alloy or the OTEL collector. The metric is filtered by:
- `service_name="<SERVICE_NAME>"` targets a particular service.
- `span_kind="SPAN_KIND_SERVER"` selects spans handling inbound requests.
@ -111,16 +111,16 @@ Heres the query breakdown
_You should query `traces_spanmetrics_latency` when using other span metric generators._
- `rate(...[10m])`
Converts the histogram into a per-second histogram over the last 10 minutes (the distribution of spans per second during that period).
- `rate(...[10m])`
Converts the histogram into a per-second histogram over the last 10 minutes (the distribution of spans per second during that period).
This makes the time window explicit and ensures latencies can be calculated over the last 10 minutes using `histogram_*` functions.
- `sum by (span_name)( … )`
- `sum by (span_name)( … )`
Merges all series that share the same `span_name`. This creates a [multidimensional alert](https://grafana.com/docs/grafana/latest/alerting/best-practices/multi-dimensional-alerts/) that generates one alert instance per span name (operation).
- `histogram_quantile(0.95, ...)`
Calculates p95 latency from the histogram after applying the rate.
- `histogram_quantile(0.95, ...)`
Calculates p95 latency from the histogram after applying the rate.
The query runs as an **instant Prometheus query**, returning a single value for the 10-minute window.
- `> 2`
Defines the threshold condition. It returns only series whose p95 latency exceeds 2 seconds.
- `> 2`
Defines the threshold condition. It returns only series whose p95 latency exceeds 2 seconds.
Alternatively, you can set this threshold as a Grafana Alerting expression in the UI, as shown in the following screenshot.
{{< figure src="/media/docs/alerting/trace-based-alertrule-screenshot.png" max-width="750px" caption="Alert rule querying span metrics and using threshold expression" >}}
@ -200,26 +200,26 @@ The following query calculates the fraction of failed server spans for each serv
Heres the query breakdown
- `traces_span_metrics_calls_total`
A counter metric produced from spans that tracks the number of completed span operations.
- `traces_span_metrics_calls_total`
A counter metric produced from spans that tracks the number of completed span operations.
- `span_kind="SPAN_KIND_SERVER"` selects spans handling inbound requests.
- `status_code="STATUS_CODE_ERROR"` selects only spans that ended in error.
- Omitting the `status_code` filter in the denominator includes all spans, returning the total span count.
_Check whether your metric generator instead creates the `traces_spanmetrics_calls_total` metric, and adjust the metric name._
- `rate(...[10m])`
Converts the cumulative histogram into a per-second histogram over the last 10 minutes (the distribution of spans per second during that period).
- `rate(...[10m])`
Converts the cumulative histogram into a per-second histogram over the last 10 minutes (the distribution of spans per second during that period).
This makes the time window explicit and ensures counters can be calculated over the last 10 minutes.
- `sum by (service, span_name)( … )`
Aggregates per service and operation, creating one alert instance for each `(service, span_name)` combination.
- `sum by (service, span_name)( … )`
Aggregates per service and operation, creating one alert instance for each `(service, span_name)` combination.
This is a [multidimensional alert](https://grafana.com/docs/grafana/latest/alerting/best-practices/multi-dimensional-alerts/) that applies to all services, helping identify which service and corresponding operation is failing.
- `sum by () (...) / sum by () (...)`
Divides failed spans by total spans to calculate the error rate per operation.
The result is a ratio between `0` and `1,` where `1` means all operations failed.
- `sum by () (...) / sum by () (...)`
Divides failed spans by total spans to calculate the error rate per operation.
The result is a ratio between `0` and `1,` where `1` means all operations failed.
The query runs as an **instant Prometheus query**, returning a single value for the 10-minute window.
- `> 0.2`
Defines the threshold condition. It returns only series whose error rate is higher than 20% of spans.
- `> 0.2`
Defines the threshold condition. It returns only series whose error rate is higher than 20% of spans.
Alternatively, you can set this threshold as a Grafana Alerting expression in the UI.
### Enable traffic guardrails
@ -295,25 +295,25 @@ With **head sampling**, alerting on span metrics should be done with caution, si
With **tail sampling**, its important to generate span metrics before a sampling decision is made. [Grafana Cloud Adaptive Traces](https://grafana.com/docs/grafana-cloud/adaptive-telemetry/adaptive-traces/) handle this automatically. With Alloy or the OpenTelemetry Collector, make sure the SpanMetrics connector runs before the filtering or [tail sampling processor](https://grafana.com/docs/alloy/latest/reference/components/otelcol/otelcol.processor.tail_sampling/).
## **Using TraceQL (experimental)**
## Using TraceQL
**TraceQL** is a query language for searching and filtering traces in **Grafana Tempo**, which uses a syntax similar to `PromQL` and `LogQL`.
TraceQL is a query language for searching and filtering traces in Grafana Tempo, which uses a syntax similar to `PromQL` and `LogQL`.
With TraceQL, you can skip converting tracing data into span metrics and query raw trace data directly. It provides a more flexible filtering based on the trace structure, attributes, or resource metadata, and can detect issues faster as it does not wait for metric generation.
However, keep in mind that TraceQL is not suitable for all scenarios. For example:
TraceQL isn't suitable for all scenarios. For example:
- **Inadequate for long-term analysis**
- **Inadequate for long-term analysis**
Trace data has a significantly shorter retention period than metrics. For historical monitoring, its recommended to convert key tracing data into metrics to ensure the persistence of important data.
- **Inadequate for alerting after sampling**
- **Inadequate for alerting after sampling**
TraceQL can only query traces that are actually stored in Tempo. If sampling drops a large portion of traces, TraceQL-based alerts may miss real issues. Refer to [consider sampling](#consider-sampling) for guidance on how to generate span metrics before sampling.
{{< admonition type="caution" >}}
TraceQL alerting is available in Grafana v12.1 or higher, supported as an [experimental feature](https://grafana.com/docs/release-life-cycle/).
Engineering and on-call support is not available. Documentation is either limited or not provided outside of code comments. No SLA is provided.
TraceQL alerting is available in Grafana v12.1 or higher, supported as an [experimental feature](https://grafana.com/docs/release-life-cycle/).
Engineering and on-call support isn't available. Documentation is either limited or not provided outside of code comments. No SLA is provided.
While TraceQL can be powerful for exploring and detecting issues directly from trace data, **alerting with TraceQL should not be used in production environments yet**. Use it for testing and experimentation at this moment.
While TraceQL can be powerful for exploring and detecting issues directly from trace data, **alerting with TraceQL shouldn't be used in production environments yet**. Use it for testing and experimentation at this moment.
{{< /admonition >}}
@ -324,6 +324,8 @@ Follow these steps to create the alert:
1. Enable TraceQL alerting
To use TraceQL in alerts, you must enable the [**`tempoAlerting`** feature flag in your Grafana configuration](https://grafana.com/docs/grafana/latest/setup-grafana/configure-grafana/#feature_toggles).
If you use Grafana Cloud, contact Support to enable TraceQL alerting.
2. Configure the alert query
In your alert rule, select the **Tempo** data source, then convert the original PromQL query into the equivalent TraceQL query: