mirror of
https://github.com/prometheus/prometheus.git
synced 2026-02-03 20:39:32 -05:00
docs: Improve semconv translation demo with migration scenario
Enhance the demo to better illustrate the value of __semconv_url__ for maintaining query continuity during OTLP translation strategy migrations: - Use same instance name (myapp:8080) for both phases to emphasize it's the same producer that migrated, not two different sources - Add side-by-side query comparison showing [PARTIAL] vs [FULL] time coverage to clearly demonstrate broken vs working queries - Add sum() aggregation example showing continuous results across the naming boundary - Add rate() example demonstrating counter rate calculations work seamlessly across the migration point - Update README with improved narrative focusing on migration scenario - Update demo.sh output to highlight the problem/solution structure The demo now runs 6 phases: data population (before/after migration), problem demonstration, solution with __semconv_url__, and aggregation/rate examples.
This commit is contained in:
parent
353cefb5a4
commit
2157f72166
3 changed files with 274 additions and 110 deletions
|
|
@ -1,21 +1,30 @@
|
|||
# Semantic Conventions Translation Demo
|
||||
|
||||
This self-contained demo showcases Prometheus's ability to query metrics using OpenTelemetry semantic conventions, automatically matching metrics that were written with different OTLP translation strategies.
|
||||
This demo showcases Prometheus's ability to maintain query continuity when migrating between OTLP translation strategies. It simulates a real-world scenario where a producer changes from `UnderscoreEscapingWithSuffixes` to `NoTranslation` (native OTel naming).
|
||||
|
||||
## The Problem
|
||||
|
||||
When OTLP metrics are written to Prometheus, different translation strategies produce different metric and label names:
|
||||
When you migrate an OTLP producer to a new translation strategy, the metric and label names change:
|
||||
|
||||
| Strategy | Metric Name | Label Name |
|
||||
|----------|-------------|------------|
|
||||
| `UnderscoreEscapingWithSuffixes` | `test_bytes_total` | `http_response_status_code` |
|
||||
| `NoTranslation` | `test` | `http.response.status_code` |
|
||||
| `UnderscoreEscapingWithSuffixes` (before) | `test_bytes_total` | `http_response_status_code` |
|
||||
| `NoTranslation` (after) | `test` | `http.response.status_code` |
|
||||
|
||||
If your infrastructure changes translation strategies over time (e.g., migrating to UTF-8 support), or you have different producers using different strategies, you end up with the same logical metric stored under different names. Traditional queries would miss data from one strategy or the other.
|
||||
**This breaks existing queries and dashboards:**
|
||||
- Queries for `test_bytes_total` only return historical data (before migration)
|
||||
- Queries for `test` only return new data (after migration)
|
||||
- Dashboards show gaps at the migration point
|
||||
|
||||
## The Solution
|
||||
|
||||
Using the `__semconv_url__` selector, Prometheus automatically generates query variants for all OTLP translation strategies, merging results into a unified series with consistent naming.
|
||||
Using the `__semconv_url__` selector, Prometheus automatically generates query variants for all OTLP translation strategies, providing seamless data continuity across the migration:
|
||||
|
||||
```promql
|
||||
test{__semconv_url__="registry/1.1.0"}
|
||||
```
|
||||
|
||||
This single query matches both naming conventions and returns all data with unified OTel semantic convention names.
|
||||
|
||||
## Running the Demo
|
||||
|
||||
|
|
@ -25,7 +34,13 @@ Using the `__semconv_url__` selector, Prometheus automatically generates query v
|
|||
go run ./documentation/examples/semconv-translation
|
||||
```
|
||||
|
||||
This runs three phases automatically and prints results to the terminal.
|
||||
This runs six phases and demonstrates:
|
||||
1. **Before migration**: Writing metrics with `UnderscoreEscapingWithSuffixes`
|
||||
2. **After migration**: Writing metrics with `NoTranslation`
|
||||
3. **The problem**: Showing how queries only return partial data
|
||||
4. **The solution**: Using `__semconv_url__` for full data coverage
|
||||
5. **Aggregation**: `sum()` works across the migration boundary
|
||||
6. **Rate calculation**: `rate()` works across the migration boundary
|
||||
|
||||
### Option 2: Browser Demo with Prometheus UI
|
||||
|
||||
|
|
@ -39,10 +54,16 @@ This script:
|
|||
2. Launches Prometheus serving that data
|
||||
3. Opens your browser to the Prometheus UI with the demo query
|
||||
|
||||
You can then try these queries in the UI:
|
||||
- `test_bytes_total` - Shows only escaped metrics (old naming, 2h-1h ago)
|
||||
- `test` - Shows only native metrics (new naming, 1h ago-now)
|
||||
- `test{__semconv_url__="registry/1.1.0"}` - Shows **BOTH** merged!
|
||||
Then try these queries in the UI:
|
||||
|
||||
**The Problem - Queries break after migration:**
|
||||
- `test_bytes_total` — Only shows old data (before migration)
|
||||
- `test` — Only shows new data (after migration)
|
||||
|
||||
**The Solution - Seamless query continuity:**
|
||||
- `test{__semconv_url__="registry/1.1.0"}` — Shows ALL data with unified naming
|
||||
- `sum(test{__semconv_url__="registry/1.1.0"})` — Aggregation works across boundary
|
||||
- `rate(test{__semconv_url__="registry/1.1.0"}[5m])` — Rate calculation works too
|
||||
|
||||
### Option 3: Manual Browser Demo
|
||||
|
||||
|
|
@ -61,34 +82,38 @@ go run ./documentation/examples/semconv-translation \
|
|||
# Open http://localhost:9090 in your browser
|
||||
```
|
||||
|
||||
## Demo Phases
|
||||
## Demo Scenario
|
||||
|
||||
### Phase 1: Write metrics with UnderscoreEscapingWithSuffixes
|
||||
The demo simulates a single producer (`myapp:8080`) that changes OTLP translation strategy:
|
||||
|
||||
Simulates an OTLP producer using traditional Prometheus naming (2 hours ago to 1 hour ago):
|
||||
### Phase 1 & 2: Data Population
|
||||
|
||||
**Before migration (2 hours ago to 1 hour ago):**
|
||||
```
|
||||
test_bytes_total{http_response_status_code="200", instance="producer-escaped", tenant="alice"}
|
||||
test_bytes_total{http_response_status_code="200", instance="myapp:8080"}
|
||||
test_bytes_total{http_response_status_code="404", instance="myapp:8080"}
|
||||
```
|
||||
|
||||
### Phase 2: Write metrics with NoTranslation
|
||||
|
||||
Simulates an OTLP producer using native OTel UTF-8 naming (1 hour ago to now):
|
||||
**After migration (1 hour ago to now):**
|
||||
```
|
||||
test{http.response.status_code="200", instance="producer-native", tenant="alice"}
|
||||
test{http.response.status_code="200", instance="myapp:8080"}
|
||||
test{http.response.status_code="404", instance="myapp:8080"}
|
||||
```
|
||||
|
||||
### Phase 3: Query with __semconv_url__
|
||||
### Phase 3: The Problem
|
||||
|
||||
Uses the `__semconv_url__` selector to query both naming conventions:
|
||||
```promql
|
||||
test{__semconv_url__="registry/1.1.0"}
|
||||
```
|
||||
Querying `test_bytes_total` returns `[PARTIAL]` — only pre-migration data.
|
||||
Querying `test` returns `[PARTIAL]` — only post-migration data.
|
||||
|
||||
This query automatically matches both:
|
||||
- `test_bytes_total{http_response_status_code="200"}` (from Phase 1)
|
||||
- `test{http.response.status_code="200"}` (from Phase 2)
|
||||
Neither query alone shows the complete picture!
|
||||
|
||||
The results are merged and returned with consistent OTel semantic naming.
|
||||
### Phase 4: The Solution
|
||||
|
||||
Querying `test{__semconv_url__="registry/1.1.0"}` returns `[FULL]` — all data spanning the migration point, with unified OTel naming.
|
||||
|
||||
### Phase 5 & 6: Aggregation and Rate
|
||||
|
||||
Both `sum()` and `rate()` calculations work seamlessly across the naming boundary, producing continuous results.
|
||||
|
||||
## Command-Line Flags
|
||||
|
||||
|
|
@ -97,9 +122,26 @@ The results are merged and returned with consistent OTel semantic naming.
|
|||
| `--data-dir=PATH` | Save TSDB to specified directory (default: temp, deleted on exit) |
|
||||
| `--populate-only` | Only populate data, skip query phase (for use with browser demo) |
|
||||
|
||||
## OTLP Translation Strategies
|
||||
## How It Works
|
||||
|
||||
Prometheus supports these OTLP translation strategies:
|
||||
1. The `__semconv_url__` parameter points to a semantic conventions file in the embedded registry
|
||||
2. Prometheus loads the metric definition (name, unit, type, attributes)
|
||||
3. For each query, Prometheus generates variants for all OTLP translation strategies:
|
||||
- `UnderscoreEscapingWithSuffixes` → `test_bytes_total{http_response_status_code=...}`
|
||||
- `UnderscoreEscapingWithoutSuffixes` → `test_bytes{http_response_status_code=...}`
|
||||
- `NoUTF8EscapingWithSuffixes` → `test_total{http.response.status_code=...}`
|
||||
- `NoTranslation` → `test{http.response.status_code=...}`
|
||||
4. Results from all variants are merged with canonical OTel naming
|
||||
|
||||
## Key Benefits
|
||||
|
||||
- **Dashboard continuity**: Existing dashboards keep working after migration
|
||||
- **No data gaps**: All historical and new data accessible with one query
|
||||
- **Aggregation support**: `sum()`, `avg()`, etc. work across naming boundaries
|
||||
- **Rate continuity**: `rate()` and `increase()` produce continuous results
|
||||
- **Canonical naming**: Results use standard OTel semantic convention names
|
||||
|
||||
## OTLP Translation Strategies
|
||||
|
||||
| Strategy | Description | Metric Example | Label Example |
|
||||
|----------|-------------|----------------|---------------|
|
||||
|
|
@ -107,14 +149,3 @@ Prometheus supports these OTLP translation strategies:
|
|||
| `UnderscoreEscapingWithoutSuffixes` | Underscores without unit/type suffixes | `test_bytes` | `http_response_status_code` |
|
||||
| `NoUTF8EscapingWithSuffixes` | Dots in labels preserved, adds suffixes | `test_total` | `http.response.status_code` |
|
||||
| `NoTranslation` | Pure OTel naming (requires UTF-8 support) | `test` | `http.response.status_code` |
|
||||
|
||||
When querying with `__semconv_url__`, Prometheus generates variants for all strategies to ensure complete data retrieval regardless of how the data was originally written.
|
||||
|
||||
## How It Works
|
||||
|
||||
1. The demo creates a TSDB instance (temp or specified directory)
|
||||
2. Writes metrics at different timestamps using different naming conventions
|
||||
3. When querying with `__semconv_url__`:
|
||||
- Prometheus loads semantic conventions from the embedded registry
|
||||
- Generates matcher variants for all OTLP translation strategies
|
||||
- Merges results with canonical OTel naming
|
||||
|
|
|
|||
|
|
@ -95,10 +95,14 @@ trap cleanup EXIT
|
|||
echo -e "${BOLD}${GREEN}Starting Prometheus server...${NC}"
|
||||
echo -e " Web UI: ${CYAN}http://localhost:${PORT}${NC}\n"
|
||||
|
||||
echo -e "${BOLD}Try these queries in the browser:${NC}"
|
||||
echo -e " 1. ${CYAN}test_bytes_total${NC} - Shows only escaped metrics (old naming)"
|
||||
echo -e " 2. ${CYAN}test${NC} - Shows only native metrics (new naming)"
|
||||
echo -e " 3. ${CYAN}test{__semconv_url__=\"registry/1.1.0\"}${NC} - Shows BOTH merged!\n"
|
||||
echo -e "${BOLD}Try these queries in the browser:${NC}\n"
|
||||
echo -e " ${YELLOW}The Problem - Queries break after migration:${NC}"
|
||||
echo -e " ${CYAN}test_bytes_total${NC} - Only old data (before migration)"
|
||||
echo -e " ${CYAN}test${NC} - Only new data (after migration)\n"
|
||||
echo -e " ${GREEN}The Solution - Seamless query continuity:${NC}"
|
||||
echo -e " ${CYAN}test{__semconv_url__=\"registry/1.1.0\"}${NC} - ALL data, unified naming!"
|
||||
echo -e " ${CYAN}sum(test{__semconv_url__=\"registry/1.1.0\"})${NC} - Aggregation works"
|
||||
echo -e " ${CYAN}rate(test{__semconv_url__=\"registry/1.1.0\"}[5m])${NC} - Rate works too!\n"
|
||||
|
||||
echo -e "${YELLOW}Press Ctrl+C to stop the demo.${NC}\n"
|
||||
|
||||
|
|
|
|||
|
|
@ -12,8 +12,13 @@
|
|||
// limitations under the License.
|
||||
|
||||
// This demo showcases OTel semantic conventions translation in Prometheus.
|
||||
// It writes metrics using different OTLP translation strategies (naming conventions)
|
||||
// and demonstrates how __semconv_url__ queries can merge series with different names.
|
||||
// It simulates a migration scenario where a producer changes OTLP translation
|
||||
// strategy from UnderscoreEscapingWithSuffixes to NoTranslation (native OTel names).
|
||||
//
|
||||
// The demo shows:
|
||||
// - How queries break after migration (old metric name returns no new data)
|
||||
// - How __semconv_url__ provides query continuity across the migration
|
||||
// - How aggregations and rate() work seamlessly across naming boundaries
|
||||
//
|
||||
// Run with: go run ./documentation/examples/semconv-translation
|
||||
//
|
||||
|
|
@ -28,12 +33,14 @@ import (
|
|||
"flag"
|
||||
"fmt"
|
||||
"os"
|
||||
"strings"
|
||||
"time"
|
||||
|
||||
"github.com/prometheus/common/promslog"
|
||||
|
||||
"github.com/prometheus/prometheus/model/labels"
|
||||
"github.com/prometheus/prometheus/promql"
|
||||
"github.com/prometheus/prometheus/storage"
|
||||
"github.com/prometheus/prometheus/storage/semconv"
|
||||
"github.com/prometheus/prometheus/tsdb"
|
||||
)
|
||||
|
|
@ -96,15 +103,16 @@ func main() {
|
|||
}
|
||||
defer db.Close()
|
||||
|
||||
// For browser demo, spread data over time so graphs look interesting.
|
||||
// Phase 1 data: 2 hours ago to 1 hour ago (simulating "old" producer).
|
||||
// Phase 2 data: 1 hour ago to now (simulating "new" producer after migration).
|
||||
// Simulate a migration scenario:
|
||||
// - Phase 1 (2h ago to 1h ago): Producer uses UnderscoreEscapingWithSuffixes (old strategy)
|
||||
// - Phase 2 (1h ago to now): Same producer migrates to NoTranslation (new strategy)
|
||||
// This represents a real-world scenario where you upgrade your OTLP configuration.
|
||||
now := time.Now()
|
||||
|
||||
// ===== Phase 1: Write metrics with UnderscoreEscapingWithSuffixes =====
|
||||
printPhase(1, "Writing metrics with UnderscoreEscapingWithSuffixes strategy")
|
||||
printPhase(1, "Before migration: UnderscoreEscapingWithSuffixes")
|
||||
|
||||
fmt.Printf("This simulates an OTLP producer using traditional Prometheus naming.\n")
|
||||
fmt.Printf("Simulating a producer BEFORE migration to native OTel naming.\n")
|
||||
fmt.Printf("The metric 'test' (unit: By, type: counter) becomes 'test_bytes_total'.\n")
|
||||
fmt.Printf("The attribute 'http.response.status_code' becomes 'http_response_status_code'.\n")
|
||||
fmt.Printf("Writing samples from 2 hours ago to 1 hour ago...\n\n")
|
||||
|
|
@ -120,11 +128,11 @@ func main() {
|
|||
for t := startTime; t.Before(endTime); t = t.Add(interval) {
|
||||
value += 10 // Counter increases over time.
|
||||
|
||||
// Same producer ("myapp") - this is the key point: same source, different naming over time.
|
||||
lbls := labels.FromStrings(
|
||||
"__name__", "test_bytes_total",
|
||||
"http_response_status_code", "200",
|
||||
"tenant", "alice",
|
||||
"instance", "producer-escaped",
|
||||
"instance", "myapp:8080",
|
||||
)
|
||||
_, err = app.Append(0, lbls, t.UnixMilli(), value)
|
||||
if err != nil {
|
||||
|
|
@ -136,8 +144,7 @@ func main() {
|
|||
lbls404 := labels.FromStrings(
|
||||
"__name__", "test_bytes_total",
|
||||
"http_response_status_code", "404",
|
||||
"tenant", "bob",
|
||||
"instance", "producer-escaped",
|
||||
"instance", "myapp:8080",
|
||||
)
|
||||
_, err = app.Append(0, lbls404, t.UnixMilli(), value*0.1)
|
||||
if err != nil {
|
||||
|
|
@ -151,14 +158,14 @@ func main() {
|
|||
os.Exit(1)
|
||||
}
|
||||
|
||||
fmt.Printf(" %s[Written]%s %d samples for test_bytes_total{http_response_status_code=\"200\"}\n", colorGreen, colorReset, int(time.Hour/interval))
|
||||
fmt.Printf(" %s[Written]%s %d samples for test_bytes_total{http_response_status_code=\"404\"}\n\n", colorGreen, colorReset, int(time.Hour/interval))
|
||||
fmt.Printf(" %s[Written]%s %d samples for test_bytes_total{http_response_status_code=\"200\", instance=\"myapp:8080\"}\n", colorGreen, colorReset, int(time.Hour/interval))
|
||||
fmt.Printf(" %s[Written]%s %d samples for test_bytes_total{http_response_status_code=\"404\", instance=\"myapp:8080\"}\n\n", colorGreen, colorReset, int(time.Hour/interval))
|
||||
|
||||
// ===== Phase 2: Write metrics with NoTranslation =====
|
||||
printPhase(2, "Writing metrics with NoTranslation strategy")
|
||||
printPhase(2, "After migration: NoTranslation (native OTel naming)")
|
||||
|
||||
fmt.Printf("This simulates an OTLP producer using native OTel UTF-8 naming.\n")
|
||||
fmt.Printf("The metric 'test' stays as 'test' (no unit suffix).\n")
|
||||
fmt.Printf("Simulating the SAME producer AFTER migration to native OTel naming.\n")
|
||||
fmt.Printf("The metric 'test' stays as 'test' (no unit/type suffix).\n")
|
||||
fmt.Printf("The attribute 'http.response.status_code' preserves its dots.\n")
|
||||
fmt.Printf("Writing samples from 1 hour ago to now...\n\n")
|
||||
|
||||
|
|
@ -171,11 +178,11 @@ func main() {
|
|||
for t := startTime; t.Before(endTime); t = t.Add(interval) {
|
||||
value += 10
|
||||
|
||||
// Same producer ("myapp"), same logical series - just different naming after migration.
|
||||
lbls := labels.FromStrings(
|
||||
"__name__", "test",
|
||||
"http.response.status_code", "200",
|
||||
"tenant", "alice",
|
||||
"instance", "producer-native",
|
||||
"instance", "myapp:8080",
|
||||
)
|
||||
_, err = app.Append(0, lbls, t.UnixMilli(), value)
|
||||
if err != nil {
|
||||
|
|
@ -183,14 +190,13 @@ func main() {
|
|||
os.Exit(1)
|
||||
}
|
||||
|
||||
// Also write a 500 series.
|
||||
lbls500 := labels.FromStrings(
|
||||
// Same 404 series, continuing after migration.
|
||||
lbls404 := labels.FromStrings(
|
||||
"__name__", "test",
|
||||
"http.response.status_code", "500",
|
||||
"tenant", "charlie",
|
||||
"instance", "producer-native",
|
||||
"http.response.status_code", "404",
|
||||
"instance", "myapp:8080",
|
||||
)
|
||||
_, err = app.Append(0, lbls500, t.UnixMilli(), value*0.05)
|
||||
_, err = app.Append(0, lbls404, t.UnixMilli(), value*0.1)
|
||||
if err != nil {
|
||||
fmt.Printf("Failed to append: %v\n", err)
|
||||
os.Exit(1)
|
||||
|
|
@ -202,29 +208,28 @@ func main() {
|
|||
os.Exit(1)
|
||||
}
|
||||
|
||||
fmt.Printf(" %s[Written]%s %d samples for test{http.response.status_code=\"200\"}\n", colorGreen, colorReset, int(time.Hour/interval))
|
||||
fmt.Printf(" %s[Written]%s %d samples for test{http.response.status_code=\"500\"}\n\n", colorGreen, colorReset, int(time.Hour/interval))
|
||||
fmt.Printf(" %s[Written]%s %d samples for test{http.response.status_code=\"200\", instance=\"myapp:8080\"}\n", colorGreen, colorReset, int(time.Hour/interval))
|
||||
fmt.Printf(" %s[Written]%s %d samples for test{http.response.status_code=\"404\", instance=\"myapp:8080\"}\n\n", colorGreen, colorReset, int(time.Hour/interval))
|
||||
|
||||
// If populate-only mode, exit here.
|
||||
if *populateOnly {
|
||||
fmt.Printf("%s%s--- Data population complete ---%s\n\n", colorBold, colorGreen, colorReset)
|
||||
fmt.Printf("TSDB data written to: %s%s%s\n\n", colorYellow, tsdbDir, colorReset)
|
||||
fmt.Printf("To query this data in the browser, run:\n")
|
||||
fmt.Printf(" %s./prometheus --storage.tsdb.path=%s --config.file=/dev/null%s\n\n", colorCyan, tsdbDir, colorReset)
|
||||
fmt.Printf("Then open http://localhost:9090 and try these queries:\n")
|
||||
fmt.Printf(" 1. %stest_bytes_total%s # Shows only escaped metrics\n", colorMagenta, colorReset)
|
||||
fmt.Printf(" 2. %stest%s # Shows only native metrics\n", colorMagenta, colorReset)
|
||||
fmt.Printf(" 3. %stest{__semconv_url__=\"registry/1.1.0\"}%s # Shows BOTH merged!\n\n", colorMagenta, colorReset)
|
||||
fmt.Printf(" %s./prometheus --storage.tsdb.path=%s --config.file=/dev/null --enable-feature=semconv-versioned-read%s\n\n", colorCyan, tsdbDir, colorReset)
|
||||
fmt.Printf("Then open http://localhost:9090 and try these queries:\n\n")
|
||||
fmt.Printf(" %sThe Problem - Queries break after migration:%s\n", colorYellow, colorReset)
|
||||
fmt.Printf(" %stest_bytes_total%s # Only old data (before migration)\n", colorMagenta, colorReset)
|
||||
fmt.Printf(" %stest%s # Only new data (after migration)\n\n", colorMagenta, colorReset)
|
||||
fmt.Printf(" %sThe Solution - Seamless query continuity:%s\n", colorGreen, colorReset)
|
||||
fmt.Printf(" %stest{__semconv_url__=\"registry/1.1.0\"}%s # ALL data, unified naming!\n", colorMagenta, colorReset)
|
||||
fmt.Printf(" %ssum(test{__semconv_url__=\"registry/1.1.0\"})%s # Aggregation works across boundary\n", colorMagenta, colorReset)
|
||||
fmt.Printf(" %srate(test{__semconv_url__=\"registry/1.1.0\"}[5m])%s # Rate works too!\n\n", colorMagenta, colorReset)
|
||||
return
|
||||
}
|
||||
|
||||
// ===== Phase 3: Query with __semconv_url__ =====
|
||||
printPhase(3, "Querying with __semconv_url__ for unified view")
|
||||
|
||||
fmt.Printf("The __semconv_url__ parameter tells Prometheus to:\n")
|
||||
fmt.Printf(" 1. Load the semantic conventions from the embedded registry\n")
|
||||
fmt.Printf(" 2. Generate variant matchers for different naming conventions\n")
|
||||
fmt.Printf(" 3. Merge results into canonical OTel names\n\n")
|
||||
// ===== Phase 3: Demonstrate the problem - broken queries after migration =====
|
||||
printPhase(3, "The Problem: Queries break after migration")
|
||||
|
||||
// Create semconv-aware storage wrapper.
|
||||
semconvStorage := semconv.AwareStorage(db)
|
||||
|
|
@ -236,41 +241,165 @@ func main() {
|
|||
Timeout: time.Minute,
|
||||
}
|
||||
engine := promql.NewEngine(opts)
|
||||
|
||||
// Query using __semconv_url__ to merge both naming conventions.
|
||||
// The registry/1.1.0 file defines the "test" metric with its attributes.
|
||||
query := `test{__semconv_url__="registry/1.1.0"}`
|
||||
fmt.Printf("Query: %s%s%s\n\n", colorMagenta, query, colorReset)
|
||||
|
||||
ctx := context.Background()
|
||||
q, err := engine.NewInstantQuery(ctx, semconvStorage, nil, query, now)
|
||||
if err != nil {
|
||||
fmt.Printf("Failed to create query: %v\n", err)
|
||||
os.Exit(1)
|
||||
}
|
||||
|
||||
result := q.Exec(ctx)
|
||||
if result.Err != nil {
|
||||
fmt.Printf("Query failed: %v\n", result.Err)
|
||||
os.Exit(1)
|
||||
}
|
||||
fmt.Printf("After migrating to native OTel naming, queries using old names miss new data:\n\n")
|
||||
|
||||
fmt.Printf("%sResults:%s\n", colorBold, colorReset)
|
||||
fmt.Printf("%s\n", result.Value.String())
|
||||
// Query 1: Old metric name - only returns pre-migration data
|
||||
runRangeQueryWithDetails(engine, db, ctx, now, "test_bytes_total",
|
||||
"Old metric name - only has data from BEFORE migration")
|
||||
|
||||
// Query 2: New metric name - only returns post-migration data
|
||||
runRangeQueryWithDetails(engine, db, ctx, now, "test",
|
||||
"New metric name - only has data from AFTER migration")
|
||||
|
||||
fmt.Printf(" %s=> Neither query alone shows the complete picture!%s\n\n", colorYellow, colorReset)
|
||||
|
||||
// ===== Phase 4: The solution - semconv-aware queries =====
|
||||
printPhase(4, "The Solution: __semconv_url__ for query continuity")
|
||||
|
||||
fmt.Printf("The __semconv_url__ parameter tells Prometheus to:\n")
|
||||
fmt.Printf(" 1. Load semantic conventions from the embedded registry\n")
|
||||
fmt.Printf(" 2. Generate query variants for all OTLP naming strategies\n")
|
||||
fmt.Printf(" 3. Merge results with canonical OTel names\n\n")
|
||||
|
||||
// Query 3: Semconv-aware query - returns ALL data with unified naming
|
||||
runRangeQueryWithDetails(engine, semconvStorage, ctx, now, `test{__semconv_url__="registry/1.1.0"}`,
|
||||
"Semconv query - returns ALL data with unified OTel naming!")
|
||||
|
||||
fmt.Printf(" %s=> Complete data coverage across the migration boundary!%s\n\n", colorGreen, colorReset)
|
||||
|
||||
// ===== Phase 5: Aggregation across the migration boundary =====
|
||||
printPhase(5, "Aggregation works across the migration boundary")
|
||||
|
||||
fmt.Printf("Aggregations like sum() work seamlessly across different naming conventions:\n\n")
|
||||
|
||||
runRangeQuery(engine, semconvStorage, ctx, now, `sum(test{__semconv_url__="registry/1.1.0"})`,
|
||||
"sum() aggregates data from both naming conventions into one continuous series")
|
||||
|
||||
// ===== Phase 6: Rate calculation across the migration boundary =====
|
||||
printPhase(6, "Counter rates work across the migration boundary")
|
||||
|
||||
fmt.Printf("Even rate() calculations work across the naming change:\n\n")
|
||||
|
||||
runRangeQuery(engine, semconvStorage, ctx, now, `sum(rate(test{__semconv_url__="registry/1.1.0"}[5m]))`,
|
||||
"rate() computes correctly across the migration point")
|
||||
|
||||
// ===== Summary =====
|
||||
fmt.Printf("\n%s%s--- Summary ---%s\n\n", colorBold, colorGreen, colorReset)
|
||||
fmt.Printf("The demo showed how Prometheus can unify metrics from different OTLP producers:\n\n")
|
||||
fmt.Printf(" %s*%s Producer A used UnderscoreEscapingWithSuffixes:\n", colorCyan, colorReset)
|
||||
fmt.Printf(" test_bytes_total{http_response_status_code=\"200\"}\n\n")
|
||||
fmt.Printf(" %s*%s Producer B used NoTranslation:\n", colorCyan, colorReset)
|
||||
fmt.Printf(" test{http.response.status_code=\"200\"}\n\n")
|
||||
fmt.Printf(" %s*%s Query with __semconv_url__ merged both into canonical OTel names:\n", colorCyan, colorReset)
|
||||
fmt.Printf(" test{http.response.status_code=\"200\"}\n\n")
|
||||
fmt.Printf("This enables querying across heterogeneous OTLP producers using\n")
|
||||
fmt.Printf("a single semantic conventions-aware query.\n\n")
|
||||
fmt.Printf("This demo simulated a migration scenario where a producer (myapp:8080)\n")
|
||||
fmt.Printf("changed OTLP translation strategy:\n\n")
|
||||
fmt.Printf(" %s*%s Before migration (2h-1h ago): UnderscoreEscapingWithSuffixes\n", colorCyan, colorReset)
|
||||
fmt.Printf(" test_bytes_total{http_response_status_code=\"200\", instance=\"myapp:8080\"}\n\n")
|
||||
fmt.Printf(" %s*%s After migration (1h ago-now): NoTranslation (native OTel)\n", colorCyan, colorReset)
|
||||
fmt.Printf(" test{http.response.status_code=\"200\", instance=\"myapp:8080\"}\n\n")
|
||||
fmt.Printf(" %s*%s Without __semconv_url__: Queries break, dashboards show gaps\n", colorYellow, colorReset)
|
||||
fmt.Printf(" %s*%s With __semconv_url__: Seamless continuity, all data accessible\n\n", colorGreen, colorReset)
|
||||
fmt.Printf("Key benefits:\n")
|
||||
fmt.Printf(" - Existing dashboards keep working after migration\n")
|
||||
fmt.Printf(" - Aggregations (sum, avg, etc.) work across naming boundaries\n")
|
||||
fmt.Printf(" - Rate calculations produce continuous results\n")
|
||||
fmt.Printf(" - Results use canonical OTel semantic convention names\n\n")
|
||||
}
|
||||
|
||||
func printPhase(n int, description string) {
|
||||
fmt.Printf("%s%s--- Phase %d: %s ---%s\n\n", colorBold, colorYellow, n, description, colorReset)
|
||||
}
|
||||
|
||||
// runRangeQuery executes a range query over the last 2.5 hours and displays results.
|
||||
func runRangeQuery(engine *promql.Engine, storage storage.Queryable, ctx context.Context, now time.Time, query, description string) {
|
||||
fmt.Printf(" Query: %s%s%s\n", colorMagenta, query, colorReset)
|
||||
fmt.Printf(" %s\n", description)
|
||||
|
||||
// Query over the full time range (2.5h to capture data from both phases).
|
||||
start := now.Add(-150 * time.Minute)
|
||||
end := now
|
||||
step := 5 * time.Minute
|
||||
|
||||
q, err := engine.NewRangeQuery(ctx, storage, nil, query, start, end, step)
|
||||
if err != nil {
|
||||
fmt.Printf(" %s[Error]%s Failed to create query: %v\n\n", colorYellow, colorReset, err)
|
||||
return
|
||||
}
|
||||
|
||||
result := q.Exec(ctx)
|
||||
if result.Err != nil {
|
||||
fmt.Printf(" %s[Error]%s Query failed: %v\n\n", colorYellow, colorReset, result.Err)
|
||||
return
|
||||
}
|
||||
|
||||
// Count data points.
|
||||
resultStr := result.Value.String()
|
||||
if resultStr == "" || resultStr == "{}" {
|
||||
fmt.Printf(" %s[Result]%s No data returned\n\n", colorYellow, colorReset)
|
||||
} else {
|
||||
// Count total data points across all series.
|
||||
pointCount := strings.Count(resultStr, "@")
|
||||
fmt.Printf(" %s[Result]%s %d data points spanning the full 2.5h range\n", colorGreen, colorReset, pointCount)
|
||||
fmt.Printf(" (covering both pre-migration and post-migration data)\n\n")
|
||||
}
|
||||
}
|
||||
|
||||
// runRangeQueryWithDetails executes a range query and shows time coverage details.
|
||||
func runRangeQueryWithDetails(engine *promql.Engine, storage storage.Queryable, ctx context.Context, now time.Time, query, description string) {
|
||||
fmt.Printf(" Query: %s%s%s\n", colorMagenta, query, colorReset)
|
||||
fmt.Printf(" %s\n", description)
|
||||
|
||||
// Query over the full time range (2.5h to capture data from both phases).
|
||||
start := now.Add(-150 * time.Minute)
|
||||
end := now
|
||||
step := 5 * time.Minute
|
||||
|
||||
q, err := engine.NewRangeQuery(ctx, storage, nil, query, start, end, step)
|
||||
if err != nil {
|
||||
fmt.Printf(" %s[Error]%s Failed to create query: %v\n\n", colorYellow, colorReset, err)
|
||||
return
|
||||
}
|
||||
|
||||
result := q.Exec(ctx)
|
||||
if result.Err != nil {
|
||||
fmt.Printf(" %s[Error]%s Query failed: %v\n\n", colorYellow, colorReset, result.Err)
|
||||
return
|
||||
}
|
||||
|
||||
// Analyze the matrix result for time coverage.
|
||||
matrix, ok := result.Value.(promql.Matrix)
|
||||
if !ok || len(matrix) == 0 {
|
||||
fmt.Printf(" %s[Result]%s No data returned\n\n", colorYellow, colorReset)
|
||||
return
|
||||
}
|
||||
|
||||
// Find the time range covered by the data.
|
||||
var minTime, maxTime int64
|
||||
totalPoints := 0
|
||||
for _, series := range matrix {
|
||||
for _, point := range series.Floats {
|
||||
if minTime == 0 || point.T < minTime {
|
||||
minTime = point.T
|
||||
}
|
||||
if point.T > maxTime {
|
||||
maxTime = point.T
|
||||
}
|
||||
totalPoints++
|
||||
}
|
||||
}
|
||||
|
||||
// Calculate time coverage.
|
||||
migrationPoint := now.Add(-1 * time.Hour).UnixMilli()
|
||||
minTimeStr := time.UnixMilli(minTime).Format("15:04")
|
||||
maxTimeStr := time.UnixMilli(maxTime).Format("15:04")
|
||||
migrationStr := time.UnixMilli(migrationPoint).Format("15:04")
|
||||
|
||||
// Determine coverage description.
|
||||
var coverage string
|
||||
if minTime < migrationPoint && maxTime > migrationPoint {
|
||||
coverage = fmt.Sprintf("%s[FULL]%s %s to %s (spans migration at %s)", colorGreen, colorReset, minTimeStr, maxTimeStr, migrationStr)
|
||||
} else if maxTime <= migrationPoint {
|
||||
coverage = fmt.Sprintf("%s[PARTIAL]%s %s to %s (only pre-migration, ends before %s)", colorYellow, colorReset, minTimeStr, maxTimeStr, migrationStr)
|
||||
} else {
|
||||
coverage = fmt.Sprintf("%s[PARTIAL]%s %s to %s (only post-migration, starts after %s)", colorYellow, colorReset, minTimeStr, maxTimeStr, migrationStr)
|
||||
}
|
||||
|
||||
fmt.Printf(" [Result] %d series, %d data points\n", len(matrix), totalPoints)
|
||||
fmt.Printf(" Time coverage: %s\n\n", coverage)
|
||||
}
|
||||
|
|
|
|||
Loading…
Reference in a new issue