Replace gopkg.in/yaml.v2 and gopkg.in/yaml.v3 imports with
go.yaml.in/yaml/v2 and go.yaml.in/yaml/v3 respectively.
Signed-off-by: Arve Knudsen <arve.knudsen@gmail.com>
Fix incorrect interpolation when counter resets occur in smoothed range
selector evaluation. Previously, the asymmetric handling of counter
resets (y1=0 on left edge, y2+=y1 on right edge) produced wrong values.
Now uniformly set y1=0 when a counter reset is detected, correctly
modeling the counter as starting from 0 post-reset.
This fixes rate calculations across counter resets. For example,
rate(metric[10s] smoothed) where metric goes from 100 to 10 (a reset)
now correctly computes 0.666... by treating the counter as resetting
to 0 rather than producing inflated values from the old behavior.
Signed-off-by: Julien Pivotto <291750+roidelapluie@users.noreply.github.com>
- Add t.Helper() to getCurrentGaugeValuesFor helper function for better
error attribution in test failures
- Add require.NoError checks for os.WriteFile calls in TestRuntimeGOGCConfig
and TestHeadCompactionWhileScraping to catch file write failures
- Strengthen error handling in TestDocumentation to assert command success
rather than silently continuing on failure
- Improve log message clarity in TestAgentSuccessfulStartup to accurately
describe early exit scenario
These changes improve test reliability and follow Go testing best practices.
Signed-off-by: Abu <abdullahfakrudeen2020@gmail.com>
The tests were flaky because they used hard-coded time.After(550ms)
waits, which had only 50ms margin over WaitForPendingReadersInTimeRange's
500ms poll interval. On slow CI runners, this margin wasn't reliable.
Use synctest for deterministic time control:
- Wrap test logic in synctest.Test() to use fake time
- Use synctest.Wait() to let goroutines reach dormant state
- Use time.Sleep() to advance fake time past the poll interval
- No more timing-dependent assertions
This makes the tests both reliable and ~60x faster (0.05s vs 3s).
Fixes both TestWaitForPendingReadersInTimeRange and
TestWaitForPendingReadersInTimeRange_AppenderV2.
Signed-off-by: Arve Knudsen <arve.knudsen@gmail.com>
* Add OpenAPI 3.2 specification generation for Prometheus HTTP API
This commit introduces an OpenAPI specification for the Prometheus API.
After testing multiple code-generation servers with built-in APIs, this
implementation uses an independent spec file outside of the critical path.
This spec file is tested with a framework present in this pull request.
The specification helps clients know which parameters they can use and is
served at /api/v1/openapi.yaml. The spec file will evolve with the
Prometheus API and has the same version number.
Downstream projects can tune the APIs presented in the spec file with
configuration options using the IncludePaths setting for path filtering.
In the future, there is room to generate a server from this spec file
(e.g. with interfaces), but this is out of scope for this pull request.
Architecture:
- Core OpenAPI infrastructure (openapi.go): Dynamic spec building,
caching, and thread-safe spec generation
- Schema definitions (openapi_schemas.go): Complete type definitions
for all API request and response types
- Path specifications (openapi_paths.go): Endpoint definitions with
parameters, request bodies, and response schemas
- Examples (openapi_examples.go): Realistic request/response examples
- Helper functions (openapi_helpers.go): Reusable builders for common
OpenAPI structures
Testing:
- Comprehensive test suite with golden file validation
- Test helpers package for API testing infrastructure
- OpenAPI compliance validation utilities
The golden file captures the complete specification for snapshot testing.
Update with: go test -run TestOpenAPIGolden -update-openapi-spec
REVIEWERS: The most important thing to check would be the OpenAPI golden
file (web/api/v1/testdata/openapi_golden.yaml). Test scenarios are important
as they test the actual OpenAPI spec validity.
Signed-off-by: Julien Pivotto <291750+roidelapluie@users.noreply.github.com>
* Add OpenAPI 3.1 support with version selection
Add support for both OpenAPI 3.1 and 3.2 specifications with version
selection via openapi_version query parameter. Defaults to 3.1 for
broader compatibility
Signed-off-by: Julien Pivotto <291750+roidelapluie@users.noreply.github.com>
* Enhance OpenAPI examples and add helper functions
- Add timestampExamples helper for consistent time formatting
- Add exampleMap helper to simplify example creation
- Improve example summaries with query details
- Add matrix result example for range vector queries
Signed-off-by: Julien Pivotto <291750+roidelapluie@users.noreply.github.com>
* web/api: Add AtST method to test helper iterators
Implement the AtST() method required by chunkenc.Iterator interface
for FakeSeriesIterator and FakeHistogramSeriesIterator test helpers.
The method returns 0 as these test helpers don't use start timestamps
Signed-off-by: Julien Pivotto <291750+roidelapluie@users.noreply.github.com>
* OpenAPI: Add minimum coverage test
Signed-off-by: Julien Pivotto <291750+roidelapluie@users.noreply.github.com>
* OpenAPI: Improve examples handling
Signed-off-by: Julien Pivotto <291750+roidelapluie@users.noreply.github.com>
---------
Signed-off-by: Julien Pivotto <291750+roidelapluie@users.noreply.github.com>
Fix flaky TestStop_DrainingEnabled and TestStop_DrainingDisabled tests.
The tests used real HTTP servers and real time, making them susceptible to
race conditions and timing-dependent failures.
The solution is to convert both tests to use synctest for deterministic fake time.
---------
Signed-off-by: Arve Knudsen <arve.knudsen@gmail.com>
Guard the stale series ratio calculation by checking numSeries > 0
before computing the ratio. This prevents division by zero when
the head has no series.
Fixes#17949
Signed-off-by: Arve Knudsen <arve.knudsen@gmail.com>
The createAttributes error was incorrectly returning nil instead of err,
causing errors to be silently discarded. This could lead to silent data
loss for sum metrics during OTLP ingestion.
Fixes#17953
Signed-off-by: Arve Knudsen <arve.knudsen@gmail.com>
* simplify readability of timeseries filtering by using the slices package
Signed-off-by: Callum Styan <callumstyan@gmail.com>
* ensure that BenchmarkBuildTimeSeries doesn't account for the building of
the actual proto in the benchmark results, we only care about the
buildTimeSeries call
Signed-off-by: Callum Styan <callumstyan@gmail.com>
---------
Signed-off-by: Callum Styan <callumstyan@gmail.com>
Branch protection means they cannot merge PRs to main/release branches.
Branch protection means they cannot approve things outside their area for
PRs to main/release branches.
Also add sysadmind (Joe) as ower of aws, to make sure he gets notified.
Signed-off-by: György Krajcsovits <gyorgy.krajcsovits@grafana.com>
We discussed IRL.Nico no longer has time to contribute.
This also syncs the file with CODEOWNERS.
Signed-off-by: György Krajcsovits <gyorgy.krajcsovits@grafana.com>
* chore(sd-ownership): add default-maintainers as default code owner
In accordance with dev summit decision.
At the same time I've set up auto assignment for code review, meaning
that not everybody will get notified for all PRs. If there's already
a maintainer assigned, you don't get notified. Otherwise the
assignment is round-robin, 1 at a time. Also you can opt out.
Signed-off-by: György Krajcsovits <gyorgy.krajcsovits@grafana.com>
* Remove code owner without write access
Signed-off-by: György Krajcsovits <gyorgy.krajcsovits@grafana.com>
---------
Signed-off-by: György Krajcsovits <gyorgy.krajcsovits@grafana.com>
Add waitForQueryLog helper that polls for query log entries to appear
before asserting, rather than reading the file immediately after making
a query. This fixes a race condition where the query log wasn't flushed
to disk before the test read the file.
The helper uses a 5 second timeout with 100ms polling intervals, which
is generous enough to handle slow CI environments while keeping the test
responsive.
Signed-off-by: Arve Knudsen <arve.knudsen@gmail.com>
tsdb: Early compaction of stale series
Closes#13616
Based on https://github.com/prometheus/proposals/pull/55
Stale series tracking was added in #16925. This PR compacts the stale series into its own block before the normal compaction hits. Here is how the settings:
stale_series_compaction_threshold: As soon as the ratio of stale series in the head block crosses StaleSeriesImmediateCompactionThreshold, TSDB performs a stale series compaction and puts all the stale series into a block and removed it from the head, but it does not remove it from the WAL. (technically this condition is checked every minute and not exactly immediate)
Additional details
WAL replay: after a stale series compaction, tombstones are added with (MinInt64, MaxInt64) for all these stale series. During WAL replay we add a special condition where when we find such tombstone, it immediately removes the series from the memory instead of storing the tombstone. This is required so that we don't spike up memory during WAL replay and also don't keep the compacted stale series in the memory.
Head block truncation ignores this block via the added metadata, similar to out-of-order blocks.
* otlptranslator: filter __name__ from OTLP attributes to prevent duplicates
OTLP metrics can have a __name__ attribute which, when combined with the
metric name passed via extras, creates duplicate __name__ labels.
This commit implements filtering out of any __name__ metric attribute from OTLP.
Also rename TestCreateAttributes to TestPrometheusConverter_createAttributes
for consistency, and add test cases for __name__, __type__, and __unit__ OTLP metric attributes.
---------
Signed-off-by: Arve Knudsen <arve.knudsen@gmail.com>
Add a test for `LeveledCompactor.Plan()` stopping after a block matches the
`BlockExcludeFilter`, as a sub-test
`TestLeveledCompactor/Plan/BlockExcludeFilter stops iteration`.
Also moving `TestLeveledCompactor_plan` to a sub-test
of `TestLeveledCompactor`, for consistency.
Signed-off-by: Arve Knudsen <arve.knudsen@gmail.com>
Add bounds check to prevent index out of range panic when
trimStringByBytes receives a string containing only UTF-8 continuation
bytes (0x80-0xBF). Previously, the loop would decrement size below 0
when no valid rune start byte was found, causing a panic.
A malicious query string with only continuation bytes could crash
the Prometheus server via the ActiveQueryTracker before the query
was parsed or validated.
Signed-off-by: Arve Knudsen <arve.knudsen@gmail.com>
These bugs were discovered accidentally with code analysis:
- https://app.devin.ai/review/prometheus/prometheus/pull/16355
Upon further inspection and performing more analysis, 3 potential bugs were found:
1. sendloops could continue running if corresponding AM changed position in the config
2. multiple configs with the same hash would share sendloops resulting in sets without sendloops
3. sendloops could continue running if the config hash was changed
- `TestApplyConfigSendLoopsNotStoppedOnKeyChange`: Verifies sendLoops work when keys swap (no fix needed)
- `TestApplyConfigDuplicateHashSharesSendLoops`: Verifies sendLoops are independent with duplicate hashes (bug fixed)
- `TestApplyConfigHashChangeLeaksSendLoops`: Verifies sendLoops are cleaned up when hash changes (bug fixed)
Signed-off-by: Siavash Safi <siavash@cloudflare.com>
* fix(teststorage/appender.go): TODO and Sample staleness check
Allow different order of consecutive stale samples between the expected
and actual array for RequireEqual and RequireNotEqual by trying to
swap the expected side until it matches.
Also fix the definition of stale sample in the test, it's not only
float, but defined for native histograms as well.
Signed-off-by: György Krajcsovits <gyorgy.krajcsovits@grafana.com>
* add unit tests
Signed-off-by: György Krajcsovits <gyorgy.krajcsovits@grafana.com>
---------
Signed-off-by: György Krajcsovits <gyorgy.krajcsovits@grafana.com>
testutil.T was needed before https://go.dev/doc/go1.13#testingpkgtesting
Now it's inconsistent and confusing, so let's kill it.
Signed-off-by: bwplotka <bwplotka@gmail.com>
Update go.work from go 1.24.9 to go 1.24.0 to match the version
specified in all go.mod files across the project
Signed-off-by: Julien Pivotto <291750+roidelapluie@users.noreply.github.com>
ARG declarations before FROM are only available within the FROM
instruction and go out of scope afterward. Re-declare ARCH and OS
after FROM so they're available for the COPY instructions.
This fixes the build failure where ${OS}-${ARCH} resolved to empty
strings, causing "not found" errors for .build/-/prometheus.
Signed-off-by: Julien Pivotto <291750+roidelapluie@users.noreply.github.com>
* notifier: unit test for dropping throughput on stuck AM
Ref: https://github.com/prometheus/prometheus/issues/7676
Signed-off-by: György Krajcsovits <gyorgy.krajcsovits@grafana.com>
Signed-off-by: Siavash Safi <siavash@cloudflare.com>
* chore(notifier): remove year from copyrights
Signed-off-by: Siavash Safi <siavash@cloudflare.com>
* feat(notifier): independent alertmanager sendloops
Independent Alertmanager sendloops avoid issues with queue overflowing
when one or more Alertmanager instances are unavailable which could
result in lost alert notifications.
The sendloops are managed per AlertmanagerSet which are dynamically
added/removed with service discovery or configuration reload.
The following metrics now include an extra dimention for alertmanager label:
- prometheus_notifications_dropped_total
- prometheus_notifications_queue_capacity
- prometheus_notifications_queue_length
This change also includes the test from #14099Closes#7676
Signed-off-by: machine424 <ayoubmrini424@gmail.com>
Signed-off-by: Siavash Safi <siavash@cloudflare.com>
---------
Signed-off-by: György Krajcsovits <gyorgy.krajcsovits@grafana.com>
Signed-off-by: Siavash Safi <siavash@cloudflare.com>
Signed-off-by: machine424 <ayoubmrini424@gmail.com>
Co-authored-by: György Krajcsovits <gyorgy.krajcsovits@grafana.com>
The distroless Dockerfile was using a hardcoded SHA256 digest that
referenced only the amd64 image, preventing builds for other
architectures. This caused the main-distroless tag to only publish
amd64 images while the regular main tag had all 5 architectures.
Changes:
- Use architecture-specific image tags (nonroot-${DISTROLESS_ARCH})
instead of SHA256 digest to enable multi-arch manifest resolution
- Add DISTROLESS_ARCH build arg to handle architecture name mapping
(armv7 -> arm) between Prometheus and distroless conventions
- Move ARG declarations before FROM to support variable substitution
Signed-off-by: Julien Pivotto <291750+roidelapluie@users.noreply.github.com>
In commit 74775d732 "Add major version tag (#8026)" from 2020, the
docker-tag-latest target was updated to create major version tags
(v2, v3, etc.) but these tags were never actually pushed to the
registry. They existed locally only after tagging but were never
published.
This commit fixes the issue by:
- Adding logic to docker-publish to push major version tags when
DOCKER_IMAGE_TAG="latest" (triggered by promci during releases)
- Adding logic to docker-manifest to create major version manifests
when DOCKER_IMAGE_TAG="latest"
Pre-release filtering is handled at the promci level, where the regex
check ^v[0-9]+(\.[0-9]+){2}$ already ensures only stable releases
trigger the "latest" tagging workflow.
Signed-off-by: Julien Pivotto <291750+roidelapluie@users.noreply.github.com>
Add automatic cleanup to newTestHeadWithOptions so that heads created
with newTestHead are automatically closed when the test ends. This
simplifies test code by removing the need for manual cleanup in most
cases.
Changes:
- Add t.Cleanup in newTestHeadWithOptions immediately after creating
the head, using _ = h.Close() to handle double-close gracefully
- Remove redundant t.Cleanup, defer, and explicit Close calls from
tests that use newTestHead
- Add cleanup for heads created with NewHead directly in restart
patterns (e.g., restartHeadAndVerifySeriesCounts, startHead)
Signed-off-by: Arve Knudsen <arve.knudsen@gmail.com>
Otherwise performance is dominated by adding to a slice that gets longer
and longer as the benchmark progresses.
I chose to Rollback rather than Commit because that should do less work.
Signed-off-by: Bryan Boreham <bjboreham@gmail.com>
Fix two issues in fuzzing infrastructure:
- Correct artifact upload path from promql/testdata/fuzz to util/fuzzing/testdata/fuzz to match where Go stores crash artifacts
- Fix GetCorpusForFuzzParseExpr to preserve original parser flag values instead of always resetting them to false, which was disabling experimental features before actual fuzzing ran
Signed-off-by: Julien Pivotto <291750+roidelapluie@users.noreply.github.com>
* otlptranslator: add label caching for OTLP-to-Prometheus conversion
Add per-request caching to reduce redundant computation and allocations
during OTLP metric conversion:
1. Per-request label sanitization cache: Cache sanitized label names
within a request to avoid repeated string allocations for commonly
repeated labels like __name__, job, instance.
2. Resource-level label caching: Precompute and cache job, instance,
promoted resource attributes, and external labels once per
ResourceMetrics boundary instead of for each datapoint.
3. Scope-level label caching: Precompute and cache scope metadata labels
(otel_scope_name, otel_scope_version, etc.) once per ScopeMetrics
boundary.
4. LabelNamer instance caching: Reuse the LabelNamer struct across
datapoints within the same resource context.
These optimizations significantly reduce allocations and improve latency
for OTLP ingestion workloads with many datapoints per resource/scope.
---------
Signed-off-by: Arve Knudsen <arve.knudsen@gmail.com>
Co-authored-by: George Krajcsovits <krajorama@users.noreply.github.com>
* UI: Fix broken Y axis after graph page reload
The new `y_axis_min` setting was always encoded into the URL, even if its value
was `null` (in which case it would be set to an empty string parameter). On the
decoding side, this wasn't taken into account correctly, and we tried to parse
the empty string as a float, causing completely broken graphs showing nothing
after reloading the graph page with such URL parameters.
I'm doing two things now:
* For the future, only encode the Y axis min into the URL if it's set at all,
similar as we do for the `end_input` and `moment_input` fields.
* On the decoding side, accommodate people (at least for now) who already saved
some links with the empty `y_axis_min` parameter by treating an empty string
as `null` instead of a number.
Signed-off-by: Julius Volz <julius.volz@gmail.com>
* Add URL state encoding/decoding tests
Signed-off-by: Julius Volz <julius.volz@gmail.com>
---------
Signed-off-by: Julius Volz <julius.volz@gmail.com>
The benchmark was passing appendMetadata=false to NewCombinedAppender,
which caused UpdateMetadata to never be called on the underlying
noOpAppender. This resulted in app.metadata always being 0, failing
the assertion that metadata count should be positive.
Fix by enabling metadata appending in the benchmark.
Signed-off-by: Arve Knudsen <arve.knudsen@gmail.com>
find . -name "*.go" -type f -exec sed -E -i \
's/([^[:alpha:]]sample\{)([^,{:]+,[^,]+,[^,]+,[^,]+\})/\10, \2/g' {} +
I've omitted tsdb/ooo_head.go from the commit because I'm also adding todo
there.
Signed-off-by: György Krajcsovits <gyorgy.krajcsovits@grafana.com>
No implementation yet. Just to test the shape of the interface.
AtST is implemented for trivial cases, anything else is hard coded
to return 0.
Ref: https://github.com/prometheus/prometheus/issues/17791
Signed-off-by: György Krajcsovits <gyorgy.krajcsovits@grafana.com>
This change fixes an issue introduced in #17707. When a regex
with a wildcard, literal, and final wildcard surounded by a
capture group was parsed - the capture group was not removed
first preventing `optimizeConcatRegex` from running.
Found via fuzz testing.
Signed-off-by: Nick Pillitteri <nick.pillitteri@grafana.com>
- Added comprehensive edge case tests for SanitizeLabelName (10 cases)
- Added comprehensive edge case tests for SanitizeFullLabelName (15 cases)
- Added more test cases for link generation functions (4 additional cases)
- Fixed unicode test case: corrected expected underscores from 7 to 5
- Fixed digits test case: corrected expected output from '_____' to '_2345'
- Converted tests to table-driven format with named subtests
- Achieved 100% code coverage for the package
Signed-off-by: Ritik Shukla <ritikshukla@Ritiks-MacBook-Air.local>
Update check-go-mod-version.sh to use git ls-files instead of find for
better performance and to respect .gitignore. Also include go.work files
in the version check to ensure consistency across workspace files and
modules.
Signed-off-by: Julien Pivotto <291750+roidelapluie@users.noreply.github.com>
When filtering by a label that exists on both the input metric and
target_info (e.g., info(metric, {host_name="orbstack"}) where host_name
exists on both), the function incorrectly returned empty results.
The bug was in combineWithInfoVector: when no new labels were added
(because they all overlapped with base metric labels), the code entered
the "no match" filtering block even though an info series WAS matched.
The fix checks len(seenInfoMetrics) == 0 to correctly identify when no
info series matched. If an info series matched (seenInfoMetrics is
non-empty), the series is kept even if no new labels were added.
Fixes#17813
Signed-off-by: Arve Knudsen <arve.knudsen@gmail.com>
* Make service discoveries removable through build tags
Signed-off-by: Arthur Silva Sens <arthursens2005@gmail.com>
* Fix cross-platform build issues
Signed-off-by: Arthur Silva Sens <arthursens2005@gmail.com>
* Change build tags used
Signed-off-by: Arthur Silva Sens <arthursens2005@gmail.com>
* Remove year from License header
Signed-off-by: Arthur Silva Sens <arthursens2005@gmail.com>
* Remove plugins automation
Signed-off-by: Arthur Silva Sens <arthursens2005@gmail.com>
* Update README
Signed-off-by: Arthur Silva Sens <arthursens2005@gmail.com>
* Update README.md
Co-authored-by: Julien <291750+roidelapluie@users.noreply.github.com>
Signed-off-by: Arthur Silva Sens <arthursens2005@gmail.com>
---------
Signed-off-by: Arthur Silva Sens <arthursens2005@gmail.com>
Co-authored-by: Julien <291750+roidelapluie@users.noreply.github.com>
Moved some Metric.Get() calls in PromQL functions to avoid unnecessary label extraction.
In many cases, this work was done to extract metric name, and was only used if annotations were emitted.
In the same go I also replaced labels.MetricName with model.MetricNameLabel, since the former was deprecated.
Signed-off-by: Vilius Pranckaitis <vpranckaitis@gmail.com>
#14173 introduced an optimisation to better handle regex patterns like .*-.*-.*. It identifies strings the pattern cannot possibly match (because they do not contain all of the literal values) and returns false from MatchString early.
However, if the string does contain all literal values, then the Go regex engine is used to confirm that the string does match the pattern. But this is not necessary in the case where the start and end of the pattern is .* and everything in between is either a literal or .*: if the string contains all of the literals in order, then it matches the pattern, and invoking Go's regex engine to confirm this is unnecessary and quite slow.
* Add some more test cases
* Add benchmark, since existing benchmark doesn't show much impact given most of the random test strings will not match the patterns.
Signed-off-by: Charles Korn <charles.korn@grafana.com>
We have separate pools for Appender and AppenderV2 objects, and must not
put another kind of object into them.
Signed-off-by: Bryan Boreham <bjboreham@gmail.com>
* tsdb: add support for OOO exemplars in CircularExemplarStorage
Doubly linked exemplar storage resize.
Split exemplar buffer resize into shrink and grow functions.
Skip duplicate OOO exemplars, re-initialize emptied index after deleting its last exemplar.
Signed-off-by: Julius Hinze <julius.hinze@grafana.com>
NHCB is native histograms with custom buckets.
prompb is used for both remote write 1.0 and remote read. We do not
support NHCB over remote write 1.0 , however we should absolutely
support it for remote read.
Prometheus remote write 1.0 client already refuses to send NHCB.
Prometheus remote write 1.0 server accepts NHCB, but doesn't store
custom values, corrupting the result. I'm now handling NHCB correctly,
instead of refusing or corrupting.
Signed-off-by: György Krajcsovits <gyorgy.krajcsovits@grafana.com>
When accepting an autocompletion result within an Identifier node (could be a
metric name, function name, keyword, etc.), the inserted completion should
replace the entire Identifier node all the way to its last character, not only
to the current cursor position.
A limitation is that the correct replacement-until-end-of-identifier only works
when e.g. a function name is currently incomplete (which is likely anyway when
trying to replace it with a different one). This is because otherwise the
Identifier node gets replaced with a more specific function node type (like
`Rate`, `SumOverTime`, etc.), and handling all those adds more complexity.
https://github.com/prometheus/prometheus/issues/15839
Signed-off-by: Julius Volz <julius.volz@gmail.com>
The /classic/static/* route was added to serve vendor JavaScript and CSS
files (jQuery, Bootstrap, etc.) for console templates. These vendor assets
were removed in #14807 due to security vulnerabilities, making this route
obsolete as it now serves an empty directory.
The console feature remains functional via --web.console.templates and
--web.console.libraries flags. Users who need JavaScript/CSS libraries in
their custom console templates must provide these assets within the
directory specified by --web.console.libraries.
Signed-off-by: Julien Pivotto <291750+roidelapluie@users.noreply.github.com>
When React components mount/unmount repeatedly, each creating a new
PromQLExtension, memory leaks occur due to LRU caches with ttlAutopurge
timers keeping references alive and in-flight HTTP requests holding
closure references. This adds destroy() methods throughout the class
hierarchy to properly release resources on unmount.
Changes:
- Add destroy() to PrometheusClient interface (optional)
- HTTPPrometheusClient: track AbortControllers and abort pending requests
- Cache: clear all LRU caches and reset cached data
- CachedPrometheusClient: delegate to cache and underlying client
- HybridComplete: delegate to prometheusClient
- CompleteStrategy: add optional destroy() method
- PromQLExtension: delegate to complete strategy
Signed-off-by: Ben Blackmore <ben.blackmore@dash0.com>
Previously the AWS SD ECS Role only discovered instances that used
`awsvpc` network mode, which attaches a dedicated Elastic Network
Interface (ENI). This change adds in additional logic so that we
discover instances that are using `host` and `bridge` networking modes,
where the IP address is that of the EC2 instance that is hosting the
container. Also this change exposes a number of additional labels that
relate to the EC2 instance when the launch type is `EC2`.
Signed-off-by: matt-gp <small_minority@hotmail.com>
The isZero() method was missing checks for 9 fields that exist in the
GlobalConfig struct. This caused the method to incorrectly return true
when only these fields had non-zero values, resulting in user
configurations being silently overwritten with defaults during YAML
unmarshaling.
Added checks for: BodySizeLimit, SampleLimit, TargetLimit, LabelLimit,
LabelNameLengthLimit, LabelValueLengthLimit, KeepDroppedTargets,
MetricNameValidationScheme, and MetricNameEscapingScheme.
Consolidated TestEmptyGlobalBlock and new isZero tests under
TestGlobalConfig.
Signed-off-by: Arve Knudsen <arve.knudsen@gmail.com>
* fix(scrape): use HonorLabels instead of HonorTimestamps in newScrapeLoop
The sampleMutator closure in newScrapeLoop was incorrectly passing
HonorTimestamps to mutateSampleLabels instead of HonorLabels. This
caused honor_labels configuration to be ignored, with the behavior
incorrectly controlled by honor_timestamps instead.
Adding TestNewScrapeLoopHonorLabelsWiring integration test that exercises
the real newScrapeLoop constructor with HonorLabels and HonorTimestamps
set to opposite values to catch this class of wiring bug.
Signed-off-by: Arve Knudsen <arve.knudsen@gmail.com>
* Update scrape/scrape_test.go
Co-authored-by: George Krajcsovits <krajorama@users.noreply.github.com>
Signed-off-by: Arve Knudsen <arve.knudsen@gmail.com>
* Add honor_labels=false test case
Signed-off-by: Arve Knudsen <arve.knudsen@gmail.com>
---------
Signed-off-by: Arve Knudsen <arve.knudsen@gmail.com>
Co-authored-by: George Krajcsovits <krajorama@users.noreply.github.com>
- Emit `Vary: Accept-Encoding` in newCompressedResponseWriter so shared caches
key responses by content-coding. This prevents cache poisoning and
undecodable bytes when a compressed variant is cached and later served to a
client that didn't advertise support. (RFC 9110 §12.5.5 "Vary";
RFC 9111 §4.1 cache key & Vary)
- When selecting gzip/deflate, set `Content-Encoding` and delete any existing
`Content-Length` so Go's net/http can frame the message correctly
(chunked for HTTP/1.1; implicit for HTTP/2+). This avoids stale length
mismatches and related proxy/client issues.
Signed-off-by: Joshua Rogers <MegaManSec@users.noreply.github.com>
This commit addresses the PR feedback for issue #17615. The previous
implementation could not distinguish between:
- No counter reset hint specified (meaning "don't care")
- counter_reset_hint:unknown explicitly specified (meaning "verify it's unknown")
Changes:
- Added CounterResetHintSet field to parser.SequenceValue to track
whether counter_reset_hint was explicitly specified in the test file
- Modified buildHistogramFromMap to set this flag when the hint is
present in the descriptor map
- Updated newHistogramSequenceValue helper and histogramsSeries
functions to propagate the flag through histogram series creation
- Updated yacc grammar to use the new helper function
- Modified compareNativeHistogram to accept the flag and only compare
hints when explicitly specified
This allows tests to:
1. Not specify a hint (no comparison, backward compatible)
2. Explicitly specify counter_reset_hint:unknown (verify it's unknown)
3. Explicitly specify counter_reset_hint:gauge/reset/not_reset (verify match)
Fixes#17615
Signed-off-by: aviralgarg05 <gargaviral99@gmail.com>
Update generated files with latest functions from Prometheus, adding
support for first_over_time, info, ts_of_first_over_time,
ts_of_last_over_time, ts_of_max_over_time, and ts_of_min_over_time.
Signed-off-by: Julien Pivotto <291750+roidelapluie@users.noreply.github.com>
Add make targets to generate and check PromQL function signatures and
documentation for the Mantine UI. The generate-promql-functions target
runs the Go generators and automatically lints the output files. The
check-generated-promql-functions target verifies that generated files
are up to date, similar to check-generated-parser.
Fix the gen_functions_list generator to output properly formatted
TypeScript code with correct indentation and semicolons.
Add check-generated-promql-functions to the UI tests CI job to ensure
generated files stay in sync with upstream changes.
Signed-off-by: Julien Pivotto <291750+roidelapluie@users.noreply.github.com>
This allows nested modules to reference the root prometheus/prometheus
module from the local filesystem instead of downloading versioned
releases. Improves development workflow and ensures CI tests against
current code.
Signed-off-by: Julien Pivotto <291750+roidelapluie@users.noreply.github.com>
The test at line 1283 for avg_over_time(nhcb_metric[13m]) incorrectly
expected counter_reset_hint:gauge in the result. However, the actual
avg_over_time implementation does not explicitly set the CounterResetHint
to GaugeType on its output histogram.
With the new counter reset hint comparison logic added to the promqltest
framework (which compares hints when explicitly specified in expected
results), this incorrect expectation was now being caught.
This fix removes the incorrect counter_reset_hint:gauge from the expected
result, allowing the test to correctly verify the avg_over_time behavior
without asserting a specific hint value that the function does not set.
The counter reset hint comparison logic works as designed: if the expected
histogram has UnknownCounterReset (the default when not specified), no
comparison is performed. Only when a hint is explicitly specified in the
test expectation will it be compared against the actual result.
Fixes the test failure introduced by the counter reset hint comparison
feature in promqltest.
Signed-off-by: Aviral Garg <aviralg2106@gmail.com>
Signed-off-by: aviralgarg05 <gargaviral99@gmail.com>
This commit implements counter reset hint comparison in the promqltest
framework to address issue #17615. Previously, while test definitions
could specify a counter_reset_hint in expected native histogram results,
the framework did not actually compare this hint between expected and
actual results.
The implementation adds optional comparison logic to the
compareNativeHistogram function:
- If the expected histogram has UnknownCounterReset (the default),
the hint is not compared (meaning "don't care")
- If the expected histogram explicitly specifies CounterReset,
NotCounterReset, or GaugeType, it is verified against the actual
histogram's hint
This allows tests to verify that PromQL functions correctly set or
preserve counter reset hints while maintaining backward compatibility
with existing tests that don't specify explicit hints.
Fixes#17615
Signed-off-by: aviralgarg05 <gargaviral99@gmail.com>
See https://github.com/prometheus/prometheus/issues/16911
This will create a denser layout by default, enabling people to see more
information on the page without having to discover the global settings menu.
Signed-off-by: Julius Volz <julius.volz@gmail.com>
Refer to the existing CHANGELOG for inspiration: https://github.com/prometheus/prometheus/blob/main/CHANGELOG.md
A concrete example may look as follows (be sure to leave out the surrounding quotes): "[FEATURE] API: Add /api/v1/features for clients to understand which features are supported".
If you need help formulating your entries, consult the reviewer(s).
- [BUGFIX] Agent: fix crash shortly after startup from invalid type of object. #17802
- [BUGFIX] Scraping: fix relabel keep/drop not working. #17807
## 3.9.0 / 2026-01-06
- [CHANGE] Native Histograms are no longer experimental! Make the `native-histogram` feature flag a no-op. Use `scrape_native_histograms` config option instead. #17528
@ -49,7 +54,7 @@
* [FEATURE] OAuth2: support jwt-bearer grant-type (RFC7523 3.1). #17592
* [FEATURE] Dockerfile: Add OpenContainers spec labels to Dockerfile. #16483
* [FEATURE] SD: Add unified AWS service discovery for ec2, lightsail and ecs services. #17406
* [FEATURE] Native histograms are now a stable, but optional feature, use the `scrape_native_histogram` config setting. #17232#17315
* [FEATURE] Native histograms are now a stable, but optional feature, use the `scrape_native_histograms` config setting. #17232#17315
* [FEATURE] UI: Support anchored and smoothed keyword in promql editor. #17239
* [FEATURE] UI: Show detailed relabeling steps for each discovered target. #17337
* [FEATURE] Alerting: Add urlQueryEscape to template functions. #17403
@ -274,7 +279,7 @@
## 3.2.1 / 2025-02-25
* [BUGFIX] Don't send Accept` header `escape=allow-utf-8` when `metric_name_validation_scheme: legacy` is configured. #16061
* [BUGFIX] Don't send `Accept` header `escape=allow-utf-8` when `metric_name_validation_scheme: legacy` is configured. #16061
## 3.2.0 / 2025-02-17
@ -285,10 +290,10 @@
* [ENHANCEMENT] scrape: Add metadata for automatic metrics to WAL for `metadata-wal-records` feature. #15837
* [ENHANCEMENT] promtool: Support linting of scrape interval, through lint option `too-long-scrape-interval`. #15719
@ -14,7 +14,7 @@ Prometheus uses GitHub to manage reviews of pull requests.
of inspiration. Also please see our [non-goals issue](https://github.com/prometheus/docs/issues/149) on areas that the Prometheus community doesn't plan to work on.
* Relevant coding style guidelines are the [Go Code Review
@git diff --exit-code -- $(UI_PATH)/mantine-ui/src/promql/functionSignatures.ts $(UI_PATH)/mantine-ui/src/promql/functionDocs.tsx ||(echo"Generated PromQL function files are out of date. Please run 'make generate-promql-functions' and commit the changes."&&false)
@echo ">> updating Go dependencies in ./documentation/examples/remote_storage/"
@cd ./documentation/examples/remote_storage/ &&for m in $$($(GO) list -mod=readonly -m -f '{{ if and (not .Indirect) (not .Main)}}{{.Path}}{{end}}' all);do\
docker tag "$(DOCKER_REPO)/$(DOCKER_IMAGE_NAME)-linux-$*:$(SANITIZED_DOCKER_IMAGE_TAG)""$(DOCKER_REPO)/$(DOCKER_IMAGE_NAME)-linux-$*:latest"
docker tag "$(DOCKER_REPO)/$(DOCKER_IMAGE_NAME)-linux-$*:$(SANITIZED_DOCKER_IMAGE_TAG)""$(DOCKER_REPO)/$(DOCKER_IMAGE_NAME)-linux-$*:v$(DOCKER_MAJOR_VERSION_TAG)"
@for variant in $(DOCKERFILE_VARIANTS_WITH_NAMES);do\
echo"Tagging $$variant_name variant for linux-$* as latest";\
docker tag "$(DOCKER_REPO)/$(DOCKER_IMAGE_NAME)-linux-$*:$(SANITIZED_DOCKER_IMAGE_TAG)-$$variant_name""$(DOCKER_REPO)/$(DOCKER_IMAGE_NAME)-linux-$*:latest-$$variant_name";\
docker tag "$(DOCKER_REPO)/$(DOCKER_IMAGE_NAME)-linux-$*:$(SANITIZED_DOCKER_IMAGE_TAG)-$$variant_name""$(DOCKER_REPO)/$(DOCKER_IMAGE_NAME)-linux-$*:v$(DOCKER_MAJOR_VERSION_TAG)-$$variant_name";\
fi;\
if["$$dockerfile"="Dockerfile"];then\
echo"Tagging default variant ($$variant_name) for linux-$* as latest";\
docker tag "$(DOCKER_REPO)/$(DOCKER_IMAGE_NAME)-linux-$*:$(SANITIZED_DOCKER_IMAGE_TAG)""$(DOCKER_REPO)/$(DOCKER_IMAGE_NAME)-linux-$*:latest";\
docker tag "$(DOCKER_REPO)/$(DOCKER_IMAGE_NAME)-linux-$*:$(SANITIZED_DOCKER_IMAGE_TAG)""$(DOCKER_REPO)/$(DOCKER_IMAGE_NAME)-linux-$*:v$(DOCKER_MAJOR_VERSION_TAG)";\
fi;\
done
.PHONY:common-docker-manifest
common-docker-manifest:
DOCKER_CLI_EXPERIMENTAL=enabled docker manifest create -a "$(DOCKER_REPO)/$(DOCKER_IMAGE_NAME):$(SANITIZED_DOCKER_IMAGE_TAG)"$(foreach ARCH,$(DOCKER_ARCHS),$(DOCKER_REPO)/$(DOCKER_IMAGE_NAME)-linux-$(ARCH):$(SANITIZED_DOCKER_IMAGE_TAG))
*However*, when using `go install` to build Prometheus, Prometheus will expect to be able to
read its web assets from local filesystem directories under `web/ui/static` and
`web/ui/templates`. In order for these assets to be found, you will have to run Prometheus
from the root of the cloned repository. Note also that these directories do not include the
React UI unless it has been built explicitly using `make assets` or `make build`.
read its web assets from local filesystem directories under `web/ui/static`. In order for
these assets to be found, you will have to run Prometheus from the root of the cloned
repository. Note also that this directory does not include the React UI unless it has been
built explicitly using `make assets` or `make build`.
An example of the above configuration file can be found [here.](https://github.com/prometheus/prometheus/blob/main/documentation/examples/prometheus.yml)
@ -113,16 +113,31 @@ The Makefile provides several targets:
### Service discovery plugins
Prometheus is bundled with many service discovery plugins.
When building Prometheus from source, you can edit the [plugins.yml](./plugins.yml)
file to disable some service discoveries. The file is a yaml-formatted list of go
import path that will be built into the Prometheus binary.
Prometheus is bundled with many service discovery plugins. You can customize
which service discoveries are included in your build using Go build tags.
After you have changed the file, you
need to run `make build` again.
To exclude service discoveries when building with `make build`, add the desired
tags to the `.promu.yml` file under `build.tags.all`:
If you are using another method to compile Prometheus, `make plugins` will
generate the plugins file accordingly.
```yaml
build:
tags:
all:
- netgo
- builtinassets
- remove_all_sd # Exclude all optional SDs
- enable_kubernetes_sd # Re-enable only kubernetes
```
Then run `make build` as usual. Alternatively, when using `go build` directly:
```bash
go build -tags "remove_all_sd,enable_kubernetes_sd" ./cmd/prometheus
```
Available build tags:
* `remove_all_sd` - Exclude all optional service discoveries (keeps file_sd, static_sd, and http_sd)
* `enable_<name>_sd` - Re-enable a specific SD when using `remove_all_sd`
If you add out-of-tree plugins, which we do not endorse at the moment,
additional steps might be needed to adjust the `go.mod` and `go.sum` files. As
If you are interested in volunteering please create a pull request against the [prometheus/prometheus](https://github.com/prometheus/prometheus) repository and propose yourself for the release series of your choice.
logger.Warn("This option for --enable-feature is being phased out. It currently changes the default for the extra_scrape_metrics config setting to true, but will become a no-op in a future version. Stop using this option and set extra_scrape_metrics in the config instead.","option",o)
logger.Info("Experimental created timestamp zero ingestion enabled. Changed default scrape_protocols to prefer PrometheusProto format.","global.scrape_protocols",fmt.Sprintf("%v",config.DefaultGlobalConfig.ScrapeProtocols))
logger.Info("Experimental start timestamp zero ingestion enabled. Changed default scrape_protocols to prefer PrometheusProto format.","global.scrape_protocols",fmt.Sprintf("%v",config.DefaultGlobalConfig.ScrapeProtocols))
case"st-storage":
// TODO(bwplotka): Implement ST Storage as per PROM-60 and document this hidden feature flag.
c.tsdb.EnableSTStorage=true
c.agent.EnableSTStorage=true
// Change relevant global variables. Hacky, but it's hard to pass a new option or default to unmarshallers. This is to widen the ST support surface.