* Adding scape on shutdown
Signed-off-by: avilevy <avilevy@google.com>
* scrape: replace skipOffsetting to make the test offset deterministic instead of skipping it entirely
Signed-off-by: avilevy <avilevy@google.com>
* renamed calculateScrapeOffset to getScrapeOffset
Signed-off-by: avilevy <avilevy@google.com>
* test(scrape): refactor time-based manager tests to use synctest
Addresses PR feedback to remove flaky, time-based sleeping in the scrape manager tests.
Add TestManager_InitialScrapeOffset and TestManager_ScrapeOnShutdown to use the testing/synctest package, completely eliminating real-world time.Sleep delays and making the assertions 100% deterministic.
- Replaced httptest.Server with net.Pipe and a custom startFakeHTTPServer helper to ensure all network I/O remains durably blocked inside the synctest bubble.
- Leveraged the skipOffsetting option to eliminate random scrape jitter, making the time-travel math exact and predictable.
- Using skipOffsetting also safely bypasses the global singleflight DNS lookup in setOffsetSeed, which previously caused cross-bubble panics in synctest.
- Extracted shared boilerplate into a setupSynctestManager helper to keep the test cases highly readable and data-driven.
Signed-off-by: avilevy <avilevy@google.com>
* Clarify use cases in InitialScrapeOffset comment
Signed-off-by: avilevy <avilevy@google.com>
* test(scrape): use httptest for mock server to respect context cancellation
- Replaced manual HTTP string formatting over `net.Pipe` with `httptest.NewUnstartedServer`.
- Implemented an in-memory `pipeListener` to allow the server to handle `net.Pipe` connections directly. This preserves `synctest` time isolation without opening real OS ports.
- Added explicit `r.Context().Done()` handling in the mock HTTP handler to properly simulate aborted requests and scrape timeouts.
- Validates that the request context remains active and is not prematurely cancelled during `ScrapeOnShutdown` scenarios.
- Renamed `skipOffsetting` to `skipJitterOffsetting`.
- Addressed other PR comments.
Signed-off-by: avilevy <avilevy@google.com>
* tmp
Signed-off-by: bwplotka <bwplotka@gmail.com>
* exp2
Signed-off-by: bwplotka <bwplotka@gmail.com>
* fix
Signed-off-by: bwplotka <bwplotka@gmail.com>
* scrape: fix scrapeOnShutdown context bug and refactor test helpers
The scrapeOnShutdown feature was failing during manager shutdown because
the scrape pool context was being cancelled before the final shutdown
scrapes could execute. Fix this by delaying context cancellation
in scrapePool.stop() until after all scrape loops have stopped.
In addition:
- Added test cases to verify scrapeOnShutdown works with InitialScrapeOffset.
- Refactored network test helper functions from manager_test.go to
helpers_test.go.
- Addressed other comments.
Signed-off-by: avilevy <avilevy@google.com>
* Update scrape/scrape.go
Co-authored-by: Bartlomiej Plotka <bwplotka@gmail.com>
Signed-off-by: avilevy18 <105948922+avilevy18@users.noreply.github.com>
---------
Signed-off-by: avilevy <avilevy@google.com>
Signed-off-by: bwplotka <bwplotka@gmail.com>
Signed-off-by: avilevy18 <105948922+avilevy18@users.noreply.github.com>
Co-authored-by: bwplotka <bwplotka@gmail.com>
Remove the separate scrapeFailureLoggerMtx and use targetMtx instead
for synchronizing access to scrapeFailureLogger. This fixes a data race
where Sync() would read scrapeFailureLogger while holding targetMtx but
SetScrapeFailureLogger() would write to it while holding a different mutex.
Add regression test to catch concurrent access issues.
Signed-off-by: Julien Pivotto <291750+roidelapluie@users.noreply.github.com>
Otherwise performance is dominated by adding to a slice that gets longer
and longer as the benchmark progresses.
I chose to Rollback rather than Commit because that should do less work.
Signed-off-by: Bryan Boreham <bjboreham@gmail.com>
* fix(scrape): use HonorLabels instead of HonorTimestamps in newScrapeLoop
The sampleMutator closure in newScrapeLoop was incorrectly passing
HonorTimestamps to mutateSampleLabels instead of HonorLabels. This
caused honor_labels configuration to be ignored, with the behavior
incorrectly controlled by honor_timestamps instead.
Adding TestNewScrapeLoopHonorLabelsWiring integration test that exercises
the real newScrapeLoop constructor with HonorLabels and HonorTimestamps
set to opposite values to catch this class of wiring bug.
Signed-off-by: Arve Knudsen <arve.knudsen@gmail.com>
* Update scrape/scrape_test.go
Co-authored-by: George Krajcsovits <krajorama@users.noreply.github.com>
Signed-off-by: Arve Knudsen <arve.knudsen@gmail.com>
* Add honor_labels=false test case
Signed-off-by: Arve Knudsen <arve.knudsen@gmail.com>
---------
Signed-off-by: Arve Knudsen <arve.knudsen@gmail.com>
Co-authored-by: George Krajcsovits <krajorama@users.noreply.github.com>
This adds the following native histograms (with a few classic buckets for backwards compatibility), while keeping the corresponding summaries (same name, just without `_histogram`):
- `prometheus_sd_refresh_duration_histogram_seconds`
- `prometheus_rule_evaluation_duration_histogram_seconds`
- `prometheus_rule_group_duration_histogram_seconds`
- `prometheus_target_sync_length_histogram_seconds`
- `prometheus_target_interval_length_histogram_seconds`
- `prometheus_engine_query_duration_histogram_seconds`
Signed-off-by: Harsh <harshmastic@gmail.com>
Signed-off-by: harsh kumar <135993950+hxrshxz@users.noreply.github.com>
Co-authored-by: Björn Rabenstein <github@rabenste.in>
ReduceResolution is currently called before validation during
ingestion. This will cause a panic if there are not enough buckets in
the histogram. If there are too many buckets, the spurious buckets are
ignored, and therefore the error in the input histogram is masked.
Furthermore, invalid negative offsets might cause problems, too.
Therefore, we need to do some minimal validation in reduceResolution.
Fortunately, it is easy and shouldn't slow things down. Sadly, it
requires to return errors, which triggers a bunch of code changes.
Even here is a bright side, we can get rud of a few panics. (Remember:
Don't panic!)
In different news, we haven't done a full validation of histograms
read via remote-read. This is not so much a security concern (as you
can throw off Prometheus easily by feeding it bogus data via
remote-read) but more that remote-read sources might be makeshift and
could accidentally create invalid histograms. We really don't want to
panic in that case. So this commit does not only add a check of the
spans and buckets as needed for resolution reduction but also a full
validation during remote-read.
Signed-off-by: beorn7 <beorn@grafana.com>
Partially fixes https://github.com/prometheus/prometheus/issues/17416 by
renaming all CT* names to ST* in the whole codebase except RW2 (this is
done in separate
[PR](https://github.com/prometheus/prometheus/pull/17411)) and
PrometheusProto exposition proto.
```
CreatedTimestamp -> StartTimestamp
CreatedTimeStamp -> StartTimestamp
created_timestamp -> start_timestamp
CT -> ST
ct -> st
```
Signed-off-by: bwplotka <bwplotka@gmail.com>
TestScrapeLoop_HistogramBucketLimit tests the bucket limiter but it also sets sample_limit to the same value, which seems incorrect.
Signed-off-by: Lukasz Mierzwa <l.mierzwa@gmail.com>
Currently histograms bypass sample_limit logic as the limitAppender only implements the Append() method, while histograms are appended using AppendHistogram.
This means that they are effectively ignored during sample_limit checks and a scrape with sample_limit=100 and 500 histograms will accept all samples.
Add AppendHistogram method to the limitAppender so histograms are also counted towards sample_limit.
Signed-off-by: Lukasz Mierzwa <l.mierzwa@gmail.com>
This adds:
* A `ScrapePoolConfig()` method to the scrape manager that allows getting
the scrape config for a given pool.
* An API endpoint at `/api/v1/targets/relabel_steps` that takes a pool name
and a label set of a target and returns a detailed list of applied
relabeling rules and their output for each step.
* A "show relabeling" link/button for each target on the discovery page
that shows the detailed flow of all relabeling rules (based on the API
response) for that target.
Note that this changes the JSON encoding of the relabeling rule config
struct to output the original snake_case (instead of camelCase) field names,
and before merging, we need to be sure that's ok :) See my comment about
that at https://github.com/prometheus/prometheus/pull/15383#issuecomment-3405591487
Fixes https://github.com/prometheus/prometheus/issues/17283
Signed-off-by: Julius Volz <julius.volz@gmail.com>
The detailed plan for this is laid out in
https://github.com/prometheus/prometheus/issues/16572 .
This commit adds a global and local scrape config option
`scrape_native_histograms`, which has to be set to true to ingest
native histograms.
To ease the transition, the feature flag is changed to simply set the
default of `scrape_native_histograms` to true.
Further implications:
- The default scrape protocols now depend on the
`scrape_native_histograms` setting.
- Everywhere else, histograms are now "on by default".
Documentation beyond the one for the feature flag and the scrape
config are deliberately left out. See
https://github.com/prometheus/prometheus/pull/17232 for that.
Signed-off-by: beorn7 <beorn@grafana.com>
Applied the analyzer "modernize" to the test files.
$ go run golang.org/x/tools/gopls/internal/analysis/modernize/cmd/modernize@latest -fix -test ./...
Signed-off-by: beorn7 <beorn@grafana.com>