Add a maximum limit of 10,000 to the TSDB status endpoint to prevent
resource exhaustion from excessively large limit values, as we preallocate
[]Stat for up to the limit: `make([]Stat, 0, length)`.
Note that the endpoint acquires a cardinality mutex during
stats calculation, so this can not be run in parallel.
Signed-off-by: Julien Pivotto <291750+roidelapluie@users.noreply.github.com>
This commit adds support for configuring a custom start timestamp
for Prometheus unit tests, allowing tests to use realistic timestamps
instead of starting at Unix epoch 0.
Signed-off-by: Julien Pivotto <291750+roidelapluie@users.noreply.github.com>
The return value of functions relating to the current time, e.g. time(),
is set by promtool to start at timestamp 0 at the start of a test's
evaluation.
This has the very nice consequence that tests can run reliably without
depending on when they are run.
It does, however, mean that tests will give out results that can be
unexpected by users.
If this behaviour is documented, then users will be empowered to write
tests for their rules that use time-dependent functions.
(Closes: prometheus/docs#1464)
Signed-off-by: Gabriel Filion <lelutin@torproject.org>
* Delay compactions until Thanos uploads all blocks
Using Thanos sidecar with Prometheus requires us to disable TSDB compactions on Prometheus side by setting --storage.tsdb.min-block-duration and --storage.tsdb.max-block-duration to the same value. See https://thanos.io/tip/components/sidecar.md. The main problem this avoids is that Prometheus might compact given block before Thanos uploads it, creating a gap in Thanos metrics. Thanos does not upload compacted blocks because that would upload the same sample multiple times. You can tell Thanos to upload compacted blocks but that is aimed at one time migrations. This patch creates a bridge between Thanos and Prometheus by allowing Prometheus to read the shipper file Thanos creates, where it tracks which blocks were already uploaded, and using that data delays compaction of blocks until they are marked as uploaded by Thanos. Thanks to this both services can coordinate with each other (in a way) and we can stop disabling compaction on Prometheus side when Thanos uploads are enabled.
The reason to have this is that disabling compactions have very dramatic performance cost. Since most time series exist for longer than a single block duration (2h by default) large chunks of block index will reference the same series, so 10 * 2h blocks will each have an index that is usually fairly big and is almost the same for all 10 blocks. Compaction de-duplicates the index so merging 10 blocks together would leave us with a single index that is around the same size as each of these 10 2h blocks would have (plus some extra for series that only exists in some blocks, but not all). Every range query that iterates over all 10 blocks would then have to read each index and so we're doing 10x more work then if we had a single compacted block.
Signed-off-by: Lukasz Mierzwa <l.mierzwa@gmail.com>
* Rename structs and functions to make this more generic
Signed-off-by: Lukasz Mierzwa <l.mierzwa@gmail.com>
* Address review comments
Signed-off-by: Lukasz Mierzwa <l.mierzwa@gmail.com>
* Cache UploadMeta for 1 minute
Signed-off-by: Lukasz Mierzwa <l.mierzwa@gmail.com>
---------
Signed-off-by: Lukasz Mierzwa <l.mierzwa@gmail.com>
* Add a nav title to fix docs website generator.
* Make it more clear that "Prometheus Agent" is a mode, not a seaparate
service.
* Add to index.
* Cleanup some wording.
* Add a downsides section.
Signed-off-by: SuperQ <superq@gmail.com>
Corrected the structure of the explanation regarding how samples are organized in the chunks directory and the handling of deletion records.
Signed-off-by: Raul Leite <sp4wn.root@gmail.com>
Partially fixes https://github.com/prometheus/prometheus/issues/17416 by
renaming all CT* names to ST* in the whole codebase except RW2 (this is
done in separate
[PR](https://github.com/prometheus/prometheus/pull/17411)) and
PrometheusProto exposition proto.
```
CreatedTimestamp -> StartTimestamp
CreatedTimeStamp -> StartTimestamp
created_timestamp -> start_timestamp
CT -> ST
ct -> st
```
Signed-off-by: bwplotka <bwplotka@gmail.com>
* add feature flag for remote write v2
Signed-off-by: pipiland2612 <nguyen.t.dang.minh@gmail.com>
* change from number to protobuf_message
Signed-off-by: pipiland2612 <nguyen.t.dang.minh@gmail.com>
* fix test
Signed-off-by: pipiland2612 <nguyen.t.dang.minh@gmail.com>
* fix name
Signed-off-by: pipiland2612 <nguyen.t.dang.minh@gmail.com>
* run make cli-documentation
Signed-off-by: pipiland2612 <nguyen.t.dang.minh@gmail.com>
* fix help
Signed-off-by: pipiland2612 <nguyen.t.dang.minh@gmail.com>
* run make cli-documentation
Signed-off-by: pipiland2612 <nguyen.t.dang.minh@gmail.com>
---------
Signed-off-by: pipiland2612 <nguyen.t.dang.minh@gmail.com>
Fixes: #16572
Mark as stable means that breaking changes are only allowed together with
major version release of Prometheus.
Co-authored-by: Björn Rabenstein <beorn@grafana.com>
Signed-off-by: George Krajcsovits <krajorama@users.noreply.github.com>
Signed-off-by: György Krajcsovits <gyorgy.krajcsovits@grafana.com>
As a follow-up to #17344, add two configuration parameters for controlling label
name translation, both defaulting to on for backwards compatibility (currently
these behaviours are hardcoded as enabled):
* otlp.label_name_underscore_sanitization => Prefix label names starting with a
single underscore with key_ when translating OTel attribute names
* otlp.label_name_preserve_multiple_underscores => Keep multiple consecutive
underscores in label names when translating OTel attribute names
Signed-off-by: Arve Knudsen <arve.knudsen@gmail.com>
This adds:
* A `ScrapePoolConfig()` method to the scrape manager that allows getting
the scrape config for a given pool.
* An API endpoint at `/api/v1/targets/relabel_steps` that takes a pool name
and a label set of a target and returns a detailed list of applied
relabeling rules and their output for each step.
* A "show relabeling" link/button for each target on the discovery page
that shows the detailed flow of all relabeling rules (based on the API
response) for that target.
Note that this changes the JSON encoding of the relabeling rule config
struct to output the original snake_case (instead of camelCase) field names,
and before merging, we need to be sure that's ok :) See my comment about
that at https://github.com/prometheus/prometheus/pull/15383#issuecomment-3405591487
Fixes https://github.com/prometheus/prometheus/issues/17283
Signed-off-by: Julius Volz <julius.volz@gmail.com>
The detailed plan for this is laid out in
https://github.com/prometheus/prometheus/issues/16572 .
This commit adds a global and local scrape config option
`scrape_native_histograms`, which has to be set to true to ingest
native histograms.
To ease the transition, the feature flag is changed to simply set the
default of `scrape_native_histograms` to true.
Further implications:
- The default scrape protocols now depend on the
`scrape_native_histograms` setting.
- Everywhere else, histograms are now "on by default".
Documentation beyond the one for the feature flag and the scrape
config are deliberately left out. See
https://github.com/prometheus/prometheus/pull/17232 for that.
Signed-off-by: beorn7 <beorn@grafana.com>
Adds a `config` label (similar to `prometheus_sd_discovered_targets`) to
refresh metrics to help identify the source of refresh issues or
performance stats. In particular for HTTP SD, it can be common to have
multiple disparate HTTP SD sources that should be identified and not
lumped together. For example if one HTTP SD service has failures, that
should be evident in its own time series seperate from other HTTP SD
sources.
`config` seemed more appropriate than `endpoint` as a general standard
for `prometheus_sd` metrics.
Docs were also updated for HTTP SD to point at the new refresh metrics
rather than the older metrics.
Signed-off-by: Will Bollock <wbollock@linode.com>
* Add anchored and smoothed to vector selectors.
This adds "anchored" and "smoothed" keywords that can be used following a matrix selector.
"Anchored" selects the last point before the range (or the first one after the range) and adds it at the boundary of the matrix selector.
"Smoothed" applies linear interpolation at the edges using the points around the edges. In the absence of a point before or after the edge, the first or the last point is added to the edge, without interpolation.
*Exemple usage*
* `increase(caddy_http_requests_total[5m] anchored)` (equivalent of *caddy_http_requests_total - caddy_http_requests_total offset 5m* but takes counter reset into consideration)
* `rate(caddy_http_requests_total[step()] smoothed)`
Signed-off-by: Julien Pivotto <291750+roidelapluie@users.noreply.github.com>
* Update docs/feature_flags.md
Co-authored-by: Charles Korn <charleskorn@users.noreply.github.com>
Signed-off-by: Julien <291750+roidelapluie@users.noreply.github.com>
* Smoothed/Anchored rate: Add more tests
Signed-off-by: Julien Pivotto <291750+roidelapluie@users.noreply.github.com>
* Anchored/Smoothed modifier: error out with histograms
Signed-off-by: Julien Pivotto <291750+roidelapluie@users.noreply.github.com>
---------
Signed-off-by: Julien Pivotto <291750+roidelapluie@users.noreply.github.com>
Signed-off-by: Julien <291750+roidelapluie@users.noreply.github.com>
Co-authored-by: Charles Korn <charleskorn@users.noreply.github.com>
Because the label API endpoints read from the TSDB indexes, they can
return information for series which are present in the index but have no
samples in the queried interval.
Add similar note for the series endpoint.
Signed-off-by: Simon Pasquier <spasquie@redhat.com>
Add a first_over_time function, and corresponding ts_of_first_over_time
function. Both are behind the experimental functions feature flag.
Signed-off-by: Craig Ringer <craig.ringer@enterprisedb.com>
We don't want do backup WAL files, so we should show how to actually
ignore those files.
Also explain what happens every 2 hours a little more clearly.
Move things around so the paragraphs flow more easily.
Followup for #14297.
Signed-off-by: Antoine Beaupré <anarcat@debian.org>
Co-authored-by: Bryan Boreham <bjboreham@gmail.com>
This mostly handles the cases mentioned in #16576. However, there are
some related changes in here, too:
- Some line formatting to avoid lines longer than 80 characters.
- Establish in basics.md that histograms have a counter vs. gauge
"flavor" that is also stored in the sample and not just by
convention as for float samples.
- Add the documentation of the unary minus, which was missing so far.
This require a bit of restructuring.
- Cleaned up a few references to "Prometheus" that should better refer
to "PromQL" (and "Prometheus's query language" → "PromQL" etc.).
I decided to not explain in all detail when and how PromQL detects an
incompatible counter reset. The spec is linked from basics.md, so the
minority that might be interested in this can still look it up.
Signed-off-by: beorn7 <beorn@grafana.com>
* Support including scope metadata as metric labels
Signed-off-by: Arthur Silva Sens <arthursens2005@gmail.com>
* Ensure Scope Name, Version and Schema URL aren't overriden by attributes
Signed-off-by: Arthur Silva Sens <arthursens2005@gmail.com>
---------
Signed-off-by: Arthur Silva Sens <arthursens2005@gmail.com>
The last permutation of the translation options does underscore translation but does not add suffixes.
This translation option already exists in Mimir as otel_metric_suffixes_enabled, indicating external demand for this strategy.
There is an accompanying update to prometheus-docs to explain the use of this mode: https://github.com/prometheus/docs/pull/2688
Signed-off-by: Owen Williams <owen.williams@grafana.com>
Presumably, this will help with Loki alerts, but the added functionality is also generally useful.
For one, this enables `parseDuration` to also accept negative duration (as that's something that is also used in PromQL by now).
This also adds a function `now` to return the evaluation time of the template (as seconds since epoch AKA Unix time) and a function `toDuration` (akin to `toTime`), which creates a Go `time.Duration` from a duration in seconds.
---------
Signed-off-by: Dmitry Ponomaryov <me@halje.ru>
Signed-off-by: Dmitry Ponomaryov <iamhalje@gmail.com>
Reverts #16730 and #16760
This is being done because we've noticed a problem in the spec that could
lead to name collisions if attributes name, version or schema_url are added
to the scope. They would collide with the already reserved labels
otel_scope_name, otel_scope_version and otel_scope_schema_url.
Since this new configuration option never made it into a release, we can
safely remove it from the 3.5 release. We'll sort this out for the 3.6 release
Signed-off-by: Arthur Silva Sens <arthursens2005@gmail.com>
step() is a new keyword introduced to represent the query step width in duration expressions.
min(a,b) and max(a,b) return the min and max from two duration expressions.
Signed-off-by: Julien Pivotto <291750+roidelapluie@users.noreply.github.com>
* fix(promql): histogram_quantile NaN observed in native histogram
Fixes: #16578
See the issue for detailed explanation.
When a histogram had only NaN observations and no normal observations,
we returned 0 from the quantile, which is completely wrong. If there were
normal observations but we went over them, we returned the upper bound of
the existing buckets, however that contradicts expectations on
histogram_fraction. Now we return NaN if the quantile is calculated to be
over all normal observations, falling into NaNs (in a virtual +Inf bucket).
We also return info level annotations if we see any NaN observations.
The annotation calls out if we returned NaN or even if we took the
virtual +Inf bucket into account.
Signed-off-by: György Krajcsovits <gyorgy.krajcsovits@grafana.com>
* fix(promql): histogram_fraction NaN observed in native histogram
Fixes: #16580
According to the specification we should not take NaN observations
into account when calculating the fraction. This commit fixes that
and adds an info level annotation to let the user know about this.
Signed-off-by: György Krajcsovits <gyorgy.krajcsovits@grafana.com>
This commit adds the ts_of_(max,min,last)_over_time functions behind the experimental feature flag.
Signed-off-by: Michael Hoffmann <mhoffmann@cloudflare.com>