We discussed IRL.Nico no longer has time to contribute.
This also syncs the file with CODEOWNERS.
Signed-off-by: György Krajcsovits <gyorgy.krajcsovits@grafana.com>
* chore(sd-ownership): add default-maintainers as default code owner
In accordance with dev summit decision.
At the same time I've set up auto assignment for code review, meaning
that not everybody will get notified for all PRs. If there's already
a maintainer assigned, you don't get notified. Otherwise the
assignment is round-robin, 1 at a time. Also you can opt out.
Signed-off-by: György Krajcsovits <gyorgy.krajcsovits@grafana.com>
* Remove code owner without write access
Signed-off-by: György Krajcsovits <gyorgy.krajcsovits@grafana.com>
---------
Signed-off-by: György Krajcsovits <gyorgy.krajcsovits@grafana.com>
Add waitForQueryLog helper that polls for query log entries to appear
before asserting, rather than reading the file immediately after making
a query. This fixes a race condition where the query log wasn't flushed
to disk before the test read the file.
The helper uses a 5 second timeout with 100ms polling intervals, which
is generous enough to handle slow CI environments while keeping the test
responsive.
Signed-off-by: Arve Knudsen <arve.knudsen@gmail.com>
tsdb: Early compaction of stale series
Closes#13616
Based on https://github.com/prometheus/proposals/pull/55
Stale series tracking was added in #16925. This PR compacts the stale series into its own block before the normal compaction hits. Here is how the settings:
stale_series_compaction_threshold: As soon as the ratio of stale series in the head block crosses StaleSeriesImmediateCompactionThreshold, TSDB performs a stale series compaction and puts all the stale series into a block and removed it from the head, but it does not remove it from the WAL. (technically this condition is checked every minute and not exactly immediate)
Additional details
WAL replay: after a stale series compaction, tombstones are added with (MinInt64, MaxInt64) for all these stale series. During WAL replay we add a special condition where when we find such tombstone, it immediately removes the series from the memory instead of storing the tombstone. This is required so that we don't spike up memory during WAL replay and also don't keep the compacted stale series in the memory.
Head block truncation ignores this block via the added metadata, similar to out-of-order blocks.
* otlptranslator: filter __name__ from OTLP attributes to prevent duplicates
OTLP metrics can have a __name__ attribute which, when combined with the
metric name passed via extras, creates duplicate __name__ labels.
This commit implements filtering out of any __name__ metric attribute from OTLP.
Also rename TestCreateAttributes to TestPrometheusConverter_createAttributes
for consistency, and add test cases for __name__, __type__, and __unit__ OTLP metric attributes.
---------
Signed-off-by: Arve Knudsen <arve.knudsen@gmail.com>
Add a test for `LeveledCompactor.Plan()` stopping after a block matches the
`BlockExcludeFilter`, as a sub-test
`TestLeveledCompactor/Plan/BlockExcludeFilter stops iteration`.
Also moving `TestLeveledCompactor_plan` to a sub-test
of `TestLeveledCompactor`, for consistency.
Signed-off-by: Arve Knudsen <arve.knudsen@gmail.com>
Add bounds check to prevent index out of range panic when
trimStringByBytes receives a string containing only UTF-8 continuation
bytes (0x80-0xBF). Previously, the loop would decrement size below 0
when no valid rune start byte was found, causing a panic.
A malicious query string with only continuation bytes could crash
the Prometheus server via the ActiveQueryTracker before the query
was parsed or validated.
Signed-off-by: Arve Knudsen <arve.knudsen@gmail.com>
These bugs were discovered accidentally with code analysis:
- https://app.devin.ai/review/prometheus/prometheus/pull/16355
Upon further inspection and performing more analysis, 3 potential bugs were found:
1. sendloops could continue running if corresponding AM changed position in the config
2. multiple configs with the same hash would share sendloops resulting in sets without sendloops
3. sendloops could continue running if the config hash was changed
- `TestApplyConfigSendLoopsNotStoppedOnKeyChange`: Verifies sendLoops work when keys swap (no fix needed)
- `TestApplyConfigDuplicateHashSharesSendLoops`: Verifies sendLoops are independent with duplicate hashes (bug fixed)
- `TestApplyConfigHashChangeLeaksSendLoops`: Verifies sendLoops are cleaned up when hash changes (bug fixed)
Signed-off-by: Siavash Safi <siavash@cloudflare.com>
* fix(teststorage/appender.go): TODO and Sample staleness check
Allow different order of consecutive stale samples between the expected
and actual array for RequireEqual and RequireNotEqual by trying to
swap the expected side until it matches.
Also fix the definition of stale sample in the test, it's not only
float, but defined for native histograms as well.
Signed-off-by: György Krajcsovits <gyorgy.krajcsovits@grafana.com>
* add unit tests
Signed-off-by: György Krajcsovits <gyorgy.krajcsovits@grafana.com>
---------
Signed-off-by: György Krajcsovits <gyorgy.krajcsovits@grafana.com>
testutil.T was needed before https://go.dev/doc/go1.13#testingpkgtesting
Now it's inconsistent and confusing, so let's kill it.
Signed-off-by: bwplotka <bwplotka@gmail.com>
Update go.work from go 1.24.9 to go 1.24.0 to match the version
specified in all go.mod files across the project
Signed-off-by: Julien Pivotto <291750+roidelapluie@users.noreply.github.com>
ARG declarations before FROM are only available within the FROM
instruction and go out of scope afterward. Re-declare ARCH and OS
after FROM so they're available for the COPY instructions.
This fixes the build failure where ${OS}-${ARCH} resolved to empty
strings, causing "not found" errors for .build/-/prometheus.
Signed-off-by: Julien Pivotto <291750+roidelapluie@users.noreply.github.com>
* notifier: unit test for dropping throughput on stuck AM
Ref: https://github.com/prometheus/prometheus/issues/7676
Signed-off-by: György Krajcsovits <gyorgy.krajcsovits@grafana.com>
Signed-off-by: Siavash Safi <siavash@cloudflare.com>
* chore(notifier): remove year from copyrights
Signed-off-by: Siavash Safi <siavash@cloudflare.com>
* feat(notifier): independent alertmanager sendloops
Independent Alertmanager sendloops avoid issues with queue overflowing
when one or more Alertmanager instances are unavailable which could
result in lost alert notifications.
The sendloops are managed per AlertmanagerSet which are dynamically
added/removed with service discovery or configuration reload.
The following metrics now include an extra dimention for alertmanager label:
- prometheus_notifications_dropped_total
- prometheus_notifications_queue_capacity
- prometheus_notifications_queue_length
This change also includes the test from #14099Closes#7676
Signed-off-by: machine424 <ayoubmrini424@gmail.com>
Signed-off-by: Siavash Safi <siavash@cloudflare.com>
---------
Signed-off-by: György Krajcsovits <gyorgy.krajcsovits@grafana.com>
Signed-off-by: Siavash Safi <siavash@cloudflare.com>
Signed-off-by: machine424 <ayoubmrini424@gmail.com>
Co-authored-by: György Krajcsovits <gyorgy.krajcsovits@grafana.com>