* notifier: unit test for dropping throughput on stuck AM
Ref: https://github.com/prometheus/prometheus/issues/7676
Signed-off-by: György Krajcsovits <gyorgy.krajcsovits@grafana.com>
Signed-off-by: Siavash Safi <siavash@cloudflare.com>
* chore(notifier): remove year from copyrights
Signed-off-by: Siavash Safi <siavash@cloudflare.com>
* feat(notifier): independent alertmanager sendloops
Independent Alertmanager sendloops avoid issues with queue overflowing
when one or more Alertmanager instances are unavailable which could
result in lost alert notifications.
The sendloops are managed per AlertmanagerSet which are dynamically
added/removed with service discovery or configuration reload.
The following metrics now include an extra dimention for alertmanager label:
- prometheus_notifications_dropped_total
- prometheus_notifications_queue_capacity
- prometheus_notifications_queue_length
This change also includes the test from #14099Closes#7676
Signed-off-by: machine424 <ayoubmrini424@gmail.com>
Signed-off-by: Siavash Safi <siavash@cloudflare.com>
---------
Signed-off-by: György Krajcsovits <gyorgy.krajcsovits@grafana.com>
Signed-off-by: Siavash Safi <siavash@cloudflare.com>
Signed-off-by: machine424 <ayoubmrini424@gmail.com>
Co-authored-by: György Krajcsovits <gyorgy.krajcsovits@grafana.com>