Commit graph

73 commits

Author SHA1 Message Date
Alexander A. Klimov
7f20cf2e33 Test icingaredis.CreateEntities() 2024-08-05 12:45:32 +02:00
Eric Lippmann
7c068d4adf Use icinga-go-library 2024-05-24 09:56:28 +02:00
Eric Lippmann
c070615e64 Move Redis related code to redis 2024-05-22 11:51:22 +02:00
Eric Lippmann
aa3c00893f Move contracts#Waiter{,Func} to com#Waiter{,Func} 2024-05-22 11:51:21 +02:00
Eric Lippmann
77ccdfc303 Move type related utility functions from internal to types 2024-05-22 11:51:21 +02:00
Eric Lippmann
2f3bf491d7 Move utils#Name() to types#Name() 2024-05-22 11:51:21 +02:00
Eric Lippmann
e2b4f0297f Introduce strcase for converting string cases 2024-05-22 11:51:21 +02:00
Eric Lippmann
75501e11f8 Move database related contracts to database/contracts 2024-05-22 11:51:21 +02:00
Eric Lippmann
5029e328c8 Unify notation of n * time.Duration 2024-04-11 13:01:31 +02:00
Alvar Penning
779afd1da3 Enhance HA "Taking over", "Handing over" logging
The reason for a switch in the HA roles was not always directly clear.
This change now introduces additional debug logging, indicating the
reasoning for either taking over or handing over the HA responsibility.

First, some logic was moved from the SQL query selecting active Icinga
DB instances to Go code. This allowed distinguishing between no
available responsible instances and responsible instances with an
expired heartbeat.

As the HA's peer timeout is logically bound to the Redis timeout, it
will now reference this timeout with an additional grace timeout. Doing
so eliminates a race between a handing over and a "forceful" take over.

As the old code indicated a takeover on the fact that no other instance
is active, it will now additionally check if it is already being the
active/responsible node. In this case, the takeover logic - which will
be interrupted at a later point as the node is already responsible - can
be skipped.

Next to the additional logging messages, both the takeover and handover
channel are now transporting a string to communicate the reason instead
of an empty struct{}. By doing so, both the "Taking over" and "Handing
over" log messages are enriched with reason.

This also required a change in the suppressed logging handling of the
HA.realize method, which got its logging enabled through the shouldLog
parameter. Now, there are both recurring events, which might be
suppressed, as well as state changing events, which should be logged.
Therefore, and because the logTicker's functionality was not clear to me
on first glance, I renamed it to routineLogTicker.

While dealing with the code, some function signature documentation were
added, to ease both mine as well as the understanding of future readers.

Additionally, the error handling of the SQL query selecting active
Icinga DB instances was changed slightly to also handle wrapped
sql.ErrNoRows errors.

Closes #688.
2024-04-02 13:23:11 +02:00
Eric Lippmann
e31b101f4f Upgrade go-redis to v9
Co-Authored-By: Alvar Penning <alvar.penning@icinga.com>
2024-03-22 15:32:15 +01:00
Alexander A. Klimov
5a79a72ff5 Heartbeat#sendEvent(m): nil-check m before dereferencing it
as it can be nil.
2023-01-19 16:55:11 +01:00
Alexander A. Klimov
6209b5b376 Save memory during config sync via SyncSubject#FactoryForDelta()
Code comment TL;DR: Allocate the same amount of smaller data structures
2022-09-13 17:57:23 +02:00
Eric Lippmann
cd96f0de6f Block XREADs for a maxium of one second
I just had the observation that blocking XREADs without timeouts (BLOCK
0) on multiple consecutive Redis restarts and I/O timeouts exceeds Redis
internal retries and eventually leads to fatal errors. @julianbrost
looked at this for clarification, here is his finding:

go-redis only considers a command successful when it returned something,
so a successfully started blocking XREAD consumes a retry attempt each
time the underlying Redis connection is terminated. If this happens
often before any element appears in the stream, this error is
propagated. (This also means that even with this PR, when restarting
Redis often enough so that a query never reaches the BLOCK 1sec, this
would still happen.)

https://github.com/Icinga/icingadb/pull/504#issuecomment-1164589244
2022-06-28 16:09:29 +02:00
Julian Brost
061660b023 Telemetry: use mutex for synchronizing last database error
The old CompareAndSwap based code tended to end up in an endless loop. Replace
it by simple syncrhonization mechanisms where this can't happen.
2022-06-28 13:30:00 +02:00
Julian Brost
def7c5f22c Telemetry: change stats names in Redis
The same names are used in perfdata names and config_sync sounds more natural
than sync_config.
2022-06-28 13:30:00 +02:00
Julian Brost
741460c935 Telemetry: rename keys in heartbeat stream
In both C++ and Go, the keys are only used as constant strings, so namespacing
them just adds clutter for the `general:*` keys, therefore remove it.
2022-06-28 13:30:00 +02:00
Julian Brost
36d5f7b33c Telemetry: send Go metrics as performance data string
Rather than using a JSON structure to convey these values, simply use the
existing format to communicate performance data to Icinga 2.

Also removes the reference to Go in the Redis structure, allowing this string
to be extended with more metrics in the future without running into naming
issues.
2022-06-28 13:30:00 +02:00
Alexander A. Klimov
e1ff704aff Write own heartbeat into icingadb:telemetry:heartbeat
including version, current DB error and HA status quo.
2022-06-23 18:31:45 +02:00
Alexander A. Klimov
64d7f1be43 Remove unused StreamLastId() 2022-06-23 18:31:45 +02:00
Alexander A. Klimov
fac9f5e4e5 Write ops/s by op and s to icingadb:telemetry:stats 2022-06-15 09:51:59 +02:00
Eric Lippmann
f21f50e958 Reduce max_hmget_connections to 8 2021-11-12 16:29:59 +01:00
Eric Lippmann
ea74dc172a Rename periodic.Stoper to periodic.Stopper 2021-11-05 17:57:27 +01:00
Eric Lippmann
ccda48234e Use custom logger for accessing the interval for periodic logging 2021-11-05 17:57:22 +01:00
Eric Lippmann
43bcd2bbee Remove syncing $redisKey log message
This info message just pollutes the logs and
for debugging we log the execution anyway.
2021-11-05 17:52:11 +01:00
Eric Lippmann
8ce917d45a Remove waiting for heartbeat message
If a heartbeat is pending,
we log it every 60 seconds anyway.
2021-11-05 17:52:11 +01:00
Eric Lippmann
5f1639aca2 Use pkg periodic for Redis logs 2021-11-05 17:18:05 +01:00
Eric Lippmann
8a03745273 Speak of Icinga heartbeat not Icinga 2 heartbeat 2021-11-05 17:18:03 +01:00
Julian Brost
54dbe0cfbe
Merge pull request #391 from Icinga/bugfix/multi-environment
Better handling of multiple environments
2021-11-05 16:55:21 +01:00
Julian Brost
9b02b18f46 Use new environment ID
https://github.com/Icinga/icinga2/pull/9036 introduced a new environment ID for
Icinga DB that's written to the icinga:stats stream as field
"icingadb_environment". This commit updates the code to make use of this ID
instead of the one derived from the Icinga 2 Environment constant.
2021-11-03 15:47:38 +01:00
Eric Lippmann
563aafaf90 Config: Validate xread_count 2021-11-03 15:23:40 +01:00
Eric Lippmann
d8ba0c374a
Merge pull request #364 from Icinga/feature/history-sync-foreign-keys
Add foreign key constraints to history tables
2021-10-07 18:38:33 +02:00
Julian Brost
bfcc324535 History sync: rewrite to use a sequential pipeline
This is in preparation for adding foreign key constraints to the history
tables. For this, is is required to insert the rows into the different history
tables in a defined order.
2021-10-05 18:35:02 +02:00
Julian Brost
82530c771d Redis/DB: export options member
This change allows the history sync to use values configured in these options.
2021-10-05 18:34:55 +02:00
Julian Brost
217ab03e59 heartbeat: wrap messages with a timestamp
Track when a heartbeat was received to allow other components to check when it
will expire.
2021-10-04 16:58:35 +02:00
Julian Brost
8b2cb3acb8 heartbeat: use a single channel for all beat/loss events
Using Cond does not allow to reliably catch all events as one will only receive
events that occour after starting to listen. For heartbeat loss events it's
import to reliably catch them to not remain in an HA active state incorrectly.

fixes #360
2021-10-04 16:36:09 +02:00
Julian Brost
e0c903bfdc Redis HYield: remove duplicates returned by HSCAN
fixes #349
2021-09-23 14:36:51 +02:00
Julian Brost
4457f9f440
Merge pull request #365 from Icinga/data-races
Fix data races
2021-09-23 12:32:19 +02:00
Eric Lippmann
454381c820 Use uint64 instead of Counter
Use uint64 as there is no longer any concurrent access.
2021-09-23 12:18:08 +02:00
Eric Lippmann
98202e1257 Use buffered channel
Use a buffered channel so that the next HSCAN call does not have
to wait until the previous result has been processed.
2021-09-23 09:37:31 +02:00
Eric Lippmann
c1e722f5fa Do not close channel too early
This fixes a data race where the pairs channel was closed too early
when the context is canceled and therefore the outer errgroup
returns from Redis operations before Wait() is called on the inner
errgroup. Unfinished Go methods in the inner errgroup would then
try to work on a closed channel.
2021-09-23 09:37:31 +02:00
Julian Brost
17321cdfc3 Fix use of wrong log function on heartbeat loss
Has to use the Warnw function as it passes additional zap attributes.
2021-09-23 09:27:26 +02:00
Julian Brost
be9054628a Ensure extra config options are properly initialized
YAML is decoded by the structure of the YAML source document, not the Go
destination data structure. Therefore, the old code did not always call
UnmarshalYAML() on all sub-structs. Therefore, defaults were not always set but
zero values were used, resulting in all kind of strange behavior.

This commit changes the code so that it no longer relies on individual
UnmarshalYAML() functions to set the defaults for each sub-struct but instead
just sets all of them when creating the surrounding Config instance. It also
moves the config validation to separate Validate() functions.
2021-09-01 18:49:38 +02:00
Eric Lippmann
fbbb9bfacd Don't allow 0 for timeout redis option
0 stands for deactivate, which makes no sense here.
2021-08-10 09:29:27 +02:00
Eric Lippmann
559b27cd8b Don't inline Redis options
There is now the options key to separate required and optional
configuration. Before, both were mixed.
2021-08-09 21:48:27 +02:00
Eric Lippmann
bf415f2e1c Add missing doc in stats_message 2021-08-09 10:30:53 +02:00
Eric Lippmann
ff88cb73f7 Add missing doc in icinga_status 2021-08-09 10:30:53 +02:00
Eric Lippmann
92bc1b26c7 Add missing doc in redis utils 2021-08-09 10:30:53 +02:00
Eric Lippmann
fee30380d5 Add missing doc in client 2021-08-09 10:30:53 +02:00
Eric Lippmann
7bda89e79d Return error instead of panicking 2021-08-09 10:29:47 +02:00