An env ID with the wrong length, either due to a copy-paste error or
human error during testing, results in a SQL CHECK CONSTRAINT violation
that is retried multiple times until it finally fails.
Historical data from an older Icinga 2 installation contained NULL
values for the name column in some rows of the icinga_commenthistory and
icinga_downtimehistory tables.
Normally this field contains something like
${name1}!${name2}!${unique_value} where the $unique_value is based on a
timestamp for older entries and a UUID for newer ones. For a concrete
example, this could be "host.example.com!ping6!123…".
Unfortunately, using an empty string for these NULL values will cause an
error later because the new primary key will be calculated based on it.
Therefore, a new deterministic name is generated based on the primary
keys and the known name1 and name2 values.
Closes#766.
A float isn't necessary as in Icinga 2 Checkable#max_check_attempts and
check_attempt are ints. But uint8 isn't enough for e.g. 1 check/s to get
HARD after 5m (300s > 255).
The reason for a switch in the HA roles was not always directly clear.
This change now introduces additional debug logging, indicating the
reasoning for either taking over or handing over the HA responsibility.
First, some logic was moved from the SQL query selecting active Icinga
DB instances to Go code. This allowed distinguishing between no
available responsible instances and responsible instances with an
expired heartbeat.
As the HA's peer timeout is logically bound to the Redis timeout, it
will now reference this timeout with an additional grace timeout. Doing
so eliminates a race between a handing over and a "forceful" take over.
As the old code indicated a takeover on the fact that no other instance
is active, it will now additionally check if it is already being the
active/responsible node. In this case, the takeover logic - which will
be interrupted at a later point as the node is already responsible - can
be skipped.
Next to the additional logging messages, both the takeover and handover
channel are now transporting a string to communicate the reason instead
of an empty struct{}. By doing so, both the "Taking over" and "Handing
over" log messages are enriched with reason.
This also required a change in the suppressed logging handling of the
HA.realize method, which got its logging enabled through the shouldLog
parameter. Now, there are both recurring events, which might be
suppressed, as well as state changing events, which should be logged.
Therefore, and because the logTicker's functionality was not clear to me
on first glance, I renamed it to routineLogTicker.
While dealing with the code, some function signature documentation were
added, to ease both mine as well as the understanding of future readers.
Additionally, the error handling of the SQL query selecting active
Icinga DB instances was changed slightly to also handle wrapped
sql.ErrNoRows errors.
Closes#688.
To better recognize the Icinga DB version used in case of errors, it is
now logged at startup.
If VCS is available during build, the current commit is included.
- Dirty VCS directory:
> 2024-03-11T14:36:29.317+0100 INFO icingadb Starting Icinga DB daemon (1.1.1-g0e9810c-dirty)
- Clean VCS directory:
> 2024-03-11T14:38:09.664+0100 INFO icingadb Starting Icinga DB daemon (1.1.1-geed8589)
- Build with `-buildvcs=false`:
> 2024-03-11T14:38:56.554+0100 INFO icingadb Starting Icinga DB daemon (1.1.1)
Closes#689.
This commit simplifies the `icingaDbOutputStage` type to contain only one
entity slice to be insert/upsert. This allows to simplify the handling in
`migrateOneType()` by removing nested loops.
Additionally, a bit of code inside that function is outsourced into a new
`utils.ChanFromSlice()` function. This makes the body of the loop over the
insert/upsert operation (the loop using the `op` variable) simple enough so
that it can just be unrolled which saves the inline struct and slice definition
for that loop.