Commit graph

6870 commits

Author SHA1 Message Date
Johannes Schmidt
34ce6495be Track deleted runtime objects to avaóid erroneous recreation 2026-05-04 14:54:45 +02:00
Julian Brost
4bd2222412
Merge pull request #9719 from Icinga/execvp
ProcessSpawnImpl(): use POSIX execvp(3), not own copy of GNU/OpenBSD-only execvpe(3)
2026-04-23 14:04:31 +02:00
Johannes Schmidt
03d3558621
Merge pull request #10799 from Icinga/fix-pdwc-tls-host-check
Fix host name verification for `PerfdataWriterConnection`
2026-04-22 11:22:27 +02:00
Johannes Schmidt
b170c3dc75 Silence -Wunnecessary-virtual-specifier warning on clang 2026-04-20 12:46:50 +02:00
Yonas Habteab
928235c838
Merge pull request #10800 from Icinga/fix-otel-stats
OTLPMetricsWriter: don't add queue stats as counter
2026-04-20 12:25:55 +02:00
Yonas Habteab
d7ed56baa8
Merge pull request #10798 from Icinga/fix-misleading-tls-timeout-logging
Fix misleading TLS handshake error logging
2026-04-20 09:54:02 +02:00
Yonas Habteab
bc5f01d0fc OTLPMetricsWriter: don't add queue stats as counter 2026-04-20 09:15:53 +02:00
Julian Brost
61c6c7f110
Merge pull request #10619 from Icinga/efficient-config-and-state-update-queue
IcingaDB: better config and state update queueing
2026-04-17 11:19:50 +02:00
Johannes Schmidt
f5be692d33 Correctly create AsioTlsStream with host argument
This was omitted by accident from the original PR, despite
being done in the original perfdata writer connection code.

Without setting this parameter, host name verification will be
disabled, which poses a security risk.
2026-04-17 10:08:20 +02:00
Julian Brost
1b33451665 Fix misleading TLS handshake error logging
The log message on TLS handshake errors always stated that a client handshake
failed, even if if the connection was acting as the server. The commit changes
it so that the actual role is taken into account.
2026-04-16 17:49:48 +02:00
Yonas Habteab
a01870a6aa Fix compiler crash on SLES 15.7 arm64 runner 2026-04-16 14:01:49 +02:00
Yonas Habteab
bfb0e7db12 Inline SendNextUpdate & remove superfluous m_RconWorker checks 2026-04-16 08:53:58 +02:00
Yonas Habteab
e4436cbcf0 IoEngine: introduce & use IsStrandRunningOnThisThread function 2026-04-15 17:51:17 +02:00
Yonas Habteab
99328ec417 Log pending items stats regularly & include them as perfdata in IcingaDB check 2026-04-15 17:33:43 +02:00
Yonas Habteab
25a18a5a7e OTel: downgrade broken_pipe errors to debug log 2026-04-15 17:25:14 +02:00
Yonas Habteab
ecf5632ef8 Timeout: lift VERIFY -> ASSERT to prevent crashes in release builds
`strand.running_in_this_thread()` relies on thread-local storage
internally, and may return false positives if the coroutine is resumed
in a different thread than it was suspended in. In debug builds, this is
not problem, since there's no TLS optimization done by the compiler, but
in release builds, the compiler might cache the address of the
thread-local variable read before the coroutine suspension, and thus
potentially reuse the same address in a different thread after
resumption, which would cause `running_in_this_thread()` to return false
or even crash (but we didn't see any crashes related to this). So,
perform the assertion only in debug builds to prevent potential wrong
usages of the `Timeout` class. For more details, see [^1][^2][^3].

[^1]: https://github.com/chriskohlhoff/asio/issues/1366
[^2]: https://bugs.llvm.org/show_bug.cgi?id=19177
[^3]: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=26461
2026-04-15 17:25:14 +02:00
Julian Brost
267675e80b RedisConnection: simplify GetOldestPendingQueryTs function 2026-04-02 16:37:57 +02:00
Yonas Habteab
7d7159c033 Reduce min queue item age from 1000ms to 300ms 2026-04-02 16:37:57 +02:00
Yonas Habteab
390ee8c02f Inline DequeueAndProcessOne & don't process items out of order
Now, the individual `ProcessQueueItem` functions decide whether to
acquire an `olock` or not instead of probing this from within the
worker loop. This is way easier than having to deal with the potential
out of order processing of items in the queue in both ways, i.e., we
don't want to send delete events for objects while their created events
haven't been processed yet and vice versa.
2026-04-02 16:37:57 +02:00
Julian Brost
855f6c7c0c IcingaDB: use key extractor for worker queue
This commit restructures the queue items so that each one now has a method
`GetQueueLookupKey()` that is used to derive which elements of the queue are
considered to be equal. For this, there is a key extractor for the
`multi_index_container` that takes the `variant` from the queue item, calls
that method on it, and puts the result in a second variant type. The types in
that variant type are automatically deduced from the return types of the
individual methods.
2026-04-02 16:37:57 +02:00
Yonas Habteab
2048450159 IcingaDB: put all queue related stuff into icingadb:task_queue namespace 2026-04-02 16:37:57 +02:00
Julian Brost
8375934d19 Simplify IcingaDB::PendingItemsThreadProc() event loop 2026-04-02 16:37:57 +02:00
Yonas Habteab
b633d6b0d0 IcingaDB: remove unused UpdateObjectAttrs method 2026-04-02 16:37:57 +02:00
Yonas Habteab
89d8c326e6 Fix missing olock for dependency child registration
Previously, the checkable was locked while processing all the dependency
registration stuff, so the worker thread should also do the same to
avoid any potential race conditions.
2026-04-02 16:37:57 +02:00
Julian Brost
9d5883df78 IcingaDB: use polymorphism for queue entries 2026-04-02 16:37:57 +02:00
Yonas Habteab
e88366ddae IcingaDB: subscribe to OnNextCheckChanged signal
We can't drop the `OnNextCheckUpdated` signal entirely yet, as IDO still
relies on it.
2026-04-02 16:37:57 +02:00
Yonas Habteab
485227390f Revert "CheckerComponent#CheckThreadProc(): also propagate next check update to Icinga DB"
This reverts commit e9b8c67975.
2026-04-02 16:37:57 +02:00
Yonas Habteab
bbb7d0249e RedisConnection: enhance WriteQueueItem & related usages 2026-04-02 16:37:57 +02:00
Yonas Habteab
cbb4147055 RedisConnection: simplify query prioritization logic
As opposed to the previous version which used a complex data structure
to correctly manage the query priorities, this version uses two separate
queues for the high and normal priority writes. All high priority writes
are processed in FIFO order but over take all queries from the normal
priority queue. The later queue only be processed when the high priority
queue is empty.
2026-04-02 16:37:57 +02:00
Yonas Habteab
adbefa5540 Revert "IcingaDB: suppress state sync until config sync finished"
This reverts commit f6f7d9b635 and all
other its new users.
2026-04-02 14:06:28 +02:00
Yonas Habteab
d364ad981e IcingaDB: enqueue config runtime updates to the worker queue 2026-04-02 14:06:28 +02:00
Yonas Habteab
4dbf782e4e OTel: raise runtime error when failing to fully serialize Protobuf request 2026-04-02 10:51:35 +02:00
Yonas Habteab
465650262a OTel: add connect & handshake timeout 2026-04-02 10:51:35 +02:00
Julian Brost
1139ba9b0d OTel: replace AsioDualEvent usage with AsioConditionVariable 2026-04-02 10:51:35 +02:00
Yonas Habteab
044f85ee76 OTel: do not perform graceful disconnect on I/O timeout 2026-04-01 12:18:22 +02:00
Yonas Habteab
96c3364ab0 OTel: fix race condition triggered on Icinga 2 reload/shutdown
Co-Authored-By: Julian Brost <julian.brost@icinga.com>
2026-04-01 12:18:22 +02:00
Yonas Habteab
715aacc19c Don't manually include custom Protobuf dir via compiler flag
Co-Authored-By: Johannes Schmidt <johannes.schmidt@icinga.com>
2026-04-01 12:18:21 +02:00
Yonas Habteab
e6c420e106 OTLP: Set enable_ha to true by default 2026-04-01 12:18:21 +02:00
Yonas Habteab
3f68eea1fd Reduce default flush_threshold to 16MiB
So that it doesn't cause `request body too large` errors when used with
the default OpenTelemetry Collector config that has `max_request_body_size`
set to `20MiB`.
2026-04-01 12:18:21 +02:00
Julian Brost
8f36bdcddc Replace for with a simpler while loop & fix a typo 2026-04-01 12:18:21 +02:00
Yonas Habteab
8bdfba8772 Allow users to provide additional resource attributes 2026-04-01 12:18:21 +02:00
Yonas Habteab
60fe45cd6e Add OTLPMetricsWriter 2026-04-01 12:18:21 +02:00
Yonas Habteab
415140bc36 Add common OTel type/lib 2026-04-01 12:18:21 +02:00
Yonas Habteab
374cc6e282 Cache Icinga DB env_id in Application class as well
So that other components can use it without having to import any Icinga
DB related header files, but only the base library.
2026-04-01 12:15:58 +02:00
Julian Brost
4f13651cb0
Merge pull request #10727 from Icinga/icingadb-missing-exception-messages
Redis exceptions: add proper what() messages
2026-03-31 10:27:33 +02:00
Julian Brost
9d361e1fb3
Merge pull request #10734 from Icinga/deprecate-everything-we-dont-like
Schedule deprecated features for removal in v2.18
2026-03-31 10:25:44 +02:00
Julian Brost
207764584a Redis exceptions: remove inline specifiers
Remove them as they are redundant, as requested in the PR review.
2026-03-27 17:02:05 +01:00
Julian Brost
221382486e Redis exceptions: use BOOST_THROW_EXCEPTION
Use it for consistency, as requested in the PR review.
2026-03-27 17:01:51 +01:00
Julian Brost
862f012381 Redis exceptions: add proper what() messages
RedisDisconnected::what() and RedisProtocolError::what() always returned an
empty string. Similarly, BadRedisType::what() and BadRedisInt::what() only
return the value that couldn't be parsed without any information about the
exception type. If only what() is used when printing the exception, as it's
typical, this results in unhelpful log messages like the following when simply
stopping the Redis server:

    [2026-02-23 14:33:33 +0100] critical/IcingaDB: Error during receiving the response to a query which has been fired and forgotten: Connection reset by peer [system:104 at /usr/include/boost/asio/detail/reactive_socket_recv_op.hpp:134 in function 'do_complete']
    [2026-02-23 14:33:33 +0100] critical/IcingaDB: Error during receiving the response to a query which has been fired and forgotten:
    [2026-02-23 14:33:33 +0100] critical/IcingaDB: Error during receiving the response to a query which has been fired and forgotten:
    [2026-02-23 14:33:33 +0100] critical/IcingaDB: Cannot connect to redis-1:6379: Connection refused [system:111 at /usr/include/boost/asio/detail/reactive_socket_connect_op.hpp:98 in function 'do_complete']

This commit changes these messages so that something like "Redis disconnected",
"Redis protocol error: bad int: foo", or "Redis protocol error: bad type: ?" is
returned. In doing so, it also removes a member of type std::vector<char> in
BadRedisInt as this is unsafe to use in exceptions (it violates the requirement
that copy constructor and assignment must be nothrow, see
https://en.cppreference.com/w/cpp/error/exception.html#Standard_exception_requirements).

With this commit, the log messages are now a bit more helpful:

    [2026-02-23 15:08:23 +0100] critical/IcingaDB: Error during receiving the response to a query which has been fired and forgotten: Connection reset by peer [system:104 at /usr/include/boost/asio/detail/reactive_socket_recv_op.hpp:134 in function 'do_complete']
    [2026-02-23 15:08:23 +0100] critical/IcingaDB: Error during receiving the response to a query which has been fired and forgotten: Redis disconnected
    [2026-02-23 15:08:23 +0100] critical/IcingaDB: Error during receiving the response to a query which has been fired and forgotten: Redis disconnected
    [2026-02-23 15:08:23 +0100] critical/IcingaDB: Cannot connect to redis-1:6379: Connection refused [system:111 at /usr/include/boost/asio/detail/reactive_socket_connect_op.hpp:98 in function 'do_complete']
2026-03-27 17:01:49 +01:00
Johannes Schmidt
2108300cf8 Add warnings to deprecated features indicating removal in v2.18 2026-03-27 14:20:55 +01:00