The issue is that std::promise internally also used thread local
storage, in a call to `std::call_once` in `std::promise::set_value()`.
The theory is that since all paths in `Send()` run this `std::call_once`
routine and from then on, then Coroutine function looks like a normal
function, the compiler inlined `set_value()` and moved the common parts
of it to a common location for all paths before the suspension point in
WriteMessage(yc).
When finally the coroutine is resumes, it is likely that that happens
under a different thread, which still has `__once_callable` in
`std::call_once` set as `nullptr`, leading to the segmentation fault.
The fix is to not use std::promise across coroutine suspension points
and instead reimplement the functionality we required from it in a small
helper class `SyncResult` that does not require any thread local storag.
On slow systems like our ARM64 Container Image build async_handshake()
can not be cancelled in one go but needs two cancels with a chance
to run the completion handler in between. The exact reason is unknown,
and this has been found through trial and error. The alternative would
have been a socket close() in case the connection is not fully established.
The issue occurs when ::Connect in `EnsureConnected()` returns after
`Disconnect()` has already set `m_Stopped` to true. By adding a check
and throwing an exception before entering `async_handshake()` the
behavior should now always be consistent.
This was omitted by accident from the original PR, despite
being done in the original perfdata writer connection code.
Without setting this parameter, host name verification will be
disabled, which poses a security risk.
The log message on TLS handshake errors always stated that a client handshake
failed, even if if the connection was acting as the server. The commit changes
it so that the actual role is taken into account.
`strand.running_in_this_thread()` relies on thread-local storage
internally, and may return false positives if the coroutine is resumed
in a different thread than it was suspended in. In debug builds, this is
not problem, since there's no TLS optimization done by the compiler, but
in release builds, the compiler might cache the address of the
thread-local variable read before the coroutine suspension, and thus
potentially reuse the same address in a different thread after
resumption, which would cause `running_in_this_thread()` to return false
or even crash (but we didn't see any crashes related to this). So,
perform the assertion only in debug builds to prevent potential wrong
usages of the `Timeout` class. For more details, see [^1][^2][^3].
[^1]: https://github.com/chriskohlhoff/asio/issues/1366
[^2]: https://bugs.llvm.org/show_bug.cgi?id=19177
[^3]: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=26461
Now, the individual `ProcessQueueItem` functions decide whether to
acquire an `olock` or not instead of probing this from within the
worker loop. This is way easier than having to deal with the potential
out of order processing of items in the queue in both ways, i.e., we
don't want to send delete events for objects while their created events
haven't been processed yet and vice versa.
This commit restructures the queue items so that each one now has a method
`GetQueueLookupKey()` that is used to derive which elements of the queue are
considered to be equal. For this, there is a key extractor for the
`multi_index_container` that takes the `variant` from the queue item, calls
that method on it, and puts the result in a second variant type. The types in
that variant type are automatically deduced from the return types of the
individual methods.
Previously, the checkable was locked while processing all the dependency
registration stuff, so the worker thread should also do the same to
avoid any potential race conditions.
As opposed to the previous version which used a complex data structure
to correctly manage the query priorities, this version uses two separate
queues for the high and normal priority writes. All high priority writes
are processed in FIFO order but over take all queries from the normal
priority queue. The later queue only be processed when the high priority
queue is empty.
So that it doesn't cause `request body too large` errors when used with
the default OpenTelemetry Collector config that has `max_request_body_size`
set to `20MiB`.