mirror of
https://github.com/opnsense/src.git
synced 2026-03-21 02:10:09 -04:00
The acquisition and release of an uncontended default/normal pthread mutex on FreeBSD is suprisingly slow, e.g., pthread wrlocks and binary semaphores both exhibit roughly 33% lower latency, while default/normal mutexes on Linux exhibit roughly 67% lower latency than FreeBSD. This is likely explained by the fact that AFAICT in the best case to acquire an uncontended mutex on Linux one need touch only 1 page and read+modify only 1 cacheline, whereas on FreeBSD we need to touch at least 4 pages, read 6 cachelines, and modify at least 4 cachelines. This patch does not address the pthread mutex architecture. Instead, it improves performance by adding the __always_inline attribute to mutex_lock_common() and mutex_unlock_common() to encourage constant folding and propagation, thereby lowering the latency to acquire and release a mutex due to a shorter code path with fewer compares, jumps, and mispredicts. With this patch on a stock build I see a reduction in latency of roughly 7% for default/normal mutexes, and 17% for robust mutexes. When built without PTHREADS_ASSERTIONS enabled I see a reduction in latency of roughly 15% and 26%, respectively. Suprisingly, I see similar reductions in latency for heavily contended mutexes. By default, this patch increases the size of libthr.so.3 by 2448 bytes, but when built without PTHREAD_ASSERTIONS enabled it only increases by 448 bytes. Reviewed by: jhb (previous version), kib MFC after: 1 week Differential revision: https://reviews.freebsd.org/D40912 |
||
|---|---|---|
| .. | ||
| arch | ||
| sys | ||
| tests | ||
| thread | ||
| libthr.3 | ||
| Makefile | ||
| Makefile.depend | ||
| plockstat.d | ||
| pthread.map | ||