is also returned by pthread_detach() if a thread was already
detached, the error code was already documented:
> [EINVAL] The implementation has detected that the value speci-
> fied by thread does not refer to a joinable thread.
* Correct a signed/unsigned problem that broke handling of files >2G.
* Implement "skip" support for much faster "tar -t".
Thanks to: Robert Sciuk for sending me a DVD that illustrated the first problem
* If write block size is zero, don't block at all.
This supports the unusual requirement of applications
that need "no-delay" writes.
* Expose _write_finish_entry() to give such applications more
control over write boundaries. (Normal applications do not
need this, as entries are completed automatically.)
* Correct the type of write callbacks; this is a minor API
change that does not affect the ABI.
* Correct the error handling in _write_next_header() around
completing the previous entry.
* Correct the documentation for block-size markers: Remove
docs for the long-defunct _read_set_block_size(); document
all of the write block size manipulators.
MFC after: 14 days
traditional shortcut of defining on-disk layouts using structures of
character arrays. Unfortunately, as recently discussed on cvs-all@,
this usage is not actually sanctioned by the standards and
specifically fails on GCC/arm (unless your data structures happen to
be "naturally aligned").
The new code defines offsets/sizes for data fields and accesses
them using explicit pointer arithmetic, instead of casting to
a structure and accessing structure fields. In particular,
the new code is now clean with WARNS=6 on arm.
MFC after: 14 days
and correct the use of unary minus with an unsigned value. (The unary
minus here is actually being used as a bitwise operation, which is
unusual enough to deserve a clarifying cast.)
archive_{read,write}_open_filename():
* Update Makefile to build the files using the new name.
* Update docs to document the new names, mentioning the
old ones as "deprecated synonyms."
* The old filenames will be reconnected to the build soon;
I'll soon recyce those files for a slightly different purpose.
internal format-specific functions return the same as the public
function, so that the public API layer doesn't have to guess the
correct return value. This addresses an obscure problem that occurs
when someone tries to write more data than the size of the entry (as
indicated in the entry header). In this case, the return value from
archive_write_data() was incorrect, reflecting the requested write
rather than the amount actually written.
MFC after: 15 days
- make document title match filename;
- remove hard sentence breaks, whitespace at EOL, and double whitespace;
- sort SEE ALSO xrefs, adding missing section numbers;
- fix a misspelled macro name.
* Use public API, don't access struct archive directly. (People should be able to copy these into their applications as a template for custom I/O callbacks.)
* Set "skip" only for regular files. ("skip" allows the low-level library to catch attempts to add an archive to itself or extract over itself.)
* Simplify the write_open functions by just calling stat() at the beginning. Somehow, these functions had acquired some complex logic that tried to avoid the stat() call but never succeeded.
MFC after: 10 days
file. This doesn't happen in normal use, because the file I/O and
decompression layers only pass through smaller blocks. It can happen
with custom read functions that block I/O in larger blocks.
* Actually use the HAVE_<header>_H macros to conditionally include
system headers. They've been defined for a long time, but only
used in a few places. Now they're used pretty consistently
throughout.
* Fill in a lot of missing casts for conversions from void*.
Although Standard C doesn't require this, some people have been
trying to use C++ compilers with this code, and they do require it.
Bit-for-bit, the compiled object files are identical, except for
one assert() whose line number changed, so I'm pretty confident I
didn't break anything. ;-)
h_errno is not an extern int, just a macro providing an integer lvalue.
PR: doc/50573
Submitted by: Ronald F.Guilmette <rfg@monkeys.com>
Reviewed by: trhodes
MFC after: 3 days
following fix:
Retransmission timeouts should be based on which attempt
it is to the nameserver and not the nameserver itself.
Obtained from: ISC
MFC after: 3 days
Remove the const qualifier from ap argument for __v2printf, that induced
that breakage, and seems to be the real reason for bad code. ap is modified
inside the __v2printf body by va_arg macro.
Pointy hat to: kib
Approved by: pjd (mentor)
i386 with default optimization level (-O2), va_list pointer ap in the
__v2printf function is advanced before the use. That cause argument
shift and garbage instead last argument in printf-family when xprintf is
activated.
The nsswitch is easy victim of the bug.
Reviewed by: kan
Approved by: kan (mentor)
MFC after: 1 week
Issue __sflush() before possible setting O_APPEND mode or ftruncate(),
write to wrong place may occurse oserwise.
Use simplified _sseek() to the start, if no O_APPEND is set, instead
of _fseeko() (_sseek() to the end, if O_APPEND, occurse later, as for
file != NULL).
Don't check seek error return, as original fopen() and freopen() never
does.
file != NULL:
Add missing _sseek() to the end.
thread signal mask has been updated to avoid stack overflow during signal
bursts.
Don't block signal forever if no threads can currently handle signal.
Check for pending signal after direct invocation of signal handler.
UNIX signalling semantics require that processes in the same
session always be able to deliver SIGCONT to one another,
overriding the remaining protections.
Fix SIGCONT special case description similar to rev. 1.22 kill.2.
PR: docs/58710
Submitted by: Ryan Younce
MFC after: 2 weeks
restore it directly and skip chmod() during the post-extract fixup.
In particular, bsdtar -xm now completely skips the post-extract fixup
for directories, which produces a noticable speedup in that case.
With the first part of my previous Summer of Code work, we get:
-made libalias modular:
-support for 'particular' protocols (like ftp/irc/etcetc) is no more
hardcoded inside libalias, but it's available through external
modules loadable at runtime
-modules are available both in kernel (/boot/kernel/alias_*.ko) and
user land (/lib/libalias_*)
-protocols/applications modularized are: cuseeme, ftp, irc, nbt, pptp,
skinny and smedia
-added logging support for kernel side
-cleanup
After a buildworld, do a 'mergemaster -i' to install the file libalias.conf
in /etc or manually copy it.
During startup (and after every HUP signal) user land applications running
the new libalias will try to read a file in /etc called libalias.conf:
that file contains the list of modules to load.
User land applications affected by this commit are ppp and natd:
if libalias.conf is present in /etc you won't notice any difference.
The only kernel land bit affected by this commit is ng_nat:
if you are using ng_nat, and it doesn't correctly handle
ftp/irc/etcetc sessions anymore, remember to kldload
the correspondent module (i.e. kldload alias_ftp).
General information and details about the inner working are available
in the libalias man page under the section 'MODULAR ARCHITECTURE
(AND ipfw(4) SUPPORT)'.
NOTA BENE: this commit affects _ONLY_ libalias, ipfw in-kernel nat
support will be part of the next libalias-related commit.
Approved by: glebius
Reviewed by: glebius, ru
don't be greedy on the GNU "::" extension when arg separated by whitespace
and POSIX_CORRECTLY is set. From POSIX point of view this is unclear
situation, so minimal assumption looks right.
we can find another way to issue an #error, but using a preprocessed
assembler for that purpose and clobbering libc.a with an empty .o
just for the sake of #error reporting is way too much of a burden.
* Expose functions for setting the "skip file" dev/ino information
* Expose functions for setting/querying the block size on reads
* Correctly propagate errors out of archive_read_close/archive_write_close
* Update manpage with information about new functions
o avoid using a global register variable.
o redefine struct ia64_tp as a union. We don't have to get to the
fields themselves. We just need it to be of the right size with
the right alignment.
16-byte aligned. Consequently, struct tcb is a multiple of 16
bytes in size. We need to make sure there's no padding after
struct ppc32_tp. We do this by explicitly adding the necessary
padding in front of it.
include path will already point to the populated include tree. This
is left over from boot-strapping the build and install of libbsm
during the initial import and merge.
Obtained from: TrustedBSD Project
Pointed out by: ru
o The TLS pointer (r2) points 0x7000 after the *end* of the TCB.
o _rtld_allocate_tls() gets a pointer to the current TCB, not the
current TLS pointer.
o _rtld_free_tls() gets the size of the TCB structure.
into pthread structure to keep track of locked PTHREAD_PRIO_PROTECT mutex,
no real mutex code is changed, the mutex locking and unlocking code should
has same performance as before.
(size_t)(num * size) == 0
but both num and size are nonzero.
Reported by: Ilja van Sprundel
Approved by: jasone
Security: Integer overflow; calloc was allocating 1 byte in
response to a request for a multiple of 2^32 (or 2^64)
bytes instead of returning NULL.
The symptom is that syslog() fails to log anything but the "ident"
string if LOG_PERROR is specified to openlog(3) and the extensible
printf is in action.
For unclear, likely quaint historical reasons, syslog uses fwopen()
on a stack buffer, rather than using the more straightforward
and faster snprintf().
Along the way, fflush(3) is called, and since the callback writer
function returns zero instead of the length "written", __SERR
naturally gets set on the filedescriptor.
The extensible printf, in difference from the normal printf refuses
to output anything to an __SERR marked filedescriptor, and thus
the actual syslog message is supressed.
MFC: after 2 weeks
old resolver opened just one socket, BIND9's resolver may
open more than one sockets. And, BIND9's resolver doesn't
close the socket on timeout. So, we need this check.
Reported by: freebsd-cvs-src__at__oldach.net (Helge Oldach), bz
Hinted by: rwatson
integer. Presently, our implementation employs an approach that
converts the value to int64_t, then back to int, unfortunately,
this approach can be problematic when the the difference between
the two time_low is larger than 0x7fffffff, as the value is then
truncated to int.
To quote the test case from the original PR, the following is
true with the current implementation:
865e1a56-b9d9-11d9-ba27-0003476f2e88 < 062ac45c-b9d9-11d9-ba27-0003476f2e88
However, according to the DCE specification, the expected result
should be:
865e1a56-b9d9-11d9-ba27-0003476f2e88 > 062ac45c-b9d9-11d9-ba27-0003476f2e88
This commit adds a new intermediate variable which uses int64_t
to store the result of subtraction between the two time_low values,
which would not introduce different semantic of the MSB found in
time_low value.
PR: 83107
Submitted by: Steve Sears <sjs at acm dot org>
MFC After: 1 month
increases performance when extracting a single entry from a large
uncompressed archive, especially on slow devices such as USB hard
drives.
Requires a number of changes:
* New archive_read_open2() supports a 'skip' client function
* Old archive_read_open() is implemented as a wrapper now, to
continue supporting the old API/ABI.
* _read_open_fd and _read_open_file sprout new 'skip' functions.
* compression layer gets a new 'skip' operation.
* compression_none passes skip requests through to client.
* compression_{gzip,bzip2,compress} simply ignore skip requests.
Thanks to: Benjamin Lutz, who designed and implemented the whole thing.
I'm just committing it. ;-)
TODO: Need to update the documentation a little bit.
pax wasn't introduced until the 1993 (?) revision.
(I need to double-check when pax was introduced and
clarify some of the history here. In particular,
I should explain that the 'pax' standard now owns the
'ustar' format spec.)
runtime using BN_CTX_new(). This is done since in OpenSSL 0.9.7e we
can only allocate BN_CTX on the stack by including an internal OpenSSL
header file, and in OpenSSL 0.9.8 BN_CTX is entirely opaque, so having
it on the stack is not possible at all.
This is done as preparation for OpenSSL 0.9.8b import.
Tested on: amd64 i386 ia64
Tested with: src/tools/regression/lib/libmp
wait(), waitpid() and usleep(), they are internal versions and
should not be cancellation points.
2. Make wait3() as a cancellation point.
3. Move raise() and pause() into file thr_sig.c.
4. Add functions _sigsuspend, _sigwait, _sigtimedwait and _sigwaitinfo,
remove SIGCANCEL bit in wait-set for those functions, the signal is
used internally to implement thread cancellation.
in rev. 1.34. Mainly I missed the fact that the buffer is used for two
purposes:
1) storing a group line from the group file;
2) __gr_parse_entry() parses the buffer and tries to put the group
members to the remaining part of the buffer and can fail if there
is no enough room for them.
Re-arrange the buffer size checks to account the latter case.
Submitted by: Kirk R Webb
MFC after: 2 weeks
In e_log.c, there was just a off-by-1 (1 ulp) error in the comment
about the threshold. The precision of the threshold is unimportant,
but the magic numbers in the code are easier to understand when the
threshold is described precisely.
In e_logf.c, mistranslation of the magic numbers gave an off-by-1
(1 * 16 ulps) error in the intended negative bound for the threshold
and an off-by-7 (7 * 16 ulps) error in the intended positive bound for
the threshold, and the intended bounds were not translated from the
double precision bounds so they were unnecessarily small by a factor
of about 2048.
The optimization of using the simple Taylor approximation for args
near a power of 2 is dubious since it only applies to a relatively
small proportion of args, but if it is done then doing it 2048 times
as often _may_ be more efficient. (My benchmarks show unexplained
dependencies on the data that increase with further optimizations
in this area.)
2**-28 as a side effect, by merging with the float precision version
of tanh() and the double precision version of sinh().
For tiny x, tanh(x) ~= x, and we used the expression x*(one+x) to
return this value (x) and set the inexact flag iff x != 0. This
doesn't work on ia64 since gcc -O does the dubious optimization
x*(one+x) = x+x*x so as to use fma, so the sign of -0.0 was lost.
Instead, handle tiny x in the same as sinh(), although this is imperfect:
- return x directly and set the inexact flag in a less efficient way.
- increased the threshold for non-tinyness from 2**-55 to 2**-28 so that
many more cases are optimized than are pessimized.
Updated some comments and fixed bugs in others (ranges for half-open
intervals mostly had the open end backwards, and there were nearby style
bugs).
functions are only for compatibility with obsolete standards. They
shouldn't be used, so they shouldn't be optimized. Use the generic
versions instead.
This fixes scalbf() as a side effect. The optimized asm version left
garbage on the FP stack. I fixed the corresponding bug in the optimized
asm scalb() and scalbn() in 1996. NetBSD fixed it in scalb(), scalbn()
and scalbnf() in 1999 but missed fixing it in scalbf(). Then in 2005
the bug was reimplemented in FreeBSD by importing NetBSD's scalbf().
The generic versions have slightly different error handling:
- the asm versions blindly round the second parameter to a (floating
point) integer and proceed, while the generic versions return NaN
if this rounding changes the value. POSIX permits both behaviours
(these functions are XSI extensions and the behaviour for a bogus
non-integral second parameter is unspecified). Apart from this
and the bug in scalbf(), the behaviour of the generic versions seems
to be identical. (I only exhusatively tested
generic_scalbf(1.0F, anyfloat) == asm_scalb(1.0F, anyfloat). This
covers many representative corner cases involving NaNs and Infs but
doesn't test exception flags. The brokenness of scalbf() showed up
as weird behaviour after testing just 7 integer cases sequentially.)
to scalbf(), but ldexpf() cannot be implemented in that way since the
types of the second parameter differ. ldexpf() can be implemented as
a weak or strong reference to scalbnf() (*) but that was already done
long before rev.1.10 was committed. The old implementation uses a
reference, so rev.1.10 had no effect on applications. The C files for
the scalb() family are not used for amd64 or i386, so rev.1.10 had even
less effect for these arches.
(*) scalbnf() raises the radix to the given exponent, while ldexpf()
raises 2 to the given exponent. Thus the functions are equivalent
except possibly for their error handling iff the radix is 2. Standards
more or less require identical error handling. Under FreeBSD, the
functions are equivalent except for more details being missing in
scalbnf()'s man page.
well as avoiding a switch statement. This change has no significant impact
to performance when branch prediction is successful at predicting the sizes
of objects passed to free(), but in the case that the object sizes are
semi-random, this change has the potential to prevent many branch prediction
misses, thus improving performance substantially.
Take advantage of alignment guarantees in ipalloc(), and pad object sizes to
something less than a power of two when possible. This has the potential
to substantially reduce internal fragmentation for objects allocated via
posix_memalign().
Avoid an unnecessary pow2_ceil() call in arena_ralloc().
Submitted by: djam8193ah@hotmail.com
and instead creating a small allocation for each malloc(0) call. The
optional SysV compatibility behavior remains unchanged.
Add a couple of assertions.
Fix a couple of typos in error message strings.
The text is correct in the "DESCRIPTION" section, so fix "SYNOPSIS"
to use the correct name.
PR: docs/90498
Submitted by: Vasil Dimov
MFC after: 3 days
do its work for SIGINFO. Always install libpthread signal handler
wrapper for SIGINFO even if user SIG_IGN's or SIG_DFL's it.
SIGINFO has a special meaning for libpthread: when LIBPTHREAD_DEBUG
enviroment variable defined it is used for dumping an information
about threads to /tmp/.
Reported by: mi
Reviewed by: deischen
MFC after: 2 weeks
If the initial buffer size (1KB) for the given group line is not big
enough, reset the offset. It helps to do not miss this line when
getrg() reallocates the larger buffer and tries to parse the line again.
PR: bin/52433, kern/55031, bin/83696, misc/97640, misc/98111
Submitted by: bsw71@mail.ru, Philip M. Gollucci, Justin Erenkrantz
Glanced at: nectar
MFC after: 1 month
objects with SF_IMMUTABLE, SF_APPEND, or SF_NOUNLINK.
* Document that non-superusers cannot set or clear any SF_* flag
(setting fails with EPERM, clearing is silently ignored).
* Document that superusers cannot change any flag if one of
SF_IMMUTABLE, SF_APPEND, SF_NOUNLINK is set and securelevel is
greater than 0.
* Document SF_SNAPSHOT and note that it is maintained by the
system and is, for this reason, impossible to set to clear by
any user.
PR: docs/33877
Submitted by: harti
Help by: George Marsellis <gam9478@njit.edu>
MFC after: 1 week
4kB pages), in order to avoid dangerous rounding error when calculating
fullness limits during run promotion/demotion.
Convert a structure bitfield to a normal field in areana_run_t. This should
have been changed along with the other fields in revision 1.120.
in part by OpenBSD's not-quite-standard-compliant
standard libraries. (No loss of functionality,
just minor recoding to not rely on certain "standard"
facilities that weren't actually needed.)
bounds. [1]
Modify logic for utilizing the data segment, such that it is possible to
create huge allocations there.
Shrink the data segment when deallocating a chunk, if it is at the end of
the data segment.
Rename chunk_size to csize in huge_malloc(), in order to avoid masking a
static variable of the same name. [1]
Reported by: Paul Allen <nospam@ugcs.caltech.edu>
subject: ranges of uid, ranges of gid, jail id
objects: ranges of uid, ranges of gid, filesystem,
object is suid, object is sgid, object matches subject uid/gid
object type
We can also negate individual conditions. The ruleset language is
a superset of the previous language, so old rules should continue
to work.
These changes require a change to the API between libugidfw and the
mac_bsdextended module. Add a version number, so we can tell if
we're running mismatched versions.
Update man pages to reflect changes, add extra test cases to
test_ugidfw.c and add a shell script that checks that the the
module seems to do what we expect.
Suggestions from: rwatson, trhodes
Reviewed by: trhodes
MFC after: 2 months
far more convenient for libkvm to work with because of the page table
block at the beginning. As a result, the MD code is smaller.
libkvm will automatically detect old vs mini dumps on i386 and amd64.
libkvm will handle i386 PAE and non-PAE modes. There is a PAE flag in
the i386 minidump header to signal the width of the entries in the
page table block.
Other convenient values are also present, such as kernbase and the direct
map addresses on amd64.
to pidfile_write happen, the pidfile will have nul characters prepended
due to the cached file descriptor offset...
Reviewed by: scottl
MFC after: 3 days
as well as add __sparc_utrap_install to FBSD_1.0; these are required by
the SCD libc 64 psABI and thus meant to be officially exported symbols.
- Remove the __fpu_* entries as well as the __sigtramp entry altogether as
these are internal to the libc FPU emulation and the signal trampoline
initialization in sigaction(2) respectively and thus don't need to be
externally visible.
- Add __sparc_utrap_setup to the list of FBSDprivate symbols as it's used
in src/lib/csu/sparc64/crt1.c to initialize the libc FPU emulation (I
think alternatively src/lib/csu/sparc64/crt1.c could be changed to use
__sparc_utrap_install instead, at the expense of increasing the size of
executables a bit).
- Add an entry for the vfork symbol to the FBSD_1 list and entries for it's
associated symbols generated by the RSYSCALL() macro to the FBSDprivate
list. There's some magic in place that automatically generates code for
vfork() if there's no explicit MD code for it so it might make sense to
move these symbols from the MD symbol map files to a MI one.
The last two changes make the libc symbol versioning useable on sparc64.
Ok'ed by: deischen
races. This isn't currently necessary for libpthread or libthr, but
without it external threads libraries like the linuxthreads port are
not safe to use.
Reported by: ganbold@micom.mng.net
have to be calculated once per allocator operation.
Make nil const.
Update various comments.
Remove/avoid division where possible.
For the one division operation that remains in the critical path, add a
switch statement that has a case for each small size class, and do division
with a constant divisor in each case. This allows the compiler to generate
optimized code that does not use hardware division [1].
Obtained from: peter [1]
this is used by some 3rd party applications when {e,f,g}cvt() are
not found. POSIX defines the xcvt() funtions but says they are
deprecated in favor or sprintf(). We'll import these functions
from OpenBSD and remove __gdtoa() from the exported interfaces
when libc version is bumped.
* Avoid choosing an arena until it's certain that an arena is needed
for allocation.
* Convert division/multiplication to bitshifting where possible.
* Avoid accessing TLS variables in single-threaded code.
* Reduce the amount of pointer dereferencing.
* Move lock acquisition in critical paths to only protect the the code
that requires synchronization, and completely remove locking where
possible.
uses them.
Now, we have res_nupdate and res_nmkupdate as well, but they are
still based on our old resolver for binary backward compatibility.
So, they don't provide new features such as TSIG support.
Reported by: pointyhat via kris
FBSDprivate locale symbols. These functions are needed by
libcompat.
Add _cleanup to the list of stdio FBSDprivate symbols. Some
third party applications use this. This will be removed and
replaced by fcloseall() once libc version is bumped.
Add _res to the list of resolv symbols.
Found by: portbuilder runs (thanks Kris!)
to make it work, turnstile like mechanism to support priority
propagating and other realtime scheduling options in kernel
should be available to userland mutex, for the moment, I just
want to make libthr be simple and efficient thread library.
Discussed with: deischen, julian
Kernel changes:
Inform hwpmc of executable objects brought into the system by
kldload() and mmap(), and of their removal by kldunload() and
munmap(). A helper function linker_hwpmc_list_objects() has been
added to "sys/kern/kern_linker.c" and is used by hwpmc to retrieve
the list of currently loaded kernel modules.
The unused `MAPPINGCHANGE' event has been deprecated in favour
of separate `MAP_IN' and `MAP_OUT' events; this change reduces
space wastage in the log.
Bump the hwpmc's ABI version to "2.0.00". Teach hwpmc(4) to
handle the map change callbacks.
Change the default per-cpu sample buffer size to hold
32 samples (up from 16).
Increment __FreeBSD_version.
libpmc(3) changes:
Update libpmc(3) to deal with the new events in the log file; bring
the pmclog(3) manual page in sync with the code.
pmcstat(8) changes:
Introduce new options to pmcstat(8): "-r" (root fs path), "-M"
(mapfile name), "-q"/"-v" (verbosity control). Option "-k" now
takes a kernel directory as its argument but will also work with
the older invocation syntax.
Rework string handling in pmcstat(8) to use an opaque type for
interned strings. Clean up ELF parsing code and add support for
tracking dynamic object mappings reported by a v2.0.00 hwpmc(4).
Report statistics at the end of a log conversion run depending
on the requested verbosity level.
Reviewed by: jhb, dds (kernel parts of an earlier patch)
Tested by: gallatin (earlier patch)
determine its value at run time according to other relevant values. This
avoids the creation of runs that are incompletely utilized, as long as
pagesize isn't too large (>32kB, given the current RUN_MIN_REGS_2POW
setting).
Increase the size of several structure bitfields in arena_run_t in order
to avoid integer overflow in the case that a run's header does not overlap
with the space that is usable as application allocation regions. Given
the tiny_min_2pow change, this fix has no additional impact unless
pagesize is >32kB.
Reported by: kris
internally used chunk to start at the beginning of the heap, rather
than at a chunk-aligned address. This reduces mapped memory somewhat
for 32-bit architectures.
Add the arena_run_link_t type and use it wherever a run object is only
used as a ring 'header'. This saves approximately 40 kB of memory per
arena.
Remove an obsolete (no longer used) code path from base_alloc(), which
supported the internal allocation of objects larger than the chunk
size.
Enhance chunk_dealloc() to cache chunk addresses for all deallocated
chunks. This has no impact for most programs, but has the potential
to reduce VM map fragmentation for programs that use huge
allocations.
documentation bug. We switched to page indexes some time around
FreeBSD 2.2. The actual 'len' limit is the maximum file size or what
will fit in your address space, whichever comes first. It should be
possible to make 1TB files on 32 bit systems, but of course address space
runs out long before then.
it's only a failure if there were actually attributes to be restored.
In particular, this fixes the problem where tar -xp always returned
a failure code on FreeBSD (which doesn't yet have all of the extended
attribute support).
Thanks to: Diego "Flameeyes" Petteno
This commit implements storing/reading POSIX.1e-style extended
attribute information in "pax" format archives. An outline of the
storage format is in the tar.5 manpage. The archive_read_extract()
function has code to restore those archives to disk for Linux; FreeBSD
implementation is forthcoming.
Many thanks to Jaakko Heinonen for finding flaws in earlier
proposals and doing the bulk of the coding in this work.
Since, res_sendsigned(3) and the friends use MD5 functions, it is
hard to include them without having MD5 functions in libc. So,
res_sendsigned(3) is not merged into libc.
Since, res_update(3) in BIND9 is not binary compatible with our
res_update(3), res_update(3) is leaved as is, except some
necessary modifications.
The res_update(3) and the friends are not essential part of the
resolver. They are not defined in resolv.h but defined in
res_update.h separately in BIND9. Further, they are not called from
our tree. So, I hide them from our resolv.h, but leave them only
for binary backward compatibility (perhaps, no one calls them).
Since, struct __res_state_ext is not exposed in BIND9, I hide it
from our resolv.h. And, global variable _res_ext is removed. It
breaks binary backward compatibility. But, since it is not used from
outside of our libc, I think it is safe.
Reviewed by: arch@ (no objection)
- <netipx> headers [1]
- IPX library (libipx)
- IPX support in ifconfig(8)
- IPXrouted(8)
- new MK_NCP option
New MK_NCP build option controls:
- <netncp> and <fs/nwfs> headers
- NCP library (libncp)
- ncplist(1) and ncplogin(1)
- mount_nwfs(8)
- ncp and nwfs kernel modules
User knobs: WITHOUT_IPX, WITHOUT_IPX_SUPPORT, WITHOUT_NCP.
[1] <netsmb/netbios.h> unconditionally uses <netipx> headers
so they are still installed. This needs to be dealt with.
that no linear searching is necessary if we resort to allocating from a
run that is known to be mostly full. There are pathological edge cases
that could have caused severely degraded performance, and this change
fixes that.
close enough to each other that reallocation would allocate a new region
of the same size. This improves the performance of repeated incremental
reallocations by up to three orders of magnitude. [1]
Fix arena_new() to properly constrain run size if a small chunk size was
specified during runtime configuration.
Suggested by: se [1]
allocation patterns that involve a relatively even mixture of many
different size classes.
Reduce the chunk size from 16 MB to 2 MB. Since chunks are now carved up
using an address-ordered first best fit policy, VM map fragmentation is
much less likely, which makes smaller chunks not as much of a risk. This
reduces the virtual memory size of most applications.
Remove redzones, since program buffer overruns are no longer as likely to
corrupt malloc data structures.
Remove the C MALLOC_OPTIONS flag, and add H and S.
used LIBTHREAD_1_0 as its version definition, but now needs
to define its symbols in the same namespace used by libc.
The compatibility hooks allows you to use libraries and
binaries built and linked to libpthread before libc was
built with symbol versioning. The shims can be removed if
libpthread is given a version bump.
Reviewed by: davidxu
disabled by default; add SYMVER_ENABLED=true to /etc/make.conf
to enable it. libc should get a version bump before this is
enabled by default.
Reviewed by: davidxu
is default and caller does not require dedicated thread. timer needs
a dedicated thread to maintain overrun count correctly in notification
context. mqueue and aio can use thread pool to do notification
concurrently, the thread pool has lifecycle control, some threads will
exit if they have idled for a while.
Staticize two tables thare are not visible in <resolv.h>
and which are also local in Solaris' libresolv.
Remove two functions that are not referenced in libc nor
anywhere else I can find, not visible in <resolv.h> and
which are also local in Solaris libresolv.
but then sizes the containing data structure at run-time to make room
for per-cpu cache data. Modify libmemstat to separately allocate a
buffer to hold per-cpu cache data, sized based on the run-time mp_maxid
variable when using libkvm to access UMA data. This avoids reading
invalid cache data from beyond the end of the uma_zone data structure
on the stack, which can result in invalid statistics and/or reads from
invalid kernel addresses.
Foot target practice by: ps
MFC after: 3 days
return a KVM error rather than an out of memory error, so that the caller
reports the KVM error state. This replaces a misleading error message
with a more accurate although equally confusing one.
MFC after: 3 days
cpu mask before looking at the cache entries for the CPU. For systems
with sparse CPU id arrays, this skips otherwise uninitialized cache
structures.
MFC after: 3 days
Remove the block of code that tries to use delayed regions in LIFO order,
since from a policy perspective, it conflicts with LRU caching of newly
coalesced regions in arena_undelay(). There are numerous policy
alternatives, and it isn't readily obvious which (if any) is superior;
this change at least has the virtue of being consistent with policy.
Add %M{essage} extension which prints an errno value as the
corresponding string if possible or numerically otherwise.
It is not currently possible to do the syslog(3) like %m extension
because errno would need to get capatured on entry to the first
function in the printf family, so %M requires you to supply errno
as an argument.
Add %Q{uote} extension which will print a string in double quotes with
appropriate back-slash escapes (only) if necessary.
fit regions are available, use the delayed regions in LIFO order, in order
to increase locality of reference. We might expect this to cause delayed
regions to be removed from the delay ring buffer more often (since we're
now re-using more recently buffered regions), but numerous tests indicate
that the overall impact on memory usage tends to be good (reduced
fragmentation).
Re-work arena_frag_reg_alloc() so that when large free regions are
exhausted, it uses small regions in a way that favors contiguous allocation
of sequentially allocated small regions. Use arena_frag_reg_alloc() in
this capacity, rather than directly attempting over-fitting of small
requests when no large regions are available.
Remove the bin overfit statistic, since it is no longer relevant due to
the arena_frag_reg_alloc() changes.
Do not specify arena_frag_reg_alloc() as an inline function. It is too
large to benefit much from being inlined, and it is also called in two
places, only one of which is in the critical path (the other call bloated
arena_reg_alloc()).
Call arena_coalesce() for a region before caching it with
arena_mru_cache().
Add assertions that detect the attempted caching of adjacent free regions,
so that we notice this problem when it is first created, rather than in
arena_coalesce(), when it's too late to know how the problem arose.
Reported by: Hans Blancke
behaviour of returning EINVAL when ".." is passed as either argument
has been restored.
rmdir("..") now returns EINVAL instead of EPERM. Document the
previously undocumented behaviour of rmdir(".") returning EINVAL
as required by POSIX and SUSv3. Bump the man page change date.
undelete("..") now returns EINVAL instead of EPERM. Bump the man
page change date.
MFC after: 3 days
problems in cases where regions are faked up for the purposes of red-black
tree searches, since those faked region headers reside on the stack, rather
than in a malloc chunk.
allowing the error to be fatal.
Move a label in order to make sure to properly handle errors in malloc(0).
Reported by: Alastair D'Silva, Saneto Takanori
routine fails or the first read fails), invoke the client close
routine immediately so the client can clean up. Also, don't store the
client pointers in this case, so that the client close routine can't
accidentally get called more than once.
A minor style fix to archive_read_open_fd.c while I'm here.
PR: 86453
Thanks to: Andrew Turner for reporting this and suggesting a fix.
archiver for Fourth Edition through Sixth Edition Unix; it was
replaced by tar in Seventh Edition. (First Edition through
Third Edition used "tap.")
Unfortunately, tp was not so very standard; there were a
few different variants. The code here attempts to support
what I believe were the most common variants.
tp support is not yet enabled by archive_read_support_format_all(),
as I'm not yet entirely comfortable with the detection
heuristics. People interested in experimenting can
add archive_read_support_format_tp() just after any calls
to archive_read_support_format_all() in bsdtar to see how
well this works.
TODO: tp format is roughly similar in structure to dump/restore
archive formats used by many systems. It should be possible
to generalize this code to handle many dump/restore variants.
Format detection heuristics are going to be rough, though.
Thanks to: Warren Toomey, whose very basic tp extraction programs
and documentation made this possible.
there is never any need to recursively call the main allocation functions.
Remove recursive spinlock support, since it is no longer needed.
Allow chunks to be as small as the page size.
Correctly propagate OOM errors from arena_new().
from /etc/login.conf, or an unterminated string buffer could result.
Probably, login_times.c should reject excessively long time strings as
unparseable, rather than truncating, which might render an invalid
string valid.
Found with: Coverity Prevent (tm)
Reviewed by: csjp
MFC after: 3 days
list, which could cause problems for multi-threaded applications
using libmemstat to monitor UMA in more than one thread
simultaneously.
MFC after: 3 days
broken for non-threaded shared processes in that __tls_get_addr()
assumes the thread pointer is always initialized. This is not the
case. When arenas_map is referenced in choose_arena() and it is
defined as a thread-local variable, it will result in a SIGSEGV.
PR: ia64/91846 (describes the TLS/ia64 bug).
had been replied, the reply was always delivered to the originator
synchronously.
With introduction of netgraph item callbacks and a wait channel with
mutex in ng_socket(4), we have fixed the problem with ngctl(8) returning
earlier than the command has been proceeded by target node. But still
ngctl(8) can return prior to the reply has arrived to its node.
To fix this:
- Introduce a new flag for netgraph(4) messages - NGM_HASREPLY.
This flag is or'ed with message like NGM_READONLY.
- In netgraph userland library if we have sent a message with
NGM_HASREPLY flag, then select(2) until reply comes.
- Mark appropriate generic commands with NGM_HASREPLY flag,
gathering them into one enum {}. Bump generic cookie.
* Add posix_memalign().
* Move calloc() from calloc.c to malloc.c. Add a calloc() implementation in
rtld-elf in order to make the loader happy (even though calloc() isn't
used in rtld-elf).
* Add _malloc_prefork() and _malloc_postfork(), and use them instead of
directly manipulating __malloc_lock.
Approved by: phk, markm (mentor)
While we don't use the NC_BROADCAST value of nc_flag anywhere in the
RPC code, it is parseable by getnetconfigent(3) from /etc/netconfig.
o Clean up some "see below"'s that were cut and pasted from netconfig.h.
operation, the caller is blocked util target threads are really
suspended, also avoid suspending a thread when it is holding a
critical lock.
Fix a bug in _thr_ref_delete which tests a never set flag.
commit broke the 2**24 cases where |x| > DBL_MAX/2. There are exponent
range problems not just for denormals (underflow) but for large values
(overflow). Doubles have more than enough exponent range to avoid the
problems, but I forgot to convert enough terms to double, so there was
an x+x term which was sometimes evaluated in float precision.
Unfortunately, this is a pessimization with some combinations of systems
and compilers (it makes no difference on Athlon XP's, but on Athlon64's
it gives a 5% pessimization with gcc-3.4 but not with gcc-3.3).
Exlain the problem better in comments.
algorithm for the second step significantly to also get a perfectly
rounded result in round-to-nearest mode. The resulting optimization
is about 25% on Athlon64's and 30% on Athlon XP's (about 25 cycles
out of 100 on the former).
Using extra precision, we don't need to do anything special to avoid
large rounding errors in the third step (Newton's method), so we can
regroup terms to avoid a division, increase clarity, and increase
opportunities for parallelism. Rearrangement for parallelism loses
the increase in clarity. We end up with the same number of operations
but with a division reduced to a multiplication.
Using specifically double precision, there is enough extra precision
for the third step to give enough precision for perfect rounding to
float precision provided the previous steps are accurate to 16 bits.
(They were accurate to 12 bits, which was almost minimal for imperfect
rounding in the old version but would be more than enough for imperfect
rounding in this version (9 bits would be enough now).) I couldn't
find any significant time optimizations from optimizing the previous
steps, so I decided to optimize for accuracy instead. The second step
needed a division although a previous commit optimized it to use a
polynomial approximation for its main detail, and this division dominated
the time for the second step. Use the same Newton's method for the
second step as for the third step since this is insignificantly slower
than the division plus the polynomial (now that Newton's method only
needs 1 division), significantly more accurate, and simpler. Single
precision would be precise enough for the second step, but doesn't
have enough exponent range to handle denormals without the special
grouping of terms (as in previous versions) that requires another
division, so we use double precision for both the second and third
steps.
functions in the child after a fork() from a threaded process,
use __sys_setprocmask() rather than setprocmask() to keep our
signal handling sane. Without this fix, signals are essentially
ignored in said child and things such as protection violations
result in an endless busy loop.
Reviewed by: deischen
similar the the Solaris implementation. Repackage the krb5 GSS mechanism
as a plugin library for the new implementation. This also includes a
comprehensive set of manpages for the GSS-API functions with text mostly
taken from the RFC.
Reviewed by: Love Hörnquist Åstrand <lha@it.su.se>, ru (build system), des (openssh parts)
between a 32-bit integer and a radix-64 ASCII string. The l64a_r() function
is a NetBSD addition.
PR: 51209 (based on submission, but very different)
Reviewed by: bde, ru
distributed non-large args, this saves about 14 of 134 cycles for
Athlon64s and about 5 of 199 cycles for AthlonXPs.
Moved the check for x == 0 inside the check for subnormals. With
gcc-3.4 on uniformly distributed non-large args, this saves another
5 cycles on Athlon64s and loses 1 cycle on AthlonXPs.
Use INSERT_WORDS() and not SET_HIGH_WORD() when converting the first
approximation from bits to double. With gcc-3.4 on uniformly distributed
non-large args, this saves another 4 cycles on both Athlon64s and and
AthlonXPs.
Accessing doubles as 2 words may be an optimization on old CPUs, but on
current CPUs it tends to cause extra operations and pipeline stalls,
especially for writes, even when only 1 of the words needs to be accessed.
Removed an unused variable.
function approximation for the second step. The polynomial has degree
2 for cbrtf() and 4 for cbrt(). These degrees are minimal for the final
accuracy to be essentially the same as before (slightly smaller).
Adjust the rounding between steps 2 and 3 to match. Unfortunately,
for cbrt(), this breaks the claimed accuracy slightly although incorrect
rounding doesn't. Claim less accuracy since its not worth pessimizing
the polynomial or relying on exhaustive testing to get insignificantly
more accuracy.
This saves about 30 cycles on Athlons (mainly by avoiding 2 divisions)
so it gives an overall optimization in the 10-25% range (a larger
percentage for float precision, especially in 32-bit mode, since other
overheads are more dominant for double precision, surprisingly more
in 32-bit mode).
- in preparing for the third approximation, actually make t larger in
magnitude than cbrt(x). After chopping, t must be incremented by 2
ulps to make it larger, not 1 ulp since chopping can reduce it by
almost 1 ulp and it might already be up to half a different-sized-ulp
smaller than cbrt(x). I have not found any cases where this is
essential, but the think-time error bound depends on it. The relative
smallness of the different-sized-ulp limited the bug. If there are
cases where this is essential, then the final error bound would be
5/6+epsilon instead of of 4/6+epsilon ulps (still < 1).
- in preparing for the third approximation, round more carefully (but
still sloppily to avoid branches) so that the claimed error bound of
0.667 ulps is satisfied in all cases tested for cbrt() and remains
satisfied in all cases for cbrtf(). There isn't enough spare precision
for very sloppy rounding to work:
- in cbrt(), even with the inadequate increment, the actual error was
0.6685 in some cases, and correcting the increment increased this
a little. The fix uses sloppy rounding to 25 bits instead of very
sloppy rounding to 21 bits, and starts using uint64_t instead of 2
words for bit manipulation so that rounding more bits is not much
costly.
- in cbrtf(), the 0.667 bound was already satisfied even with the
inadequate increment, but change the code to almost match cbrt()
anyway. There is not enough spare precision in the Newton
approximation to double the inadequate increment without exceeding
the 0.667 bound, and no spare precision to avoid this problem as
in cbrt(). The fix is to round using an increment of 2 smaller-ulps
before chopping so that an increment of 1 ulp is enough. In cbrt(),
we essentially do the same, but move the chop point so that the
increment of 1 is not needed.
Fixed comments to match code:
- in cbrt(), the second approximation is good to 25 bits, not quite 26 bits.
- in cbrt(), don't claim that the second approximation may be implemented
in single precision. Single precision cannot handle the full exponent
range without minor but pessimal changes to renormalize, and although
single precision is enough, 25 bit precision is now claimed and used.
Added comments about some of the magic for the error bound 4/6+epsilon.
I still don't understand why it is 4/6+ and not 6/6+ ulps.
Indent comments at the right of code more consistently.
to be compatible with symbol versioning support as implemented by
GNU libc and documented by http://people.redhat.com/~drepper/symbol-versioning
and LSB 3.0.
Implement dlvsym() function to allow lookups for a specific version of
a given symbol.
means:
o Remove Elf64_Quarter,
o Redefine Elf64_Half to be 16-bit,
o Redefine Elf64_Word to be 32-bit,
o Add Elf64_Xword and Elf64_Sxword for 64-bit entities,
o Use Elf_Size in MI code to abstract the difference between
Elf32_Word and Elf64_Word.
o Add Elf_Ssize as the signed counterpart of Elf_Size.
MFC after: 2 weeks
on probationary terms: it may go away again if it transpires it is
a bad idea.
This extensible printf version will only be used if either
environment variable USE_XPRINTF is defined
or
one of the extension functions are called.
or
the global variable __use_xprintf is set greater than zero.
In all other cases our traditional printf implementation will
be used.
The extensible version is slower than the default printf, mostly
because less opportunity for combining I/O operation exists when
faced with extensions. The default printf on the other hand
is a bad case of spaghetti code.
The extension API has a GLIBC compatible part and a FreeBSD version
of same. The FreeBSD version exists because the GLIBC version may
run afoul of our FILE * locking in multithreaded programs and it
even further eliminate the opportunities for combining I/O operations.
Include three demo extensions which can be enabled if desired: time
(%T), hexdump (%H) and strvis (%V).
%T can format time_t (%T), struct timeval (%lT) and struct timespec (%llT)
in one of two human readable duration formats:
"%.3llT" -> "20349.245"
"%#.3llT" -> "5h39m9.245"
%H will hexdump a sequence of bytes and takes a pointer and a length
argument. The width specifies number of bytes per line.
"%4H" -> "65 72 20 65"
"%+4H" -> "0000 65 72 20 65"
"%#4H" -> "65 72 20 65 |er e|"
"%+#4H" -> "0000 65 72 20 65 |er e|"
%V will dump a string in strvis format.
"%V" -> "Hello\tWor\377ld" (C-style)
"%0V" -> "Hello\011Wor\377ld" (octal)
"%+V" -> "Hello%09Wor%FFld" (http-style)
Tests, comments, bugreports etc are most welcome.
allocate a memory block. sscanf calls __svfscanf which in turn calls
fread, fread triggers mutex initialization but the mutex is not
destroyed in sscanf, this leads to memory leak. To avoid the memory
leak and performance issue, we create a none MT-safe version of fread:
__fread, and instead let __svfscanf call __fread.
PR: threads/90392
Patch submitted by: dhartmei
MFC after: 7 days
the second step of approximating cbrt(x). It turns out to be neither
very magic not nor very good. It is just the (2,2) Pade approximation
to 1/cbrt(r) at r = 1, arranged in a strange way to use fewer operations
at a cost of replacing 4 multiplications by 1 division, which is an
especially bad tradeoff on machines where some of the multiplications
can be done in parallel. A Remez rational approximation would give
at least 2 more bits of accuracy, but the (2,2) Pade approximation
already gives 6 more bits than needed. (Changed the comment which
essentially says that it gives 3 more bits.)
Lower order Pade approximations are not quite accurate enough for
double precision but are plenty for float precision. A lower order
Remez rational approximation might be enough for double precision too.
However, rational approximations inherently require an extra division,
and polynomial approximations work well for 1/cbrt(r) at r = 1, so I
plan to switch to using the latter. There are some technical
complications that tend to cost a division in another way.
This gives an optimization of between 9 and 22% on Athlons (largest
for cbrt() on amd64 -- from 205 to 159 cycles).
We extracted the sign bit and worked with |x|, and restored the sign
bit as the last step. We avoided branches to a fault by using accesses
to FP values as bits to clear and restore the sign bit. Avoiding
branches is usually good, but the bit access macros are not so good
(especially for setting FP values), and here they always caused pipeline
stalls on Athlons. Even using branches would be faster except on args
that give perfect branch misprediction, since only mispredicted branches
cause stalls, but it possible to avoid touching the sign bit in FP
values at all (except to preserve it in conversions from bits to FP
not related to the sign bit). Do this. The results are identical
except in 2 of the 3 unsupported rounding modes, since all the
approximations use odd rational functions so they work right on strictly
negative values, and the special case of -0 doesn't use an approximation.
For some denormalized long double values, a bug in __hldtoa() (called
from *printf()'s %A format) results in a base 16 digit being rounded
up from 0xf to 0x10.
When this digit is subsequently converted to string format, an index
of 10 reaches past the end of the uppper-case hex/char array, picking
up whatever the code segment happen to contain at that address.
This mostly seem to be some character from the upper half of the
byte range.
When using the %a format instead of %A, the first character past
the end of the lowercase hex/char table happens to be index 0 in
the uppercase hex/char table hextable and therefore the string
representation features a '0', which is supposedly correct.
This leads me to belive that the proper fix _may_ be as simple as
masking all but the lower four bits off after incrementing a hex-digit
in libc/gdtoa/_hdtoa.c:roundup(). I worry however that the upper
bit in 0x10 indicates a carry not carried.
Until das@ or bde@ finds time to visit this issue, extend the
hexdigit arrays with a 17th index containing '?' so that we get a
invalid but consistent and printable output in both %a and %A formats
whenever this bug strikes.
This unmasks the bug in the %a format therefore solving the real
issue may both become easier and more urgent.
Possibly related to: PR 85080
With help by: bde@
<cbrt(x) in bits> ~= <x in bits>/3 + BIAS.
Keep the large comments only in the double version as usual.
Fixed some style bugs (mainly grammar and spelling errors in comments).
It was because I forgot to translate the part of the double precision
algorithm that chops t so that t*t is exact. Now the maximum error
is the same as for double precision (almost exactly 2.0/3 ulps).
The maximum error was 3.56 ulps.
The bug was another translation error. The double precision version
has a comment saying "new cbrt to 23 bits, may be implemented in
precision". This means exactly what it says -- that the 23 bit second
approximation for the double precision cbrt() may be implemented in
single (i.e., float) precision. It doesn't mean what the translation
assumed -- that this approximation, when implemented in float precision,
is good enough for the the final approximation in float precision.
First, float precision needs a 24 bit approximation. The "23 bit"
approximation is actually good to 24 bits on float precision args, but
only if it is evaluated in double precision. Second, the algorithm
requires a cleanup step to ensure its error bound.
In float precision, any reasonable algorithm works for the cleanup
step. Use the same algorithm as for double precision, although this
is much more than enough and is a significant pessimization, and don't
optimize or simplify anything using double precision to implement the
float case, so that the whole double precision algorithm can be verified
in float precision. A maximum error of 0.667 ulps is claimed for cbrt()
and the max for cbrtf() using the same algorithm shouldn't be different,
but the actual max for cbrtf() on amd64 is now 0.9834 ulps. (On i386
-O1 the max is 0.5006 (down from < 0.7) due to extra precision.)
The threshold for not being tiny was too small. Use the usual 2**-12
threshold. As for sinhf, use a different method (now the same as for
sinhf) to set the inexact flag for tiny nonzero x so that the larger
threshold works, although this method is imperfect. As for sinhf,
this change is not just an optimization, since the general code that
we fell into has accuracy problems even for tiny x. On amd64, avoiding
it fixes tanhf on 2*13495596 args with errors of between 1 and 1.3
ulps and thus reduces the total number of args with errors of >= 1 ulp
from 37533748 to 5271278; the maximum error is unchanged at 2.2 ulps.
The magic number 22 is log(DBL_MAX)/2 plus slop. This is bogus for
float precision. Use 9 (log(FLT_MAX)/2 plus less slop than for
double precision). Unlike for coshf and tanhf, this is just an
optimization, and MAX isn't misspelled EPSILON in the commit log.
I started testing with nonstandard rounding modes, and verified that
the chosen thresholds work for all modes modulo problems not related
to thresholds. The best thresholds are not very dependent on the mode,
at least for tanhf.
shares its low half with pio2_hi. pio2_hi is rounded down although
rounding to nearest would be a tiny bit better, so pio4_hi must be
rounded down too. It was rounded to nearest, which happens to be
different in float precision but the same in double precision.
This fixes about 13.5 million errors of more than 1 ulp in asinf().
The largest error was 2.81 ulps on amd64 and 2.57 ulps on i386 -O1.
Now the largest error is 0.93 ulps on amd65 and 0.67 ulps on i386 -O1.
sqrt(2)/2-1. For log1p(), fixed the approximation to sqrt(2)/2-1.
The end result is to fix an error of 1.293 ulps in
log1pf(0.41421395540 (hex 0x3ed413da))
and an error of 1.783 ulps in
log1p(-0.292893409729003961761) (hex 0x12bec4 00000001)).
The former was the only error of > 1 ulp for log1pf() and the latter
is the only such error that I know of for log1p().
The approximations don't need to be very accurate, but the last 2 need
to be related to the first and be rounded up a little (even more than
1 ulp for sqrt(2)/2-1) for the following implementation-detail reason:
when the arg (x) is not between (the approximations to) sqrt(2)/2-1
and sqrt(2)-1, we commit to using a correction term, but we only
actually use it if 1+x is between sqrt(2)/2 and sqrt(2) according to
the first approximation. Thus we must ensure that
!(sqrt(2)/2-1 < x < sqrt(2)-1) implies !(sqrt(2)/2 < x+1 < sqrt(2)),
where all the sqrt(2)'s are really slightly different approximations
to sqrt(2) and some of the "<"'s are really "<="'s. This was not done.
In log1pf(), the last 2 approximations were rounded up by about 6 ulps
more than needed relative to a good approximation to sqrt(2), but the
actual approximation to sqrt(2) was off by 3 ulps. The approximation
to sqrt(2)-1 ended up being 4 ulps too small, so the algoritm was
broken in 4 cases. The result happened to be broken in 1 case. This
is fixed by using a natural approximation to sqrt(2) and derived
approximations for the others.
In logf(), all the approximations made sense, but the approximation
to sqrt(2)/2-1 was 2 ulps too small (a tiny amount, since we compare
with a granularity of 2**32 ulps), so the algorithm was broken in 2
cases. The result was broken in 1 case. This is fixed by rounding
up the approximation to sqrt(2)/2-1 by 2**32 ulps, so 2**32 cases are
now handled a little differently (still correctly according to my
assertion that the approximations don't need to be very accurate, but
this has not been checked).
through the history in sh.
| Refresh bug reported by Julien Torres:
|
| going from:
| activate -verbose
| to:
| reset -activation
| results in:
| reset -activationverbose"
| instead of:
| reset -activation
|
| This is because we choose to insert "reset -" before the current line,
| and the delete "e -" and insert "ion" in the appropriate place. The
| cleareol code did not handle this case properly; we now cleareol to
| the maximum number of characters of the first difference, the second
| difference and the difference in line length.
on assignment.
Extra precision on i386's broke hi+lo decomposition in the usual way.
It caused all except 1 of the 62343 errors of more than 1 ulp for
log1pf() on i386's with gcc -O [-fno-float-store].
according to the highest nonzero bit in a denormal was missing.
fdlibm ilogbf() and ilogb() have always had the adjustment, but only
use a small part of their method for handling denormals; use the
normalization method in log[f]() for the main part.
It was lost in rev.1.9. The log message for rev.1.9 says that the
special case of +-0 is handled twice, but it was only handled once,
so it became unhandled, and this happened to break half of the cases
that return +-0:
- round-towards-minus-infinity: 0 < x < 1: result was -0 not 0
- round-to-nearest: -0.5 <= x < 0: result was 0 not -0
- round-towards-plus-infinity: -1 < x < 0: result was 0 not -0
- round-towards-zero: -1 < x < 0: result was 0 not -0
TWO52[sx] to trick gcc into correctly converting TWO52[sx]+x to double
on assignment to "double w", force a correct assignment by assigning
to *(double *)&w. This is cleaner and avoids the double rounding
problem on machines that evaluate double expressions in double
precision. It is not necessary to convert w-TWO52[sx] to double
precision on return as implied in the comment in rev.1.3, since
the difference is exact.
(1) In round-to-nearest mode, on all machines, fdlibm rint() never
worked for |x| = n+0.75 where n is an even integer between 262144
and 524286 inclusive (2*131072 cases). To avoid double rounding
on some machines, we begin by adjusting x to a value with the 0.25
bit not set, essentially by moving the 0.25 bit to a lower bit
where it works well enough as a guard, but we botched the adjustment
when log2(|x|) == 18 (2*2**52 cases) and ended up just clearing
the 0.25 bit then. Most subcases still worked accidentally since
another lower bit serves as a guard. The case of odd n worked
accidentally because the rounding goes the right way then. However,
for even n, after mangling n+0.75 to 0.5, rounding gives n but the
correct result is n+1.
(2) In round-towards-minus-infinity mode, on all machines, fdlibm rint()
never for x = n+0.25 where n is any integer between -524287 and
-262144 inclusive (262144 cases). In these cases, after mangling
n+0.25 to n, rounding gives n but the correct result is n-1.
(3) In round-towards-plus-infinity mode, on all machines, fdlibm rint()
never for x = n+0.25 where n is any integer between 262144 and
524287 inclusive (262144 cases). In these cases, after mangling
n+0.25 to n, rounding gives n but the correct result is n+1.
A variant of this bug was fixed for the float case in rev.1.9 of s_rintf.c,
but the analysis there is incomplete (it only mentions (1)) and the fix
is buggy.
Example of the problem with double rounding: rint(1.375) on a machine
which evaluates double expressions with just 1 bit of extra precision
and is in round-to-nearest mode. We evaluate the result using
(double)(2**52 + 1.375) - 2**52. Evaluating 2**52 + 1.375 in (53+1) bit
prcision gives 2**52 + 1.5 (first rounding). (Second) rounding of this
to double gives 2**52 + 2.0. Subtracting 2**52 from this gives 2.0 but
we want 1.0. Evaluating 2**52 + 1.375 in double precision would have
given the desired intermediate result of 2**52 + 1.0.
The double rounding problem is relatively rare, so the botched adjustment
can be fixed for most machines by removing the entire adjustment. This
would be a wrong fix (using it is 1 of the bugs in rev.1.9 of s_rintf.c)
since fdlibm is supposed to be generic, but it works in the following cases:
- on all machines that evaluate double expressions in double precision,
provided either long double has the same precision as double (alpha,
and i386's with precision forced to double) or my earlier fix to use
a long double 2**52 is modified to avoid using long double precision.
- on all machines that evaluate double expressions in many more than 11
bits of extra precision. The 1 bit of extra precision in the example
is the worst case. With N bits of extra precision, it sufices to
adjust the bit N bits below the 0.5 bit. For N >= about 52 there is
no such bit so the adjustment is both impossible and unnecessary. The
fix in rev.1.9 of s_rintf.c apparently depends on corresponding magic
in float precision: on all supported machines N is either 0 or >= 24,
so double rounding doesn't occur in practice.
- on all machines that don't use fdlibm rint*() (i386's).
So under FreeBSD, the double rounding problem only affects amd64 now, but
should only affect i386 in future (when double expressions are evaluated
in long double precision).
Switch strncpy to strlcpy suggested by gad and issue found by pjd.
Add to uname(3) man page describing:
UNAME_s
UNAME_r
UNAME_v
UNAME_m
Add to getosreldate(3) man page describing:
OSVERSION
Submitted by: ru, pjd/gad
Reviewed by: ru (man pages)
- in round-towards-minus-infinity mode, on all machines, roundf(x) never
worked for 0 < |x| < 0.5 (2*0x3effffff cases in all, or almost half of
float space). It was -0 for 0 < x < 0.5 and 0 for -0.5 < x < 0, but
should be 0 and -0, respectively. This is because t = ceilf(|x|) = 1
for these args, and when we adjust t from 1 to 0 by subtracting 1, we
get -0 in this rounding mode, but we want and expected to get 0.
- in round-towards-minus-infinity, round towards zero and round-to-nearest
modes, on machines that evaluate float expressions in float precision
(most machines except i386's), roundf(x) never worked for |x| =
<float value immediately below 0.5> (2 cases in all). It was +-1 but
should have been +-0. This is because t = ceilf(|x|) = 1 for these
args, and when we try to classify |x| by subtracting it from 1 we
get an unexpected rounding error -- the result is 0.5 after rounding
to float in all 3 rounding modes, so we we have forgotten the
difference between |x| and 0.5 and end up returning the same value
as for +-0.5.
The fix is to use floorf() instead of ceilf() and to add 1 instead of
-1 in the adjustment. With floorf() all the expressions used are
always evaluated exactly so there are no rounding problems, and with
adjustments of +1 we don't go near -0 when adjusting.
Attempted to fix round() and roundl() by cloning the fix for roundf().
This has only been tested for round(), only on args representable as
floats. Double expressions are evaluated in double precision even on
i386's, so round(0.5-epsilon) was broken even on i386's. roundl()
must be completely broken on i386's since long double precision is not
really supported. There seem to be no other dependencies on the
precision.
FreeBSD machine. To do this add the man 1 uname changes to __xuname.c
so we can override the settings it reports. Add OSVERSION override
to getosreldate. Finally which Makefile.inc1 to use uname -m instead
of sysctl -n hw.machine_arch to get the arch. type.
With these change you can put a complete FreeBSD OS image into a
chroot set:
UNAME_s=FreeBSD
UNAME_r=4.7-RELEASE
UNAME_v="FreeBSD $UNAME_r #1: Fri Jul 22 20:32:52 PDT 2005 fake@fake:/usr/obj/usr/src/sys/FAKE"
UNAME_m=i386
UNAME_p=i386
OSVERSION=470000
on an amd64 or i386 and it just work including building ports and using
pkg_add -r etc. The caveat for this example is that these patches
have to be applied to FreeBSD 4.7 and the uname(1) changes need to
be merged. This also addresses issue with libtool.
This is usefull for when a build machine has been trashed for an
old release and we want to do a build on a new machine that FreeBSD
4.7 won't run on ...
other systems it prevents a tty from becoming a controlling tty on the
open. O_SYNC is the POSIX name for O_FSYNC.
The Markup Police may need to tweak my references to standards.
k_tanf.c but with different details.
The polynomial is odd with degree 13 for tanf() and odd with degree
9 for sinf(), so the details are not very different for sinf() -- the
term with the x**11 and x**13 coefficients goes awaym and (mysteriously)
it helps to do the evaluation of w = z*z early although moving it later
was a key optimization for tanf(). The details are different but simpler
for cosf() because the polynomial is even and of lower degree.
On Athlons, for uniformly distributed args in [-2pi, 2pi], this gives
an optimization of about 4 cycles (10%) in most cases (13% for sinf()
on AXP, but 0% for cosf() with gcc-3.3 -O1 on AXP). The best case
(sinf() with gcc-3.4 -O1 -fcaller-saves on A64) now takes 33-39 cycles
(was 37-45 cycles). Hardware sinf takes 74-129 cycles. Despite
being fine tuned for Athlons, the optimization is even larger on
some other arches (about 15% on ia64 (pluto2) and 20% on alpha (beast)
with gcc -O2 -fomit-frame-pointer).
cosf(x) is supposed to return something like x when x is a NaN, and
we actually fairly consistently return x-x which is normally very like
x (on i386 and and it is x if x is a quiet NaN and x with the quiet bit
set if x is a signaling NaN. Rev.1.10 broke this by normalising x to
fabsf(x). It's not clear if fabsf(x) is should preserve x if x is a NaN,
but it actually clears the sign bit, and other parts of the code depended
on this.
The bugs can be fixed by saving x before normalizing it, and using the
saved x only for NaNs, and using uint32_t instead of int32_t for ix
so that negative NaNs are not misclassified even if fabsf() doesn't
clear their sign bit, but gcc pessimizes the saving very well, especially
on Athlon XPs (it generates extra loads and stores, and mixes use of
the SSE and i387, and this somehow messes up pipelines). Normalizing
x is not a very good optimization anyway, so stop doing it. (It adds
latency to the FPU pipelines, but in previous versions it helped except
for |x| <= 3pi/4 by simplifying the integer pipelines.) Use the same
organization as in s_sinf.c and s_tanf.c with some branches reordered.
These changes combined recover most of the performance of the unfixed
version on A64 but still lose 10% on AXP with gcc-3.4 -O1 but not with
gcc-3.3 -O1.
that was used doesn't work normally here, since we want to be able to
multiply `hi' by the exponent of x _exactly_, and the exponent of x has
more than 7 significant bits for most denormal x's, so the multiplication
was not always exact despite a cloned comment claiming that it was. (The
comment is correct in the double precision case -- with the normal 33+53
bit decomposition the exponent can have 20 significant bits and the extra
bit for denormals is only the 11th.)
Fixing this had little or no effect for denormals (I think because
more precision is inherently lost for denormals than is lost by roundoff
errors in the multiplication).
The fix is to reduce the precision of the decomposition to 16+24 bits.
Due to 2 bugs in the old deomposition and numerical accidents, reducing
the precision actually increased the precision of hi+lo. The old hi+lo
had about 39 bits instead of at least 41 like it should have had.
There were off-by-1-bit errors in each of hi and lo, apparently due
to mistranslation from the double precision hi and lo. The correct
16 bit hi happens to give about 19 bits of precision, so the correct
hi+lo gives about 43 bits instead of at least 40. The end result is
that expf() is now perfectly rounded (to nearest) except in 52561 cases
instead of except in 67027 cases, and the maximum error is 0.5013 ulps
instead of 0.5023 ulps.
rather than forcing the state to LOOK. If we are in the middle of parsing
a line when we have to do a FILL we would have lost any token we were in
the middle of parsing and would have treated the next character as being
at the start of a new line instead.
PR: kern/89181
Submitted by: Antony Mawer gnats at mawer dot org
MFC after: 1 week
Instead of echoing the code in a comment, try to describe why we split
up the evaluation in a special way.
The new optimization is mostly to move the evaluation of w = z*z later
so that everything else (except z = x*x) doesn't have to wait for w.
On Athlons, FP multiplication has a latency of 4 cycles so this
optimization saves 4 cycles per call provided no new dependencies are
introduced. Tweaking the other terms in to reduce dependencies saves
a couple more cycles in some cases (more on AXP than on A64; up to 8
cycles out of 56 altogether in some cases). The previous version had
a similar optimization for s = z*x. Special optimizations like these
probably have a larger effect than the simple 2-way vectorization
permitted (but not activated by gcc) in the old version, since 2-way
vectorization is not enough and the polynomial's degree is so small
in the float case that non-vectorizable dependencies dominate.
On an AXP, tanf() on uniformly distributed args in [-2pi, 2pi] now
takes 34-55 cycles (was 39-59 cycles).
of between 1.0 and 1.8509 ulps for lgammaf(x) with x between -2**-21 and
-2**-70.
As usual, the cutoff for tiny args was not correctly translated to
float precision. It was 2**-70 but 2**-21 works. Not as usual, having
a too-small threshold was worse than a pessimization. It was just a
pessimization for (positive) args between 2**-70 and 2**-21, but for
the first ~50 million (negative) args below -2**-70, the general code
overflowed and gave a result of infinity instead of correct (finite)
results near 70*log(2). For the remaining ~361 million negative args
above -2**21, the general code gave almost-acceptable errors (lgamma[f]()
is not very accurate in general) but the pessimization was larger than
for misclassified tiny positive args.
Now the max error for lgammaf(x) with |x| < 2**-21 is 0.7885 ulps, and
speed and accuracy are almost the same for positive and negative args
in this range. The maximum error overall is still infinity ulps.
A cutoff of 2**-70 is probably wastefully small for the double precision
case. Smaller cutoffs can be used to reduce the max error to nearly
0.5 ulps for tiny args, but this is useless since the general algrorithm
for nearly-tiny args is not nearly that accurate -- it has a max error of
about 1 ulp.
gives a tiny but hopefully always free optimization in the 2 quadrants
to which it applies. On Athlons, it reduces maximum latency by 4 cycles
in these quadrants but has usually has a smaller effect on total time
(typically ~2 cycles (~5%), but sometimes 8 cycles when the compiler
generates poor code).
of the function name.
Added my (non-)copyright.
In k_tanf.c, added the first set of redundant parentheses to control
grouping which was claimed to be added in the previous commit.
returning float). The functions are renamed from __kernel_{cos,sin}f()
to __kernel_{cos,sin}df() so that misuses of them will cause link errors
and not crashes.
This version is an almost-routine translation with no special optimizations
for accuracy or efficiency. The not-quite-routine part is that in
__kernel_cosf(), regenerating the minimax polynomial with double
precision coefficients gives a coefficient for the x**2 term that is
not quite -0.5, so the literal 0.5 in the code and the related `hz'
variable need to be modified; also, the special code for reducing the
error in 1.0-x**2*0.5 is no longer needed, so it is convenient to
adjust all the logic for the x**2 term a little. Note that without
extra precision, it would be very bad to use a coefficient of other
than -0.5 for the x**2 term -- the old version depends on multiplication
by -0.5 being infinitely precise so as not to need even more special
code for reducing the error in 1-x**2*0.5.
This gives an unimportant increase in accuracy, from ~0.8 to ~0.501
ulps. Almost all of the error is from the final rounding step, since
the choice of the minimax polynomials so that their contribution to the
error is a bit less than 0.5 ulps just happens to give contributions that
are significantly less (~.001 ulps).
An Athlons, for uniformly distributed args in [-2pi, 2pi], this gives
overall speed increases in the 10-20% range, despite giving a speed
decrease of typically 19% (from 31 cycles up to 37) for sinf() on args
in [-pi/4, pi/4].
libarchive doesn't make malloc(0) requests, so the autoconf
checks aren't needed and the autoconf workarounds for
broken malloc(0) just create problems.
Thanks to: Dan Nelson, who reports that this fixes libarchive on AIX 5.2
- Remove dead code that I forgot to remove in the previous commit.
- Calculate the sum of the lower terms of the polynomial (divided by
x**5) in a single expression (sum of odd terms) + (sum of even terms)
with parentheses to control grouping. This is clearer and happens to
give better instruction scheduling for a tiny optimization (an
average of about ~0.5 cycles/call on Athlons).
- Calculate the final sum in a single expression with parentheses to
control grouping too. Change the grouping from
first_term + (second_term + sum_of_lower_terms) to
(first_term + second_term) + sum_of_lower_terms. Normally the first
grouping must be used for accuracy, but extra precision makes any
grouping give a correct result so we can group for efficiency. This
is a larger optimization (average 3-4 cycles/call or 5%).
- Use parentheses to indicate that the C order of left to right evaluation
is what is wanted (for efficiency) in a multiplication too.
The old fdlibm code has several optimizations related to these. 2
involve doing an extra operation that can be done almost in parallel
on some superscalar machines but are pessimizations on sequential
machines. Others involve statement ordering or expression grouping.
All of these except the ordering for the combining the sums of the odd
and even terms seem to be ideal for Athlons, but parallelism is still
limited so all of these optimizations combined together with the ones
in this commit save only ~6-8 cycles (~10%).
On an AXP, tanf() on uniformly distributed args in [-2pi, 2pi] now
takes 39-59 cycles. I don't know of any more optimizations for tanf()
short of writing it all in asm with very MD instruction scheduling.
Hardware fsin takes 122-138 cycles. Most of the optimizations for
tanf() don't work very well for tan[l](). fdlibm tan() now takes
145-365 cycles.
A single polynomial approximation for tan(x) works in infinite precision
up to |x| < pi/2, but in finite precision, to restrict the accumulated
roundoff error to < 1 ulp, |x| must be restricted to less than about
sqrt(0.5/((1.5+1.5)/3)) ~= 0.707. We restricted it a bit more to
give a safety margin including some slop for optimizations. Now that
we use double precision for the calculations, the accumulated roundoff
error is in double-precision ulps so it can easily be made almost 2**29
times smaller than a single-precision ulp. Near x = pi/4 its maximum
is about 0.5+(1.5+1.5)*x**2/3 ~= 1.117 double-precision ulps.
The minimax polynomial needs to be different to work for the larger
interval. I didn't increase its degree the old degree is just large
enough to keep the final error less than 1 ulp and increasing the
degree would be a pessimization. The maximum error is now ~0.80
ulps instead of ~0.53 ulps.
The speedup from this optimization for uniformly distributed args in
[-2pi, 2pi] is 28-43% on athlons, depending on how badly gcc selected
and scheduled the instructions in the old version. The old version
has some int-to-float conversions that are apparently difficult to schedule
well, but gcc-3.3 somehow did everything ~10 cycles or ~10% faster than
gcc-3.4, with the difference especially large on AXPs. On A64s, the
problem seems to be related to documented penalties for moving single
precision data to undead xmm registers. With this version, the speed
is cycles is almost independent of the athlon and gcc version despite
the large differences in instruction selection to use the FPU on AXPs
and SSE on A64s.
This is a minor interface change. The function is renamed from
__kernel_tanf() to __kernel_tandf() so that misues of it will cause
link errors and not crashes.
This version is a routine translation with no special optimizations
for accuracy or efficiency. It gives an unimportant increase in
accuracy, from ~0.9 ulps to 0.5285 ulps. Almost all of the error is
from the minimax polynomial (~0.03 ulps and the final rounding step
(< 0.5 ulps). It gives strange differences in efficiency in the -5
to +10% range, with -O1 fairly consistently becoming faster and -O2
slower on AXP and A64 with gcc-3.3 and gcc-3.4.
arg to __kernel_rem_pio2() gives 53-bit (double) precision, not single
precision and/or the array dimension like I thought. prec == 2 is
used in e_rem_pio2.c for double precision although it is documented
to be for 64-bit (extended) precision, and I just reduced it by 1
thinking that this would give the value suitable for 24-bit (float)
precision. Reducing it 1 more to the documented value for float
precision doesn't actually work (it gives errors of ~0.75 ulps in the
reduced arg, but errors of much less than 0.5 ulps are needed; the bug
seems to be in kernel_rem_pio2.c). Keep using a value 1 larger than
the documented value but supply an array large enough hold the extra
unused result from this.
The bug can also be fixed quickly by increasing init_jk[0] in
k_rem_pio2.c from 2 to 3. This gives behaviour identical to using
prec == 1 except it doesn't create the extra result. It isn't clear
how the precision bug affects higher precisions. 113-bit (quad) is
the largest precision, so there is no way to use a large precision
to fix it.
they can be #included in other .c files to give inline functions, and
use them to inline the functions in most callers (not in e_lgammaf_r.c).
__kernel_tanf() is too large and complicated for gcc to inline very well.
An athlons, this gives a speed increase under favourable pipeline
conditions of about 10% overall (larger for AXP, smaller for A64).
E.g., on AXP, sinf() on uniformly distributed args in [-2Pi, 2Pi]
now takes 30-56 cycles; it used to take 45-61 cycles; hardware fsin
takes 65-129.
On athlons, this gives a speedup of 10-20% for tanf() on uniformly
distributed args in [-2Pi, 2Pi]. (It only directly applies for 43%
of the args and gives a 16-20% speedup for these (more for AXP than
A64) and this gives an overall speedup of 10-12% which is all that it
should; however, it gives an overall speedup of 17-20% with gcc-3.3
on AXP-A64 by mysteriously effected cases where it isn't executed.)
I originally intended to use double precision for all internals of
float trig functions and will probably still do this, but benchmarking
showed that converting to double precision and back is a pessimization
in cases where a simple float precision calculation works, so it may
be optimal to switch precisions only when using extra precision is
much simpler.
__ieee754_rem_pio2f() to its 3 callers and manually inline them.
On Athlons, with favourable compiler flags and optimizations and
favourable pipeline conditions, this gives a speedup of 30-40 cycles
for cosf(), sinf() and tanf() on the range pi/4 < |x| <= 9pi/4, so
thes functions are now signifcantly faster than the hardware trig
functions in many cases. E.g., in a benchmark with uniformly distributed
x in [-2pi, 2pi], A64 hardware fcos took 72-129 cycles and cosf() took
37-55 cycles. Out-of-order execution is needed to get both of these
times. The optimizations in this commit apparently work more by
removing 1 serialization point than by reducing latency.
s_cosf.c and s_sinf.c:
Use a non-bogus magic constant for the threshold of pi/4. It was 2 ulps
smaller than pi/4 rounded down, but its value is not critical so it should
be the result of natural rounding.
s_cosf.c and s_tanf.c:
Use a literal 0.0 instead of an unnecessary variable initialized to
[(float)]0.0. Let the function prototype convert to 0.0F.
Improved wording in some comments.
Attempted to improve indentation of comments.
number of branches.
Use a non-bogus magic constant for the threshold of pi/4. It was 2 ulps
smaller than pi/4 rounded down, but its value is not critical so it should
be the result of natural rounding. Use "<=" comparisons with rounded-
down thresholds for all small multiples of pi/4.
Cleaned up previous commit:
- use static const variables instead of expressions for multiples of pi/2
to ensure that they are evaluated at compile time. gcc currently
evaluates them at compile time but C99 compilers are not required
to do so. We want compile time evaluation for optimization and don't
care about side effects.
- use M_PI_2 instead of a magic constant for pi/2. We need magic constants
related to pi/2 elsewhere but not here since we just want pi/2 rounded
to double and even prefer it to be rounded in the default rounding mode.
We can depend on the cmpiler being C99ish enough to round M_PI_2 correctly
just as much as we depended on it handling hex constants correctly. This
also fixes a harmless rounding error in the hex constant.
- keep using expressions n*<value for pi/2> in the initializers for the
static const variables. 2*M_PI_2 and 4*M_PI_2 are obviously rounded in
the same way as the corresponding infinite precision expressions for
multiples of pi/2, and 3*M_PI_2 happens to be rounded like this, so we
don't need magic constants for the multiples.
- fixed and/or updated some comments.
define also, but res_config.h was not included into libc/net/name6.c.
So getipnodebyaddr() ignored the multiple PTRs.
PR: kern/88241
Submitted by: Dan Lukes <dan__at__obluda.cz>
MFC after: 3 days
The threshold for not being tiny was too small. Use the usual 2**-12
threshold. This change is not just an optimization, since the general
code that we fell into has accuracy problems even for tiny x. Avoiding
it fixes 2*1366 args with errors of more than 1 ulp, with a maximum
error of 1.167 ulps.
The magic number 22 is log(DBL_EPSILON)/2 plus slop. This is bogus
for float precision. Use 9 (~log(FLT_EPSILON)/2 plus less slop than
for double precision). The code for handling the interval
[2**-28, 9_was_22] has accuracy problems even for [9, 22], so this
change happens to fix errors of more than 1 ulp in about 2*17000
cases. It leaves such errors in about 2*1074000 cases, with a max
error of 1.242 ulps.
The threshold for switching from returning exp(x)/2 to returning
exp(x/2)^2/2 was a little smaller than necessary. As for coshf(),
This was not quite harmless since the exp(x/2)^2/2 case is inaccurate,
and fixing it avoids accuracy problems in 2*6 cases, leaving problems
in 2*19997 cases.
Fixed naming errors in pseudo-code in comments.
The threshold for not being tiny was confusing and too small. Use the
usual 2**-12 threshold and simplify the algorithm slightly so that
this threshold works (now use the threshold for sinhf() instead of one
for 1+expm1()). This is just a small optimization.
The magic number 22 is log(DBL_EPSILON)/2 plus slop. This is bogus
for float precision. Use 9 (~log(FLT_EPSILON)/2 plus less slop than
for double precision).
The threshold for switching from returning exp(x)/2 to returning
exp(x/2)^2/2 was a little smaller than necessary. This was not quite
harmless since the exp(x/2)^2/2 case is inaccurate. Fixing it happens
to avoid accuracy problems for 2*6 of the 2*151 args that were handled
by the exp(x)/2 case. This leaves accuracy problems for about 2*19997
args near the overflow threshold (~89); the maximum error there is
2.5029 ulps.
There are also accuracy probles for args in +-[0.5*ln2, 9] -- 2*188885
args with errors of more than 1 ulp, with a maximum error of 1.384 ulps.
Fixed a syntax error and naming errors in pseudo-code in comments.
specialized for float precision. The new polynomial has degree 8
instead of 14, and a maximum error of 2**-34.34 (absolute) instead of
2**-30.66. This doesn't affect the final error significantly; the
maximum error was and is about 0.8879 ulps on amd64 -01.
The fdlibm expf() is not used on i386's (the "optimized" asm version
is used), but probably should be since it was already significantly
faster than the asm version on athlons. The asm version has the
advantage of being more accurate, so keep using it for now.
polynomial for __kernel_tanf(). The old one was the double-precision
polynomial with coefficients truncated to float. Truncation is not
a good way to convert minimax polynomials to lower precision. Optimize
for efficiency and use the lowest-degree polynomial that gives a
relative error of less than 1 ulp. It has degree 13 instead of 27,
and happens to be 2.5 times more accurate (in infinite precision) than
the old polynomial (the maximum error is 0.017 ulps instead of 0.041
ulps).
Unlike for cosf and sinf, the old accuracy was close to being inadequate
-- the polynomial for double precision has a max error of 0.014 ulps
and nearly this small an error is needed. The new accuracy is also a
bit small, but exhaustive checking shows that even the old accuracy
was enough. The increased accuracy reduces the maximum relative error
in the final result on amd64 -O1 from 0.9588 ulps to 0.9044 ulps.
special case pi/4 <= |x| < 3*pi/4. This gives a tiny optimization (it
saves 2 subtractions, which are scheduled well so they take a whole 1
cycle extra on an AthlonXP), and simplifies the code so that the
following optimization is not so ugly.
Optimize for the range 3*pi/4 < |x| < 9*Pi/2 in the same way. On
Athlon{XP,64} systems, this gives a 25-40% optimization (depending a
lot on CFLAGS) for the cosf() and sinf() consumers on this range.
Relative to i387 hardware fcos and fsin, it makes the software versions
faster in most cases instead of slower in most cases. The relative
optimization is smaller for tanf() the inefficient part is elsewhere.
The 53-bit approximation to pi/2 is good enough for pi/4 <= |x| <
3*pi/4 because after losing up to 24 bits to subtraction, we still
have 29 bits of precision and only need 25 bits. Even with only 5
extra bits, it is possible to get perfectly rounded results starting
with the reduced x, since if x is nearly a multiple of pi/2 then x is
not near a half-way case and if x is not nearly a multiple of pi/2
then we don't lose many bits. With our intentionally imperfect rounding
we get the same results for cosf(), sinf() and tanf() as without this
optimization.
standard in C99 and POSIX.1-2001+. They are also not deprecated, since
apart from being standard they can handle special args slightly better
than the ilogb() functions.
Move their documentation to ilogb.3. Try to use consistent and improved
wording for both sets of functions. All of ieee854, C99 and POSIX
have better wording and more details for special args.
Add history for the logb() functions and ilogbl(). Fix history for
ilogb().
so that it can be faster for tiny x and avoided for reduced x.
This improves things a little differently than for cosine and sine.
We still need to reclassify x in the "kernel" functions, but we get
an extra optimization for tiny x, and an overall optimization since
tiny reduced x rarely happens. We also get optimizations for space
and style. A large block of poorly duplicated code to fix a special
case is no longer needed. This supersedes the fixes in k_sin.c revs
1.9 and 1.11 and k_sinf.c 1.8 and 1.10.
Fixed wrong constant for the cutoff for "tiny" in tanf(). It was
2**-28, but should be almost the same as the cutoff in sinf() (2**-12).
The incorrect cutoff protected us from the bugs fixed in k_sinf.c 1.8
and 1.10, except 4 cases of reduced args passed the cutoff and needed
special handling in theory although not in practice. Now we essentially
use a cutoff of 0 for the case of reduced args, so we now have 0 special
args instead of 4.
This change makes no difference to the results for sinf() (since it
only changes the algorithm for the 4 special args and the results for
those happen not to change), but it changes lots of results for sin().
Exhaustive testing is impossible for sin(), but exhaustive testing
for sinf() (relative to a version with the old algorithm and a fixed
cutoff) shows that the changes in the error are either reductions or
from 0.5-epsilon ulps to 0.5+epsilon ulps. The new method just uses
some extra terms in approximations so it tends to give more accurate
results, and there are apparently no problems from having extra
accuracy. On amd64 with -O1, on all float args the error range in ulps
is reduced from (0.500, 0.665] to [0.335, 0.500) in 24168 cases and
increased from 0.500-epsilon to 0.500+epsilon in 24 cases. Non-
exhaustive testing by ucbtest shows no differences.
commit moved it). This includes a comment that the "kernel" sine no
longer works on arg -0, so callers must now handle this case. The kernel
sine still works on all other tiny args; without the optimization it is
just a little slower on these args. I intended it to keep working on
all tiny args, but that seems to be impossible without losing efficiency
or accuracy. (sin(x) ~ x * (1 + S1*x**2 + ...) would preserve -0, but
the approximation must be written as x + S1*x**3 + ... for accuracy.)
case never occurs since pi/2 is irrational so no multiple of it can
be represented as a float and we have precise arg reduction so we never
end up with a remainder of 0 in the "kernel" function unless the
original arg is 0.
If this case occurs, then we would now fall through to general code
that returns +-Inf (depending on the sign of the reduced arg) instead
of forcing +Inf. The correct handling would be to return NaN since
we would have lost so much precision that the correct result can be
anything _except_ +-Inf.
Don't reindent the else clause left over from this, although it was already
bogusly indented ("if (foo) return; else ..." just marches the indentation
to the right), since it will be removed too.
Index: k_tan.c
===================================================================
RCS file: /home/ncvs/src/lib/msun/src/k_tan.c,v
retrieving revision 1.10
diff -r1.10 k_tan.c
88,90c88
< if (((ix | low) | (iy + 1)) == 0)
< return one / fabs(x);
< else {
---
> {
a declaration was not translated to "float" although bit fiddling on
double variables was translated. This resulted in garbage being put
into the low word of one of the doubles instead of non-garbage being
put into the only word of the intended float. This had no effect on
any result because:
- with doubles, the algorithm for calculating -1/(x+y) is unnecessarily
complicated. Just returning -1/((double)x+y) would work, and the
misdeclaration gave something like that except for messing up some
low bits with the bit fiddling.
- doubles have plenty of bits to spare so messing up some of the low
bits is unlikely to matter.
- due to other bugs, the buggy code is reached for a whole 4 args out
of all 2**32 float args. The bug fixed by 1.8 only affects a small
percentage of cases and a small percentage of 4 is 0. The 4 args
happen to cause no problems without 1.8, so they are even less likely
to be affected by the bug in 1.8 than average args; in fact, neither
1.8 nor this commit makes any difference to the result for these 4
args (and thus for all args).
Corrections to the log message in 1.8: the bug only applies to tan()
and not tanf(), not because the float type can't represent numbers
large enough to trigger the problem (e.g., the example in the fdlibm-5.3
readme which is > 1.0e269), but because:
- the float type can't represent small enough numbers. For there to be
a possible problem, the original arg for tanf() must lie very near an
odd multiple of pi/2. Doubles can get nearer in absolute units. In
ulps there should be little difference, but ...
- ... the cutoff for "small" numbers is bogus in k_tanf.c. It is still
the double value (2**-28). Since this is 32 times smaller than
FLT_EPSILON and large float values are not very uniformly distributed,
only 6 args other than ones that are initially below the cutoff give
a reduced arg that passes the cutoff (the 4 problem cases mentioned
above and 2 non-problem cases).
Fixing the cutoff makes the bug affect tanf() and much easier to detect
than for tan(). With a cutoff of 2**-12 on amd64 with -O1, 670102
args pass the cutoff; of these, there are 337604 cases where there
might be an error of >= 1 ulp and 5826 cases where there is such an
error; the maximum error is 1.5382 ulps.
The fix in 1.8 works with the reduced cutoff in all cases despite the
bug in it. It changes the result in 84492 cases altogether to fix the
5826 broken cases. Fixing the fix by translating "double" to "float"
changes the result in 42 cases relative to 1.8. In 24 cases the
(absolute) error is increased and in 18 cases it is reduced, but it
remains less than 1 ulp in all cases.
rewritten, now timers created with same sigev_notify_attributes will
run in same thread, this allows user to organize which timers can
run in same thread to save some thread resource.
The log message for 1.5 said that some small (one or two ulp) inaccuracies
were fixed, and a comment implied that the critical change is to switch
the rounding mode to to-nearest, with a switch of the precision to
extended at no extra cost. Actually, the errors are very large (ucbtest
finds ones of several hundred ulps), and it is the switch of the
precision that is critical.
Another comment was wrong about NaNs being handled sloppily.
and medium size args too: instead of conditionally subtracting a float
17+24, 17+17+24 or 17+17+17+24 bit approximation to pi/2, always
subtract a double 33+53 bit one. The float version is now closer to
the double version than to old versions of itself -- it uses the same
33+53 bit approximation as the simplest cases in the double version,
and where the float version had to switch to the slow general case at
|x| == 2^7*pi/2, it now switches at |x| == 2^19*pi/2 the same as the
double version.
This speeds up arg reduction by a factor of 2 for |x| between 3*pi/4 and
2^7*pi/4, and by a factor of 7 for |x| between 2^7*pi/4 and 2^19*pi/4.
using under FreeBSD. Before this commit, all float precision functions
except exp2f() were implemented using only float precision, apparently
because Cygnus needed this in 1993 for embedded systems with slow or
inefficient double precision. For FreeBSD, except possibly on systems
that do floating point entirely in software (very old i386 and now
arm), this just gives a more complicated implementation, many bugs,
and usually worse performance for float precision than for double
precision. The bugs and worse performance were particulary large in
arg reduction for trig functions. We want to divide by an approximation
to pi/2 which has as many as 1584 bits, so we should use the widest
type that is efficient and/or easy to use, i.e., double. Use fdlibm's
__kernel_rem_pio2() to do this as Sun apparently intended. Cygnus's
k_rem_pio2f.c is now unused. e_rem_pio2f.c still needs to be separate
from e_rem_pio2.c so that it can be optimized for float args. Similarly
for long double precision.
This speeds up cosf(x) on large args by a factor of about 2. Correct
arg reduction on large args is still inherently very slow, so hopefully
these args rarely occur in practice. There is much more efficiency
to be gained by using double precision to speed up arg reduction on
medium and small float args.
__kernel_sinf(). The old ones were the double-precision polynomials
with coefficients truncated to float. Truncation is not a good way
to convert minimax polynomials to lower precision. Optimize for
efficiency and use the lowest-degree polynomials that give a relative
error of less than 1 ulp -- degree 8 instead of 14 for cosf and degree
9 instead of 13 for sinf. For sinf, the degree 8 polynomial happens
to be 6 times more accurate than the old degree 14 one, but this only
gives a tiny amount of extra accuracy in results -- we just need to
use a a degree high enough to give a polynomial whose relative accuracy
in infinite precision (but with float coefficients) is a small fraction
of a float ulp (fdlibm generally uses 1/32 for the small fraction, and
the fraction for our degree 8 polynomial is about 1/600).
The maximum relative errors for cosf() and sinf() are now 0.7719 ulps
and 0.7969 ulps, respectively.
This supersedes the fix for the old algorithm in rev.1.8 of k_cosf.c.
I want this change mainly because it is an optimization. It helps
make software cos[f](x) and sin[f](x) faster than the i387 hardware
versions for small x. It is also a simplification, and reduces the
maximum relative error for cosf() and sinf() on machines like amd64
from about 0.87 ulps to about 0.80 ulps. It was validated for cosf()
and sinf() by exhaustive testing. Exhaustive testing is not possible
for cos() and sin(), but ucbtest reports a similar reduction for the
worst case found by non-exhaustive testing. ucbtest's non-exhaustive
testing seems to be good enough to find problems in algorithms but not
maximum relative errors when there are spikes. E.g., short runs of
it find only 3 ulp error where the i387 hardware cos() has an error
of about 2**40 ulps near pi/2.
to floats (mainly i386's). All errors of more than 1 ulp for float
precision trig functions were supposed to have been fixed; however,
compiling with gcc -O2 uncovered 18250 more such errors for cosf(),
with a maximum error of 1.409 ulps.
Use essentially the same fix as in rev.1.8 of k_rem_pio2f.c (access a
non-volatile variable as a volatile). Here the -O1 case apparently
worked because the variable is in a 2-element array and it takes -O2
to mess up such a variable by putting it in a register.
The maximum error for cosf() on i386 with gcc -O2 is now 0.5467 (it
is still 0.5650 with gcc -O1). This shows that -O2 still causes some
extra precision, but the extra precision is now good.
Extra precision is harmful mainly for implementing extra precision in
software. We want to represent x+y as w+r where both "+" operations
are in infinite precision and r is tiny compared with w. There is a
standard algorithm for this (Knuth (1981) 4.2.2 Theorem C), and fdlibm
uses this routinely, but the algorithm requires w and r to have the
same precision as x and y. w is just x+y (calculated in the same
finite precision as x and y), and r is a tiny correction term. The
i386 gcc bugs tend to give extra precision in w, and then using this
extra precision in the calculation of r results in the correction
mostly staying in w and being missing from r. There still tends to
be no problem if the result is a simple expression involving w and r
-- modulo spills, w keeps its extra precision and r remains the right
correction for this wrong w. However, here we want to pass w and r
to extern functions. Extra precision is not retained in function args,
so w gets fixed up, but the change to the tiny r is tinier, so r almost
remains as a wrong correction for the right w.
{cos_sin}[f](x) so that x doesn't need to be reclassified in the
"kernel" functions to determine if it is tiny (it still needs to be
reclassified in the cosine case for other reasons that will go away).
This optimization is quite large for exponentially distributed x, since
x is tiny for almost half of the domain, but it is a pessimization for
uniformally distributed x since it takes a little time for all cases
but rarely applies. Arg reduction on exponentially distributed x
rarely gives a tiny x unless the reduction is null, so it is best to
only do the optimization if the initial x is tiny, which is what this
commit arranges. The imediate result is an average optimization of
1.4% relative to the previous version in a case that doesn't favour
the optimization (double cos(x) on all float x) and a large
pessimization for the relatively unimportant cases of lgamma[f][_r](x)
on tiny, negative, exponentially distributed x. The optimization should
be recovered for lgamma*() as part of fixing lgamma*()'s low-quality
arg reduction.
Fixed various wrong constants for the cutoff for "tiny". For cosine,
the cutoff is when x**2/2! == {FLT or DBL}_EPSILON/2. We round down
to an integral power of 2 (and for cos() reduce the power by another
1) because the exact cutoff doesn't matter and would take more work
to determine. For sine, the exact cutoff is larger due to the ration
of terms being x**2/3! instead of x**2/2!, but we use the same cutoff
as for cosine. We now use a cutoff of 2**-27 for double precision and
2**-12 for single precision. 2**-27 was used in all cases but was
misspelled 2**27 in comments. Wrong and sloppy cutoffs just cause
missed optimizations (provided the rounding mode is to nearest --
other modes just aren't supported).
readable on certain random memory configurations. If the libkvm consumer
tried to read something that was in the very last pdpe, pde or pte slot,
it would bogusly fail.
This is broken in RELENG_6 too.
expr and printf are not available during installworld, so
use /bin/sh arithmetic expansion instead of expr and simply
give up on vanity formatting. ;-)
systems (or on FreeBSD systems when using ports).
2) Overhaul the versioning logic. In particular,
SHLIB_MAJOR number is now computed as "major+minor",
which ensures library versions are the same for
the FreeBSD build system and the portable
libtool/autoconf/automake build system.
link names, usernames, or group names that contain
non-ASCII characters.
In particular, this corrects an inconsistency reported
by Ed Maste when archiving symlinks with odd characters:
long symlinks would get preserved, short ones would
be changed.
broken assignment to floats (e.g., i386 with gcc -O, but not amd64 or
ia64; i386 with gcc -O0 worked accidentally).
Use an unnamed volatile temporary variable to trick gcc -O into clipping
extra precision on assignment. It's surprising that only 1 place needed
to be changed.
For tanf() on i386 with gcc -O, the bug caused errors > 1 ulp with a
density of 2.3% for args larger in magnitude than 128*pi/2, with a
maximum error of 1.624 ulps.
After this fix, exhaustive testing shows that range reduction for
floats works as intended assuming that it is in within a factor of
about 2^16 of working as intended for doubles. It provides >= 8
extra bits of precision for all ranges. On i386:
range max error in double/single ulps extra precision
----- ------------------------------- ---------------
0 to 3*pi/4 0x000d3132 / 0.0016 9+ bits
3*pi/4 to 128*pi/2 0x00160445 / 0.0027 8+
128*pi/2 to +Inf 0x00000030 / 0.00000009 23+
128*pi/2 up, -O0 before fix 0x00000030 / 0.00000009 23+
128*pi/2 up, -O1 before fix 0x10000000 / 0.5 1
The 23+ bits of extra precision for large multiples corresponds to almost
perfect reduction to a pair of floats (24 extra would be perfect).
After this fix, the maximum relative error (relative to the corresponding
fdlibm double precision function) is < 1 ulp for all basic trig functions
on all 2^32 float args on all machines tested:
amd64 ia64 i386-O0 i386-O1
------ ------ ------ ------
cosf: 0.8681 0.8681 0.7927 0.5650
sinf: 0.8733 0.8610 0.7849 0.5651
tanf: 0.9708 0.9329 0.9329 0.7035
of pi/2 (1 line) and expand a comment about related magic (many lines).
The bug was essentially the same as for the +-pi/2 case (a mistranslated
mask), but was smaller so it only significantly affected multiples
starting near +-13*pi/2. At least on amd64, for cosf() on all 2^32
float args, the bug caused 128 errors of >= 1 ulp, with a maximum error
of 1.2393 ulps.
and add a comment about related magic (many lines)).
__kernel_cos[f]() needs a trick to reduce the error to below 1 ulp
when |x| >= 0.3 for the range-reduced x. Modulo other bugs, naive
code that doesn't use the trick would have an error of >= 1 ulp
in about 0.00006% of cases when |x| >= 0.3 for the unreduced x,
with a maximum relative error of about 1.03 ulps. Mistransation
of the trick from the double precision case resulted in errors in
about 0.2% of cases, with a maximum relative error of about 1.3 ulps.
The mistranslation involved not doing implicit masking of the 32-bit
float word corresponding to to implicit masking of the lower 32-bit
double word by clearing it.
sinf() uses __kernel_cosf() for half of all cases so its errors from
this bug are similar. tanf() is not affected.
The error bounds in the above and in my other recent commit messages
are for amd64. Extra precision for floats on i386's accidentally masks
this bug, but only if k_cosf.c is compiled with -O. Although the extra
precision helps here, this is accidental and depends on longstanding
gcc precision bugs (not clipping extra precision on assignment...),
and the gcc bugs are mostly avoided by compiling without -O. I now
develop libm mainly on amd64 systems to simplify error detection and
debugging.
17+17+24 bit pi/2 must only be used when subtraction of the first 2
terms in it from the arg is exact. This happens iff the the arg in
bits is one of the 2**17[-1] values on each side of (float)(pi/2).
Revert to the algorithm in rev.1.7 and only fix its threshold for using
the 3-term pi/2. Use the threshold that maximizes the number of values
for which the 3-term pi/2 is used, subject to not changing the algorithm
for comparing with the threshold. The 3-term pi/2 ends up being used
for about half of its usable range (about 64K values on each side).
a maximum error of 2.905 ulps for cosf(), but the algorithm for cosf()
is good for < 1 ulps and happens to give perfect rounding (< 0.5 ulps)
near +-pi/2 except for the bug. The extra relative errors for tanf()
were similar (slightly larger). The bug didn't affect sinf() since
sinf'(+-pi/2) is 0.
For range reduction in ~[-3pi/4, -pi/4] and ~[pi/4, 3pi/4] we must
subtract +-pi/2 and the only complication is that this must be done
in extra precision. We have handy 17+24-bit and 17+17+24-bit
approximations to pi/2. If we always used the former then we would
lose up to 24 bits of accuracy due to cancelation of leading bits, but
we need to keep at least 24 bits plus a guard digit or 2, and should
keep as many guard bits as efficiency permits. So we used the
less-precise pi/2 not very near +-pi/2 and switched to using the
more-precise pi/2 very near +-pi/2. However, we got the threshold for
the switch wrong by allowing 19 bits to cancel, so we ended up with
only 21 or 22 bits of accuracy in some cases, which is even worse than
naively subtracting pi/2 would have done.
Exhaustive checking shows that allowing only 17 bits to cancel (min.
accuracy ~24 bits) is sufficient to reduce the maximum error for cosf()
near +-pi/2 to 0.726 ulps, but allowing only 6 bits to cancel (min.
accuracy ~35-bits) happens to give perfect rounding for cosf() at
little extra cost so we prefer that.
We actually (in effect) allow 0 bits to cancel and always use the
17+17+24-bit pi/2 (min. accuracy ~41 bits). This is simpler and
probably always more efficient too. Classifying args to avoid using
this pi/2 when it is not needed takes several extra integer operations
and a branch, but just using it takes only 1 FP operation.
The patch also fixes misspelling of 17 as 24 in many comments.
For the double-precision version, the magic numbers include 33+53 bits
for the less-precise pi/2 and (53-32-1 = 20) bits being allowed to
cancel, so there are ~33-20 = 13 guard bits. This is sufficient except
probably for perfect rounding. The more-precise pi/2 has 33+33+53
bits and we still waste time classifying args to avoid using it.
The bug is apparently from mistranslation of the magic 32 in 53-32-1.
The number of bits allowed to cancel is not critical and we use 32 for
double precision because it allows efficient classification using a
32-bit comparison. For float precision, we must use an explicit mask,
and there are fewer bits so there is less margin for error in their
allocation. The 32 got reduced to 4 but should have been reduced
almost in proportion to the reduction of mantissa bits.
in 1993 in rev.1.5 of the i386 a.out version (csu/i386/crt0.c).
Profiling uses a magic label "eprol" to delimit the start of the part
of the text section covered by profiling. This label must be placed
before the call to main() to get main() properly profiled. It was
placed there in rev.1.1 of crt0.c. Rev.1.5 imported the initial
implementation of shared libraries in FreeBSD and misplaced the label.
Fortunately, the misplaced label was misspelled and the old label
wasn't removed, so the new label had no effect. Unfortunately, when
profiling was implemented for the ELF in 1998 in rev.1.2 of
csu/i386-elf/crt1.c, only the incorrectly placed label was copied
(after fixing its name). The bug was then copied to all other arches.
The label seems to be still misplaced in NetBSD for most arches. It
is in common.c for most arches so it is even further from being inside
the function that calls main().
I think "eprol" is short for "end of prologue", but it must be placed
before the end of the prologue so that it covers main(). crt0.c has
it before the calls atexit(_mcleanup) and monstartup(...), but it
cannot affect these calls so I moved it after the call to monstartup().
It now also covers the call to _init() but not the newer call to
_init_tls(). Profiling of _init() seems to be harmless, and the call
to _init_tls() seems to be misplaced.
Reviewed by: jdp (long ago, for a slightly different i386 version)
host name. This is matches the documented behaviro. The previous
behavior would remove the domain name even if the result retained a dot.
This fixes rsh connections from a.example.com to example.com.
Reviewed by: ceri (at least the concept)
o Don't reinitialise the atfork() handler list in the child. We
are meant to call the child handler, and on subsequent fork()s
should call all three functions as normal.
o Don't reinitialise the thread specific keyed data in the
child after a fork. Applications may require this for context.
o Reinitialise curthread->tlflags after removing ourselves from
(and reinitialising) the various internal thread lists.
o Reinitialise __malloc_lock in the child after fork() (to balance
our explicitly taking the lock prior to the fork()).
With these changes, it is possible to enable the NOTYET code in
thr_kern.c to allow the use of non-async-safe functions after
fork()ing from a threaded program.
Reviewed by: Daniel Eischen <deischen@freebsd.org>
[_malloc_lock reinitialisation has since been moved to avoid polluting the
!NOTYET code]
"HEADER" unless the open is successful. Instead, leave the state as
"NEW." In particular, if archive_read_open() fails, a subsequent call
to archive_read_next_header() will now cause an explicit assertion
failure instead of a silent segmentation fault.
This may need a little more work to fully realize the intention: If
archive_read_open() fails, you should be able to call it again on the
same archive handle to open a different archive (or the same archive
using a different mechanism).
sizeof(*list), not sizeof(**list). (i.e., sizeof(pointer) rather than
sizeof(char)).
It is possible that this buffer overflow is exploitable, but it was
added after RELENG_5 forked and hasn't been MFCed, so this will not
receive an advisory.
Submitted by: Vitezslav Novy
MFC after: 1 day
to doubles as bits. fdlibm-1.1 had similar aliasing bugs, but these
were fixed by NetBSD or Cygnus before a modified version of fdlibm was
imported in 1994. TRUNC() is only used by tgamma() and some
implementation-detail functions. The aliasing bugs were detected by
compiling with gcc -O2 but don't seem to have broken tgamma() on i386's
or amd64's. They broke my modified version of tgamma().
Moved the definition of TRUNC() to mathimpl.h so that it can be fixed
in one place, although the general version is even slower than necessary
because it has to operate on pointers to volatiles to handle its arg
sometimes being volatile. Inefficiency of the fdlibm macros slows
down libm generally, and tgamma() is a relatively unimportant part of
libm. The macros act as if on 32-bit words in memory, so they are
hard to optimize to direct actions on 64-bit double registers for
(non-i386) machines where this is possible. The optimization is too
hard for gcc on amd64's, and declaring variables as volatile makes it
impossible.
Change first MAXPATHLEN to more standard PATH_MAX
Change second MAXPATHLEN to 1024 (it is temp buffer not related)
Change comment to reflect that.
Suggested by: bde
just use MAXPATHLEN. It prevents potential buffer overflow with other
malloc implementations.
(this change based on submitted patch)
PR: 86135
Submitted by: Trevor Blackwell <tlb@tlb.org>
(wchar_t is defined in stddef.h, and only two files need more than that.)
Portability: Since the wchar requirements are really quite modest,
it's easy to define basic replacements for wcslen, wcscmp, wcscpy,
etc, for use on systems that lack <wchar.h>. In particular, this allows
libarchive to be used on older OpenBSD systems.
than the value of backlog argument.
- Document the fact that a subsequent listen(2) calls on the listening
socket change the backlog argument.
- Note that current listen queue lengths can be queried using netstat(1).
Submitted by: Igor Sysoev <is rambler-co.ru>
Wording by: gnn
It is the binary equivalent to strstr(3).
void *memmem(const void *big, size_t big_len,
const void *little, size_t little_len);
Submitted by: Pascal Gloor <pascal.gloor at spale.com>
MFC after: 3 days
correctly in the case of FTP_PROXY, because an empty FTP_PROXY has a
specific meaning ("don't use any proxy at all for ftp, even if HTTP_PROXY
is defined"), while an empty HTTP_PROXY has no meaning at all.
PR: bin/85185
Submitted by: Conall O'Brien <conallob=freebsd@maths.tcd.ie>
MFC after: 2 weeks
example in libmemstat.3 was not updated to take this rename into account.
Update the example.
PR: 84946
Submitted by: Wojciech A. Koszek <dunstan at freebsd dot czest dot pl>
MFC after: 1 day
inadvertently match a negative char in the RE being compiled.
This fixes compilation of "\376" (as an ERE) and "\376\376" (as a BRE).
PR: 84740
MFC after: 1 week
tokenizer.c:1.3). Contrary to the commit log there were no memory leaks,
but the change introduced a bug because the free'd pointer was not zeroed
and calling the appropriate _end() function would call free() a second time.
so that libmemstat can be used to view full memory statistics from
kernel core dumps and /dev/mem. This is provided via a new query
function, memstat_kvm_malloc(), which is also automatically invoked
by memstat_kvm_all(). A kvm handle must be passed in.
This will allow malloc(9)-specific code to be removed from vmstat(8).
opt_vmpage.h.
Remove definition of _KERNEL, it is no longer required in order to
include uma_int.h, as the sensitive parts of uma_int.h (a number of
inlines depending on kernel-only constants) are now protected by
_KERNEL.
extracted from tar archives. Otherwise, converting tar archives to
cpio format (with "bsdtar -cf out.cpio @in.tar") convert every entry
into a hard link to a single file. This simple logic breaks hard
links, but that's better than the alternative.
MFC after: 7 days
header of the pax extension entry, clip them to ustar limits. In particular,
this prevents an internal panic for very old files.
Thanks to: Chris Spiegel
MFC after: 7 days
GNU tar sparse files, people have extended cpio) and clarify an
important detail about pax format (that ustar-compliant archivers
can mostly read pax archives correctly).
MFC after: 7 days
that knows how to extract UMA(9) allocator statistics from a core dump or
live memory image using kvm(3). The caller is expected to provide the
necessary kvm_t handle, which is then used by libmemstat(3).
With these changes, it is trivially straight forward to re-introduce
vmstat -z support on core dumps, which was lost when UMA was introduced.
In the short term, this requires including vm/ include files that are not
intended for extra-kernel use, requiring in turn some ugliness.
- Move memory_type_list flushing logic from memstat_mtl_free() to
_memstat_mtl_empty(), a libmemstat-internal function that can
be called from other parts of the library. Invoke
_memstat_mtl_empty() from memstat_mtl_free(), which also frees
the containing list structure.
Invoke _memstat_mtl_empty() instead of memstat_mtl_free() in
various error cases in memstat_malloc.c and memstat_uma.c, which
previously resulted in the list being freed prematurely.
- Reverse the order of updating the mt_kegfree and mt_free fields
of the memory_type in memstat_uma.c, otherwise keg free items
won't be counted properly for non-secondary zones.
MFC after: 3 days
conversion routine, now change my mind and add one, memstat_strerror(3),
which returns a const char * pointer to a string describing the error,
to be used on the results of memstat_mtl_geterror().
While here, also correct a minor typo in the HISTORY man page.
Pointers on improving ease of internationalization would be
appreciated.
MFC after: 1 day
- Define a set of libmemstat(3) error constants, which are used by all
libmemstat(3) methods except for memstat_mtl_alloc(), which allocates
a memory type list and may return ENOMEM via errno.
- Define a per-memory_type_list current error value, which is set when a
call associated with a memory list fails. This requires wrapping a
structure around the queue(9) list head data structure, but this change
is not visible to libmemstat(3) consumers due to using access methods.
- Add a new accessor method, memstat_mtl_geterror() to retrieve the error
number.
- Consistently set the error number in a number of failure modes where
previously some combination of setting errno and printf'ing error
descriptions was used. libmemstat(3) will now no longer print to stdio
under any circumstances. Returns of NULL/-1 for errors remain the
same.
This avoids use of stdio, misuse of error numbers, and should make it
easier to program a libmemstat(3) consumer able to print useful error
messages. Currently, no error-to-string function is provided, as I'm
unsure how to address internationalization concerns.
MFC after: 1 day
try and discourage use outside the library.
Remove duplicate declaration of memstat_mtl_free() from memstat_internal.h,
as it's not internal, and the memstat.h definition suffices.
on top of a primary zone, sharing the same allocation "keg". When
reporting statistics for zones, do not report the free items in the
keg as part of the free items in the zone, or those free items will
be reported more than once: for the primary zone, and then any
secondary zones off the primary zone. Separately record and maintain
a kegfree statistic, and export via memstat_get_kegfree(), which is
available for use if needed. Since items free'd back to the keg are
not fully initialized, and hence may not actually be available (since
secondary zone ctor-time initialization can fail), this makes some
amount of sense.
This change corrects a bug made visible in the libmemstat(3)
modifications to netstat: mbufs freed back to the keg from the
packet zone would be counted twice, resulting in negative values
being printed in the mbuf free count.
Some further refinement of reporting relating to secondary zones may
still be required.
Reported by: ssouhlal
MFC after: 3 days
MEMSTAT_MAXCALLER (8), and expose MEMSTAT_MAXCALLER via memstat.h so
that applications can check their assumptions about how many slots
are available.
Remove 'spare' memory storage in struct malloc_type, since we now
don't expose the data structure internals to applications and rely
on accessor methods, this approach to ABI stability isn't required.
MFC after: 7 days
applications in tracking kernel memory statistics. It provides an
abstracted interface to uma(9) and malloc(9) statistics, wrapped
around the recently added binary stream sysctls for the allocators.
Using this interface, it is easy to build monitoring tools, query
specific memory types for usage information, etc. Facilities are
provided for binding caller-provided data to memory types,
incremental updates of memory types, and queries that span multiple
allocators.
Support for additional allocators is (relatively) easy to add.
The API for libmemstat(3) will probably change some over time as
consumers are written, and requirements evolve. It is written to
avoid encoding ABIs for data structure layout into consuming
applications for this reason.
MFC after: 1 week
- It is acceptable to call free(3) when the given pointer itself
is NULL, so we do not need to determine NULL before passing
a pointer to free(3)
- Handle failure of malloc(3)
MT6/5 Candidate
Submitted by: Dan Lukes <dan at obluda cz>
PR: bin/83352
if _FREEFALL_CONFIG is set gcc bails since pam_sm_setcred() in pam_krb5.c
no longer uses any of its parameters.
Pointy hat: kensmith
Approved by: re (scottl)
and writev() except that they take an additional offset argument and do
not change the current file position. In SAT speak:
preadv:readv::pread:read and pwritev:writev::pwrite:write.
- Try to reduce code duplication some by merging most of the old
kern_foov() and dofilefoo() functions into new dofilefoo() functions
that are called by kern_foov() and kern_pfoov(). The non-v functions
now all generate a simple uio on the stack from the passed in arguments
and then call kern_foov(). For example, read() now just builds a uio and
calls kern_readv() and pwrite() just builds a uio and calls kern_pwritev().
PR: kern/80362
Submitted by: Marc Olzheim marcolz at stack dot nl (1)
Approved by: re (scottl)
MFC after: 1 week
a tty device instead of the legacy minor number approach. This is known to
fix gnome-vfs' sftp module as well as kio_sftp and kdesu on -CURRENT.
Thanks to scottl for the snprintf() approach idea.
Reviewed by: phk
Tested by: pav
mich
Approved by: re (scottl)
branches but missed HEAD. This patch extends his a little bit,
setting it up via the Makefiles so that adding _FREEFALL_CONFIG
to /etc/make.conf is the only thing needed to cluster-ize things
(current setup also requires overriding CFLAGS).
From Peter's commit to the RELENG_* branches:
> Add the freebsd.org custer's source modifications under #ifdefs to aid
> keeping things in sync. For ksu:
> * install suid-root by default
> * don't fall back to asking for a unix password (ie: be pure kerberos)
> * allow custom user instances for things like www and not just root
The Makefile tweaks will be MFC-ed, the rest is already done.
MFC after: 3 days
Approved by: re (dwhite)
- Allow libpmc(3) to support P4/EMT64 PMCs on the amd64 architecture
and AMD K8 PMCs on the i386. [2]
Submitted by: ps [1]
Pointy hat: myself [2]
Approved by: re (scottl)
- pmcstat(8) gprof output mode fixes:
lib/libpmc/pmclog.{c,h}, sys/sys/pmclog.h:
+ Add a 'is_usermode' field to the PMCLOG_PCSAMPLE event
+ Add an 'entryaddr' field to the PMCLOG_PROCEXEC event,
so that pmcstat(8) can determine where the runtime loader
/libexec/ld-elf.so.1 is getting loaded.
sys/kern/kern_exec.c:
+ Use a local struct to group the entry address of the image being
exec()'ed and the process credential changed flag to the exec
handling hook inside hwpmc(4).
usr.sbin/pmcstat/*:
+ Support "-k kernelpath", "-D sampledir".
+ Implement the ELF bits of 'gmon.out' profile generation in a new
file "pmcstat_log.c". Move all log related functions to this
file.
+ Move local definitions and prototypes to "pmcstat.h"
- Other bug fixes:
+ lib/libpmc/pmclog.c: correctly handle EOF in pmclog_read().
+ sys/dev/hwpmc_mod.c: unconditionally log a PROCEXIT event to all
attached PMCs when a process exits.
+ sys/sys/pmc.h: correct a function prototype.
+ Improve usage checks in pmcstat(8).
Approved by: re (blanket hwpmc)
Like on libthr, there is an i386_set_gsbase() stub implementation here
to avoid libc.so.5 issues. This should likely be a weak symbol and I
expect this will be fixed soon.
Approved by: re
returned an lseek offset in a "u_long *" value, which can't express >4GB
offsets on 32 bit machines (eg: PAE). Change to "off_t *" for all.
Support ELF crashdumps on i386 and amd64.
Support PAE crashdumps on i386. This is done by auto-detecting the
presence of the IdlePDPT which means that PAE is active.
I used Marcel's _kvm_pa2off strategy and ELF header reader for ELF support
on amd64. Paul Saab ported the amd64 changes to i386 and we implemented
the PAE support from there.
Note that gdb6 in the src tree uses whatever libkvm supports. If you want
to debug an old crash dump, you might want to keep an old libkvm.so handy
and use LD_PRELOAD or the like. This does not detect the old raw dump
format.
Approved by: re
cached state. Otherwise, a subsequent call to devinfo_init() would succeed
without reading the device tree from the kernel thinking that the cached
state was up to date since the generation count was the same. However,
since the cached state was actually free'd, attempts to examine the tree
after the second devinfo_init() would fail.
Reported by: Juho Vuori juho dot vuori at kepa dot fi
Submitted by: Stefan Farfeleder stefan at fafoe dot narf dot at
Approved by: re (dwhite)
MFC after: 1 week
Compliance Definition. On sparc64, GCC emits _Qp_cmp() calls for its
__builtin_isfoo() functions which are used for C99's isfoo() macros.
Approved by: re(dwhite)
PR: 73782
to point at libmap.conf(5). This will help answer questions about what
and why it is, although not in great detail.
Approved by: re (scottl)
MFC after: 1 week
MFC note: When MFC'd, don't MFC mention of work not yet MFC'd.
method of executing commands remotely. There are no rexec clients in
the FreeBSD tree, and the client function rexec(3) is present only in
libcompat. It has been documented as "obsolete" since 4.3BSD, and its
use has been discouraged in the man page for over 10 years.
reflects the actual behavior of the API
for listing extended attributes.
PR: docs/79261
Submitted by: rodrigc
Reviewed by: rwatson, kan
Approved by: das (mentor)
- Implement sampling modes and logging support in hwpmc(4).
- Separate MI and MD parts of hwpmc(4) and allow sharing of
PMC implementations across different architectures.
Add support for P4 (EMT64) style PMCs to the amd64 code.
- New pmcstat(8) options: -E (exit time counts) -W (counts
every context switch), -R (print log file).
- pmc(3) API changes, improve our ability to keep ABI compatibility
in the future. Add more 'alias' names for commonly used events.
- bug fixes & documentation.
aio_write(2) completion through kevent(2). This method does not work on
64-bit architectures. It was deprecated in FreeBSD 4.4. See revisions
1.87 and 1.70.2.7.
Change aio_physwakeup() to call psignal(9) directly rather than indirectly
through a timeout(9). Discussed with: bde
Correct a bug introduced in revision 1.65 that could result in premature
delivery of a signal if an lio_listio(2) consisted of a mixture of
direct/raw and queued I/O operations. Observed by: tegge
Eliminate a field from struct kaioinfo that is now unused.
Reviewed by: tegge
netent.
- Change 1st argument of getnetbyaddr() to an uint32_t on 64 bit
arch as well to confirm to POSIX-2001.
These changes break ABI compatibility on 64 bit arch.
There is similar padding issue for ai_addrlen of struct addrinfo.
However, it is leaved as is for now.
Discussed on: arch@, standards@ and current@
X-MFC after: never
compiling on IRIX and Solaris. Remove the "archive_check_magic" macro
that existed only to provide __func__ to the underlying __archive_check_magic
function.
Thanks to: Darin Broady
MFC after: 14 days
from mode before using mode for extended attributes entry, copy
mtime/atime/ctime to extended attributes entry so it's a little more
clear that it corresponds to the like-named regular entry.
MFC after: 14 days
BZ_NO_COMPRESS support to the bzip2 sources directly (yes, this takes file
off the vendor branch, but looks like bzip2 maintainer doesn't care), so that
it will not be removed when the next upgrade is performed. Also, add a short
note on how to test bzip2 support.
Pointy hat to: obrien
Correct comment (libz -> libbz2) and remove useless full path to zutil.h
while I am here.
long (and unsigned long long) to long double conversions.
- Add a parameter that specifies the position of the sign bit to the _QP_TTOQ
macro, previously it always looked at bit 31. Pass a negative number to
disable sign inspection for unsigned types. This fixes _Qp_xtoq(),
_Qp_uitoq() and _Qp_uxtoq().
- In the functions __fpu_itof() and __fpu_xtof(), look at the sign bit to
decide whether we're doing a conversion from an unsigned type. If so, don't
negate the mantissa if the integer exceeds the biggest signed number.
PR: 55773
Patch by: Stephen Paskaluk (based upon)
MFC after: 2 weeks
and restoring the metadata. In particular, the metadata-restore
functions now all accept a file descriptor and a pathname. If the
file descriptor is set and the platform supports the appropriate
syscall, restore the metadata through the file descriptor. Otherwise,
restore it through the pathname. This is complicated by varying
syscall support (FreeBSD has an fchmod(2) but no fchflags(2), for
example) and because non-file entries don't have an fd to use in
restoring attributes (for example, mknod(2) doesn't return a file
handle).
MFC after: 14 days
bzip2 support provided, and amd64 depended on. Amd64 has a custom
${.OBJDIR}/machine symlink in it and the -I. picked this up. Without
it, the libstand code was being compiled in 32 bit mode, but with 64 bit
machine headers.
that use SSE. The compiler does attempt to do this in main() but not very
successfully - it still manages to use unaligned offsets from %ebp in some
cases. Also we need to have an aligned stack in case something uses SSE
via _init().
MFC After: 1 week
RFC 2553. In XNS5.2, and subsequently in POSIX-2001 and RFC
3493, it was changed to a socklen_t. And, the n_net of a
struct netent used to be an unsigned long integer. In XNS5,
and subsequently in POSIX-2001, it was changed to an uint32_t.
To accomodate for this while preserving ABI compatibility with
the old interface, we need to prepend or append 32 bits of
padding, depending on the (LP64) architecture's endianness.
- Correct 1st argument of getnetbyaddr() to uint32_t on 32
bit arch. Stay as is on 64 bit arch for ABI backward
compatibility for now.
Reviewed by: das, peter
MFC after: 2 weeks
Reviewed by: rwatson at freebsd dot org
Approved by: rwatson at freebsd dot org
MFC after: 1 week
Fix the matchlen() function so that it handles the IPv4 (AF_INET)
case correctly. Until now it has been treating IPv4 addresses
as if they were IPv6 which could lead to corruption errors.
return the buffer immediately. This will permit ssh and/or PAM logins
broken by previous commit.
The (potential) underlying problem is still under investigation.
Point hat to: me
different from what has been offered in libc_r (the one spotted in the
original PR which is found in libthr has already been removed by David's
commit, which is rev. 1.44 of lib/libthr/thread/thr_private.h):
- Use POSIX standard prototype for ttyname_r, which is,
int ttyname_r(int, char *, size_t);
Instead of:
char *ttyname_r(int, char *, size_t);
This is to conform IEEE Std 1003.1, 2004 Edition [1].
- Since we need to use standard errno for return code, include
errno.h in ttyname.c
- Update ttyname(3) implementation according to reflect the API
change.
- Document new ttyname_r(3) behavior
- Since we already make use of a thread local storage for
ttyname(3), remove the BUGS section.
- Remove conflicting ttyname_r related declarations found in libc_r.
Hopefully this change should not have changed the API/ABI, as the ttyname_r
symbol was never introduced before the last unistd.h change which happens a
couple of days before.
[1] http://www.opengroup.org/onlinepubs/009695399/functions/ttyname.html
Requested by: Tom McLaughlin <tmclaugh sdf lonestar org>
Through PR: threads/76938
Patched by: Craig Rodrigues <rodrigc crodrigues org> (with minor changes)
Prompted by: mezz@
primarily of pointless $FreeBSD$ tags), sync most files in HEAD with
those in the ZLIB branch. This minimizes the differences between
HEAD and ZLIB and should simplify future imports.
After this, there are only three files with local modifications
(gzio.c, minigzip.c, and zconf.h) and two non-vendor files
(Makefile, zopen.c). The rest exactly match the vendor distribution.
PR: i386/76294
MFC after: 2 weeks
(symlink or hardlink) is already set. Instead, it was always setting
the hardlink field. In particular, this caused GNU tar format long
symlinks to be interpreted as hardlinks.
Thanks to: Brooks Davis
MFC after: 7 days
negative (in addition to returning EINVAL when called on a descriptor
that is not a socket).
Submitted by: Arne H Juul <arnej@europe.yahoo-inc.com>
PR: docs/80587
(((truncate to zero length) or (create)) (text file)) (for writing)
and not
((truncate file to zero length) or (create text file)) (for writing)
MFC after: 1 week
- Use /*- instead of /* for copyright section
- Include unistd.h for prototype of it
- Sort and separate includes as described in style(9)
- ANSIfy the function defination
- Use const for the traversing iterator
Have pmcstat(8) and pmccontrol(8) use these APIs.
Return PMC class-related constants (PMC widths and capabilities)
with the OP GETCPUINFO call leaving OP PMCINFO to return only the
dynamic information associated with a PMC (i.e., whether enabled,
owner pid, reload count etc.).
Allow pmc_read() (i.e., OPS PMCRW) on active self-attached PMCs to
get upto-date values from hardware since we can guarantee that the
hardware is running the correct PMC at the time of the call.
Bug fixes:
- (x86 class processors) Fix a bug that prevented an RDPMC
instruction from being recognized as permitted till after the
attached process had context switched out and back in again after
a pmc_start() call.
Tighten the rules for using RDPMC class instructions: a GETMSR
OP is now allowed only after an OP ATTACH has been done by the
PMC's owner to itself. OP GETMSR is not allowed for PMCs that
track descendants, for PMCs attached to processes other than
their owner processes.
- (P4/HTT processors only) Fix a bug that caused the MI and MD
layers to get out of sync. Add a new MD operation 'get_config()'
as part of this fix.
- Allow multiple system-mode PMCs at the same row-index but on
different CPUs to be allocated.
- Reject allocation of an administratively disabled PMC.
Misc. code cleanups and refactoring. Improve a few comments.
any query.
- don't query against IPv6 link-local address.
- use IN6_IS_ADDR_V4{MAPPED,COMPAT} macros.
- use memcpy() instead of bcopy().
Inspired by: NetBSD
These are two of the three files that have non-trivial differences from
the vendor branch. minigzip.c is the third, but there were no changes
from ZLib 1.2.1 to ZLib 1.2.2 in that file.
The rest of the files I intend to get reverted back to the vendor
branch (with cooperation of cvsadmin@).
PR: i386/76294
internal error if pax extended attributes were being generated. Being
< 255 characters, the first-pass path editing (to generate a
ustar-compatible name for the main entry) wouldn't occur, and the
second-pass path editing (to generate a ustar name for the pax
attributes entry) assumed the input was already < 245 chars.
The core problem here was using an abbreviated algorithm for the
second pass that relied on the first pass having already run. The
rewritten code is much simpler: It just uses the full path-shortening
algorithm for building both ustar pathnames. This way, the second
ustar pathname will always be short enough.
Thanks to: Mark Cammidge
Related to: bin/74385
us when <sys/pmc.h> is included.
o Replace "#if __i386__" and "#if __amd64__" with the equivalent of
"#ifdef __i386__" and "#ifdef __amd64__" (resp.) These tokens are
not defined on all platforms.
o Conditionally compile pmc_parse_mask() on i386 and amd64 only. It's
only referenced there. This will change when support for other
platforms is added, of course.
Ok'd by: jkoshy@
getnameinfo(3). POSIX standard does not require a sa_len field
in sockaddr struct, hence such requirement will cause problem
for portability.
PR: standards/80008
Requested by: Xin Liu <lx@knight.6test.edu.cn>
Reviewed by: freebsd-standards (das)
MFC After: 2 weeks
check the password or group database before attempting to parse as an
integer, as is done for the first {uid,gid} in an identity phrase.
Obtained from: TrustedBSD Project
Sponsored by: SPAWAR, SPARTA
a return instruction. (The latter is discouraged by the Opteron
optimization manual because it disables branch prediction for the return
instruction.)
Reviewed by: bde
* Handles entries with compressed size >2GB (signed/unsigned cleanup)
* Handles entries with compressed size >4GB ("ZIP64" extension)
* Handles Unix extensions (ctime, atime, mtime, mode, uid, etc)
* Format-specific "skip data" override allows ZIP reader to skip
entries without decompressing them, which makes "tar -t"
a lot faster.
* Handles "length-at-end" entries generated by, e.g., "zip -r - foo"
Many thanks to: Dan Nelson, who contributed the code and test files for
the first three items above and suggested the fourth.
libalias.
In /usr/src/lib/libalias/alias.c, the functions LibAliasIn and
LibAliasOutTry call the legacy PacketAliasIn/PacketAliasOut instead
of LibAliasIn/LibAliasOut when the PKT_ALIAS_REVERSE option is set.
In this case, the context variable "la" gets lost because the legacy
compatibility routines expect "la" to be global. This was obviously
an oversight when rewriting the PacketAlias* functions to the
LibAlias* functions.
The fix (as shown in the patch below) is to remove the legacy
subroutine calls and replace with the new ones using the "la" struct
as the first arg.
Submitted by: Gil Kloepfer <fgil@kloepfer.org>
Confirmed by: <nicolai@catpipe.net>
PR: 76839
MFC after: 3 days
implementations inspired by the ones in DragonFly. Unlike the
DragonFly versions, these have a small data cache footprint, and my
tests show that they're never slower than the old code except when the
charset or the span is 0 or 1 characters. This implementation is
generally faster than DragonFly until either the charset or the span
gets in the ballpark of 32 to 64 characters.
these at the moment, but applications that test for them will now
have a better chance of compiling.
I have intentionally omitted errnos that are only good for STREAMS,
since apps that use STREAMS won't compile anyway. The exception is
EPROTO, which was apparently intended for STREAMS, but worth having
anyway because Linux (mis)uses it for other things.
1. fast simple type mutex.
2. __thread tls works.
3. asynchronous cancellation works ( using signal ).
4. thread synchronization is fully based on umtx, mainly, condition
variable and other synchronization objects were rewritten by using
umtx directly. those objects can be shared between processes via
shared memory, it has to change ABI which does not happen yet.
5. default stack size is increased to 1M on 32 bits platform, 2M for
64 bits platform.
As the result, some mysql super-smack benchmarks show performance is
improved massivly.
Okayed by: jeff, mtm, rwatson, scottl
floating-point arithmetic on i386. Now I'm going to make excuses
for why this code is kinda scary:
- To avoid breaking the ABI with 5.3-RELEASE, we can't change
sizeof(fenv_t). I stuck the saved mxcsr in some discontiguous
reserved bits in the existing structure.
- Attempting to access the mxcsr on older processors results
in an illegal instruction exception, so support for SSE must
be detected at runtime. (The extra baggage is optimized away
if either the application or libm is compiled with -msse{,2}.)
I didn't run tests to ensure that this doesn't SIGILL on older 486's
lacking the cpuid instruction or on other processors lacking SSE.
Results from running the fenv regression test on these processors
would be appreciated. (You'll need to compile the test with
-DNO_STRICT_DFL_ENV.) If you have an 80386, or if your processor
supports SSE but the kernel didn't enable it, then you're probably out
of luck.
Also, I un-inlined some of the functions that grew larger as a result
of this change, moving them from fenv.h to fenv.c.
fedisableexcept(), and fegetexcept(). These two sets of routines
provide the same functionality. I implemented the former as an
undocumented internal interface to make the regression test easier to
write. However, fe(enable|disable|get)except() is already part of
glibc, and I would like to avoid gratuitous differences. The only
major flaw in the glibc API is that there's no good way to report
errors on processors that don't support all the unmasked exceptions.
to mistakes from day 1, it has always had semantics inconsistent with
SVR4 and its successors. In particular, given argument M:
- On Solaris and FreeBSD/{alpha,sparc64}, it clobbers the old flags
and *sets* the new flag word to M. (NetBSD, too?)
- On FreeBSD/{amd64,i386}, it *clears* the flags that are specified in M
and leaves the remaining flags unchanged (modulo a small bug on amd64.)
- On FreeBSD/ia64, it is not implemented.
There is no way to fix fpsetsticky() to DTRT for both old FreeBSD apps
and apps ported from other operating systems, so the best approach
seems to be to kill the function and fix any apps that break. I
couldn't find any ports that use it, and any such ports would already
be broken on FreeBSD/ia64 and Linux anyway.
By the way, the routine has always been undocumented in FreeBSD,
except for an MLINK to a manpage that doesn't describe it. This
manpage has stated since 5.3-RELEASE that the functions it describes
are deprecated, so that must mean that functions that it is *supposed*
to describe but doesn't are even *more* deprecated. ;-)
Note that fpresetsticky() has been retained on FreeBSD/i386. As far
as I can tell, no other operating systems or ports of FreeBSD
implement it, so there's nothing for it to be inconsistent with.
PR: 75862
Suggested by: bde
particularly good reason to do this, except that __strong_reference
does type checking, whereas __weak_reference does not.
On Alpha, the compiler won't accept a 'long double' parameter in
place of a 'double' parameter even thought the two types are
identical.
an invalid exception and return an NaN.
- If a long double has 113 bits of precision, implement fma in terms
of simple long double arithmetic instead of complicated double arithmetic.
- If a long double is the same as a double, alias fma as fmal.
identical to scalbnf, which is now aliased as ldexpf. Note that the
old implementations made the mistake of setting errno and were the
only libm routines to do so.
- Add nexttoward{,f,l} and nextafterl. On all platforms,
nexttowardl is an alias for nextafterl.
- Add fmal.
- Add man pages for new routines: fmal, nextafterl,
nexttoward{,f,l}, scalb{,l}nl.
Note that on platforms where long double is the same as double, we
generally just alias the double versions of the routines, since doing
so avoids extra work on the source code level and redundant code in
the binary. In particular:
ldbl53 ldbl64/113
fmal s_fma.c s_fmal.c
ldexpl s_scalbn.c s_scalbnl.c
nextafterl s_nextafter.c s_nextafterl.c
nexttoward s_nextafter.c s_nexttoward.c
nexttowardf s_nexttowardf.c s_nexttowardf.c
nexttowardl s_nextafter.c s_nextafterl.c
scalbnl s_scalbn.c s_scalbnl.c
sparc64's 128-bit long doubles.
- Define FP_FAST_FMAL for ia64.
- Prototypes for fmal, frexpl, ldexpl, nextafterl, nexttoward{,f,l},
scalblnl, and scalbnl.
- In scalbln and scalblnf, check the bounds of the second argument.
This is probably unnecessary, but strictly speaking, we should
report an error if someone tries to compute scalbln(x, INT_MAX + 1ll).
nexttowardl. These are not needed on machines where long doubles
look like IEEE-754 doubles, so the implementation only supports
the usual long double formats with 15-bit exponents.
Anything bizarre, such as machines where floating-point and integer
data have different endianness, will cause problems. This is the case
with big endian ia64 according to libc/ia64/_fpmath.h. Please contact
me if you managed to get a machine running this way.
that are intended to raise underflow and inexact exceptions.
- On systems where long double is the same as double, nextafter
should be aliased as nexttoward, nexttowardl, and nextafterl.
bit in a long double. For architectures that don't have such a bit,
LDBL_NBIT is 0. This makes it possible to say `mantissa & ~LDBL_NBIT'
in places that previously used an #ifdef to select the right expression.
The optimizer should dispense with the extra arithmetic when LDBL_NBIT
is 0 anyway.
- Add an XXX comment for the big endian case.
bit in a long double. For architectures that don't have such a bit,
LDBL_NBIT is 0. This makes it possible to say `mantissa & ~LDBL_NBIT'
in places that previously used an #ifdef to select the right expression.
The optimizer should dispense with the extra arithmetic when LDBL_NBIT
is 0.
Symptoms of the problem included assembler warnings and
nondeterministic runtime behavior when a fe*() call that affects the
fpsr is closely followed by a float point op.
The bug (at least, I think it's a bug) is that gcc does not insert a
break between a volatile asm and a dependent instruction if the
volatile asm came from an inlined function. Volatile asms seem to be
fine in other circumstances, even without -mvolatile-asm-stop, so
perhaps the compiler adds the stop bits before inlining takes place.
The problem does not occur at -O0 because inlining is disabled, and it
doesn't happen at -O2 because -fschedule-insns2 knows better.
a libalias application (e.g. natd, ppp, etc.) to crash. Note: Skinny support
is not enabled in natd or ppp by default.
Approved by: secteam (nectar)
MFC after: 1 day
Secuiryt: This fixes a remote DoS exploit
any pending HTTP request rather than calling shutdown(2) with SHUT_WR.
This makes libfetch (and thus fetch(1)) work again with Squid proxies
configured to not allow half-closed connections.
Reported by: Pawel Worach (pawel.worach AT telia DOT com)
the lock is held by other thread, but not when nobody owns it. According
to deischen@, this part of code will never be hit in our threads
library, since it does not use locks without wait/wakeup functions.
Spotted by: mingyanguo via ChinaUnix.net forum
Reviewed by: deischen
surrounding the undef'ing it. It does not seem necessary to
undef some symbol that is not exist, and gcc does not complain
about whether a symbol is exist before #undef'ing it out.
Spotted by: mingyanguo via ChinaUnix.net forum
Reviewed by: phk
it type and endian clean and removing of stdio dependency from NLS
functions (catalog files now are processed via mmap())
Also following changes were done (against NetBSD version):
. If mmap() failed, set errno to EINVAL and do not try to munmap() file
Obtained from: NetBSD
. Replace inclusion of sys/param.h to sys/cdefs.h and sys/types.h where
appropriate.
. move _*_init() prototypes to mblocal.h, and remove these prototypes
from .c files
. use _none_init() in __setrunelocale() instead of duplicating code
. move __mb* variables from table.c to none.c allowing us to not to
export _none_*() externs, and appropriately remove them from mblocal.h
Ok'ed by: tjr
introducing the disk formats for _RuneLocale and friends.
The disk formats do not have (useless) pointers and have 32-bit
quantities instead of rune_t and long. (htonl(3) only works
with 32-bit quantities, so there's no loss).
Bootstrap mklocale(1) when necessary. (Bootstrapping from 4.x
would be trivial (verified), but we no longer provide pre-5.3
source upgrades and this is the first commit to actually break
it.)
inputs. The trouble with replacing two floats with a double is that
the latter has 6 extra bits of precision, which actually hurts
accuracy in many cases. All of the constants are optimal when float
arithmetic is used, and would need to be recomputed to do this right.
Noticed by: bde (ucbtest)
return a generic text message instead.
(Someday, I'll track down all the places that
are generating errors but not recording messages. ;-/
Thanks to: Jaakko Heinonen
results in a performance gain on the order of 10% for amd64 (sledge),
ia64 (pluto1), i386+SSE (Pentium 4), and sparc64 (panther), and a
negligible improvement for i386 without SSE. (The i386 port still
uses the hardware instruction, though.)
changed to use the statclock. Make sure we calculate the value
of a tick correctly in userland.
Noticed by: Kazuaki Oda <kaakun at highway dot ne dot jp>
occurred with large read-ahead requests. This only affected
formats that incorrectly make large requests (ZIP did this until
recently) or with block sizes over 32k.
When reading the bodies of Zip archive entries, request a minimum of 1
byte, rather than a minimum of the full entry size. This is faster
(since it does not force the decompression layer to combine reads) and
works around a bug in the "none" decompression handler (which I'm
testing a separate fix for now). I've also renamed "bytes_read" to
"bytes_avail" in several places to more accurately reflect that the
value returned from (a->compression_read_ahead) is the number of bytes
available, not necessarily the number of bytes requested.
(padding) entries, extract inode value from PX entry, recognize SP and
ST (start/end of SUSP extensions).
I don't enforce SP yet, as I've seen CDROMs which use Rockridge
extensions but don't have the SP record (which is officially
required).
The ISO9660 support is now mature enough to extract FreeBSD
distribution CDROMs created with mkisofs.
- Remove all:. It's redundant, and ${LIB} in it is just a bug.
- Remove .ORDER:. *.mgc files can safely be built in parallel.
- Remove PITA. The mkmagic tool is smart to put the binary file
into the current directory (${.OBJDIR}) even if the source file
lives somewhere else, which is just what we need.
manpages. They are not very related, so separating them makes it
easier to add meaningful cross-references and extend some of the
descriptions.
- Move the part of math(3) that discusses IEEE 754 to the ieee(3)
manpage.
Only supports "deflate" and "none" compression for now.
Also, add a few clarifications to the archive_read.3 manpage as
requested by William Dean DeVries.
copy the acquired TGT from the in-memory cache to the on-disk cache
at login. This was documented but un-implemented behavior.
MFC after: 1 week
PR: bin/64464
Reported and tested by: Eric van Gyzen <vangyzen at stat dot duke dot edu>
FreeBSD currently implements the most up to date IPv6 APIs for
option and route header parsing. This checkin marks the older APIs
as deprecated and points the reader to the newer pages.
Reviewed by: Jun-ichiro Itojun
Approved by: rwatson (mentor)
- Rearrange the list of functions into categories.
- Remove the ulps column. It was appropriate for only some
of the functions in the list, and correct for even fewer
of them.
- Add some new paragraphs, and remove some old ones about
NaNs that may do more harm than good.
- Document precisions other than double-precision.
- Although ldexp() is in libc for backwards compatibility, ldexpf() is
in its proper place in libm. Document both as being in libm.
- The ldexp() and ldexpf() functions conform to C99.
- Neither frexp() nor frexpf() set errno.
- Although frexp() is in libc for backwards compatibility, frexpf() is
in its proper place in libm. Document both as being in libm.
- The frexp() and frexpf() functions conform to C99.
Reviewed by: Kame Project (including Itojun-san, Jinmei-san and Suzuki-san)
Approved by: Robert Watson (robert at freebsd dot org)
Obtained from: Kame Project and OpenBSD
Replace manual pages that may have violated the IETF's Copyright.
All come from the Kame tree.
Several were from OpenBSD except for ip6.4, and the inet6* pages which were
rewritten by me.
All of the text is new and drawn from reading the code and
documentation.
Approved by: Robert Watson (robert at freebsd dot org)
Remove files in preparation for replacement with totally new versions
of the manual pages.
Update the Makefile to handle the new file to be added.
scalbn() implementation from libm. (The two functions are defined to
be identical, but ldexp() lives in libc for backwards compatibility.)
The old ldexp() implementation...
- was more complicated than this one
- set errno instead of raising FP exceptions
- got some corner cases wrong
(e.g. ldexp(1.0, 2000) in round-to-zero mode)
The new implementation lives in libc/gen instead of
libc/$MACHINE_ARCH/gen, since we don't need N copies of a
machine-independent file. The amd64 and i386 platforms
retain their fast and correct MD implementations and
override this one.
really so.
"If the value of base is 16, the characters 0x or 0X may optionally
precede the sequence of letters and digits, following the sign if
present."
Found by: joerg
instead use the FPU to convert subnormals to normals. (NB: Further
simplification is possible, such as using the FPU for the rounding
step.)
This fixes a bug reported by stefanf where long double subnormals in
the Intel 80-bit format would be output with one fewer digit than
necessary when the default precision was used.
for libarchive error messages. Mostly, this
avoids a portability headache related to
copying va_list arguments (some FreeBSD 5
platforms require va_copy; FreeBSD 4 doesn't
support va_copy at all). It also dramatically reduces the
size of libarchive for embedded applications:
a minimal "untar" program using libarchive can now be
under 64k statically linked (as opposed to ~100k
using library *printf() functions).
MFC after: 14 days
Eliminate gdtoa.mk and move its contents to ${MACHINE_ARCH}/Makefile.inc.
The purpose of having a separate file involved an abandoned scheme that
would have kept contrib/gdtoa out of the include path for the rest of libc.
flags, so they are not pure. Remove the __pure2 annotation from them.
I believe that the following routines and their float and long double
counterparts are the only ones here that can be __pure2:
copysign is* fabs finite fmax fmin fpclassify ilogb nan signbit
When gcc supports FENV_ACCESS, perhaps there will be a new annotation
that allows the other functions to be considered pure when FENV_ACCESS
is off.
Discussed with: bde
basically support this, subject to gcc's lack of FENV_ACCESS support.
In any case, the previous setting of math_errhandling to 0 is not
allowed by POSIX.
registers as volatile. Instructions that *wrote* to FP state were
already marked volatile, but apparently gcc has license to move
non-volatile asms past volatile asms. This broke amd64's feupdateenv
at -O2 due to a WAR conflict between fnstsw and fldenv there.
GNU) for determining whether a string is an affirmative or negative
response to a question according to the current locale. This is done
by matching the response against nl_langinfo(3) items YESEXPR and NOEXPR.
make utilities like du(1) 64bit-clean.
When this field is used, one cannot use 'fts_number' and 'fts_pointer'
fields.
This commit doesn't break API nor ABI.
This work is part of the BigDisk project:
http://www.FreeBSD.org/projects/bigdisk/
Discussed on: arch@
MFC after: 5 days
which doesn't end in \n, since it may be very confusing. Also this should
increase consistency, since most other config files work just fine regardless
of the presence of traling \n in the last line.
MFC After: 2 weeks
* Reference-count the directory data so that
we don't leak memory.
* Correctly step through the directory records
(skipping unrecognized extensions)
* Use better defaults for file modes
* Sort directory entries by offset of the end of the file
rather than the beginning of the file. This fixes a
lot of "out-of-order" problems with zero-length files,
in particular.
* Style fixes, remove some debug code, add some error messages.
This seems to be able to extract a TOC and extract files from
the couple of ISO images I've tested it with.
Treat this as experimental proof-of-concept code for the
moment. There are still a bunch of debug messages (there
are a few oddities in ISO9660 that I haven't yet figured
out how to handle), a lot of bugs to be addressed (this
code leaks memory very badly), and a lot of missing features (no
Rockridge support, in particular). I'd appreciate
feedback from anyone who understands ISO9660 format
better than I do. ;-)
Suggested by: Robert Watson
the regular ustar entry. The old code sometimes created
a too-long name that overflowed the ustar fields and triggered
an internal assertion failure. This version should be more
robust.
Thanks to: Michal Listos
Fixes: bin/74385
MFC after: 15 days
a "null pointer".''
Making good use of the excellent explanations sent to me by Ruslan
Ermilov, Garrett Wollman and Bruce Evans, correct the descriptions of
null pointers. They are just "null pointers", not nil, not NULL or
".Dv NULL".
Suggested by: ru, wollman, bde
Reviewed by: ru, wollman
Pointy hat: keramida
ustar fields. Later, we're going to permit numeric extensions
for these fields, so we can support large values here. In particular,
this allows GNU tar to correctly extract such entries even
though it doesn't support the pax extended attributes.
Note: r1.18 and r1.17.2.1 of this file allowed similar treatment
of the uid/gid fields.
Thanks to: Ben Mesander
improves the recognition of hardlink entries
with/without bodies (which is implemented through
a look-ahead that uses the bid function).
MFC after: 7 days
signals instead of having more intricate knowledge of thread state
within signal handling.
Simplify signal code because of above (by David Xu).
Use macros for libpthread usage of pthread_cleanup_push() and
pthread_cleanup_pop(). This removes some instances of malloc()
and free() from the semaphore and pthread_once() implementations.
When single threaded and forking(), make sure that the current
thread's signal mask is inherited by the forked thread.
Use private mutexes for libc and libpthread. Signals are
deferred while threads hold private mutexes. This fix also
breaks www/linuxpluginwrapper; a patch that fixes it is at
http://people.freebsd.org/~deischen/kse/linuxpluginwrapper.diff
Fix race condition in condition variables where handling a
signal (pthread_kill() or kill()) may not see a wakeup
(pthread_cond_signal() or pthread_cond_broadcast()).
In collaboration with: davidxu
sigdelset(3) and sigismember(3) were killed about five years ago.
o The functions (specifically sigismember(3)) could return -1 and
set errno.
PR: bin/75156
Obtained from: NetBSD
MFC after: 2 weeks
o Bump the date of the document.
device numbers. In particular, this should fix
a bug where archiving a device node with a very
large minor number would sometimes overflow and
corrupt the major number.
Thanks to: Ben Mesander
MFC after: 7 days
http://www.opengroup.org/onlinepubs/009695399/functions/swab.html
the prototype for swab() should be in <unistd.h> and not in <string.h>.
Move it, and update to match SUS. Leave the prototype in string.h for
now, for backwards compat.
PR: 74751
Submitted by: Craig Rodrigues <rodrigc@crodrigues.org>
Discussed with: das
ID ranges that consist of exactly one attribute ID. libsdp(3) will check start
and end of the attribute ID range and if they are the same the range will be
collapsed to one atribute ID.
The problem was observed on Audiovox SMT5600 and Palm Treo 650.
MFC after: 3 days
regular 'ustar' entry, use narrow-character version,
not wide-character version, as the ustar entry always
uses the narrow-character filename.
Thanks to: Michal Listos
Inspired by, but doesn't fix: bin/74385
operation (by subtracting the absolute result from 0), don't test
for overflow.
This avoids an arithmetic exception when dividing LONG_MIN by 1:
This is the only case that causes overflow, and the resulting value
is correct under 2's compliment arithmetic.
PR: 72024
Approved by: dwmalone@
Obtained from: NetBSD
MFC after: 4 days
reading past 'stop' in various places when converting multibyte characters.
Reading too far caused truncation to not be detected when it should have
been, eventually causing regexec() to loop infinitely in with certain
combinations of patterns and strings in multibyte locales.
PR: 74020
MFC after: 4 weeks
is supported.
-Document the new more preferred syntax
-Add examples for the new syntax
-Add a note that the old syntax will be deprecated in the future.
Reviewed by: rwatson
the the pax attributes, I shouldn't try using the public
API for finishing out the attribute entry, either.
This also removes some old dubious state manipulations.
because the code was using the external API
(archive_write_data) and assuming internal
error-return conventions. Use the internal
API for writing data.
Thanks to: Joe Marcus Clarke
If turned on no NIS support and related programs will be built.
Lost parts rediscovered by: Danny Braniss <danny at cs.huji.ac.il>
PR: bin/68303
No objections: des, gshapiro, nectar
Reviewed by: ru
Approved by: rwatson (mentor)
MFC after: 2 weeks
seed, the random number generator rand(3) still sucks and is unlikely
sufficient for crypto use. Correct what appears to be a cut and paste
error from the srandomdev() man page.
Submitted by: Ben Mesander
run as a 32 bit support library for an amd64 kernel. 32 bit consumers of
libthr have zero chance of running on an amd64 kernel since we don't
implement the i386_set_ldt() family of functions. Note that this commit
doesn't make it actually work, it just removes one more obstacle.
can't use the i386_set_ldt() family of routines, because they are not
implemented. Instead, use the recently exposed direct access sysarch
routines for setting what %fs and %gs point to.
Use this for the i386 TLS _set_tp() routine, but only when compiling to
run as a 32 bit support binary for amd64 kernels.
syslog(3) if we are a priveleged program (sshd, su, etc.).
- Make syslogd open an additional socket /var/run/logpriv, with 0600
permissions.
- In libc, try to use this socket.
- Do not loop forever if we are using this socket (partial backout of 1.31)
Reviewed by: dwmalone, Andrea Campi <andrea webcom it>
Approved by: julian (mentor)
MFC after: 1 month
_(use space as padding), and 0(zero padding).
These GNU extensions are widely used ones that is worthy for us to
have.
Discussed with: stefanf, roam, -current
Approved by: murray
Prodded by: ports/72722, ports/72723
MFC After: 1 month
packages expect and seems to be most correct according to the slightly-
ambiguous standards.
MFC after: 1 month
Corroborated by: POSIX <http://tinyurl.com/4uvub>
Reviewed by: silence on threads@
the GPT partition on i386 and adm64 as type=gpt, subtype=0 and with the
sname set to the UUID. This prevents sysinstall from bombing out. This
also makes sure the GPT partition shows up in sysinstall so as to avoid
accidental "clobberage".
PR: bin/72896
file. In particular, this allows bsdtar to append (-r) to
an empty file.
Thanks to: Ryan Sommers
While I'm here, straighten out a misleading comment about GNU-compatible
sparse file handling.
put DEAD thread on GC list, this closes a race between pthread_join
and thr_cleanup.
2. Introduce a mutex to protect tcb initialization, tls allocation and
deallocation code in rtld seems no lock protection or it is broken,
under stress testing, memory is corrupted.
Reviewed by: deischen
patch partly provided by: deischen
probably means it also requires a .so version
bump. Defer it until I finish some related
work on cleaning up error returns throughout
the library.
Thanks to: Conrad J. Sabatier
no userland locks are heald, the dead thread lock can no longer protect
access to it. Therefore, instead of using an if (!dead)...else clause
after walking the active threads list test the thread pointer before
deciding not to walk the dead threads list. If the thread pointer is null
it means it was not found in the active threads list and the dead threads
list should be checked.
2. Do not free the stack of a thread that is not marked dead. This is the
2nd and final part of eliminating the race to free a thread's stack.
MFC after: 3 days
Extract the struct cdev pointer and the tty device from inside rather than
incorrectly casting the 'struct cdev *' pointer to a 'dev_t' int. Not
that this was particularly important since it was only used for reading
vmcore files.
- Make some minor rearrangements in the introduction.
- Mention the problem with argument reduction on i386.
- Add recently-implemented functions to the table.
- Un-document the error bounds that only apply to the old 4BSD math
library, and fill in the correct values where I know them. No
attempt has been made to document bounds lower than 1 ulp, although
smaller bounds are usually achievable in round-to-nearest mode.
Requested by: bde
o Remove unneeded sys/types.h and netinet/in.h from the synopsis and
the example.
o We do have struct in_addr in arpa/inet.h, so no need for netinet/in.h.
o Mention where AF_* constants defined are.
Educated by: bde
- Add a comment noting that the ru_[us]times values being read aren't
actually valid and need to be computed from the raw values.
Submitted by: many (1)
After some discussion the best option seems to be to signal the thread's
death from within the kernel. This requires that thr_exit() take an
argument.
Discussed with: davidxu, deischen, marcel
MFC after: 3 days
/lib/{libm,libreadline}
/usr/lib/{libhistory,libopie,libpcap}
in preparation for doing the same thing to RELENG_5. HUGE amounts of
help for determining what to bump provided by kris.
Discussed on: freebsd-current
Approved by: re (not required for commit but something like this should be)
the signal mask and pending signals of the calling thread. These
are stored in userland in libpthread.
There is a small race condition in this patch which could cause
problems if a signal arrives after setting the (kernel) signal
mask and before exec'ing. The thread's set of pending signals
also are not yet installed in the exec'd process. Both of these
will be corrected with the addition of a special syscall.
Reported & Tested by: Joost Bekkers <joost at jodocus dot org>
Reviewed by: julian, davidxu
1. Install man files and links for the lwres library.
2. Fix the path in various files to say /etc/namedb/ instead of just /etc.
3. Correctly install the conf file man pages for named and rndc.
to match how similar syntax is used in the ports system. Thanks to kris
for pointing out my mistake here.
Install the lwres library unless the user defines NO_BIND, or the new
knob, NO_BIND_LIBS_LWRES. There is at least one potential customer
for this library in the wings. Thanks to nectar for the reminder.
but have a knob (WANT_BIND_LIBS) to build and install them in /usr/lib
and /usr/include. Rumors are that this may be useful at a later point,
let's see.
What this really means is that all BIND libraries are now internal to
buildworld (by default, unless WANT_BIND_LIBS is defined), and linked
statically into various BIND executables.
While here, removed redundant -I's from CFLAGS in lib/bind makefiles.
Sponsored by: des
OK'ed by: dougb
__isnan() and __isnanf() must remain in libc for hysterical raisins.
On the other hand, __isnanl() must live in libm because libm uses it
internally and can't depend on older versions of libc to provide it.
Fortunately, we don't need __isnanl() in both libraries.
Prodded by: ale
PR: 71698
MT5 candidate
specified mutex is invalid. In spec parlance 'MAY FAIL' means it's
up to the implementor. So, remove the check for NULL pointers for two
reasons:
1. A mutex may be invalid without necessarily being NULL.
2. If the pointer to the mutex is NULL core-dumping in the
vicinity of the problem is much much much better than failing
in some other part of the code (especially when the application
doesn't check the return value of the function that you oh so
helpfully set to EINVAL).
full KSE support still have -lpthread as an alias for -lc_r. The only
thing that's different is the name of the knob that turns it off.
Pointed out by: ru@
POSIX threads libraries are not available. Add crypto support if
the crypto libraries are available. Build dnssec-{keygen,signzone}
if crypto is available.
Submitted by: (in part) dougb@
1. The correct cutoff for large uid/gid handling is 1<<18, not 1<<20.
2. Limit the uid/gid in the 'x' extension header (where numeric extensions
are not permitted) to 1<<18, but use the correct value in the regular
header (where numeric extensions are permitted).
Thanks to: Dan Nelson
MFC after: 3 days
caused by refering broken (uninitialized?) pointer which is retrieved
from __bt_new() (and from mpool_new()).
I don't know why this linp[0] is read before stored because this
should be controlled by .lower and .upper member of PAGE structure
which are correctly initialized.
But this workaround fixes the problem on my environment and this
module has #ifdef PURIFY option which initializes new and reused
memory from mpool by memset(p, 0xff, size) like as I did.
Please feel free to fix the real bug instead of my workaround.
transaction id from the request, this is useful for debugging.
Fix the autoh_freeall(3) function to properly free the array of
auto handles. Before it was freeing individual members of the list
OK, however it was then advancing the pointer and freeing the wrong
data for the whole list.
multibyte character support:
- In CHadd(), avoid writing past the end of the character set bitmap when
the opposite-case counterpart of wide characters with values less than
NC have values greater than or equal to NC.
- In CHaddtype(), fix a braino that caused alphabetic characters to be
added to all character classes! (but only with REG_ICASE)
PR: 71367
but with slightly cleaned up interfaces.
The KSE structure has become the same as the "per thread scheduler
private data" structure. In order to not make the diffs too great
one is #defined as the other at this time.
The KSE (or td_sched) structure is now allocated per thread and has no
allocation code of its own.
Concurrency for a KSEGRP is now kept track of via a simple pair of counters
rather than using KSE structures as tokens.
Since the KSE structure is different in each scheduler, kern_switch.c
is now included at the end of each scheduler. Nothing outside the
scheduler knows the contents of the KSE (aka td_sched) structure.
The fields in the ksegrp structure that are to do with the scheduler's
queueing mechanisms are now moved to the kg_sched structure.
(per ksegrp scheduler private data structure). In other words how the
scheduler queues and keeps track of threads is no-one's business except
the scheduler's. This should allow people to write experimental
schedulers with completely different internal structuring.
A scheduler call sched_set_concurrency(kg, N) has been added that
notifies teh scheduler that no more than N threads from that ksegrp
should be allowed to be on concurrently scheduled. This is also
used to enforce 'fainess' at this time so that a ksegrp with
10000 threads can not swamp a the run queue and force out a process
with 1 thread, since the current code will not set the concurrency above
NCPU, and both schedulers will not allow more than that many
onto the system run queue at a time. Each scheduler should eventualy develop
their own methods to do this now that they are effectively separated.
Rejig libthr's kernel interface to follow the same code paths as
linkse for scope system threads. This has slightly hurt libthr's performance
but I will work to recover as much of it as I can.
Thread exit code has been cleaned up greatly.
exit and exec code now transitions a process back to
'standard non-threaded mode' before taking the next step.
Reviewed by: scottl, peter
MFC after: 1 week
denote a directory. Unfortunately, in the presence of GNU or POSIX
extensions, this code was checking the truncated filename stored in the
regular header rather than the full filename stored in the extended
attribute. As a result, long filenames with '/' in just the right
position would trigger this check and be erroneously marked as
directories. Move the check so it only considers the full filename.
Note: the check can't simply be disabled for archives that contain
these extensions because there are some very broken archivers out
there.
Thanks to: Will Froning
MFC after: 3 days
since otherwise the initial seek offset will contain the directory
offset of the filesystem block that contained its directory entry.
This bug was mostly harmless because typically the directory is
less than one filesystem block in size so the offset would be zero.
It did however generally break loading a kernel from the (large)
kernel compile directory.
Also reset the seek pointer when a new inode is opened in read_inode(),
though this is not actually necessary now because all callers set
it afterwards.
By using r8 instead of r14 to do the swap, we put the dst argument
in the return register. Since bcopy() doesn't clobber r8, we don't
have to do anything else. This fixes ports/textproc/aspell.
documenting the obsoleteness of the msync(2) syscall and its single
remaining purpose.
PR: 70916
Submitted by: Radim Kolar <hsn@netmag.cz>
MFC after: 3 days
.h files. This simplifies the Makefile here a bit and makes it behave
better in a couple of situations. While I'm here, clean up some comments
and try to improve the organization a bit.
Thanks to: Ruslan Ermilov (The Marvelous Makefile Guru)
mpool_open(3) - it is *not* really used for synchronization; in fact,
it is not used at all.
PR: 70929
Submitted by: Martin Kammerhofer <dada@sbox.tugraz.at>
MFC after: 3 days
This should provide a big performance boost for folks using NIS or LDAP.
MFC after: 3 days
Thanks to: Jun Kuriyama (for reminding me that this was still on my TODO list)
This closes a security hole. Otherwise, libarchive will happily
extract into directories to which it lacks write permissions by
resetting the permissions during the extract.
Thanks to: Kris Kennaway
_mcount() stub when profiling is enabled. Emit this code sequence
for assembly routines as welli (MCOUNT definition in <machine/asm.h>.
We do not pass the GOT entry however as the 4th argument, because it's
not used. The _mcount() stub calls __mcount(), which does the actual
work. Define _MCOUNT_DECL to define __mcount. We do not have an
implementation of mcount(), so we define MCOUNT as empty, but have a
weak alias to _mcount() in _mcount.S.
Note that the _mcount() stub in the kernel is slightly different from
the stub in userland. This is because we do not have to worry about
nested routines in the kernel.
64 bit systems, years roughly -2^31 through 2^31 can be represented in
time_t without any trouble. 32 bit time_t systems only range from
roughly 1902 through 2038. As a consequence, none of the date munging
code for all the various calendar tweaks before then is present. There
are other problems including the fact that there was no 'year zero' and
so on. So rather than get excited about trying to figure out when the
calendar jumped by two weeks etc, simply disallow negative (ie: prior to
1900) years.
This happens to have an important side effect. If you bzero a 'struct
tm', it corresponds to 'Jan 0, 1900, 00:00 GMT'. This happens to be
representable (after canonification) in 64 bit time_t space. Zero tm
structs are generally an error and mktime normally returns -1 for them.
Interestingly, it tries to canonify the 'jan 0' to 'dec 31, 1899', ie:
year -1. This conveniently trips the negative year test above, which
means we can trivially detect the null 'tm' struct.
This actually tripped up code at work. :-/ (Don't ask)
GP register, because it's clobbered for calls across load modules. The
previous commit inserted the call to _init_tls() between the call to
atexit() and the restoration of the GP register clobbered by it. Fix:
restore GP before we call _init_tls().
Pointy hat: dfr@
19 column positions wide in the first line and 20 in the rest of the lines.
This fixes the example to provide the correct output.
PR: 53454
Noticed by: Kuang-che Wu <kcwu@kcwu.homeip.net>
Submitted by: Marc Silver <marcs@draenor.org>
Approved by: re (scottl)
simple errx() function.
Improve behavior when bzlib/zlib are missing by detecting and
issuing an error message on attempts to read gzip/bzip2 compressed
archives.
a knob to force process scope threads. If the environment variable
LIBPTHREAD_PROCESS_SCOPE is set, force all threads to be process
scope threads regardless of how the application creates them. If
LIBPTHREAD_SYSTEM_SCOPE is set (forcing system scope threads), it
overrides LIBPTHREAD_PROCESS_SCOPE.
$ # To force system scope threads
$ LIBPTHREAD_SYSTEM_SCOPE=anything threaded_app
$ # To force process scope threads
$ LIBPTHREAD_PROCESS_SCOPE=anything threaded_app
if it is true.
2.Add thread_db api td_thr_tls_get_addr to get tls address, the real code
is commented out util tls patch is committed.
Reviewed by: deischen
and close it) and "finish" (destroy the object) functions. For backwards
compat and simplicity, have "finish" invoke "close" transparently if needed.
This allows clients to close the archive and check end-of-operation
statistics before destroying the object.
LIBPTHREAD_SYSTEM_SCOPE in the environment.
You can still force libpthread to be built in strictly 1:1 by
adding -DSYSTEM_SCOPE_ONLY to CFLAGS. This is kept for archs
that don't yet support M:N mode.
Requested by: rwatson
Reviewed by: davidxu
is present for FreeBSD. If you "make distfile" on FreeBSD, you will
soon have a tar.gz file suitable for deploying to other systems
(complete with the expected "configure" script, etc). This latter
relies (at least for now) on the GNU auto??? tools. (I like autoconf
okay, but someday I hope to write a custom Makefile.in and dispense
with automake, which is somewhat odious.)
As part of this, I've cleaned up some of the conditional
compilation options, added make-foo to construct archive.h dynamically
(it now contains some version constants), and added some useful
informational files.
mode bits when setting permissions from ACL data.
Thanks to: David Gilbert for first reporting this and
Jimmy Olgeni for noticing that it only occurred on
ACL-enabled filesystems.
of releases. The -DNOCRYPT build option still exists for anyone who
really wants to build non-cryptographic binaries, but the "crypto"
release distribution is now part of "base", and anyone installing from a
release will get cryptographic binaries.
Approved by: re (scottl), markm
Discussed on: freebsd-current, in late April 2004
the chunk will never be added to the list in that case. Use type mbr
for GPT nested MBRs and use type part for any partition we don't know
or care about. Since the subtype is 0, this should not cause confusion.
libc. The externally-visible effect of this is to add __isnanl() to
libm, which means that libm.so.2 can once again link against libc.so.4
when LD_BIND_NOW is set. This was broken by the addition of fdiml(),
which calls __isnanl().
functions. Basically, the ip_next() function was used to get the PPTP and
Skinny headers when tcp_next() should have been used instead. Symptoms of
this included a segfault in natd when trying to process a PPTP or Skinny
packet.
Approved by: des
example. The externs haven't been needed in about 10 years, so
there's no reason to have them other than for hysterical raisins. And
the California Rasins haven't been around for a long time...
o In the rwlock code: move a duplicated check inside an if..else to after
the if...else clause.
o When initializing a static rwlock move the initialization check
inside the lock.
o In thr_setschedparam.c: When breaking out of the trylock...retry if busy
loop make sure to reset the mtx pointer to null if the mutex is nolonger
in a queue.
code on the existence of the relevant libraries (actually,
the existence of the include files).
This will allow the library to be easily ported to systems
that don't have these libraries. (Of course, it also means
that clients using the library on such systems will not be
able to take advantage of the automatic compression format
detection.)
to describe the 4.4BSD extension of accepting arguments outside the range
of unsigned char. This gives us freedom to remove this extension when we
remove the <rune.h> interface in FreeBSD 6.
These convert plain ASCII characters in-line, making them only slightly
slower than the single-byte ("NONE" encoding) version when processing
ASCII strings.
in the regular ustar header that are overridden by the pax
extended attributes. As a result, it makes perfect sense to
use numeric extensions in the regular ustar header so that readers
that don't understand pax extensions but do understand some other
extensions can still get useful information out of it.
This is especially important for filesizes, as the failure to
read a file size correctly can get the reader out of sync.
This commit introduces a "non-strict" option into the internal
function to format a ustar header. In non-strict mode, the formatter
will use longer octal values (overwriting terminators) or binary
("base-256") values as needed to ensure that large file sizes,
negative mtimes, etc, have the correct values stored in the regular
ustar header.
archive_version: Returns a text string, e.g., "libarchive 1.00.000"
archive_api_version: Returns the SHLIB major version
archive_api_feature: Returns a feature number useful for answering
questions such as "Is this recent enough to do XXX?"
The last two also have macros defined in archive.h, so you can compare
the compile-time and run-time environments. (In particular, you can
compare ARCHIVE_API_VERSION to archive_api_version() to detect library
version mismatches.)
With these in hand, it will soon be time to turn on the
shared-library version of libarchive... stay tuned.
Thanks to: David O'Brien for pointing this out.
Also, add in a few additional portability tweaks and make a few
more things conditional on features (HAVE_XXXX macros) rather
than platform.
In particular, this means we can now correctly read gtar archives that
contain timestamps prior to the start of the Epoch.
Also, make the code in this area more portable. ANSI C99 headers are
not yet ubiquitous (for example, FreeBSD 4 still lacks them), so be
prepared for systems that don't have the INT64_MAX, INT64_MIN, and
UINT64_MAX macros. This version still requires int64_t and uint64_t be
defined (which can be done in archive_platform.h if necessary), but
doesn't require them to be exactly 64 bits.
convenient when the source string isn't null-terminated.
Implement the other conversion functions (mbstowcs(), mbsrtowcs(), wcstombs(),
wcsrtombs()) in terms of these new functions.
for statfs(2). This is false, if the pathname specified
is a regular file, then the information for the file
system that the file lives on will be returned.
Approved by: bmilekic (mentor)
gcc is using. This fixes devstat consumers (like vmstat, iostat,
systat) so they don't print crazy zillion digit numbers for
disk transfers and bandwidth.
According to gcc, long doubles are 64-bits, rather than 128 bits
like the SVR4 ABI spec wants them to be.. Note that MacOSX also treats
long doubles as 64-bits, and not 128 bits, so we are in good company.
Reviewed by: das
Approved by: grehan
- It was added to libc instead of libm. Hopefully no programs rely
on this mistake.
- It didn't work properly on large long doubles because its argument
was converted to type double, resulting in undefined behavior.
notably, this restores some of the contents in thread_db.h as well
as David Xu's copyright notice. This also fixes the includes in
the MD libpthread files which Scott tried to provide a quick fix
for.
Pointy hat: marcel