For copies shorter than 512 bytes, the data is copied using plain
ld/std instructions.
For 512 bytes or more, the copy is done in 3 phases:
Phase 1: copy from the src buffer until it's aligned at a 16-byte boundary
Phase 2: copy as many aligned 64-byte blocks from the src buffer as possible
Phase 3: copy the remaining data, if any
In phase 2, this code uses VSX instructions when available. Otherwise,
it uses ldx/stdx.
Submitted by: Luis Pires <lffpires_ruabrasil.org> (original version)
Reviewed by: jhibbits
Differential Revision: https://reviews.freebsd.org/D15118
Assembly optimization of strncpy for PowerPC64, using double words
instead of bytes to copy strings.
Submitted by: Leonardo Bianconi <leonardo.bianconi_eldorado.org.br> (original version)
Reviewed by: jhibbits
Differential Revision: https://reviews.freebsd.org/D15369
Assembly optimization of strcpy for PowerPC64, using double words
instead of bytes to copy strings.
Submitted by: Leonardo Bianconi <leonardo.bianconi_eldorado.org.br> (original version)
Reviewed by: jhibbits
Differential Revision: https://reviews.freebsd.org/D15368
The rewrite of strcmp in assembly uses an instruction added in PowerISA
2.05, making it SIGILL on CPUs older than the POWER6, such as the PPC970 in
the PowerMac G5. Revert this until we get clang+lld, or retire the in-tree
binutils in favor of newer binutils with IFUNC support, whichever comes
first.
Summary:
Optimize strcmp for powerpc64.
Data is loaded by double words and cmpb intruction is used to find '\0'.
Some performance gain rates between the current and the optimized solution:
String size (bytes) Gain rate
<=8 0.59%
<=16 1.92%
32 3.02%
64 5.60%
128 10.16%
256 18.05%
512 30.18%
1024 42.82%
Submitted by: alexandre.yamashita_eldorado.org.br,
leonardo.bianconi_eldorado.org.br
Differential Revision: https://reviews.freebsd.org/D15220