opnsense-src/sys/amd64/include
Bruce Evans 1a5735873e On amd64, declare sse2_pagezero() and start using it again, but only
for zeroing pages in idle where nontemporal writes are clearly best.
This is almost a no-op since zeroing in idle works does nothing good
and is off by default.  Fix END() statement forgotten in previous
commit.

Align the loop in sse2_pagezero().  Since it writes to main memory,
the loop doesn't have to be very carefully written to keep up.
Unrolling it was considered useless or harmful and was not done on
i386, but that was too careless.

Timing for i386: the loop was not unrolled at all, and moved only 4
bytes/iteration.  So on a 2GHz CPU, it needed to run at 2 cycles/
iteration to keep up with a memory speed of just 4GB/sec.  But when
it crossed a 16-byte boundary, on old CPUs it ran at 3 cycles/
iteration so it gave a maximum speed of 2.67GB/sec and couldn't even
keep up with PC3200 memory.  Fix the alignment so that it keep up with
4GB/sec memory, and unroll once to get nearer to 8GB/sec.  Further
unrolling might be useless or harmful since it would prevent the loop
fitting in 16-bytes.  My test system with an old CPU and old DDR1 only
needed 5+ GB/sec.  My test system with a new CPU and DDR3 doesn't need
any changes to keep up ~16GB/sec.

Timing for amd64: with 8-byte accesses and newer faster CPUs it is
easy to reach 16GB/sec but not so easy to go much faster.  The
alignment doesn't matter much if the CPU is not very old.  The loop
was already unrolled 4 times, but needs 32 bytes and uses a fancy
method that doesn't work for 2-way unrolling in 16 bytes.  Just
align it to 32-bytes.
2016-08-29 13:07:21 +00:00
..
pc Add more UEFI/e820 memory types from latest specifications. 2016-07-24 09:15:11 +00:00
xen x86/xen: Consolidate xen-os.h in a single place 2015-10-21 10:04:35 +00:00
_align.h Merge amd64/i386 _align.h by aligning on the size of register_t (copied 2010-11-26 10:59:20 +00:00
_bus.h
_inttypes.h Copy powerpc/include/_inttypes.h to x86 and replace i386/amd64/pc98 2011-01-08 18:09:48 +00:00
_limits.h Copy amd64 _limits.h to x86 and merge with i386 _limits.h. Replace 2012-02-28 18:24:28 +00:00
_stdint.h Copy amd64 _stdint.h to x86 and merge with i386 _stdint.h. Replace 2012-02-28 18:38:33 +00:00
_types.h Copy amd64 _types.h to x86 and merge with i386 _types.h. Replace existing 2012-02-28 18:15:28 +00:00
acpica_machdep.h Merge acpica_machdep.h for amd64 and i386 and move to x86. In fact, these 2013-08-13 22:05:10 +00:00
apm_bios.h Move identical copies of apm_bios.h to sys/x86/include, replace them with 2010-11-11 19:36:21 +00:00
asm.h Revert r274772: it is not valid on MIPS 2014-11-25 03:50:31 +00:00
asmacros.h Extend earlier addition of stack frames to most of support.S. This makes 2014-11-13 22:11:44 +00:00
atomic.h atomic: Add testandclear on i386/amd64 2016-05-16 07:19:33 +00:00
bus.h bhyve import part 2 of 2, guest kernel changes. 2011-05-14 18:37:24 +00:00
bus_dma.h
clock.h xen: implement an early timer for Xen PVH 2014-03-11 10:20:42 +00:00
counter.h Replace a number of conflations of mp_ncpus and mp_maxid with either 2016-07-06 14:09:49 +00:00
cpu.h amd64/i386: introduce APIC hooks for different APIC implementations. 2014-06-16 08:43:03 +00:00
cpufunc.h For amd64 non-PCID machines, and for i386 machines with support for 2015-12-03 11:14:14 +00:00
cputypes.h Move shared variables from {amd64,i386}/initcpu.c to x86/identcpu.c. 2015-12-23 21:41:42 +00:00
db_machdep.h
dump.h Factor out duplicated code from dumpsys() on each architecture into generic 2015-01-07 01:01:39 +00:00
elf.h Convert machine/elf.h, machine/frame.h, machine/sigframe.h, 2013-02-20 17:39:52 +00:00
endian.h Copy amd64 endian.h to x86 and merge with i386 endian.h. Replace 2012-02-28 19:39:54 +00:00
exec.h
fdt.h Add basic support for FDT to i386 & amd64. This change includes: 2013-05-21 03:05:49 +00:00
float.h Copy amd64 float.h to x86 and merge with i386 float.h. Replace 2012-03-04 14:00:32 +00:00
floatingpoint.h
fpu.h Create a separate structure for per-CPU state saved across suspend and 2014-09-06 15:23:28 +00:00
frame.h Convert machine/elf.h, machine/frame.h, machine/sigframe.h, 2013-02-20 17:39:52 +00:00
gdb_machdep.h
ieeefp.h People porting FreeBSD to new architectures ought not have to 2011-10-21 06:41:46 +00:00
in_cksum.h Rationalize BSD license on sys/*/include/in_cksum.h 2015-08-05 19:05:12 +00:00
intr_machdep.h Fix build for !SMP kernels after the Xen MSIX workaround. 2016-08-22 21:23:17 +00:00
iodev.h MFC r207329, r208716: 2010-06-01 21:19:58 +00:00
kdb.h
limits.h
md_var.h On amd64, declare sse2_pagezero() and start using it again, but only 2016-08-29 13:07:21 +00:00
memdev.h Add reader/writer lock around mem_range_attr_get() and mem_range_attr_set(). 2011-01-17 22:58:28 +00:00
metadata.h Move amd64 metadata.h to x86 and share with i386 2016-01-07 19:47:26 +00:00
minidump.h amd64: introduce minidump version 2 2010-11-11 18:35:28 +00:00
mp_watchdog.h
nexusvar.h
npx.h Move userland bits of i386 npx.h and amd64 fpu.h to x86 fpu.h. 2012-03-16 20:24:30 +00:00
ofw_machdep.h Add basic support for FDT to i386 & amd64. This change includes: 2013-05-21 03:05:49 +00:00
param.h Enable DEVICE_NUMA with up to 8 domains by default on amd64. 2016-04-12 21:23:44 +00:00
pcb.h Export various helper variables describing the layout and size of 2015-11-12 22:00:59 +00:00
pci_cfgreg.h Move {amd64,i386}/pci/pci_bus.c and {amd64,i386}/include/pci_cfgreg.h to 2011-06-22 21:04:13 +00:00
pcpu.h Rewrite amd64 PCID implementation to follow an algorithm described in 2015-05-09 19:11:01 +00:00
pmap.h Add locking annotations to amd64 struct md_page members. 2016-05-10 09:58:51 +00:00
pmc_mdep.h Use single instance of the identical INKERNEL() and PMC_IN_KERNEL() 2015-07-02 14:37:21 +00:00
ppireg.h
proc.h Eliminate pvh_global_lock from the amd64 pmap. 2016-05-14 23:35:11 +00:00
profile.h Use intr_disable() and intr_restore() instead of frobbing the flags register 2010-10-25 15:28:03 +00:00
psl.h Copy i386 psl.h to x86 and replace amd64/i386/pc98 psl.h with stubs. 2012-03-19 21:29:57 +00:00
ptrace.h Copy amd64 ptrace.h to x86 and merge with i386 ptrace.h. Replace 2012-03-04 20:24:28 +00:00
pvclock.h Generalized parts of the XEN timer code into a generic pvclock 2015-02-04 08:26:43 +00:00
reg.h Copy i386 reg.h to x86 and merge with amd64 reg.h. Replace i386/amd64/pc98 2012-03-18 19:06:38 +00:00
reloc.h
resource.h Add support for managing PCI bus numbers. As with BARs and PCI-PCI bridge 2014-02-12 04:30:37 +00:00
runq.h
segments.h Hide struct pcb definition by #ifdef __amd64__ braces. If cc -m32 2013-11-26 19:38:42 +00:00
setjmp.h Copy amd64 setjmp.h to x86 and replace amd64/i386/pc98 setjmp.h with stubs. 2012-02-28 22:17:52 +00:00
sf_buf.h Merge all MD sf_buf allocators into one MI, residing in kern/subr_sfbuf.c 2014-08-05 09:44:10 +00:00
sigframe.h Convert machine/elf.h, machine/frame.h, machine/sigframe.h, 2013-02-20 17:39:52 +00:00
signal.h Convert machine/elf.h, machine/frame.h, machine/sigframe.h, 2013-02-20 17:39:52 +00:00
smp.h Merge common parts of i386 and amd64 md_var.h and smp.h into 2015-12-07 17:41:20 +00:00
specialreg.h Copy i386 specialreg.h to x86 and merge with amd64 specialreg.h. Replace 2012-03-19 21:34:11 +00:00
stack.h Merge stack(9) implementations for i386 and amd64 under x86/. 2015-09-11 03:24:07 +00:00
stdarg.h Copy amd64 stdarg.h to x86 and replace amd64/i386/pc98 stdarg.h with stubs. 2012-02-28 22:30:58 +00:00
sysarch.h Copy amd64 sysarch.h to x86 and merge with i386 sysarch.h. Replace 2012-03-19 21:57:31 +00:00
timerreg.h
trap.h Copy amd64 trap.h to x86 and replace amd64/i386/pc98 trap.h with stubs. 2012-03-04 14:12:57 +00:00
tss.h
ucontext.h Convert machine/elf.h, machine/frame.h, machine/sigframe.h, 2013-02-20 17:39:52 +00:00
varargs.h
vdso.h Implement mechanism to export some kernel timekeeping data to 2012-06-22 07:06:40 +00:00
vm.h Reassign copyright statements on several files from Advanced 2015-04-23 14:22:20 +00:00
vmm.h sys/amd64: Small spelling fixes. 2016-05-03 22:13:04 +00:00
vmm_dev.h Restructure memory allocation in bhyve to support "devmem". 2015-06-18 06:00:17 +00:00
vmm_instruction_emul.h Deprecate the 3-way return values from vm_gla2gpa() and vm_copy_setup(). 2015-05-06 16:25:20 +00:00
vmparam.h Retire VM_FREEPOOL_CACHE as the next step in eliminating PG_CACHE pages. 2015-06-08 04:59:32 +00:00