Commit graph

176 commits

Author SHA1 Message Date
Konstantin Belousov
661722e76f Move sysinit and sysuninit linker sets in the data (writeable) section.
Both sets are sorted in place, and with the introduction of read-only
permissions on the amd64 kernel text, the sorting override depended on
CR0.WP turned off.  Make it correct by moving the sets into writeable
part of the KVA, also fixing boot on machines where hand-off from BIOS
to OS occurs with CR0.WP set.

Based on submission by:	Peter Lei <peter.lei@ieee.org>
MFC after:	1 week
2018-03-21 10:26:39 +00:00
Colin Percival
95e678f96e Use the TSLOG framework to record SYSINIT entry/exit timestamps. 2017-12-31 09:23:02 +00:00
Pedro F. Giffuni
df57947f08 spdx: initial adoption of licensing ID tags.
The Software Package Data Exchange (SPDX) group provides a specification
to make it easier for automated tools to detect and summarize well known
opensource licenses. We are gradually adopting the specification, noting
that the tags are considered only advisory and do not, in any way,
superceed or replace the license texts.

Special thanks to Wind River for providing access to "The Duke of
Highlander" tool: an older (2014) run over FreeBSD tree was useful as a
starting point.

Initially, only tag files that use BSD 4-Clause "Original" license.

RelNotes:	yes
Differential Revision:	https://reviews.freebsd.org/D13133
2017-11-18 14:26:50 +00:00
Conrad Meyer
fe0593d72a Add SI_SUB_TASKQ after SI_SUB_INTR and move taskqueue initialization there for EARLY_AP_STARTUP
This fixes a regression accidentally introduced in r322588, due to an
interaction with EARLY_AP_STARTUP.

Reviewed by:	bdrewery@, jhb@
Sponsored by:	Dell EMC Isilon
Differential Revision:	https://reviews.freebsd.org/D12053
2017-08-16 21:42:27 +00:00
Ian Lepore
2db14f97de Add config_intrhook_oneshot(): schedule an intrhook function and unregister
it automatically after it runs.

The config_intrhook mechanism allows a driver to stall the boot process
until device(s) required for booting are available, by not allowing system
inits to proceed until all intrhook functions have been unregistered.
Virtually all existing code simply unregisters from within the hook function
when it gets called.

This new function makes that common usage more convenient. Instead of
allocating and filling in a struct, passing it to a function that might (in
theory) fail, and checking the return code, now a driver can simply call
this cannot-fail routine, passing just the intrhook function and its arg.

Differential Revision:	https://reviews.freebsd.org/D11963
2017-08-13 18:10:24 +00:00
Konstantin Belousov
97222e17ab Fix TUNABLE_UINT64() on 32bit architectures.
The macro is not used in the tree.

Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
2017-03-27 06:37:03 +00:00
Bjoern A. Zeeb
3f58662dd9 The pr_destroy field does not allow us to run the teardown code in a
specific order.  VNET_SYSUNINITs however are doing exactly that.
Thus remove the VIMAGE conditional field from the domain(9) protosw
structure and replace it with VNET_SYSUNINITs.
This also allows us to change some order and to make the teardown functions
file local static.
Also convert divert(4) as it uses the same mechanism ip(4) and ip6(4) use
internally.

Slightly reshuffle the SI_SUB_* fields in kernel.h and add a new ones, e.g.,
for pfil consumers (firewalls), partially for this commit and for others
to come.

Reviewed by:		gnn, tuexen (sctp), jhb (kernel.h)
Obtained from:		projects/vnet
MFC after:		2 weeks
X-MFC:			do not remove pr_destroy
Sponsored by:		The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D6652
2016-06-01 10:14:04 +00:00
John Baldwin
fdce57a042 Add an EARLY_AP_STARTUP option to start APs earlier during boot.
Currently, Application Processors (non-boot CPUs) are started by
MD code at SI_SUB_CPU, but they are kept waiting in a "pen" until
SI_SUB_SMP at which point they are released to run kernel threads.
SI_SUB_SMP is one of the last SYSINIT levels, so APs don't enter
the scheduler and start running threads until fairly late in the
boot.

This change moves SI_SUB_SMP up to just before software interrupt
threads are created allowing the APs to start executing kernel
threads much sooner (before any devices are probed).  This allows
several initialization routines that need to perform initialization
on all CPUs to now perform that initialization in one step rather
than having to defer the AP initialization to a second SYSINIT run
at SI_SUB_SMP.  It also permits all CPUs to be available for
handling interrupts before any devices are probed.

This last feature fixes a problem on with interrupt vector exhaustion.
Specifically, in the old model all device interrupts were routed
onto the boot CPU during boot.  Later after the APs were released at
SI_SUB_SMP, interrupts were redistributed across all CPUs.

However, several drivers for multiqueue hardware allocate N interrupts
per CPU in the system.  In a system with many CPUs, just a few drivers
doing this could exhaust the available pool of interrupt vectors on
the boot CPU as each driver was allocating N * mp_ncpu vectors on the
boot CPU.  Now, drivers will allocate interrupts on their desired CPUs
during boot meaning that only N interrupts are allocated from the boot
CPU instead of N * mp_ncpu.

Some other bits of code can also be simplified as smp_started is
now true much earlier and will now always be true for these bits of
code.  This removes the need to treat the single-CPU boot environment
as a special case.

As a transition aid, the new behavior is available under a new kernel
option (EARLY_AP_STARTUP).  This will allow the option to be turned off
if need be during initial testing.  I plan to enable this on x86 by
default in a followup commit in the next few days and to have all
platforms moved over before 11.0.  Once the transition is complete,
the option will be removed along with the !EARLY_AP_STARTUP code.

These changes have only been tested on x86.  Other platform maintainers
are encouraged to port their architectures over as well.  The main
things to check for are any uses of smp_started in MD code that can be
simplified and SI_SUB_SMP SYSINITs in MD code that can be removed in
the EARLY_AP_STARTUP case (e.g. the interrupt shuffling).

PR:		kern/199321
Reviewed by:	markj, gnn, kib
Sponsored by:	Netflix
2016-05-14 18:22:52 +00:00
Warner Losh
15be49f5a1 Create wrappers for uint64_t and int64_t for the tunables. While not
strictly necessary, it is more convenient.
2016-04-15 03:09:55 +00:00
Edward Tomasz Napierala
55811a0385 Remove comment obsoleted by r289056.
MFC after:	1 month
Sponsored by:	The FreeBSD Foundation
2015-10-08 21:52:20 +00:00
Edward Tomasz Napierala
0ad53012ed Remove unused SI_SUB_* #defines.
Reviewed by:	kib@
MFC after:	1 month
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D3708
2015-10-08 21:28:06 +00:00
Mark Murray
d1b06863fb Huge cleanup of random(4) code.
* GENERAL
- Update copyright.
- Make kernel options for RANDOM_YARROW and RANDOM_DUMMY. Set
  neither to ON, which means we want Fortuna
- If there is no 'device random' in the kernel, there will be NO
  random(4) device in the kernel, and the KERN_ARND sysctl will
  return nothing. With RANDOM_DUMMY there will be a random(4) that
  always blocks.
- Repair kern.arandom (KERN_ARND sysctl). The old version went
  through arc4random(9) and was a bit weird.
- Adjust arc4random stirring a bit - the existing code looks a little
  suspect.
- Fix the nasty pre- and post-read overloading by providing explictit
  functions to do these tasks.
- Redo read_random(9) so as to duplicate random(4)'s read internals.
  This makes it a first-class citizen rather than a hack.
- Move stuff out of locked regions when it does not need to be
  there.
- Trim RANDOM_DEBUG printfs. Some are excess to requirement, some
  behind boot verbose.
- Use SYSINIT to sequence the startup.
- Fix init/deinit sysctl stuff.
- Make relevant sysctls also tunables.
- Add different harvesting "styles" to allow for different requirements
  (direct, queue, fast).
- Add harvesting of FFS atime events. This needs to be checked for
  weighing down the FS code.
- Add harvesting of slab allocator events. This needs to be checked for
  weighing down the allocator code.
- Fix the random(9) manpage.
- Loadable modules are not present for now. These will be re-engineered
  when the dust settles.
- Use macros for locks.
- Fix comments.

* src/share/man/...
- Update the man pages.

* src/etc/...
- The startup/shutdown work is done in D2924.

* src/UPDATING
- Add UPDATING announcement.

* src/sys/dev/random/build.sh
- Add copyright.
- Add libz for unit tests.

* src/sys/dev/random/dummy.c
- Remove; no longer needed. Functionality incorporated into randomdev.*.

* live_entropy_sources.c live_entropy_sources.h
- Remove; content moved.
- move content to randomdev.[ch] and optimise.

* src/sys/dev/random/random_adaptors.c src/sys/dev/random/random_adaptors.h
- Remove; plugability is no longer used. Compile-time algorithm
  selection is the way to go.

* src/sys/dev/random/random_harvestq.c src/sys/dev/random/random_harvestq.h
- Add early (re)boot-time randomness caching.

* src/sys/dev/random/randomdev_soft.c src/sys/dev/random/randomdev_soft.h
- Remove; no longer needed.

* src/sys/dev/random/uint128.h
- Provide a fake uint128_t; if a real one ever arrived, we can use
  that instead. All that is needed here is N=0, N++, N==0, and some
  localised trickery is used to manufacture a 128-bit 0ULLL.

* src/sys/dev/random/unit_test.c src/sys/dev/random/unit_test.h
- Improve unit tests; previously the testing human needed clairvoyance;
  now the test will do a basic check of compressibility. Clairvoyant
  talent is still a good idea.
- This is still a long way off a proper unit test.

* src/sys/dev/random/fortuna.c src/sys/dev/random/fortuna.h
- Improve messy union to just uint128_t.
- Remove unneeded 'static struct fortuna_start_cache'.
- Tighten up up arithmetic.
- Provide a method to allow eternal junk to be introduced; harden
  it against blatant by compress/hashing.
- Assert that locks are held correctly.
- Fix the nasty pre- and post-read overloading by providing explictit
  functions to do these tasks.
- Turn into self-sufficient module (no longer requires randomdev_soft.[ch])

* src/sys/dev/random/yarrow.c src/sys/dev/random/yarrow.h
- Improve messy union to just uint128_t.
- Remove unneeded 'staic struct start_cache'.
- Tighten up up arithmetic.
- Provide a method to allow eternal junk to be introduced; harden
  it against blatant by compress/hashing.
- Assert that locks are held correctly.
- Fix the nasty pre- and post-read overloading by providing explictit
  functions to do these tasks.
- Turn into self-sufficient module (no longer requires randomdev_soft.[ch])
- Fix some magic numbers elsewhere used as FAST and SLOW.

Differential Revision: https://reviews.freebsd.org/D2025
Reviewed by: vsevolod,delphij,rwatson,trasz,jmg
Approved by: so (delphij)
2015-06-30 17:00:45 +00:00
Robert Watson
089f02b006 Spell raccdt in a more conventional way in a comment.
MFC after:	3 days
2014-05-06 10:53:51 +00:00
Robert Watson
0e452337c5 Garbage collect two more unused sysinit subsystems: SI_SUB_KVM_RSRC and
SI_SUB_CLISTS.

MFC after:	3 days
2014-05-05 21:46:10 +00:00
Robert Watson
a2496f6e01 Garbage collect mtxpool_lockbuilder, the mutex pool historically used
for lockmgr and sx interlocks, but unused since optimised versions of
those sleep locks were introduced.  This will save a (quite) small
amount of memory in all kernel configurations.  The sleep mutex pool is
retained as it is used for 'struct bio' and several other consumers.

Discussed with:	jhb
MFC after:	3 days
2014-05-02 07:57:40 +00:00
Justin T. Gibbs
76acc41fb7 Implement vector callback for PVHVM and unify event channel implementations
Re-structure Xen HVM support so that:
	- Xen is detected and hypercalls can be performed very
	  early in system startup.
	- Xen interrupt services are implemented using FreeBSD's native
	  interrupt delivery infrastructure.
	- the Xen interrupt service implementation is shared between PV
	  and HVM guests.
	- Xen interrupt handlers can optionally use a filter handler
	  in order to avoid the overhead of dispatch to an interrupt
	  thread.
	- interrupt load can be distributed among all available CPUs.
	- the overhead of accessing the emulated local and I/O apics
	  on HVM is removed for event channel port events.
	- a similar optimization can eventually, and fairly easily,
	  be used to optimize MSI.

Early Xen detection, HVM refactoring, PVHVM interrupt infrastructure,
and misc Xen cleanups:

Sponsored by: Spectra Logic Corporation

Unification of PV & HVM interrupt infrastructure, bug fixes,
and misc Xen cleanups:

Submitted by: Roger Pau Monné
Sponsored by: Citrix Systems R&D

sys/x86/x86/local_apic.c:
sys/amd64/include/apicvar.h:
sys/i386/include/apicvar.h:
sys/amd64/amd64/apic_vector.S:
sys/i386/i386/apic_vector.s:
sys/amd64/amd64/machdep.c:
sys/i386/i386/machdep.c:
sys/i386/xen/exception.s:
sys/x86/include/segments.h:
	Reserve IDT vector 0x93 for the Xen event channel upcall
	interrupt handler.  On Hypervisors that support the direct
	vector callback feature, we can request that this vector be
	called directly by an injected HVM interrupt event, instead
	of a simulated PCI interrupt on the Xen platform PCI device.
	This avoids all of the overhead of dealing with the emulated
	I/O APIC and local APIC.  It also means that the Hypervisor
	can inject these events on any CPU, allowing upcalls for
	different ports to be handled in parallel.

sys/amd64/amd64/mp_machdep.c:
sys/i386/i386/mp_machdep.c:
	Map Xen per-vcpu area during AP startup.

sys/amd64/include/intr_machdep.h:
sys/i386/include/intr_machdep.h:
	Increase the FreeBSD IRQ vector table to include space
	for event channel interrupt sources.

sys/amd64/include/pcpu.h:
sys/i386/include/pcpu.h:
	Remove Xen HVM per-cpu variable data.  These fields are now
	allocated via the dynamic per-cpu scheme.  See xen_intr.c
	for details.

sys/amd64/include/xen/hypercall.h:
sys/dev/xen/blkback/blkback.c:
sys/i386/include/xen/xenvar.h:
sys/i386/xen/clock.c:
sys/i386/xen/xen_machdep.c:
sys/xen/gnttab.c:
	Prefer FreeBSD primatives to Linux ones in Xen support code.

sys/amd64/include/xen/xen-os.h:
sys/i386/include/xen/xen-os.h:
sys/xen/xen-os.h:
sys/dev/xen/balloon/balloon.c:
sys/dev/xen/blkback/blkback.c:
sys/dev/xen/blkfront/blkfront.c:
sys/dev/xen/console/xencons_ring.c:
sys/dev/xen/control/control.c:
sys/dev/xen/netback/netback.c:
sys/dev/xen/netfront/netfront.c:
sys/dev/xen/xenpci/xenpci.c:
sys/i386/i386/machdep.c:
sys/i386/include/pmap.h:
sys/i386/include/xen/xenfunc.h:
sys/i386/isa/npx.c:
sys/i386/xen/clock.c:
sys/i386/xen/mp_machdep.c:
sys/i386/xen/mptable.c:
sys/i386/xen/xen_clock_util.c:
sys/i386/xen/xen_machdep.c:
sys/i386/xen/xen_rtc.c:
sys/xen/evtchn/evtchn_dev.c:
sys/xen/features.c:
sys/xen/gnttab.c:
sys/xen/gnttab.h:
sys/xen/hvm.h:
sys/xen/xenbus/xenbus.c:
sys/xen/xenbus/xenbus_if.m:
sys/xen/xenbus/xenbusb_front.c:
sys/xen/xenbus/xenbusvar.h:
sys/xen/xenstore/xenstore.c:
sys/xen/xenstore/xenstore_dev.c:
sys/xen/xenstore/xenstorevar.h:
	Pull common Xen OS support functions/settings into xen/xen-os.h.

sys/amd64/include/xen/xen-os.h:
sys/i386/include/xen/xen-os.h:
sys/xen/xen-os.h:
	Remove constants, macros, and functions unused in FreeBSD's Xen
	support.

sys/xen/xen-os.h:
sys/i386/xen/xen_machdep.c:
sys/x86/xen/hvm.c:
	Introduce new functions xen_domain(), xen_pv_domain(), and
	xen_hvm_domain().  These are used in favor of #ifdefs so that
	FreeBSD can dynamically detect and adapt to the presence of
	a hypervisor.  The goal is to have an HVM optimized GENERIC,
	but more is necessary before this is possible.

sys/amd64/amd64/machdep.c:
sys/dev/xen/xenpci/xenpcivar.h:
sys/dev/xen/xenpci/xenpci.c:
sys/x86/xen/hvm.c:
sys/sys/kernel.h:
	Refactor magic ioport, Hypercall table and Hypervisor shared
	information page setup, and move it to a dedicated HVM support
	module.

	HVM mode initialization is now triggered during the
	SI_SUB_HYPERVISOR phase of system startup.  This currently
	occurs just after the kernel VM is fully setup which is
	just enough infrastructure to allow the hypercall table
	and shared info page to be properly mapped.

sys/xen/hvm.h:
sys/x86/xen/hvm.c:
	Add definitions and a method for configuring Hypervisor event
	delievery via a direct vector callback.

sys/amd64/include/xen/xen-os.h:
sys/x86/xen/hvm.c:

sys/conf/files:
sys/conf/files.amd64:
sys/conf/files.i386:
	Adjust kernel build to reflect the refactoring of early
	Xen startup code and Xen interrupt services.

sys/dev/xen/blkback/blkback.c:
sys/dev/xen/blkfront/blkfront.c:
sys/dev/xen/blkfront/block.h:
sys/dev/xen/control/control.c:
sys/dev/xen/evtchn/evtchn_dev.c:
sys/dev/xen/netback/netback.c:
sys/dev/xen/netfront/netfront.c:
sys/xen/xenstore/xenstore.c:
sys/xen/evtchn/evtchn_dev.c:
sys/dev/xen/console/console.c:
sys/dev/xen/console/xencons_ring.c
	Adjust drivers to use new xen_intr_*() API.

sys/dev/xen/blkback/blkback.c:
	Since blkback defers all event handling to a taskqueue,
	convert this task queue to a "fast" taskqueue, and schedule
	it via an interrupt filter.  This avoids an unnecessary
	ithread context switch.

sys/xen/xenstore/xenstore.c:
	The xenstore driver is MPSAFE.  Indicate as much when
	registering its interrupt handler.

sys/xen/xenbus/xenbus.c:
sys/xen/xenbus/xenbusvar.h:
	Remove unused event channel APIs.

sys/xen/evtchn.h:
	Remove all kernel Xen interrupt service API definitions
	from this file.  It is now only used for structure and
	ioctl definitions related to the event channel userland
	device driver.

	Update the definitions in this file to match those from
	NetBSD.  Implementing this interface will be necessary for
	Dom0 support.

sys/xen/evtchn/evtchnvar.h:
	Add a header file for implemenation internal APIs related
	to managing event channels event delivery.  This is used
	to allow, for example, the event channel userland device
	driver to access low-level routines that typical kernel
	consumers of event channel services should never access.

sys/xen/interface/event_channel.h:
sys/xen/xen_intr.h:
	Standardize on the evtchn_port_t type for referring to
	an event channel port id.  In order to prevent low-level
	event channel APIs from leaking to kernel consumers who
	should not have access to this data, the type is defined
	twice: Once in the Xen provided event_channel.h, and again
	in xen/xen_intr.h.  The double declaration is protected by
	__XEN_EVTCHN_PORT_DEFINED__ to ensure it is never declared
	twice within a given compilation unit.

sys/xen/xen_intr.h:
sys/xen/evtchn/evtchn.c:
sys/x86/xen/xen_intr.c:
sys/dev/xen/xenpci/evtchn.c:
sys/dev/xen/xenpci/xenpcivar.h:
	New implementation of Xen interrupt services.  This is
	similar in many respects to the i386 PV implementation with
	the exception that events for bound to event channel ports
	(i.e. not IPI, virtual IRQ, or physical IRQ) are further
	optimized to avoid mask/unmask operations that aren't
	necessary for these edge triggered events.

	Stubs exist for supporting physical IRQ binding, but will
	need additional work before this implementation can be
	fully shared between PV and HVM.

sys/amd64/amd64/mp_machdep.c:
sys/i386/i386/mp_machdep.c:
sys/i386/xen/mp_machdep.c
sys/x86/xen/hvm.c:
	Add support for placing vcpu_info into an arbritary memory
	page instead of using HYPERVISOR_shared_info->vcpu_info.
	This allows the creation of domains with more than 32 vcpus.

sys/i386/i386/machdep.c:
sys/i386/xen/clock.c:
sys/i386/xen/xen_machdep.c:
sys/i386/xen/exception.s:
	Add support for new event channle implementation.
2013-08-29 19:52:18 +00:00
Andriy Gapon
785797c341 rename scheduler->swapper and SI_SUB_RUN_SCHEDULER->SI_SUB_LAST
Also directly call swapper() at the end of mi_startup instead of
relying on swapper being the last thing in sysinits order.

Rationale:

- "RUN_SCHEDULER" was misleading, scheduling already takes place at that stage
- "scheduler" was misleading, the function swaps in the swapped out processes
- another SYSINIT(SI_SUB_RUN_SCHEDULER, SI_ORDER_ANY) could never be
  invoked depending on its relative order with scheduler; this was not obvious
  and the bug actually used to exist

Reviewed by:	kib (ealier version)
MFC after:	14 days
2013-07-24 09:45:31 +00:00
John Baldwin
a8df530ddc Mark 'ticks', 'time_second', and 'time_uptime' as volatile to prevent the
compiler from caching their values in tight loops.

Reviewed by:	bde
MFC after:	1 week
2013-01-28 19:38:13 +00:00
Andre Oppermann
a8c9f6fd75 Remove splimp() comment from sysinit table and attribute SI_SUB_PROTO_BEGIN
and SI_SUB_PROTO_END to VNET related initializations.

MFC after:	3 days
2012-10-19 10:04:43 +00:00
John Baldwin
56d20d01b5 Replace a reference to the non-existent SI_ORDER_LAST in a comment with
SI_ORDER_ANY.

Submitted by:	Brandon Gooch  brandongooch yahoo com
2012-06-12 18:19:46 +00:00
Edward Tomasz Napierala
097055e26d Add racct. It's an API to keep per-process, per-jail, per-loginclass
and per-loginclass resource accounting information, to be used by the new
resource limits code.  It's connected to the build, but the code that
actually calls the new functions will come later.

Sponsored by:	The FreeBSD Foundation
Reviewed by:	kib (earlier version)
2011-03-29 17:47:25 +00:00
Matthew D Fleming
c77715ef6c Mostly revert r219468, as I had misremembered the C standard regarding
the size of an extern array.

Keep one change from strncpy to strlcpy.
2011-03-11 18:56:55 +00:00
Matthew D Fleming
cd67ac41ae Use MAXPATHLEN rather than the size of an extern array when copying the
kernel name.  Also consistenly use strlcpy().

Suggested by:	Warner Losh
2011-03-10 22:56:00 +00:00
Alexander Motin
5b24354c79 Remove unexisted since r212541 timer1hz/timer2hz variables. 2010-11-10 16:42:36 +00:00
Alexander Motin
dbd55f3ff0 - Implement MI helper functions, dividing one or two timer interrupts with
arbitrary frequencies into hardclock(), statclock() and profclock() calls.
Same code with minor variations duplicated several times over the tree for
different timer drivers and architectures.
- Switch all x86 archs to new functions, simplifying the code and removing
extra logic from timer drivers. Other archs are also welcome.
2010-05-24 11:40:49 +00:00
Ruslan Ermilov
e64585bdc2 Random number generator initialization cleanup:
- Introduce new SI_SUB_RANDOM point in boot sequence to make it
clear from where one may start using random(9).  It should be as
early as possible, so place it just after SI_SUB_CPU where we
have some randomness on most platforms via get_cyclecount().

- Move stack protector initialization to be after SI_SUB_RANDOM
as before this point we have no randomness at all.  This fixes
stack protector to actually protect stack with some random guard
value instead of a well-known one.

Note that this patch doesn't try to address arc4random(9) issues.
With current code, it will be implicitly seeded by stack protector
and hence will get the same entropy as random(9).  It will be
securely reseeded once /dev/random is feeded by some entropy from
userland.

Submitted by:	Maxim Dounin <mdounin@mdounin.ru>
MFC after:	3 days
2009-10-20 16:36:51 +00:00
Robert Watson
d0728d7174 Introduce and use a sysinit-based initialization scheme for virtual
network stacks, VNET_SYSINIT:

- Add VNET_SYSINIT and VNET_SYSUNINIT macros to declare events that will
  occur each time a network stack is instantiated and destroyed.  In the
  !VIMAGE case, these are simply mapped into regular SYSINIT/SYSUNINIT.
  For the VIMAGE case, we instead use SYSINIT's to track their order and
  properties on registration, using them for each vnet when created/
  destroyed, or immediately on module load for already-started vnets.
- Remove vnet_modinfo mechanism that existed to serve this purpose
  previously, as well as its dependency scheme: we now just use the
  SYSINIT ordering scheme.
- Implement VNET_DOMAIN_SET() to allow protocol domains to declare that
  they want init functions to be called for each virtual network stack
  rather than just once at boot, compiling down to DOMAIN_SET() in the
  non-VIMAGE case.
- Walk all virtualized kernel subsystems and make use of these instead
  of modinfo or DOMAIN_SET() for init/uninit events.  In some cases,
  convert modular components from using modevent to using sysinit (where
  appropriate).  In some cases, do minor rejuggling of SYSINIT ordering
  to make room for or better manage events.

Portions submitted by:	jhb (VNET_SYSINIT), bz (cleanup)
Discussed with:		jhb, bz, julian, zec
Reviewed by:		bz
Approved by:		re (VIMAGE blanket)
2009-07-23 20:46:49 +00:00
Robert Watson
5ee847d3ac Reimplement and/or implement vnet list locking by replacing a mostly
unused custom mutex/condvar-based sleep locks with two locks: an
rwlock (for non-sleeping use) and sxlock (for sleeping use).  Either
acquired for read is sufficient to stabilize the vnet list, but both
must be acquired for write to modify the list.

Replace previous no-op read locking macros, used in various places
in the stack, with actual locking to prevent race conditions.  Callers
must declare when they may perform unbounded sleeps or not when
selecting how to lock.

Refactor vnet sysinits so that the vnet list and locks are initialized
before kernel modules are linked, as the kernel linker will use them
for modules loaded by the boot loader.

Update various consumers of these KPIs based on whether they may sleep
or not.

Reviewed by:	bz
Approved by:	re (kib)
2009-07-19 14:20:53 +00:00
Jamie Gritton
76ca6f88da Place hostnames and similar information fully under the prison system.
The system hostname is now stored in prison0, and the global variable
"hostname" has been removed, as has the hostname_mtx mutex.  Jails may
have their own host information, or they may inherit it from the
parent/system.  The proper way to read the hostname is via
getcredhostname(), which will copy either the hostname associated with
the passed cred, or the system hostname if you pass NULL.  The system
hostname can still be accessed directly (and without locking) at
prison0.pr_host, but that should be avoided where possible.

The "similar information" referred to is domainname, hostid, and
hostuuid, which have also become prison parameters and had their
associated global variables removed.

Approved by:	bz (mentor)
2009-05-29 21:27:12 +00:00
Marko Zec
29b02909eb Introduce a new virtualization container, provisionally named vprocg, to hold
virtualized instances of hostname and domainname, as well as a new top-level
virtualization struct vimage, which holds pointers to struct vnet and struct
vprocg.  Struct vprocg is likely to become replaced in the near future with
a new jail management API import.

As a consequence of this change, change struct ucred to point to a struct
vimage, instead of directly pointing to a vnet.

Merge vnet / vimage / ucred refcounting infrastructure from p4 / vimage
branch.

Permit kldload / kldunload operations to be executed only from the default
vimage context.

This change should have no functional impact on nooptions VIMAGE kernel
builds.

Reviewed by:	bz
Approved by:	julian (mentor)
2009-05-08 14:11:06 +00:00
Marko Zec
bfe1aba468 Introduce vnet module registration / initialization framework with
dependency tracking and ordering enforcement.

With this change, per-vnet initialization functions introduced with
r190787 are no longer directly called from traditional initialization
functions (which cc in most cases inlined to pre-r190787 code), but are
instead registered via the vnet framework first, and are invoked only
after all prerequisite modules have been initialized.  In the long run,
this framework should allow us to both initialize and dismantle
multiple vnet instances in a correct order.

The problem this change aims to solve is how to replay the
initialization sequence of various network stack components, which
have been traditionally triggered via different mechanisms (SYSINIT,
protosw).  Note that this initialization sequence was and still can be
subtly different depending on whether certain pieces of code have been
statically compiled into the kernel, loaded as modules by boot
loader, or kldloaded at run time.

The approach is simple - we record the initialization sequence
established by the traditional mechanisms whenever vnet_mod_register()
is called for a particular vnet module.  The vnet_mod_register_multi()
variant allows a single initializer function to be registered multiple
times but with different arguments - currently this is only used in
kern/uipc_domain.c by net_add_domain() with different struct domain *
as arguments, which allows for protosw-registered initialization
routines to be invoked in a correct order by the new vnet
initialization framework.

For the purpose of identifying vnet modules, each vnet module has to
have a unique ID, which is statically assigned in sys/vimage.h.
Dynamic assignment of vnet module IDs is not supported yet.

A vnet module may specify a single prerequisite module at registration
time by filling in the vmi_dependson field of its vnet_modinfo struct
with the ID of the module it depends on.  Unless specified otherwise,
all vnet modules depend on VNET_MOD_NET (container for ifnet list head,
rt_tables etc.), which thus has to and will always be initialized
first.  The framework will panic if it detects any unresolved
dependencies before completing system initialization.  Detection of
unresolved dependencies for vnet modules registered after boot
(kldloaded modules) is not provided.

Note that the fact that each module can specify only a single
prerequisite may become problematic in the long run.  In particular,
INET6 depends on INET being already instantiated, due to TCP / UDP
structures residing in INET container.  IPSEC also depends on INET,
which will in turn additionally complicate making INET6-only kernel
configs a reality.

The entire registration framework can be compiled out by turning on the
VIMAGE_GLOBALS kernel config option.

Reviewed by:	bz
Approved by:	julian (mentor)
2009-04-11 05:58:58 +00:00
Marko Zec
385195c062 Conditionally compile out V_ globals while instantiating the appropriate
container structures, depending on VIMAGE_GLOBALS compile time option.

Make VIMAGE_GLOBALS a new compile-time option, which by default will not
be defined, resulting in instatiations of global variables selected for
V_irtualization (enclosed in #ifdef VIMAGE_GLOBALS blocks) to be
effectively compiled out.  Instantiate new global container structures
to hold V_irtualized variables: vnet_net_0, vnet_inet_0, vnet_inet6_0,
vnet_ipsec_0, vnet_netgraph_0, and vnet_gif_0.

Update the VSYM() macro so that depending on VIMAGE_GLOBALS the V_
macros resolve either to the original globals, or to fields inside
container structures, i.e. effectively

#ifdef VIMAGE_GLOBALS
#define V_rt_tables rt_tables
#else
#define V_rt_tables vnet_net_0._rt_tables
#endif

Update SYSCTL_V_*() macros to operate either on globals or on fields
inside container structs.

Extend the internal kldsym() lookups with the ability to resolve
selected fields inside the virtualization container structs.  This
applies only to the fields which are explicitly registered for kldsym()
visibility via VNET_MOD_DECLARE() and vnet_mod_register(), currently
this is done only in sys/net/if.c.

Fix a few broken instances of MODULE_GLOBAL() macro use in SCTP code,
and modify the MODULE_GLOBAL() macro to resolve to V_ macros, which in
turn result in proper code being generated depending on VIMAGE_GLOBALS.

De-virtualize local static variables in sys/contrib/pf/net/pf_subr.c
which were prematurely V_irtualized by automated V_ prepending scripts
during earlier merging steps.  PF virtualization will be done
separately, most probably after next PF import.

Convert a few variable initializations at instantiation to
initialization in init functions, most notably in ipfw.  Also convert
TUNABLE_INT() initializers for V_ variables to TUNABLE_FETCH_INT() in
initializer functions.

Discussed at:	devsummit Strassburg
Reviewed by:	bz, julian
Approved by:	julian (mentor)
Obtained from:	//depot/projects/vimage-commit2/...
X-MFC after:	never
Sponsored by:	NLnet Foundation, The FreeBSD Foundation
2008-12-10 23:12:39 +00:00
Ed Schouten
040b1db930 Remove the now unused `lbolt' variable from the kernel.
We used to have a single wait channel inside the kernel which could be
used by threads that just wanted to sleep for some time (the next
second). The old TTY layer was the only piece of code that still used
lbolt, because I already removed the use of lbolt from the NFS clients
and the VFS syncer.

Approved by:	philip
2008-08-20 12:20:22 +00:00
Pawel Jakub Dawidek
7f41115ef6 Implement the following macros for completeness:
SYSCTL_QUAD()
	SYSCTL_ADD_QUAD()
	TUNABLE_QUAD()
	TUNABLE_QUAD_FETCH()

Now we can use 64bit tunables on 32bit systems.
2008-07-21 15:05:25 +00:00
Robert Watson
4f7d1876d5 Introduce a new lock, hostname_mtx, and use it to synchronize access
to global hostname and domainname variables.  Where necessary, copy
to or from a stack-local buffer before performing copyin() or
copyout().  A few uses, such as in cd9660 and daemon_saver, remain
under-synchronized and will require further updates.

Correct a bug in which a failed copyin() of domainname would leave
domainname potentially corrupted.

MFC after:	3 weeks
2008-07-05 13:10:10 +00:00
John Birrell
69b2c659c1 Add sysinit levels for DTrace. 2008-05-18 22:10:10 +00:00
Sam Leffler
00c71fb7c3 o add a mountroot event handler that fires when / is mounted; this information
was lost when root started being mounted by init
o remove SI_SUB_MOUNT_ROOT since it's no longer meaningful

MFC after:	2 weeks
2008-04-08 17:53:33 +00:00
Robert Watson
dd3af71f17 Remove trailing ';' from C_SYSINIT() macro definition, in keeping
with style(9) recommendation that macros not contain the
terminating ';', leaving that to the invoker.  All SYSINIT()
consumers must now provide a trailing ';'.

Unlike the change to remove the ';'s from callers, this change
shouldn't be MFC'd unless we don't mind requiring source changes
to third party modules that might still depend on SYSINIT()
providing its own ';'.
2008-03-16 11:01:32 +00:00
Robert Watson
55c3064e78 Add a new kernel startup event for DDB services, which will include DDB
output capture, scripting, and textdumps.
2007-12-25 18:36:43 +00:00
John Birrell
e3709a563c Remove _SOLARIS_C_SOURCE compatibility definitions. Unfortunately the
ZFS porting style didn't extend this, instead using a heap of additional
header files that don't get installed.

My intention had been to allow OpenSolaris external code to build on
FreeBSD out of the box (i.e. without a src tree).
2007-11-28 21:54:46 +00:00
Robert Watson
33d2bb9ca3 First in a series of changes to remove the now-unused Giant compatibility
framework for non-MPSAFE network protocols:

- Remove debug_mpsafenet variable, sysctl, and tunable.
- Remove NET_NEEDS_GIANT() and associate SYSINITSs used by it to force
  debug.mpsafenet=0 if non-MPSAFE protocols are compiled into the kernel.
- Remove logic to automatically flag interrupt handlers as non-MPSAFE if
  debug.mpsafenet is set for an INTR_TYPE_NET handler.
- Remove logic to automatically flag netisr handlers as non-MPSAFE if
  debug.mpsafenet is set.
- Remove references in a few subsystems, including NFS and Cronyx drivers,
  which keyed off debug_mpsafenet to determine various aspects of their own
  locking behavior.
- Convert NET_LOCK_GIANT(), NET_UNLOCK_GIANT(), and NET_ASSERT_GIANT into
  no-op's, as their entire behavior was determined by the value in
  debug_mpsafenet.
- Alias NET_CALLOUT_MPSAFE to CALLOUT_MPSAFE.

Many remaining references to NET_.*_GIANT() and NET_CALLOUT_MPSAFE are still
present in subsystems, and will be removed in followup commits.

Reviewed by:	bz, jhb
Approved by:	re (kensmith)
2007-07-27 11:59:57 +00:00
Pawel Jakub Dawidek
6db107202a Fix build breakage. 2007-04-09 22:29:13 +00:00
Pawel Jakub Dawidek
82068fe7a9 Add kern.hostuuid sysctl, which will be used to keep host's UUID.
Reviewed by:	mlaier, rink, brooks, rwatson
2007-04-09 19:18:09 +00:00
Pawel Jakub Dawidek
24c3c19e73 Hide lbolt under _SOLARIS_C_SOURCE in preparation for ZFS import.
I really couldn't avoid this with preprocessor magic.
2007-04-05 20:40:47 +00:00
Poul-Henning Kamp
f645b0b51c First part of a little cleanup in the calendar/timezone/RTC handling.
Move relevant variables to <sys/clock.h> and fix #includes as necessary.

Use libkern's much more time- & spamce-efficient BCD routines.
2006-10-02 12:59:59 +00:00
John Baldwin
03e161fdb1 Make system call modules a bit more robust:
- If we fail to register the system call during MOD_LOAD, then note that
  so that we don't try to deregister it or invoke the chained event handler
  during the subsequent MOD_UNLOAD event.  Doing the deregister when the
  register failed could result in trashing system call entries.
- Add a SI_SUB_SYSCALLS just before starting up init and use that to
  register syscall modules instead of SI_SUB_DRIVERS.  Registering system
  calls as late as possible increases the chances that any other module
  event handlers or SYSINITs in a module are executed to initialize the
  data in a kld before a syscall dependent on that data is able to be
  invoked.

MFC after:	3 days
2006-08-01 16:32:20 +00:00
Poul-Henning Kamp
9806934e47 Be less harsh on brueffers eyes :-) 2006-05-26 10:23:05 +00:00
Poul-Henning Kamp
0aa6cf76bf Remove SI_SUB_CONSOLE, porting from 4.4-Lite is no longer an issue. 2006-05-26 10:00:58 +00:00
Ruslan Ermilov
8df65d80e2 GC long unused hostnamelen and domainnamelen.
Submitted by:	Alex Lyashkov <shadow@psoft.net>
2006-05-24 07:54:42 +00:00
Christian S.J. Peron
d1dfd92177 Convert the primary ACL allocator from malloc(9) to using a UMA zone instead.
Also introduce an aclinit function which will be used to create the UMA zone
for use by file systems at system start up.

MFC after:	1 month
Discussed with:	rwatson
2005-09-06 00:06:30 +00:00