kernel build. This makes it possible for me not to get pissed off that
random.ko crashes the system trying to rdtsc() when the i386/cpu.h
support code decides it's okay to call that op when neither I386_CPU or
I486_CPU is defined. I guess it also makes WITNESS/INVARIANTS defines
get picked up by the modules.
vnode of the parent. However, this check should not be performed if
the lookup failed. This change should fix "union_lookup returning
. not same as startdir" panics people were seeing. The bug was
introduced by an incomplete import of a NetBSD delta in rev 1.38.
- Move the aforementioned check out from DIAGNOSTIC. Performance
is the least of our unionfs worries.
- Minor reorganization.
PR: 53004
MFC after: 1 week
- Return EBUSY if the region was wired by mlock(2) and MS_INVALIDATE
is specified to msync(2). This is required by the Open Group Base
Specifications Issue 6.
- vm_map_sync() doesn't return KERN_FAILURE. Thus, msync(2) can't
possibly return EIO.
- The second major loop in vm_map_sync() handles sub maps. Thus,
failing on sub maps in the first major loop isn't necessary.
any functions that call them. Calling proc_rwmem() with the proc lock
held is not safe. Currently, we're protected from any races by Giant.
Eventually proc_rwmem() should require the proc lock and not Giant.
parts of ptrace using proc_rwmem(). proc_rwmem() requires giant, and
giant must be acquired prior to the proc lock, so ptrace must require giant
still.
multicast hash are written. There are still two distinct algorithms used,
and there actually isn't any reason each driver should have its own copy
of this function as they could all share one copy of it (if it grew an
additional argument).
Give the HZ/overflow check a 10% margin.
Eliminate bogus newline.
If timecounters have equal quality, prefer higher frequency.
Some inspiration from: bde
5212-based devices because PHY errors are used to collect data
on environmental noise that and doesn't truly reflect the state
of the communications media. The result is confused users.
Folks that want to watch PHY errors can still get the statistics
through the device ioctl (used by athstats).
o reject scan requests for a device that isn't marked up
This fixes a problem where requesting a scan before marking the device
up would cause a panic because the current channel was set to "any" (0xffff).
o correct a read-lock assert in in_pcblookup_local that should be
a write-lock assert (since time wait close cleanups may alter state)
Supported by: FreeBSD Foundation
preemption two CPUs can be in the same function at the same time
and clobber each others variables. Remove register declaration
from local variables.
Reviewed by: sam (mentor)
and empty its turnstile while the blocking threads still pointed to the
turnstile. If the thread on the first CPU blocked on a lock owned by
one of the threads blocked on the turnstile just woken up, then the
first CPU could try to manipulate a bogus thread queue in the turnstile
during priority propagation.
- Update locking notes for ts_owner and always clear ts_owner, not just
under INVARIANTS.
Tested by: sam (1)
deleted in 1.81. Increase the initial timeout limit to 2ms to
eliminate spurious messages of excessive timeouts in the NFS
client code.
Requested by: Poul-Henning Kamp <phk@phk.freebsd.dk>
Requested by: Mike Silbersack <silby@silby.com>
Requested by: Sam Leffler <sam@errno.com>
Giant and is also MPSAFE.
Push Giant further down into __mac_get_fd() and __mac_set_fd(),
grabbing it only for constrained regions dealing with VFS, and
dropping it entirely for operations related to labeling of pipes.
Obtained from: TrustedBSD Project
Sponsored by: DARPA, Network Associates Laboratories
Radeon IGP support (still lacking PCI IDs), and DRM interface 1.2 updates which
include finally tying the DRM instances to specific devices rather than relying
on the X Server.
This switch toggles between strict multicast delivery, and traditional
multicast delivery.
The traditional (default) behaviour is to deliver multicast datagrams to all
sockets which are members of that group, regardless of the network interface
where the datagrams were received.
The strict behaviour is to deliver multicast datagrams received on a
particular interface only to sockets whose membership is bound to that
interface.
Note that as a matter of course, multicast consumers specifying INADDR_ANY
for their interface get joined on the interface where the default route
happens to be bound. This switch has no effect if the interface which the
consumer specifies for IP_ADD_MEMBERSHIP is not UP and RUNNING.
The original patch has been cleaned up somewhat from that submitted. It has
been tested on a multihomed machine with multiple QuickTime RTP streams
running over the local switch, which doesn't do IGMP snooping.
PR: kern/58359
Submitted by: William A. Carrel
Reviewed by: rwatson
MFC after: 1 week
vector stubs and into the C functions they call.
- Move disabling and EOIing of interrupt sources out of PIC driver entry
points and into intr_execute_handlers(). Intr_execute_handlers() only
disables a source for an interrupt if it is a stray interrupt or has
threaded handlers. Sources with fast handlers no longer disable (mask)
the source while executing the handlers.
- Move the setting of clkintr_pending into intr_execute_handlers() and set
the variable for any interrupt source with a vector of 0. (Should only
be true for IRQ 0.) This fixes clkintr_pending in the NO_MIXED_MODE
case.
- Implement lapic_eoi() and use it to implement ioapic_eoi_source().
- Rename atpic_sched_ithd() to atpic_handle_intr() since it is used to
handle all atpic interrupts and not just threaded ones.
Inspired by: peter's changes to amd64 in p4 (1)
Requested by: bde (2)
extra argument to the devfs MAC policy entry points was accidentally
merged from the MAC branch during my earlier commit to these policies,
and is not scheduled to be merged just yet.
defines for these constants that include the trailing NUL byte. These
new constants have SIZ in their name instead of LEN. As soon as all
consumers in the tree are converted to use the new defines the old
defines will be put under BURN_BRIDGES.
Reviewed by: archie, julian, ru
Approved by: re (in principle)
after the additions made for the new statfs structure (version
1.157). These must be updated in a separate checkin after
syscalls.master has been checked in so that they reflect its
new CVS identity. As these are purely derived files, it is not
clear to me why they are under CVS at all. I presume that it has
something to do with having `make world' operate properly.
accurate reporting of multi-terabyte filesystem sizes.
You should build and boot a new kernel BEFORE doing a `make world'
as the new kernel will know about binaries using the old statfs
structure, but an old kernel will not know about the new system
calls that support the new statfs structure. Running an old kernel
after a `make world' will cause programs such as `df' that do a
statfs system call to fail with a bad system call.
Reviewed by: Bruce Evans <bde@zeta.org.au>
Reviewed by: Tim Robbins <tjr@freebsd.org>
Reviewed by: Julian Elischer <julian@elischer.org>
Reviewed by: the hoards of <arch@freebsd.org>
Sponsored by: DARPA & NAI Labs.
in various kernel objects to represent security data, we embed a
(struct label *) pointer, which now references labels allocated using
a UMA zone (mac_label.c). This allows the size and shape of struct
label to be varied without changing the size and shape of these kernel
objects, which become part of the frozen ABI with 5-STABLE. This opens
the door for boot-time selection of the number of label slots, and hence
changes to the bound on the number of simultaneous labeled policies
at boot-time instead of compile-time. This also makes it easier to
embed label references in new objects as required for locking/caching
with fine-grained network stack locking, such as inpcb structures.
This change also moves us further in the direction of hiding the
structure of kernel objects from MAC policy modules, not to mention
dramatically reducing the number of '&' symbols appearing in both the
MAC Framework and MAC policy modules, and improving readability.
While this results in minimal performance change with MAC enabled, it
will observably shrink the size of a number of critical kernel data
structures for the !MAC case, and should have a small (but measurable)
performance benefit (i.e., struct vnode, struct socket) do to memory
conservation and reduced cost of zeroing memory.
NOTE: Users of MAC must recompile their kernel and all MAC modules as a
result of this change. Because this is an API change, third party
MAC modules will also need to be updated to make less use of the '&'
symbol.
Suggestions from: bmilekic
Obtained from: TrustedBSD Project
Sponsored by: DARPA, Network Associates Laboratories
vfs_mount_alloc/vfs_mount_destroy functions and take care to completely
destroy the mount point along with its locks. Mount struct has grown in
coplexity recently and depending on each failure path to destroy it
completely isn't working anymore.
2. Eliminate largely identical vfs_mount and vfs_unmount question by
moving the code to handle both cases into a newly introduced vfs_domount
function.
3. Simplify nfs_mount_diskless to always expect an allocated mount
struct and never attempt an allocation/destruction itself. The
vfs_allocroot allocation was there to support 'magic' swap space
configuration for diskless clients that was already removed by PHK some
time ago.
4. Include a vfs_buildopts cleanups by Peter Edwards to validate the
sanity of nmount parameters passed from userland.
Submitted by: (4) Peter Edwards <peter.edwards@openet-telecom.com>
Reviewed by: rwatson
important change is in cpu_switch() where we disable the high FP
registers for the thread that we switch-out if the CPU currently
has its high FP registers. This avoids that the high FP registers
remain enabled for the thread even when the CPU has unloaded them
or the thread migrated to another processor.
Likewise, when we switch-in a thread of that has its high FP
registers on the CPU, we enable them. This avoids an otherwise
harmless, but unnecessary trap to have them enabled.
The code that handles the disabled high FP trap (in trap()) has
been turned into a critical section for the most part to avoid
being preempted. If there's a race, we bail out and have the
processor trap again if necessary.
Avoid using the generic ia64_highfp_save() function when the
context is predictable. The function adds unnecessary overhead.
Don't use ia64_highfp_load() for the same reason. The function
is now unused and can be removed.
These changes make the lazy context switching of the high FP
registers in an UP kernel functional.
turnstiles to implement blocking isntead of implementing a thread queue
directly. These turnstiles are somewhat similar to those used in Solaris 7
as described in Solaris Internals but are also different.
Turnstiles do not come out of a fixed-sized pool. Rather, each thread is
assigned a turnstile when it is created that it frees when it is destroyed.
When a thread blocks on a lock, it donates its turnstile to that lock to
serve as queue of blocked threads. The queue associated with a given lock
is found by a lookup in a simple hash table. The turnstile itself is
protected by a lock associated with its entry in the hash table. This
means that sched_lock is no longer needed to contest on a mutex. Instead,
sched_lock is only used when manipulating run queues or thread priorities.
Turnstiles also implement priority propagation inherently.
Currently turnstiles only support mutexes. Eventually, however, turnstiles
may grow two queue's to support a non-sleepable reader/writer lock
implementation. For more details, see the comments in sys/turnstile.h and
kern/subr_turnstile.c.
The two primary advantages from the turnstile code include: 1) the size
of struct mutex shrinks by four pointers as it no longer stores the
thread queue linkages directly, and 2) less contention on sched_lock in
SMP systems including the ability for multiple CPUs to contend on different
locks simultaneously (not that this last detail is necessarily that much of
a big win). Note that 1) means that this commit is a kernel ABI breaker,
so don't mix old modules with a new kernel and vice versa.
Tested on: i386 SMP, sparc64 SMP, alpha SMP
- Fail in agp_alloc_gatt if the aperture size is 0 instead of panicing in
contigmalloc.
Reported by: Bjoern Fischer <bfischer@Techfak.Uni-Bielefeld.DE>
Reviewed by: jhb
MFC after: 1 week
interrupt such as IRQ 22 or 19. However, the ACPI BIOS still routes
interrupts from some PCI devices to the same intpin calling the pin
IRQ 22. Thus, ACPI expects to address a single interrupt source via two
different names. To work around this, if the SCI is remapped to a non-ISA
interrupt (i.e., greater than 15), then we use
acpi_OverrideInterruptLevel() function to tell ACPI to use IRQ 22 or 19
rather than IRQ 9 for the SCI.
Previously we would change IRQ 22 or 19's name to IRQ 9 when we encountered
such an Interrupt Source Override entry in the MADT which routed the SCI
properly but left PCI devices mapped to IRQ 22 or 19 w/o a routable
interrupt.
Tested by: sos
This ensures that uart gets a higher console priority than syscons when
a serial console is being used. Testing against the "console" environment
variable doesn't make sense since we only have one loader console driver.
should now only have HTT CPUs if they have explicitly asked for them
either by enabling HyperThreading in the BIOS or by using the
MPTABLE_FORCE_HTT kernel option.
should only be used if they are enabled in the BIOS. Now that we support
enumerating CPUs using the ACPI MADT, any HTT machine using ACPI should
respect the BIOS setting. For HTT machines with ACPI disabled in the
kernel, the MPTABLE_FORCE_HTT kernel option can be used to try to probe HTT
CPUs like have done in the past for the MP Table case. This option should
only be enabled if HTT is enabled in the BIOS.
that we currently do not keep track of whether the thread has
actually used the high FP registers before. If not, we should
not save them in the context which automaticly means that we
also would not restore them from the context. For now, do it
unconditionally so that we can reach functional completeness.
functions switched to using {g|s}et_mcontext(). The problem is that
sigreturn(), being a syscall, can be given an async. context (i.e.
one corresponding to an interrupt or trap). When this happens, we
try to return to user mode via epc_syscall_return with a trapframe
that can only be used to return to user mode via exception_restore.
To fix this, we check the frame's flags immediately prior to
epc_syscall_return and branch to exception_restore for non-syscall
frames. Modify the assertion in set_mcontext() to check that if
there's a mismatch, it's because of sigreturn().
ktr_resize_pool(); this eliminates a potential livelock.
Return ENOSPC only if we encountered an out-of-memory condition when
trying to increase the pool size.
Reviewed by: jhb, bde (style)
cache after a data access error we must discard all cache lines. When
disabled existing cache lines are not invalidated by stores to memory, so
we risk reading stale data that was cached before the data access error if
we don't flush them. This is especially fatal when the memory involved
is the active part of the kernel or user stack. For good measure we also
flush the instruction cache.
This fixes random crashes when the X server probes the PCI bus through
/dev/pci.
commit broke the world because it depended on namespace pollution that
was only in my version of <machine/bootinfo.h>. The include was removed
in rev.1.63 after the last reference to it went away in rev.1.61.
dsp_open: rearrange to only hold one lock at a time
dsp_close: ditto
mixer_hwvol_init: delete locking, the only consumer seems to
be the ess driver and it only call it a creation time, I
think the device will be stable across the sleepable malloc.
cmi interrupt routine: Release locks while caller chn_intr,
either this or do what emu10k1 does which is have no locks
at in the interrupt handler.
Submitted by: mat@cnd.mcgill.ca
consistency initialized. Consequently, a number of conditionals that
checked the validity of b_object before passing it to VM_OBJECT_LOCK()
and VM_OBJECT_UNLOCK() are no longer needed.
The reason this was done was to avoid a race to the root when an
NFS server went down. However a semi-recent change to the way that
the kernel's lookup() routine traverses mount points prevents this.
Rev 1.39 of vfs_lookup.c changed the ordering of locks such that we
aquire a shared lock on the mount point being accessed and then drop
the directory vnode lock before requesting the target lock.
With that in place we no longer need shared locks for NFS to prevent
race to the root lockups.
it is marked as RTF_UP. This appears to fix a crash that was sometimes
triggered when dhclient(8) tried to send a packet after an interface
had been detatched.
Reviewed by: sam
a resource leak. Move the resource deallocation code from fifo_close()
to a new function, fifo_cleanup(), and call fifo_cleanup() from
fifo_close() and the appropriate places in fifo_open().
Tested by: Lukas Ertl
Pointy hat to: truckman
because RFNOWAIT was being passed to kproc_create.
The result was that shutdown took quite a bit longer because this
errant "child" would not respond to termination signals from init
at system shutdown.
RFNOWAIT dissassociates itself from the caller by attaching to init
as a parent proc. We could have had the taskqueue proc listen for
SIGKILL, but being able to SIGKILL a potentially critical system
process doesn't seem like a good idea.
comment about this flag in rev.1.61. It is not historical like the
comment said; it is the flag that says that most of what is laboriously
put in the bootinfo struct is actually there. Newer kernels were
bootable by even the broken boot2 without losing anything except the
symbol table, but older kernels need at least the memory sizes.
Restoring the "|" with RB_BOOTINFO that was lost in rev.1.43 costs 5
bytes. The fix can be done in only 4 bytes by fixing some code that
was removed in rev.1.61 (put RB_BOOTINFO back in in the initial value
of "opts" and fix RBX_MASK to not clobber it.)
crashes that had sn0 as the irq that's being serviced, when there was
no sn0 in the system. This seems to prevent them. Also, we want to
wait until after we've registered with the network layer before we
turn on the interrupt spigot to avoid races.
- redo updating.
rijndael-api-fst.[ch]:
- switch to use new low level rijndael api.
- stop using u8, u16 and u32.
- space cleanup.
Tested by: gbde(8) and phk's test program
free one sem_undo with un_cnt == 0 instead of all of them. This is a
temporary workaround until the SLIST_FOREACH_PREVPTR loop gets fixed so
that it doesn't cause cycles in semu_list when removing multiple adjacent
items. It might be easier to just use (doubly-linked) LISTs here instead
of complicated SLIST code to achieve O(1) removals.
This bug manifested itself as a complete lockup under heavy semaphore use
by multiple processes with the SEM_UNDO flag set.
PR: 58984
Only update them in the newly created context to reflect the state
after copying the dirty registers onto the user stack. If we were to
update the trapframe, we lose the state at entry into the kernel. We
may need that after we create the context, such as for KSE upcalls.
We have to update the trapframe after writing the dirty registers to
the user stack for signal delivery to work. But this is best done in
sendsig() itself where it applies, not in get_mcontext() where it's
done unconditionally.
with debugger, so testing P_TRACED in SIGPENDING is useless. This test also
is the culprit which causes lots of 'failed to set signal flags properly for
ast()' to be printed on console which is just a false complaint.
must return EINVAL if size is zero. Submitted by: tegge
- In order to avoid a race condition in multithreaded applications, the
check and removal operations by munmap(2) must be in the same critical
section. To accomodate this, vm_map_check_protection() is modified to
require its caller to obtain at least a read lock on the map.
revision 1.142
date: 2003/10/11 03:04:26; author: toshii
Fix a done list handling bug which exhibits under high shared
interrupt rate and bus traffic. As the interrupt register is
read after checking hcca_done_head, there was a small chance
of dropping a done list. Ignore OHCI_WDH interrupt bit if
hcca_done_head is zero so that OHCI_WDH is processed later.
revision 1.141
date: 2003/09/10 20:08:29; author: mycroft;
Update actlen even in the case where a TD returns an error --
this is critical for the umass bulk-only STALL case.
revision 1.176
date: 2003/11/04 19:11:21; author: mycroft;
Ignore a CRCTO error on a SETUP transaction in combination with
STALLED or NAK. This fixes problems with the GL641.
- remove the unnecessary elm arg from SIMPLEQ_REMOVE_HEAD().
this mirrors the functionality of SLIST_REMOVE_HEAD() (the other
singly-linked list type) and FreeBSD's STAILQ_REMOVE_HEAD()
use set_mcontext() to restore the context in sigreturn(). Since we
put the syscall number and the syscall arguments in the trapframe
(we don't save the scratch registers for syscalls, which allows us
to reuse the space to our advantage), create a MD specific flag so
that we save the scratch registers even for syscalls. We would not
be able to restart a syscall otherwise.
The signal trampoline does not need to flush the regiters anymore,
because get_mcontext() already handles that. In fact, if we set up
the context correctly, we do not need to have a trampoline at all.
This change however only minimally changes the trampoline code. In
follow-up commits this can be further optimized.
Note that normally we preserve cfm and iip in the trapframe created
by the EPC syscall path when we restore a context in set_mcontext()
because those fields are not normally set for a synchronuous context.
The kernel puts the return address and frame info of the syscall
stub in there. By preserving these fields we hide this detail from
userland which allows us to use setcontext(2) for user created
contexts. However, sigreturn() is commonly called from the trampoline,
which means that if we preserve cfm and iip in all cases, we would
return to the trampoline after the sigreturn(), which means we hit
the safety net: we call exit(2). So, we do not preserve cfm and iip
when we have a synchronous context that also has scratch registers
(the uncommon context created by sendsig() only), under the assumption
that if such a context is created in userland, something special is
going on and the use of cfm and iip is then just another quirk. All
this is invisible in the common case.
if we drop into the pmap or vnode layers.
- Migrate the handling of zero-length msync(2)s into vm_map_sync() so that
multithread applications can't change the map between implementing the
zero-length hack in msync(2) and reacquiring the map lock in
vm_map_sync().
Reviewed by: tegge
Since all callers either passed 0 or 1 for clear_ret, define bit 0 in
the flags for use as clear_ret. Reserve bits 1, 2 and 3 for use by MI
code for possible (but unlikely) future use. The remaining bits are for
use by MD code.
This change is triggered by a need on ia64 to have another knob for
get_mcontext().
o When we're resetting the board, make sure that we error out the pending
CCBs first. Otherwise the aha_cmd won't accept further commands, such
as those that are used to reset the card (AOP_INITIALIZE_MBOX). This
appears to cause a cascade failure where no more commands are possible
to the card.
o Reduce from 10s down to 1s the amount of time we're willing to tolerate
the card being awol. This helps the above case.
o Add some error checking to two commands issued in the probe.
I have a dim memory of gibbs@ trying to tell me about this problem a
few years ago, so pointy hat to imp@ for sitting on it so long.
in the log message for kern_sched.c 1.83 (which should have been
repo-copied to preserve history for this file), the (4BSD) scheduler
algorithm only works right if stathz is nearly 128 Hz. The old
commit lock said 64 Hz; the scheduler actually wants nearly 16 Hz
but there was a scale factor of 4 to give the requirement of 64 Hz,
and rev.1.83 changed the scale factor so that the requirement became
128 Hz. The change of the scale factor was incomplete in the SMP
case. Then scheduling ticks are provided by smp_ncpu CPUs, and the
scheduler cannot tell the difference between this and 1 CPU providing
scheduling ticks smp_ncpu times faster, so we need another scale
factor of smp_ncp or an algorithm change.
This quick fix uses the scale factor without even trying to optimize
the runtime divisions required for this as is done for the other
scale factor.
The main algorithmic problem is the clamp on the scheduling tick counts.
This was 295; it is now approximately 295 * smp_ncpu. When the limit
is reached, threads get free timeslices and scheduling becomes very
unfair to the threads that don't hit the limit. The limit can be
reached and maintained in the worst case if the load average is larger
than (limit / effective_stathz - 1) / 2 = 0.65 now (was just 0.08 with
2 CPUs before this change), so there are algorithmic problems even for
a load average of 1. Fortunately, the worst case isn't common enough
for the problem to be very noticeable (it is mainly for niced CPU hogs
competing with less nice CPU hogs).
thread being waken up. The thread waken up can run at a priority as
high as after tsleep().
- Replace selwakeup()s with selwakeuppri()s and pass appropriate
priorities.
- Add cv_broadcastpri() which raises the priority of the broadcast
threads. Used by selwakeuppri() if collision occurs.
Not objected in: -arch, -current
that msync(2) is its only caller.
- Migrate the parts of the old vm_map_clean() that examined the internals
of a vm object to a new function vm_object_sync() that is implemented in
vm_object.c. At the same, introduce the necessary vm object locking so
that vm_map_sync() and vm_object_sync() can be called without Giant.
Reviewed by: tegge
are zx1 based machines and they don't particularly like it when we
poke at them with PC legacy code. The atkbd and psm devices were
disabled in the hints file so that one could enable them on machines
that support legacy devices, but that's not really something you can
expect from a first-time installer. This still leaves syscons (sc)
and the vga device, which were enabled by default and wrecking havoc
anyway. We could disable them by default like the atkbd and psm
devices, but there's really no point in pretending we're in a better
shape that way.
o pickup Giant in divert_packet to protect sbappendaddr since it
can be entered through MPSAFE callouts or through ip_input when
mpsafenet is 1
o add missing locking on output
o add locking to abort and shutdown
o add a ctlinput handler to invalidate held routing table references
on an ICMP redirect (may not be needed)
Supported by: FreeBSD Foundation
o add assertions in tcp_respond to validate inpcb locking assumptions
o use local variable instead of chasing pointers in tcp_respond
Supported by: FreeBSD Foundation
whether or not the isr needs to hold Giant when running; Giant-less
operation is also controlled by the setting of debug_mpsafenet
o mark all netisr's except NETISR_IP as needing Giant
o add a GIANT_REQUIRED assertion to the top of netisr's that need Giant
o pickup Giant (when debug_mpsafenet is 1) inside ip_input before
calling up with a packet
o change netisr handling so swi_net runs w/o Giant; instead we grab
Giant before invoking handlers based on whether the handler needs Giant
o change netisr handling so that netisr's that are marked MPSAFE may
have multiple instances active at a time
o add netisr statistics for packets dropped because the isr is inactive
Supported by: FreeBSD Foundation
PALM_4 initialisation hack. I've not confirmed it myself, but
seeing as we already don't use it for the Sony Clie_41, let's drop
it from the Clie_40 also and see what happens.
(Question: What about the Clie_S360 and Clie_NX60 devices? Do we
need to drop Palm4 from those as well? Possibly, but I've not had
any reports about those so I don't know.)
PR: kern/56575
MFC after: 3 days
is highly MD in an emulation environment since it operates on the host
environment. Although the setregs functions are really for exec support
rather than signals, they deal with the same sorts of context and include
files. So I put it there rather than create yet another file.
since there is no direct association between M:N thread and kse,
sometimes, a thread does not have a kse, in that case, return a pctcpu
from its last kse, it is not perfect, but gives a good number to be
displayed.
pmap_pte() and pmap_pte_quick(). The distinction being based upon the
locks that are held by the caller. When the given pmap is not the
current pmap, pmap_pte() should be used when Giant is held and
pmap_pte_quick() should be used when the vm page queues lock is held.
- When assigning to PMAP1 or PMAP2, include PG_A anf PG_M.
- Reenable the inlining of pmap_is_current().
In collaboration with: tegge
has a LOR between IPFW inpcb locks but I'm committing it now as the
lesser of two evils (the other being unlocked use of in_pcblookup).
Supported by: FreeBSD Foundation
successfully initialized in the label as a socket peer label, not a
socket label. For current policy modules, this didn't make a
difference, but if a policy module had label data in the peer label
that was to be GC'd in a different way than the normal socket label,
it might have been a problem.
Obtained from: TrustedBSD Project
Sponsored by: DARPA, Network Associates Laboratories
that it was obsoleted. it is better to fail than just hiding use
of ipsec_gethist() at build.
Sugessted by: "Bjoern A. Zeeb" <bzeeb-lists@lists.zabbadoz.net>
Removed banal comments about ELAN*. Complain about ELAN* being misnamed
instead (so that these options are not obviously related to a CPU and
don't sort with CPU_ELAN).
Complain about CPU_DISABLE_CMPXCHG being in the wrong namespace.
to a routing table entry w/o bumping the reference count or locking
against the entry being free'd. This caused major havoc (for some
reason it appeared most frequently for folks running natd). Fix
is to bump the reference count whenever we copy the route cache
contents into a private copy so the entry cannot be reclaimed out
from under us. This is a short term fix as the forthcoming routing
table changes will eliminate this cache entirely.
Supported by: FreeBSD Foundation
Instead, let the vm objects be lazily instantiated at fault time. This
results in the allocation of fewer vm objects and vm map entries due to
aggregation in the vm system.
sure we handle stacked registers properly by taking into account
that:
1. bspstore points after the frame (due to cover),
2. we need to adjust for intermediate NaT collections.
not in scheduler specific data because eventually it will be required by
all schedulers.
- Implement sched_pin and unpin as an inline for now. If a scheduler needs
to do something more complicated than adjusting the pinned count we can
move this into a function later in an api compatible way.
o move it from subr_bus.c to netisr.c where it more properly belongs
o add NET_PICKUP_GIANT and NET_DROP_GIANT macros that will be used to
grab Giant as needed when MPSAFE operation is enabled
Supported by: FreeBSD Foundation
pin that is used by the default identity mapping if it still maps to the
old vector. The ACPI case might need some tweaking for the SCI interrupt
case since ACPI likes to address the intpin using both the IRQ remapped to
it as well as the previous existing PCI IRQ mapped to it.
Reported by: kan
on ia64. The bug is present in i386 as well but didn't show up due to
more relaxed page protections. This fix has been submitted to the vendor.
Submitted by: marcel
rather than signed. This fixes some cosmetics such as verbose printf's
for IRQs greater than 127.
- The calculation for next_ioapic_base was also adjusted so that it will
only complain once for each hole in the IRQs provided by ACPI for IO
APICs.
Reported by: Michal Mertl <mime@traveller.cz>
Don't put the name of the file in a comment. $FreeBSD$ gives more than
enough about the file's pathname.
Fixed misdescription of the file. It isn't the whole unified Makefile...
Moved the settings of WERROR and of the standard extra CFLAGS
-finline-limit and -fno-strict-aliasing to a less wrong place. They
were in the section for profiling.
of ffs_reload()'s mountp parameter to mp in rev.1.28 of ffs_vnops.c
had not been merged here.
ext2fs_reload() is still missing locking from not merging other changes
to ffs_reload(), but none of these is related to recent locking changes.
but can be enabled by setting hw.atm.hatmN.mpsafe in the kernel
environment to a non-zero value before loading the driver. When
the problems with network MPSAFEty have been sorted out this will
be removed and the driver will default to MPSAFE.
We put them directly onto the free list instead of calling the
external mbuf free routine (that routine would have cleaned the flag).
This fixes a bug which manifests itself in falsely reporting a lot of used
buffers when configuring the interface down.
conservative lock. The problem with the lock-less algorithm is that
it suffers from the ABA problem. Running an application with funnels
a couple of 100kpkts/s through the netgraph system on a dual CPU system
with MPSAFE drivers will panic almost immediatly with the old algorithm.
It may be possible to eliminate the contention between threads that insert
free items into the list and those that get free items by using the
Michael/Scott queue algorithm that has two locks.
Introduce two new macros MNT_ILOCK(mp)/MNT_IUNLOCK(mp) to
operate on this mutex transparently.
Eventually new mutex will be protecting more fields in
struct mount, not only vnode list.
Discussed with: jeff
1.36 +73 -60 src/sys/compat/linux/linux_ipc.c
1.83 +102 -48 src/sys/kern/sysv_shm.c
1.8 +4 -0 src/sys/sys/syscallsubr.h
That change was intended to support vmware3, but
wantrem parameter is useless because vmware3 uses SYSV shared memory
to talk with X server and X server is native application.
The patch worked because check for wantrem was not valid
(wantrem and SHMSEG_REMOVED was never checked for SHMSEG_ALLOCATED segments).
Add kern.ipc.shm_allow_removed (integer, rw) sysctl (default 0) which when set
to 1 allows to return removed segments in
shm_find_segment_by_shmid() and shm_find_segment_by_shmidx().
MFC after: 1 week
the amd64 implementation of the pcpu macros is even more verbose than on
i386 and that causes gcc to way overestimate the complexity of this
2-instruction macro. The other platforms can probably lower their
default values.
from the xe driver. Should probably be removed when current probe/attach
problems with the driver are fixed, but is useful now when requesting
diagnostic information from users.
Reviewed by: imp (mentor)
isa_device pointer as its argument and uses that to call the driver's
interrupt handler passing the unit number as its argument. This should
fix COMPAT_OLDISA devices with a unit number of 0.
Reviewed by: peter
Reported by: bde
- share policy-on-socket for listening socket.
- don't copy policy-on-socket at all. secpolicy no longer contain
spidx, which saves a lot of memory.
- deep-copy pcb policy if it is an ipsec policy. assign ID field to
all SPD entries. make it possible for racoon to grab SPD entry on
pcb.
- fixed the order of searching SA table for packets.
- fixed to get a security association header. a mode is always needed
to compare them.
- fixed that the incorrect time was set to
sadb_comb_{hard|soft}_usetime.
- disallow port spec for tunnel mode policy (as we don't reassemble).
- an user can define a policy-id.
- clear enc/auth key before freeing.
- fixed that the kernel crashed when key_spdacquire() was called
because key_spdacquire() had been implemented imcopletely.
- preparation for 64bit sequence number.
- maintain ordered list of SA, based on SA id.
- cleanup secasvar management; refcnt is key.c responsibility;
alloc/free is keydb.c responsibility.
- cleanup, avoid double-loop.
- use hash for spi-based lookup.
- mark persistent SP "persistent".
XXX in theory refcnt should do the right thing, however, we have
"spdflush" which would touch all SPs. another solution would be to
de-register persistent SPs from sptree.
- u_short -> u_int16_t
- reduce kernel stack usage by auto variable secasindex.
- clarify function name confusion. ipsec_*_policy ->
ipsec_*_pcbpolicy.
- avoid variable name confusion.
(struct inpcbpolicy *)pcb_sp, spp (struct secpolicy **), sp (struct
secpolicy *)
- count number of ipsec encapsulations on ipsec4_output, so that we
can tell ip_output() how to handle the packet further.
- When the value of the ul_proto is ICMP or ICMPV6, the port field in
"src" of the spidx specifies ICMP type, and the port field in "dst"
of the spidx specifies ICMP code.
- avoid from applying IPsec transport mode to the packets when the
kernel forwards the packets.
Tested by: nork
Obtained from: KAME
- Add the following functions to the api: sched_bind(), sched_unbind(),
sched_pin(), and sched_unpin(). Bind/unbind are used for traditional
cpu binding. Pin and unpin are meant to allow the kernel to hold a thread
on a particular cpu so that it may cache per-cpu data without fear of
being migrated.
no matter where in the directory structure it may be. Use this and the "-k"
flag in the generated gdbinit files so that the "getsyms" function in gdb
requires no user intervention to run and will find every module if they're
in the kernel build's module directory. This is still quite useful for
cases where gdb knows that the path for some modules is /boot/kernel and
others are in the object directory for /usr/src/sys/$ARCH/compile/kernel.
Approved by: grog
waitrunningbufspace() calls so that they are always able to
proceed and clean up buffer space.
Submitted by: Brian Fundakowski Feldman <green@freebsd.org>
o Fix minor type problems
o Fix minor problem with a couple debug printfs.
o Default to a sane media type when none is reported.
o Minor style changes
The PR complains this will fix the IBM 300GL cards.
Submitted by: Max Gotlib
PR: 11462
Use zpfind() to see if the process became a zombie if pfind() doesn't find it
and if the caller wants to know about process death, so that the caller knows
the process died even if it happened before the kevent was actually registered.
MFC after: 1 week
This fixes a race condition (specifically with signal events) that could
lead to the kn being re-inserted into the list after it has been destroyed,
which is not something we want to happen.
PR: kern/58258
operating mode to HostAP, the card will lock up indefinitely (but
the wi(4) driver can recover if you eject the card). The problem is
that the card needs to be "reset" in a way before you even change the
media to hostap. In practice this isn't as noticeable because you
probably do some operation beforehand which prevents the lock-up
before you enable hostap mode.
e.g.:
"ifconfig wi0 up media autoselect mediaopt hostap" will lock up
(if you just inserted the card).
"ifconfig wi0 up ssid foo media autoselect mediaopt hostap" won't lock up.
- Compile 'device acpi' into GENERIC by default as well. Note that
the beastie loader menu item to disable ACPI still works if ACPI is
compiled into the kernel.
let the MD code choose whether or not to implement such a policy. The new
i386 interrupt code allows multiple FAST handlers for a given source for
example. However, the code does not allow FAST and non-FAST handlers to be
mixed.
we would manage this better by having the interrupt code add each
interrupt vector to the resource map when each source is registered.
- Use the new interrupt code API for registering and tearing down interrupt
handlers.
- The MP code no longer knows anything specific about an MP Table.
Instead, the local APIC code adds CPUs via the cpu_add() function when
a local APIC is enumerated by an APIC enumerator.
- Don't divide the argument to mp_bootaddress() by 1024 just so that we
can turn around and mulitply it by 1024 again.
- We no longer panic if SMP is enabled but we are booted on a UP machine.
- init_secondary(), the asm code between init_secondary() and ap_init()
in mpboot.s and ap_init() have all been merged together in C into
init_secondary().
- We now use the cpuid feature bits to determine if we should enable
PSE, PGE, or VME on each AP.
- Due to the change in the implementation of critical sections, acquire
the SMP TLB mutex around a slightly larger chunk of code for TLB
shootdowns.
- Remove some of the debug code from the original SMP implementation
that is no longer used or no longer applies to the new APIC code.
- Use a temporary hack to disable the ACPI module until the SMP code has
been further reorganized to allow ACPI to work as a module again.
- Add a DDB command to dump the interesting contents of the IDT.
devices claiming resources that they don't actually use. The PIC drivers
only register valid interrupt sources, so we don't need to rely on these
drivers to claim invalid IRQs to prevent their use by other drivers.
slave pin on the master PIC in the !APIC_IO case. The PIC drivers now
manage these details internally.
- Remove an spl0() that hasn't done anything since SMPng was first
committed.
- Update some comments that have rotted since SMPng.
- Use intr_suspend/resume() callouts to the interrupt code layer which
suspends and resumes all the known interrupt sources instead of calling
icu_reinit() directly.
APIC Descriptor Table to enumerate both I/O APICs and local APICs. ACPI
does not embed PCI interrupt routing information in the MADT like the MP
Table does. Instead, ACPI stores the PCI interrupt routing information
in the _PRT object under each PCI bus device. The MADT table simply
provides hints about which interrupt vectors map to which I/O APICs. Thus
when using ACPI, the existing ACPI PCI bridge drivers are sufficient to
route PCI interrupts.
- The apic interrupt entry points have been rewritten so that each entry
point can serve 32 different vectors. When the entry is executed, it
uses one of the 32-bit ISR registers to determine which vector in its
assigned range was triggered. Thus, the apic code can support 159
different interrupt vectors with only 5 entry points.
- We now always to disable the local APIC to work around an errata in
certain PPros and then re-enable it again if we decide to use the APICs
to route interrupts.
- We no longer map IO APICs or local APICs using special page table
entries. Instead, we just use pmap_mapdev(). We also no longer
export the virtual address of the local APIC as a global symbol to
the rest of the system, but only in local_apic.c. To aid this, the
APIC ID of each CPU is exported as a per-CPU variable.
- Interrupt sources are provided for each intpin on each IO APIC.
Currently, each source is given a unique interrupt vector meaning that
PCI interrupts are not shared on most machines with an I/O APIC.
That mapping for interrupt sources to interrupt vectors is up to the
APIC enumerator driver however.
- We no longer probe to see if we need to use mixed mode to route IRQ 0,
instead we always use mixed mode to route IRQ 0 for now. This can be
disabled via the 'NO_MIXED_MODE' kernel option.
- The npx(4) driver now always probes to see if a built-in FPU is present
since this test can now be performed with the new APIC code. However,
an SMP kernel will panic if there is more than one CPU and a built-in
FPU is not found.
- PCI interrupts are now properly routed when using APICs to route
interrupts, so remove the hack to psuedo-route interrupts when the
intpin register was read.
- The apic.h header was moved to apicreg.h and a new apicvar.h header
that declares the APIs used by the new APIC code was added.
default we provide 16 interrupt sources for IRQs 0 through 15. However,
if the I/O APIC driver has already registered sources for any of those IRQs
then we will silently fail to register our own source for that IRQ.
Note that i386/isa/icu.h is now specific to the 8259A and no longer
contains any info relevant to APICs. Also note that fast interrupts no
longer use a separate entry point. Instead, both fast and threaded
interrupts share the same entry point which merely looks up the appropriate
source and passes control to intr_execute_handlers().
that provides methods via a PIC driver to do things like mask a source,
unmask a source, enable it when the first interrupt handler is added, etc.
The interrupt code provides a table of interrupt sources indexed by IRQ
numbers, or vectors. These vectors are what new-bus uses for its IRQ
resources and for bus_setup_intr()/bus_teardown_intr(). The interrupt
code then maps that vector a given interrupt source object. When an
interrupt comes in, the low-level interrupt code looks up the interrupt
source for the source that triggered the interrupt and hands it off to
this code to execute the appropriate handlers.
By having an interrupt source abstraction, this allows us to have different
types of interrupt source providers within the shared IRQ address space.
For example, IRQ 0 may map to pin 0 of the master 8259A PIC, IRQs 1
through 60 may map to pins on various I/O APICs, and IRQs 120 through
128 may map to MSI interrupts for various PCI devices.
the root path. This is reported to make non-PXE netbooting, such as
is used on sparc64 systems, work correctly when the TFTP server is
not the same as the root server.
PR: kern/57328
Submitted by: Per Kristian Hove <Per.Hove@math.ntnu.no>
header copy made on input path: this is now handled differently.
Obtained from: TrustedBSD Project
Sponsored by: DARPA, Network Associates Laboratories
is really EtherExpress or EEPro or what, but it does appear in a
couple of ethernet cards that have appeared recently on ebay. Silicom
appears to make these cards, and they have the 82595TX chipset in
them, and sometimes uarts. The ex driver needs some work to support
these cards, but I thought I'd get the device into pccarddevs.
The hardware driver decides the name under /dev/led and provides
the function to turn the lamp on/off.
All leds are serviced by a single timeout which runs at a basic rate
of hz/10.
The LED is controlled by ascii strings as follows.
0 Turn off.
1 Turn on.
f Flash: _-
f2 Flash: __--
f3 Flash: ___---
f4...f9 etc.
d%d Digits. "d12": -__________-_-______________________________
s%s String, roll your own:
'a-j' gives on for (1...10)/10 sec.
'A-J' gives on for (1...10)/10 sec.
'sAaAbBa': _-_--__-
m%s Morse
'.' dot
'-' dash
' ' letter space
'\n' word space
My mdoc skills do not reach to express that.
Add a sysctl declaration for hw.ata.atapi_dma, which had gone MIA (though
setting it in loader.conf still worked, it was not visible at runtime)
Approved by: sos
to the pci attachment. Cardbus is a derived class of pci so all pci
drivers are automatically available for matching against cardbus devices.
Reviewed by: imp
message encoding and decoding stuff into the base module. All of this
is accessed by several of the NgATM modules and putting this into
atmbase reduceds the memory footprint.
cr.isr sanity check. We actually encounter insanities, which very
likely means that the insanity check itself is insane. Remove an empty
comment while I'm at it.
directly on the radix tree and does not hold any routing table refernces.
This fixes the reference counting problems that manifested itself as a
panic during unmount of filesystems that were mounted by NFS over an
interface that had been removed.
Supported by: FreeBSD Foundation
idle. They figure out that we're idle fast enough that the cache pollution
introduces by scanning their run queue is more expensive than waiting
a little longer.
- Add kseq_setidle() to mark us as being idle. Use this in place of
kseq_find().
- Remove kseq_load_highest(), kseq_find() was the only consumer of this
interface. kseq_balance() has it's own customized version that finds the
lowest and highest loads simultaneously.
Continuously told that this would be faster by: terry
GEOM was not designed to handle media that does not have
a size. Blank CD's are of that type, so cheat and set the
media size to -1. This allows burning to work, but makes
GEOM issue outofrange reads that makes the ATAPI subsystem
spew out a few warnings. GEOM should be tought about this.
GEOM was not designed to handle changing the sectorsize
between opens. Writing multitack CD's with both audio and
data tracks needs to change sector size on the fly. We
cheat here and stuff the current sectorsize into GEOM
private internals. GEOM should grow some clean way for this.
o Fix MFC cards. We were bogusly setting CCR_IOBASE[01] and CCR_IOLIMIT.
now when we activate the resource, we adjust these for MFC cards, per the
spec.
o Change type of pf_mfc_* to be bus_addr_t, which is more correct than
long.
This makes my 3C362D/3C363D and 3CXEM556 cards work! Woo Hoo!
o Remove redundant $FreeBSD$
o Better comments about ep_get_macaddr.
o remove one tab in a switch statement (style only)
o Recognize ID 0x0035 as the device ID for the 3CXEM556 that I have. This
makes the 3CXEM556 work for me. Not 100% sure this is the assigned ID,
as I don't have the datasheets for this part, but it does work and get
the correct ethrnet address.
o Comment about the whole fake IRQ 3 thing. some need it, some don't, all
work with it.
the total load, the timeshare load, and the number of threads that can
be migrated to another cpu. Account for these seperately.
- Introduce a KSE_CAN_MIGRATE() macro which determines whether or not a KSE
can be migrated to another CPU. Currently, this only checks to see if
we're an interrupt handler. Eventually this will also be used to support
CPU binding.
An example of useless is bios.h. An example of wrong is msdos.h (due
to the use of long for 32-bit fields).
display.h cannot be removed because it's used by syscons. That header
however has no platform dependency and shouldn't really be here.
Removal if these headers may cause build failures in the ports tree.
It's the ports that need fixing in that case.
Tested with: buildworld, LINT
previously only considered the send sequence space. Unfortunately,
some OSes (windows) still use a random positive increments scheme for
their syn-ack ISNs, so I must consider receive sequence space as well.
The value of 250000 bytes / second for Microsoft's ISN rate of increase
was determined by testing with an XP machine.
wasn't curthread, i.e. when we receive a thread pointer to use
as a function argument. Use VOP_UNLOCK/vrele in these cases.
The only case there td != curthread known at the moment is
boot() calling sync with thread0 pointer.
This fixes the panic on shutdown people have reported.
slice assignment. Add a comment describing what it does.
- Remove a stale XXX comment, the nice should not impact the interactivity,
nice adjustments only effect non-interactive tasks in ULE.
- Don't allow nice -20 tasks to totally starve nice 0 tasks. Give them at
least SCHED_SLICE_MIN ticks. We still allow nice 0 tasks to starve nice
+20 tasks as intended.
- SCHED_PRI_NRESV does not have the off by one error in PRIO_TOTAL so we
do not have to account for it in the few places that we use it.
Requested by: bde
0 and SCHED_SLP_RUN_MAX * 2. This allows us to simplify the algorithm
quite a bit. Before, it dealt with arbitrary values which required us
to do nasty integer division tricks that didn't quite work out correctly.
- Chnage sched_wakeup() to detect conditions where the slp+runtime could
exceed SCHED_SLP_RUN_MAX * 2. This can happen if we go to sleep for
longer than 6 seconds. In this case, we'll just clear the runtime and
set the sleep time to the max.
- Define a new function, sched_interact_fork() which updates the slp+runtime
of a newly forked thread. We want to limit the amount of history retained
from the parent so that we learn the child's behavior quickly. We don't,
however want to decay it to nothing. Previously, we would simply divide
each parameter by 100 whenever we forked. After a few forks the values
would reach 0 and tasks would not be considered interactive.
- Add another KTR entry, cleanup some existing entries.
- Remove a useless sched_interact_update() from sched_priority(). This is
already done by the callers that require it.
destination objects are locked on entry and exit. Add comments to
the callers noting that the locks can be released by swap_pager_copy().
- Remove several instances of GIANT_REQUIRED.
we will generate for a given ip/port tuple has advanced far enough
for the time_wait socket in question to be safely recycled.
- Have in_pcblookup_local use tcp_twrecycleable to determine if
time_Wait sockets which are hogging local ports can be safely
freed.
This change preserves proper TIME_WAIT behavior under normal
circumstances while allowing for safe and fast recycling whenever
ephemeral port space is scarce.
o change os glue API to be compatible with Linux so hal.o's can
be used on any system
o add ABI version to catch driver-HAL mismatches
o move hal version information from ah_osdep.c to binary component
o remove ath_hal_wait os glue component
o assign constant values to all enums to avoid potential compiler
incompatibilities
o add support for 3Com badged cards (PCI vendor ID)
o add support for IBM mini-pci cards (PCI device ID)
o expose MAC, PHY, and radio hardware revisions
o support for big-endian platforms
o new method to set slot time in us
o bug fix for 5211: beacon timers not setup correctly
o bug fix for 5212: don't crash when handed a 5112 radio
Requested by: jhb
Initialize the real mode stack. This is needed at least for the return
address from the lcall.
Requested by: takawata
Fix style bugs in acpi_wakecode.S
Requested by: bde
Remove the kernel option now that we have the tunable.
to use the direct mapped KVA at KERNBASE to service the request. This also
allows pmap_mapdev() to be used for such addresses very early during the
boot process and might provide some small savings on KVA.
Reviewed by: peter
if_xname, if_dname, and if_dunit. if_xname is the name of the interface
and if_dname/unit are the driver name and instance.
This change paves the way for interface renaming and enhanced pseudo
device creation and configuration symantics.
Approved By: re (in principle)
Reviewed By: njl, imp
Tested On: i386, amd64, sparc64
Obtained From: NetBSD (if_xname)
mbuf cluster, copy the data to a separate mbuf that do not use a
cluster. this change will reduce the possiblity of packet loss
in the socket layer.
Obtained from: KAME
256 raw receive buffers (96 byte each) fit into one page. This breaks the
limit imposed by the usage of an uint8_t for the buffer number. Restrict
the allocation size for buffers to a maximum of 8192.
- Add an IPI based mechanism for migrating kses. This mechanism is
broken down into several components. This is intended to reduce cache
thrashing by eliminating most cases where one cpu touches another's
run queues.
- kseq_notify() appends a kse to a lockless singly linked list and
conditionally sends an IPI to the target processor. Right now this is
protected by sched_lock but at some point I'd like to get rid of the
global lock. This is why I used something more complicated than a
standard queue.
- kseq_assign() processes our list of kses that have been assigned to us
by other processors. This simply calls sched_add() for each item on the
list after clearing the new KEF_ASSIGNED flag. This flag is used to
indicate that we have been appeneded to the assigned queue but not
added to the run queue yet.
- In sched_add(), instead of adding a KSE to another processor's queue we
use kse_notify() so that we don't touch their queue. Also in sched_add(),
if KEF_ASSIGNED is already set return immediately. This can happen if
a thread is removed and readded so that the priority is recorded properly.
- In sched_rem() return immediately if KEF_ASSIGNED is set. All callers
immediately readd simply to adjust priorites etc.
- In sched_choose(), if we're running an IDLE task or the per cpu idle thread
set our cpumask bit in 'kseq_idle' so that other processors may know that
we are idle. Before this, make a single pass through the run queues of
other processors so that we may find work more immediately if it is
available.
- In sched_runnable(), don't scan each processor's run queue, they will IPI
us if they have work for us to do.
- In sched_add(), if we're adding a thread that can be migrated and we have
plenty of work to do, try to migrate the thread to an idle kseq.
- Simplify the logic in sched_prio() and take the KEF_ASSIGNED flag into
consideration.
- No longer use kseq_choose() to steal threads, it can lose it's last
argument.
- Create a new function runq_steal() which operates like runq_choose() but
skips threads based on some criteria. Currently it will not steal
PRI_ITHD threads. In the future this will be used for CPU binding.
- Create a kseq_steal() that checks each run queue with runq_steal(), use
kseq_steal() in the places where we used kseq_choose() to steal with
before.
the rstack functionality:
1. Fix a KASSERT that tests for the address to be above the upward
growable stack. Typically for rstack, the faulting address can be
identical to the record end of the upward growable entry, and
very likely is on ia64. The KASSERT tested for greater than, not
greater equal, so whenever the register stack had to be grown
the assertion fired.
2. When we grow the upward growable stack entry and adjust the
unlying object, don't forget to adjust the size of the VM map.
Not doing so would trigger an assert in vm_mapzdtor().
Pointy hat: marcel (for not testing with INVARIANTS).
those cylinder groups that have at least 75% of the average free
space per cylinder group for that file system are considered as
candidates for the creation of a new directory. The previous formula
for minbfree would set it to zero if the file system was more than
75% full, which allowed cylinder groups with no free space at all
to be chosen as candidates for directory creation, which resulted
in an expensive search for free blocks for each file that was
subsequently created in that directory.
Modify the calculation of minifree in the same way.
Decrease maxcontigdirs as the file system fills to decrease the
likelyhood that a cluster of directories will overflow the available
space in a cylinder group.
Reviewed by: mckusick
Tested by: kmarx@vicor.com
MFC after: 2 weeks
that libtool-using packages seem to love using this flag.
/usr/include/sys/cdefs.h:184:5: warning: "__STDC_VERSION__" is not defined
/usr/include/sys/cdefs.h:372:5: warning: "_POSIX_C_SOURCE" is not defined
/usr/include/sys/cdefs.h:378:5: warning: "_POSIX_C_SOURCE" is not defined
routine that takes a locked routing table reference and removes all
references to the entry in the various data structures. This
eliminates instances of recursive locking and also closes races
where the lock on the entry had to be dropped prior to calling
rtrequest(RTM_DELETE). This also cleans up confusion where the
caller held a reference to an entry that might have been reclaimed
(and in some cases used that reference).
Supported by: FreeBSD Foundation
are now in the header of the external buffer itself which allows us
to manipulate them in the free routine without having to lock the softc
structure or the free list. To get space for these flags the chunk number
is reduced to 8 bit which amounts to a maximum of 256 chunks per allocated
page. This restriction is now enforced by a CTASSERT.
structures come out the right size.
Fix the ones that broke. stat32 had some missing fields from the end
and statfs32 was broken due to the strange definition of MNAMELEN
(which is dependent on sizeof(long))
I'm not sure if this fixes any actual problems or not.
the module. Previously we grabbed the mutex used by the callouts,
then stopped the callout with callout_stop, but if the callout
was already active and blocked by the mutex then it would continue
later and reference the mutex after it was destroyed. Instead
stop the callout first then lock.
Supported by: FreeBSD Foundation
o add some more debugging help for figuring out why folks are
getting complaints about releasing routing table entries with
a zero refcnt
o fix comment that talked about spl's
o remove duplicate define of DUMMYNET_DEBUG
Supported by: FreeBSD Foundation
direct dispatch) to avoid extensive kernel stack usage and to
avoid directly re-entering the network stack. The latter causes
locking problems when, for example, a complete TCP handshake`
happens w/o a context switch.
clobbers this variable. Long ago, when the idle loop wasn't in a
process, it set switchtime.tv_sec to zero to indicate that the time
needs to be read after the idle loop finishes. The special case for
this isn't needed now that there is an idle process (for each CPU).
The time is read in the normal way when the idle process is switched
away from. The seconds component of the time is only zero for the
first second after the uptime is set, and the mostly-dead code was only
executed during this time. (This was slightly broken by using uptimes
instead of times relative to the Epoch -- in the original version the
seconds component of the time was only 0 for the first second after
the Epoch.)
In mi_switch(), moved the setting of switchticks to just after the
first (and now only) setting of switchtime. This setting used to be
delayed since a late setting was needed for the idle case and an early
setting was not needed. Now the early setting is needed so that
fork_exit() doesn't need to set either switchtime or switchticks.
Removed now-completely-rotted comment attached to this. Most of the
code described by the comment had already moved to sched_switch().
very first cell in the mbuf should have a cell header word (of which
everything except the payload type and the CLP bit is ignored). All
other cells should be 48 byte and get the same header as the first cell.
This fixes a problem with sending more than 120000 raw cells/sec through
an HE155. The card seems to need 2 cell times to DMA the transmit buffer
ready queue entry and the transmit buffer descriptor so at 1/3 the
link rate the transmit buffer ready queue starts to fill up. Even with this
patch it's obviously impossible to send raw cells at link rate.
- implement the tunnel egress rule in ip_ecn_egress() in ip_ecn.c.
make ip{,6}_ecn_egress() return integer to tell the caller that
this packet should be dropped.
- handle ECN at fragment reassembly in ip_input.c and frag6.c.
Obtained from: KAME
begin with sched_lock held but not recursed, so this variable was
always 0.
Removed fixup of sched_lock.mtx_recurse after context switches in
sched_switch(). Context switches always end with this variable in the
same state that it began in, so there is no need to fix it up. Only
sched_lock.mtx_lock really needs a fixup.
Replaced fixup of sched_lock.mtx_recurse in fork_exit() by an assertion
that sched_lock is owned and not recursed after it is fixed up. This
assertion much match the one in mi_switch(), and if sched_lock were
recursed then a non-null fixup of sched_lock.mtx_recurse would probably
be needed again, unlike in sched_switch(), since fork_exit() doesn't
return to its caller in the normal way.
Correct a bug when the number of pages for external mbufs was
very large. In this case the page number could overflow into the large
buffer flag. Make this more unlikley by move that flag further away.
is returned from the card to the driver. Add a counter that shows
how many times this allocation has failed. Note, that we could even
further delay the allocation of the mbuf until we know, that we need it
(there are no receive errors and the connection is open). This will be done
in a later commit.
Print the new statistics field in atmconfig.
atomic instructions instead. Remove the stuff used to track
whether an external mbuf travels through the system. This is
temporary only and will come back soon.
code has the typical branch prediction detour, which creates cross-
section branches. A LINT kernel is apparently large enough nowadays
that the .text and .text2 sections cannot always be layed-out so that
branches between them reach.
The fix is to stop using the alpha-specific bitops and instead use
the portable implementation used by all platforms other than alpha
and i386.
with an mbuf until it is reclaimed. This is in contrast to tags that
vanish when an mbuf chain passes through an interface. Persistent tags
are used, for example, by MAC labels.
Add an m_tag_delete_nonpersistent function to strip non-persistent tags
from mbufs and use it to strip such tags from packets as they pass through
the loopback interface and when turned around by icmp. This fixes problems
with "tag leakage".
Pointed out by: Jonathan Stone
Reviewed by: Robert Watson
reset in a child process after a fork(). Currently, however, only the
real timer is cleared while the virtual and profiling timers are inherited.
The realtimer is cleared because it lives directly in struct proc in
p_realtimer. It is in the zero'd section of struct proc. The other timers
live in the p_timer[] array in struct pstats. These timers are copied on
fork() rather than zero'd. The fix is to move p_timer[] to the zero'd
part of struct pstats so that they are zero'd instead of copied on fork().
Note: Since at least FreeBSD 2.0 (and possibly earlier) we've had storage
for two real interval timers. Now that the uarea is less important,
perhaps we could move all of p_timer[] over to struct proc and drop the
p_realtimer special case to fix that.
PR: kern/58647
Reported by: Dan Nelson <dnelson@allantgroup.com>
MFC after: 1 week
the RNAT bit index constant. The net effect of this is that there's
no discontinuity WRT NaT collections which greatly simplifies certain
operations. The cost of this is that there can be up to 504 bytes of
unused stack between the true base of the kernel stack and the start
of the RSE backing store. The cost of adjusting the backing store
pointer to keep the RNAT bit index constant, for each kernel entry,
is negligible.
The primary reasons for this change are:
1. Asynchronuous contexts in KSE processes have the disadvantage of
having to copy the dirty registers from the kernel stack onto the
user stack. The implementation we had so far copied the registers
one at a time without calculating NaT collection values. A process
that used speculation would not work. Now that the RNAT bit index
is constant, we can block-copy the registers from the kernel stack
to the user stack without having to worry about NaT collections.
They will be in the right place on the user stack.
2. The ndirty field in the trapframe is now also usable in userland.
This was previously not the case because ndirty also includes the
space occupied by NaT collections. The value could be off by 8,
depending on the discontinuity. Now that the RNAT bit index is
contants, we have exactly the same number of NaT collection points
on the kernel stack as we would have had on the user stack if we
didn't switch backing stores.
3. Debuggers and other applications that use ptrace(2) can now copy
the dirty registers from the kernel stack (using ptrace(2)) and
copy them whereever they want them (onto the user stack of the
inferior as might be the case for gdb) without having to worry
about NaT collections in the same way the kernel doesn't have to
worry about them.
There's a second order effect caused by the randomization of the
base of the backing store, for it depends on the number of dirty
registers the processor happened to have at the time of entry into
the kernel. The second order effect is that the RSE will have a
better cache utilization as compared to having the backing store
always aligned at page boundaries. This has not been measured and
may be in practice only minimally beneficial, if at all measurable.
license. Only clause 3 has been revoked. Restore the fourth clause
as clause 3.
Pointed out by: das@
Remove my name as a copyright holder since I don't use a BSD license
compatible or comparable to the UCB license. I choose not to add a
complete second license for my work for aesthetic reasons, nor to
replace the UCB license on grounds of rewriting more than 90% of the
source files. The rewrite can also be seen as an enhancement and since
the files were practically empty, it's rather trivial to have changed
90% of the files.
the maximum number of pages for buffers) return -1 instead of 0.
This fixes a panic under conditions when many mbufs are needed.
Update the head pointer of the receive buffer pool queue even when
we could not supply a buffer to the chip. Otherwise the chip will
not re-interrupt us for another try. A better strategy would probably
be to remember this condition and to supply buffers without an interrupt
as soon as buffers get available.
Contributed by: Thomaswuerfl@gmx.de
- In sched_prio(), adjust the run queue for threads which may need to move
to the current queue due to priority propagation .
- In sched_switch(), fix style bug introduced when the KSE support went in.
Columns are 80 chars wide, not 90.
- In sched_switch(), Fix the comparison in the idle case and explicitly
re-initialize the runq in the not propagated case.
- Remove dead code in sched_clock().
- In sched_clock(), If we're an IDLE class td set NEEDRESCHED so that threads
that have become runnable will get a chance to.
- In sched_runnable(), if we're not the IDLETD, we should not consider
curthread when examining the load. This mimics the 4BSD behavior of
returning 0 when the only runnable thread is running.
- In sched_userret(), remove the code for setting NEEDRESCHED entirely.
This is not necessary and is not implemented in 4BSD.
- Use the correct comparison in sched_add() when checking to see if an idle
prio task has had it's priority temporarily elevated.
instead of retrying them blindly.
This should fix some of the problems people have been having with cdrom
drives taking a long time to probe. This should also eliminate the need
for the initial TUR in cdsize().
cam_periph.c: Don't keep retrying if the error we get back is a fatal
error. This should help us detect the transition from
"Logical unit not ready, cause not reportable" to "Medium
not present" in the "TUR many" handler. (The TUR many
handler gets triggered for Logical unit not ready, cause
not reportable errors.)
scsi_cd.c: Remove the initial test unit ready in cdsize(). Hopefully
it isn't necessary after the above change.
Submitted by: gibbs (mostly)
Tested by: peter
MFC After: 2 weeks
added for XFree86. There are 2 reasons for doing this with sysarch():
1. The memory mapped I/O space is not at a fixed physical address. An
application has to use some interface to get the base address. It
gets worse if the machine has multiple memory mapped I/O spaces.
2. Access to the memory mapped I/O space needs to happen through a
translation that is flagged as uncachable. There's no interface
that allows a process to do uncached memory I/O, other than though
/dev/mem (possibly).
So, until we either disallow direct access to I/O or bus space from
userland or have a better way of doing this, sysarch() has the least
negative impact on existing interfaces.
of lock acquires and releases performed.
- Move an assertion from vm_object_collapse() to vm_object_zdtor()
because it applies to all cases of object destruction.
Make the pccard attachment work with NEWCARD
Start locking of the driver, but only the macros are defined right now
Tested on: Megahertz CC10BT/2
# (These cards are very popular on ebay these days, and run < $10 including
# shipping from some sellers).
routines. Otherwise we run into trouble with speculative tlb preloads
on SMP systems. This effectively defeats Jeff's revision 1.438
optimization (for his pentium4-M laptop) in the SMP case. It breaks
other systems, particularly athlon-MP's.
if we do acquire an advisory lock, great! We'll release it later.
However, if we fail to acquire a lock, we perform the coredump
anyway. This problem became particularly visible with NFS after
the introduction of rpc.lockd: if the lock manager isn't running,
then locking calls will fail, aborting the core dump (resulting in
a zero-byte dump file).
Reported by: Yogeshwar Shenoy <ynshenoy@alumni.cs.ucsb.edu>
type, rather than "object_label" as the first argument. This reduces
complexity a little for the consumer, and also makes it easier for
use to rename the underlying entry points in struct mac_policy_obj.
Obtained from: TrustedBSD Project
Sponsored by: DARPA, Network Associates Laboratories
in bufs_info sysctl handler. dev->dma and dev->dma_lock existence are
protected by DRM_LOCK(). Fixes panic on sysctl hw.dri when the device is
uninitialied (when you aren't in X).
- Add a DDB function to dump the contents of an ithread and optionally
details about each handler in that ithread. This function can be used
by MD code to implement DDB commands that display information about
interrupt sources and their registered handlers.
the ACPI timer and we shouldn't do that if ACPI is already around to do
that for us.
- Set a description and tweak the order of checks in the probe function
to more closely match other PCI drivers.
This should probably be moved to sys/dev/piix/piix.c at some point and
turned on for all i386 kernels rather than just SMP ones.
(aka RFC2292bis). Though I believe this commit doesn't break
backward compatibility againt existing binaries, it breaks
backward compatibility of API.
Now, the applications which use Advanced Sockets API such as
telnet, ping6, mld6query and traceroute6 use RFC3542 API.
Obtained from: KAME
dcons(4): very simple console and gdb port driver
dcons_crom(4): FireWire attachment
dconschat(8): User interface to dcons
Tested with: i386, i386-PAE, and sparc64.
giant, which implies that we need to take out giant it we're NOT
MPSAFE.
# I can't believe the number of people that looked at this failed to
# detect this.
vm_pageout_page_stats() from Giant.
- Modify vm_pager_put_pages() and vm_pager_page_unswapped() to expect the
vm object to be locked on entry. (All of the pager routines now expect
this.)
that at most 20% of sockets can be in time_wait at one time, ensuring
that time_wait sockets do not starve real connections from inpcb
structures.
No implementation change is needed, jlemon already implemented a nice
LRU-ish algorithm for tcp_tw structure recycling.
This should reduce the need for sysadmins to lower the default msl on
busy servers.
overlapping TR/TC entries (which results in a machine check). Note
that we don't look at the size of the memory descriptor, because
it doesn't guarantee non-overlap.
With this change, a UP kernel could boot on a Intel Tiger4 machine
with the following options:
options LOG2_ID_PAGE_SIZE=26 # 64M
options LOG2_PAGE_SIZE=14 # 16K
Approved by: marcel
setting. Make va_copy be an alias if __ISO_C_VISIBLE >= 1999.
Why? more than a few ports have an autoconf that looks for __va_copy
because it is available on glibc. It is critical that we use it if
at all possible on amd64. It generally isn't a problem for i386 and its
ilk because autoconf driven code tends to fall back to an assignment.
locking, and the apparently unnecessary locking for -stable has been removed.
This may fix issues with missed interrupts since April, which manifested
themselves as slowdowns or hangs in radeon, in particular. Many cleanups also
took place. In the shared code, there are improvements to r128 driver
stability.
when loaded as a module
o cleanup data structures on module unload when no application has
been started (i.e. kldload, kldunload w/o mrtd)
o remove extraneous unlocks immediately prior to destroying them
Supported by: FreeBSD Foundation
part of the support is that it still assumes one master and one target
where as AGP 3.0 actually supports multiple devices on the bus.
Submitted by: Keith Whitwell <keith@tungstengraphics.com>
Sponsored by: The Weather Channel
a minor conflict):
o Use ETHER_ADDR_LEN in preference to '6'.
o Remove two unnecessary (caddr_t) casts. One of them causes problems in
my tree where etherbroadcastaddr is const, and (caddr_t) casts the const
away.
we had were bogus.
While here, reassign the copyright to the Project. There's nothing
in this files that originates from NetBSD, especially now that the
FreeBSD/alpha bits have been removed, but even then the amount of
inherited code that we actually used was nil.
mcontext_t for the register values. Currently only ld8 and ldfd
instructions are handled as those are the ones we need now (a
misaligned ld8 occurs 4 times in ntpd(8) and a misaligned ldfd
occurs once in mozilla 1.4 and 1.5). Other instructions are added
when needed.
at the first address and spills it to the second address. This
allows unaligned_fixup() to update the context of the process in
a way that assures proper rounding.
Similar functions for single-and extended-precision are added when
needed.
in that it provides an abstract (intermediate) representation for
instructions. This significantly improves working with instructions
such as emulation of instructions that are not implemented by the
hardware (e.g. long branch) or enhancing implemented instructions
(e.g. handling of misaligned memory accesses). Not to mention that
it's much easier to print instructions.
Functions are included that provide a textual representation for
opcodes, completers and operands.
The disassembler supports all ia64 instructions defined by revision
2.1 of the SDM (Oct 2002).
implement i386 compat numbers where it makes sense. This would save a
syscall translation layer. Yes, this breaks the abi slightly again, but
fortunately its just a recompile rather than tweaking the source. I will
be fixing the libc stubs while I'm here.
to a multiple of the access byte width. This overcomes errors in the
AML often found in Toshiba laptops. These errors were allowed by
the Microsoft ASL compiler and interpreter. This will NOT be imported
by ACPI-CA so make the change on our local branch. File was already off
the vendor branch.
Submitted by: blaz
Original idea: Rick Richardson for Linux
enable strict checks of the AML. Our default behavior will be to relax
checks to work on as many platforms as possible. Also clean up and document
other ACPI options while I'm here.
Include src/sys/security/mac/mac_internal.h in kern_mac.c.
Remove redundant defines from the include: SYSCTL_DECL(), debug macros,
composition macros.
Unstaticize various bits now exposed to the remainder of the kernel:
mac_init_label(), mac_destroy_label().
Remove all the functions now implemented in mac_process/mac_vfs/mac_net/
mac_pipe. Also remove debug counters, sysctls exporting debug
counters, enforcement flags, sysctls exporting enforcement flags.
Leave module declaration, sysctl nodes, mactemp malloc type, system
calls.
This should conclude MAC/LINT/NOTES breakage from the break-out process,
but I'm running builds now to make sure I caught everything.
Obtained from: TrustedBSD Project
Sponsored by: DARPA, Network Associates Laboratories
Unstaticize mac_late.
Remove ea_warn_once, now in mac_vfs.c.
Unstaticisize mac_policy_list, mac_static_policy_list, use
struct mac_policy_list_head instead of LIST_HEAD() directly.
Unstaticize and un-inline MAC policy locking functions so they can
be referenced from mac_*.c.
Obtained from: TrustedBSD Project
Sponsored by: DARPA, Network Associates Laboratories
security/mac/mac_net.c
security/mac/mac_pipe.c
security/mac/mac_process.c
security/mac/mac_system.c
security/mac/mac_vfs.c
Note: Here begins a period of NOTES/LINT build breakage due to duplicate
symbols that will shortly be removed from kern_mac.c.
Obtained from: TrustedBSD Project
Sponsored by: DARPA, Network Associates Laboratories
Extended attribute transaction warning flag if transactions aren't
supported on the EA implementation being used.
Debug fallback flag to permit a less conservative fallback if reading
an on-disk label fails.
Enforce_fs toggle to enforce file systme access control.
Debugging counters for file system objects: mounts, vnodes, devfs_dirents.
Object initialization, destruction, copying, internalization,
externalization, relabeling for file system objects.
Life cycle operations for devfs entries.
Generic extended attribute label implementation for use by UFS, UFS2 in
multilabel mode.
Generic single-level label implementation for use by all file systems
when in singlelabel mode.
Exec-time transition based on file label entry points.
Vnode operation access control checks (many).
Mount operation access control checks (few).
Obtained from: TrustedBSD Project
Sponsored by: DARPA, Network Associates Laboratories
Pipe enforcement flag.
Pipe object debugging counters.
MALLOC type for MAC label storage.
Pipe MAC label management routines, externalize/internalization/change
routines.
Pipe MAC access control checks.
Un-staticize functions called from mac_set_fd() when operating on a
pipe. Abstraction improvements in this space seem likely.
Obtained from: TrustedBSD Project
Sponsored by: DARPA, Network Associates Laboratories
Network and socket enforcement toggles.
Counters for network objects (mbufs, ifnets, bpfdecs, sockets, and ipqs).
Label management routines for network objects.
Life cycle events for network objects.
Label internalization/externalization/relabel for ifnets, sockets,
including ioctl implementations for sockets, ifnets.
Access control checks relating to network obejcts.
Obtained from: TrustedBSD Project
Sponsored by: DARPA, Network Associates Laboratories
in mac_internal.h:
Sysctl tree declarations.
Policy list structure definition.
Policy list variables (static, dynamic).
mac_late flag.
Enforcement flags for process, vm, which have checks in multiple files.
mac_labelmbufs variable to drive conditional mbuf labeling.
M_MACTEMP malloc type.
Debugging counter macros.
MAC Framework infrastructure primitives, including policy locking
primitives, kernel label initialization/destruction, userland
label consistency checks, policy slot allocation.
Per-object interfaces for objects that are internalized and externalized
using system calls that will remain centrally defined: credentials,
pipes, vnodes.
MAC policy composition macros: MAC_CHECK, MAC_BOOLEAN, MAC_EXTERNALIZE,
MAC_INTERNALIZE, MAC_PERFORM.
Obtained from: TrustedBSD Project
Sponsored by: DARPA, Network Associates Laboratories
vm_pageout_scan(). Rationale: I don't like leaving a busy page in the
cache queue with neither the vm object nor the vm page queues lock held.
- Assert that the page is active in vm_pageout_page_stats().
Until we can have perfect knowledge that all callers above us think it's okay
for us to sleep, releasing *our* locks of course, we don't dare try and sleep.
in connection with Marvell based SATA->PATA dongles.
The problem was caused by a combination of things working
together to make it hard to spot...
The ATA driver has always started the ATA command, then build
the SG list for DMA and then finally started the DMA engine.
While this is according to specs, it poses a potential
problem as some controllers apparently do not allow for unlimitted
time between starting the ATA command and starting the DMA engine.
At about the same time as ATAng was committed there were lots
of other changes applied, some of which was locking in parts
that causes the busdma load functions to take significantly
longer to load the SG list.
This pushed the time spent between starting the ATA command and
starting the DMA engine over the hill for some controllers
(especially the Silicon Image DS3112a) and caused what looked
like lost interrupts.
The solution is to get all the SG list work or rather all
busdma related stuff done before we even try to start anything.
This has the nice side effect of seperating busdma out the
way it should be, so the working of the ATA machinery is not
cluttered up with busdma droppings, making the code easier
to read and understand.
sysctl that a given variable is tunable.
Also added is CTLFLAG_RDTUN, which is CTLFLAG_RD|CTLFLAG_TUN; TUN does
not always imply read-only, so RDTUN should be used where RD was used
before.
for dev_strategy() use.
Retire bio_driver[12] (aliases for b_io.bio_driver[12]) these fields are
reserved for device driver use and can as such never have any interest
in the buf end of things.
- Return NULL instead of returning memory outside of the stackgap
in stackgap_alloc() (FreeBSD-SA-00:42.linux)
- Check for stackgap_alloc() returning NULL in svr4_emul_find(),
and clean_pipe().
- Avoid integer overflow on large nfds argument in svr4_sys_poll()
- Reject negative nbytes argument in svr4_sys_getdents()
- Don't copy out past the end of the struct componentname
pathname buffer in svr4_sys_resolvepath()
- Reject out-of-range signal numbers in svr4_sys_sigaction(),
svr4_sys_signal(), and svr4_sys_kill().
- Don't malloc() user-specified lengths in show_ioc() and
show_strbuf(), place arbitrary limits instead.
- Range-check lengths in si_listen(), ti_getinfo(), ti_bind(),
svr4_do_putmsg(), svr4_do_getmsg(), svr4_stream_ti_ioctl().
Some fixes obtain from OpenBSD.
pick up the DEVFS inode number from the dev_t and find our directory
entry from that, we don't need to scan the directory to find it.
This also solves an issue with on-demand devices in subdirectories.
Submitted by: cognet
by libguile that needs to know the base of the RSE backing store. We
currently do not export the fixed address to userland by means of a
sysctl so user code needs to hardcode it for now. This will be revisited
later.
The RSE backing store is now at the bottom of region 4. The memory stack
is at the top of region 4. This means that the whole region is usable
for the stacks, giving a 61-bit stack space.
Port: lang/guile (depended of x11/gnome2)
chain passed into dc_encap, which dc_start was unaware of. This caused
the old (now invalid) mbuf to be passed to BPF_MTAP.
Spotted by: Kenjiro Cho <kjc@csl.sony.co.jp>
is non-free. (More checks can/should be added in the future.)
Use M_ASSERTVALID in BPF_MTAP so that we catch when freed mbufs are
passed in, even if no bpf listeners are active.
Inspired by a bug in if_dc caught by Kenjiro Cho.
rijndael_blockDecrypt() as both input and output.
This property is important because inside rijndael we can get away
with allocating just a 16 byte "work" buffer on the stack (which
is very cheap), whereas the calling code would need to allocate the
full sized buffer, and in all likelyhood would have to do so with
an expensive malloc(9).
table, acquiring the necessary locks as it works. It usually returns
two references to the new descriptor: one in the descriptor table
and one via a pointer argument.
As falloc releases the FILEDESC lock before returning, there is a
potential for a process to close the reference in the file descriptor
table before falloc's caller gets to use the file. I don't think this
can happen in practice at the moment, because Giant indirectly protects
closes.
To stop the file being completly closed in this situation, this change
makes falloc set the refcount to two when both references are returned.
This makes life easier for several of falloc's callers, because the
first thing they previously did was grab an extra reference on the
file.
Reviewed by: iedowse
Idea run past: jhb
to the object's type field and the call to vm_pageout_flush() are
synchronized.
- The above change allows for the eliminaton of the last parameter
to vm_pageout_flush().
- Synchronize access to the page's valid field in vm_pageout_flush()
using the containing object's lock.
- Specifying VM_MAP_WIRE_HOLESOK should not assume that the start
address is the beginning of the map. Instead, move to the first
entry after the start address.
- The implementation of VM_MAP_WIRE_HOLESOK was incomplete. This
caused the failure of mlockall(2) in some circumstances.
Use EP_{READ,WRITE}{,_MULTI}_{1,2,4} instead. I've had several people
submit patches like this over the years of varying qualities, markm
being the last. The names were chosen in consulation with mdodd on
irc.
I've tested this with only PCMCIA cards: 3CCE589EC and 3CCSH572BT.
I've not tried with my more extensive ISA, EISA and cbus collection.
Reviewed by: mdodd
the point where it being a macro is no longer sensible, and it will
only be more so in days to come.
BIO_STRATEGY() is now only used from DEV_STRATEGY() and should not
be used directly anymore.
Put the contents of both in the new function dev_strategy() and
make DEV_STRATEGY() call that function.
In addition, this allows us to make the rather magic bufdonebio()
helper function static.
This alse saves hunderedandsome bytes of code in a typical kernel.
Though this is still incomplete and has some missing features such as
exclusive login and event notification, it may be enough for someone
who wants to play with it.
This driver is supposed to work with firewire(4), targ(4) of CAM(4)
and scsi_target(8) which can be found in /usr/share/example/scsi_target.
This driver doesn't require sbp(4) which implements initiator mode.
Sample configuration:
Kernel: (you can use modules as well)
device firewire
device scbus
device targ
device sbp_targ
After reboot:
# mdconfig -a -t malloc -s 10m
md0
# scsi_target 0:0:0 /dev/md0
(Assuming sbp_targ0 on scbus0)
You should find the 10MB HDD on FreeBSD/MacOS X/WinXP or whatever connected
to the target using FireWire.
Manpage is not finished yet.
- Change type of target->luns to allocate an array of LUNs dynamically.
This allows targets to change their number of LUNs after each bus reset.
- Serialize ORB POINTER command for each LUN.
- Improve debug messages.
definition structure. Define one flag, CN_FLAG_NODEBUG, which
indicates the console driver cannot be used in the context of the
debugger. This may be used, for example, if the console device
interacts with kernel services that cannot be used from the
debugger context, such as the network stack. These drivers are
skipped over for calls to cn_checkc() and cn_putc(), and the
calling function simply moves on to the next available console.
- Correct the logic for the AIF array index pointers so that correct slot is
always looked at.
- Copy the full FIB payload size when copying AIF's, not just the first 64
bytes.
Thanks to Mirapoint, Inc, for pointing these problems out and offering a
solution.
a fair bit of difference to the power consumption and lets my cpu cool
down enough for the temperature sensitive fan controller to completely
stop the cpu fan at times.
halt state that minimizes power consumption while still preserving
cache and TLB coherency. Halting the processor is not conditional at
this time. Tested with UP and SMP kernels.
address has been changed when PFIL_HOOKS is enabled and, if it has,
arrange for the proper action by ip*_forward.
Submitted by: Pyun YongHyeon
Supported by: FreeBSD Foundation
address has been changed when PFIL_HOOKS is enabled and, if it has,
arrange for the proper action by ip*_forward.
Supported by: FreeBSD Foundation
Submitted by: Pyun YongHyeon
Xcpustop(). %es is used in at least the call to savectx() when savectx()
calls bcopy(), so not loading it was fatal if a stop IPI interrupts
user mode.
This reduces bugs starting and stopping CPUs for debuggers. CPUs are
stopped mainly in kdb_trap() and cpu_reset(). At reset time there is
a good chance that all the CPUs are in the kernel, so the bug was
probably harmless then.
classes and if a method is not found in a given class, its base classes
are searched (in the order they were declared). This search is recursive,
i.e. a method may be define in a base class of a base class.
* Change the kobj method lookup algorithm to one which is SMP-safe. This
relies only on the constraint that an observer of a sequence of writes
of pointer-sized values will see exactly one of those values, not a
mixture of two or more values. This assumption holds for all processors
which FreeBSD supports.
* Add locking to kobj class initialisation.
* Add a simpler form of 'inheritance' for devclasses. Each devclass can
have a parent devclass. Searches for drivers continue up the chain of
devclasses until either a matching driver is found or a devclass is
reached which has no parent. This can allow, for instance, pci drivers
to match cardbus devices (assuming that cardbus declares pci as its
parent devclass).
* Increment __FreeBSD_version.
This preserves the driver API entirely except for one minor feature used
by the ISA compatibility shims. A workaround for ISA compatibility will
be committed separately. The kobj and newbus ABI has changed - all modules
must be recompiled.
rounding errors. This was the source of the majority of the
interactivity problems. Reintroduce the old algorithm and its XXX.
- Up the interactivity threshold to 30. It really could stand to be even
a tiny bit higher.
- Let the sleep and run time accumulate up to 5 seconds of history rather
than two. This helps stop XFree86 from becoming non-interactive during
bursts of activity.
trashed after being freed. This has caused several panics including
kern/42277 related to soft updates. Jim Kuhn tracked the problem
down to ipfw limit rule processing. In the expiry of dynamic rules,
it is possible for an O_LIMIT_PARENT rule to be removed when it still
has live children. When the children eventually do expire, a pointer
to the (long gone) parent is dereferenced and a count decremented.
Since this memory can, and is, allocated for other purposes (in the
case of kern/42277 an inodedep structure), chaos ensues. The offset
in question in inodedep is the offset of the 16 bit count field in
the ipfw2 ipfw_dyn_rule.
Submitted by: Jim Kuhn <jkuhn@sandvine.com>
Reviewed by: "Evgueni V. Gavrilov" <aquatique@rusunix.org>
Reviewed by: Ben Pfountz <netprince@vt.edu>
MFC after: 1 week
passes the fdidx from VOP_OPEN down.
This is for all I know the final API for this functionality, but
the locking semantics for messing with the filedescriptor from
the device driver are not settled at this time.
Discussed in from [FreeBSD-tech-jp 3396] to [FreeBSD-tech-jp 3407]
at FreeBSD-tech-jp@jp.freebsd.org.
NOTE: We must put ed_probe_SIC() function into if_ed_isa.c because
this is a bus dependent code. But the ed driver code is not
separated explicitly whether it is bus dependent or independent
now.
Refer to: http://plaza17.mbn.or.jp/~chi/myprog/FreeBSD/sicat.html
Submitted by: chi@bd.mbn.or.jp (Chiharu Shibata)
every page. If the source entry was read-only, one or more wired pages
could be in backing objects.
- vm_fault_copy_entry() should not set the PG_WRITEABLE flag on the page
unless the destination entry is, in fact, writeable.
elevated either due to priority propagation or because we're in the
kernel in either case, put us on the current queue so that we dont
stop others from using important resources. At some point the priority
elevations from sleeping in the kernel should go away.
- Remove an optimization in sched_userret(). Before we would only set
NEEDRESCHED if there was something of a higher priority available. This
is a trivial optimization and it breaks priority propagation because it
doesn't take threads which we may be blocking into account. Notice that
the thread which is blocking others gets up to one tick of cpu time before
we honor this NEEDRESCHED in sched_clock().
lock around a call to the original function. Make the timeout
function in callout_reset() use the wrapped function to avoid a
lock assertion panic.
Reviewed by: sam
Reported by: cgiordano@ids.net
sigreturn() ABI and the signal context on the stack.
Make the trapframe (and its shadows in the ucontext and sigframe etc)
8 bytes larger in order to preserve 16 byte stack alignment for the
following C code calls. I could have done some padding after the
trapframe was saved, but some of the C code still expects an argument of
'struct trapframe'. Anyway, this gives me a spare field that can be used
to store things like 'partial trapframe' status or something else in
the future.
The runtime impact is fairly small, *except* for threaded apps and things
that decode contexts and the signal stack (eg: cvsup binary). Signal
delivery isn't too badly affected because the kernel generates the
sigframe that sigreturn uses after the handler has been called.
The size of mcontext_t and struct sigframe hasn't changed. Only
the last few fields (sc_eip etc) got moved a little and I eliminated
a spare field. mc_len/sc_len did change location though so the
sanity checks there will still trap it.
- Make multicast work
- Fix (some of) the watchdog timeouts after card reset
- Add support for CE2, CEM28 and CEM33 cards
- General code cleanup
Any card that worked previously should still work, as well as a lot that
didn't.
The driver is not yet style(9) compliant; those changes are forthcoming,
once the functional changes are done.
PR: kern/50644
Reviewed by: imp
Approved by: imp
I changed. That is never a good sign.
1) only map 1 page at address zero, not 4096 pages
2) page 1 starts at address 4096 (PAGE_SIZE) not 4095 (PAGE_MASK). I
don't even want to think what the pte's looked like.
3) subtract the r/o page group start address from the end before
converting it to a count. Otherwise an extra page is mapped.
If you were affected by this, the symptoms of this was a hang at boot
after the spinner. Sorry folks. :-(
"You broke my laptop!" by: sam
accesses softc after it is freed. Use a different malloc type for
softc than the rest of the bus code to make it more clear when these
things happen that it is the driver that's at fault, not the bus code.
Suggested by: sam and/or phk (I think)
timeout would continue to happen: boom! Fix this[*] by timing out earlier.
[*] almost fixes the race on unload: wi_inquire could be running when
untimeout is called, and there's no way to know when it has actually
returned. This race is very rare and hard to lose.
Submitted by: scottl
seeded with arc4random rather than calling arc4random for each
packet. Note this is the same algorithm used to select the IV when
doing WEP on the host.
o don't grab the mutex at the top of ath_detach; it does nothing
useful
o deal with entry to ath_ioctl during detach to disable promiscuous
mode as a result of calling bpfdetach2: cannot call ath_init when
the device is marked invalid as the code isn't prepared to deal
with it (in particular by that time the hal reference may have
been yanked)
change ath_rate_ctl_reset to handle transition from station
mode to adhoc mode; was not resetting the initial xmit rate
causing outbound frames to be dicarded
use because a kernel thread is borrowing it. The borrowed page table
can change spontaneously, making any dependence on its continued use
subject to a race condition.
- _pmap_unwire_pte_hold() cannot use pmap_is_current(): If a change is
made to a page table page mapping for a borrowed page table, the TLB
must be updated.
In collaboration with: tegge
you on the current queue. In the future, it would be nice if priority
propagation could deterministicly pluck a thread off of the next queue
and put it on the current queue. Until then this hack stops us from
holding up our entire current queue, including interrupt handlers, while
a thread on the next queue is blocked while holding Giant.
- Inherit our pctcpu information from our parent.
- correct signedness mixups.
- log fix.
- preparation for 64bit sequence number.
introduce SA id (unique ID for SA - SPI is useless as duplicated
SPI is allowed)
- no need to malloc/free cksum buffer.
Obtained from: KAME
kqueue write events on a socket and you regularly create tons of pipes
which overwrites the structure causing a panic when removing the knote
from the list. If the peer has gone away (and it's a write knote), then
don't bother trying to remove the knote from the list.
Submitted by: Brian Buchanan and myself
Obtained from: nCircle
- Return NULL instead of returning memory outside of the stackgap
in stackgap_alloc() (FreeBSD-SA-00:42.linux)
- Check for stackgap_alloc() returning NULL in ibcs2_emul_find();
other calls to stackgap_alloc() have not been changed since they
are small fixed-size allocations.
- Replace use of strcpy() with strlcpy() in exec_coff_imgact()
to avoid buffer overflow
- Use strlcat() instead of strcat() to avoid a one byte buffer
overflow in ibcs2_setipdomainname()
- Use copyinstr() instead of copyin() in ibcs2_setipdomainname()
to ensure that the string is null-terminated
- Avoid integer overflow in ibcs2_setgroups() and ibcs2_setgroups()
by checking that gidsetsize argument is non-negative and
no larger than NGROUPS_MAX.
- Range-check signal numbers in ibcs2_wait(), ibcs2_sigaction(),
ibcs2_sigsys() and ibcs2_kill() to avoid accessing array past
the end (or before the start)
parameter in the read and write case dereferenced an unitialized
pointer and can't possibly ever have catched an actual invalid
argument.
This was apparently true for the read/write and getconf cases. The
latter does not even receive the paramter that is to be verified.
I'm surprised that this did not cause kernel panics, but it seems
that the uninitialized local variable happens to contain data that
may be used as a pointer to memory that satisfies the test condition.
Make the code work as intended by moving the test inside the switch
case where the pointer has been properly initialized.
Since the read and write case shared just about all code (except
for the single call to PCIB_READ_CONFIG resp. PCIB_WRITE_CONFIG) I
have merged both cases.
Noticed by: trhodes@FreeBSD.org (Tom Rhodes)
- Allocate storage for uap->msg always because it is copyin()'ed in
native sendmsg().
- Convert sockopt level from Linux to FreeBSD after native recvmsg() calling.
- Some cleanups.
Tested with: Oracle 9i shared server connection mode.
MFC after: 1 week
o correct recursive locking when polling and in em_82547_move_tail
o destroy mutex on detach
o add EM_LOCK_ASSERT and similar macros for creating+deleteing the mtx
Submitted by: Daniel Eischen <eischen@vigrid.com>
beasts which are reported to exist in both Atmel and Prism2 flavours. In
particular, Itronix branded laptops have the Atmel part with an Intersil
radio.
Obtained from: NetBSD
from UWX_REG_MUMBLE to UWX_REG_AR_MUMBLE. Compatibility defines are
present in libuwx. Change the names here so that we don't depend on
compatibility defines.
Note that there's now an UWX_REG_PFS and an UWX_REG_AR_PFS and the
former is not a compatibility define for the latter AFAICT. Change
to UWX_REG_AR_PFS as that seems to be the one we need to handle.
all the fixes locally applied and submitted to the author. Not
included in BETA 5, but part of this import are:
o FreeBSD specific ifdefs to make this compile within a kernel.
These are limited to include directives and defines.
o Removal of unused variables, proper casts and initializations
to allow building with -Werror. This happens in code so has a
higher chance of causing future import conflicts but not enough
to worry about it.
I'm especially thankful that the author accepted the change to
replace DISABLE_TRACE with UWX_TRACE_ENABLE so that we can use it
in kernel config files without nasty mappings or indirections as
that would make the integration less perfect. Thanks Cary!
an uninitialized sysctl_ctx, using flag DA_FLAG_SCTX_INIT. This
prevents a panic encoutered with some umass units that probe correctly
but fail to attach. Same problem, and same fix, as scsi_cd.c rev. 1.86.
Reviewed by: njl, ken
pmap_copy_page() et al. to accept a vm_page_t rather than a physical
address. Also, this change will facilitate locking access to the vm page's
valid field.
has been initialized.
(cdsysctlinit): Set flag CD_FLAG_SCTX_INIT after sysctl_ctx has been
initialized.
This resolves a panic encountered when a cd drive is sucessfully probed
but fails to attach.
Reviewed by: ken
o minor optimization of cardbus_cis processing. Remove a bunch of generic
entries that are handled by generic.
o no longer need the card_get_type stuff.
This MIB specifies how many bus resets should be observed before the
lost device entry is removed. The default value is 3.
You can set this value to 0 if you want a SBP device to be detached from CAM
layer as soon as the device is physically detached like USB.
routine of its own, and allows us to move the indentation back two
layers making the code more readable.
delete a prototype that should have been killed years ago in pccardvar.h.
# adding quirks here is way harder than it needs to be. :-(
In unodered excution case, we cannot detect link-chain end only
by prev == NULL if lastest ORB is executed earlyer than the former
ORBs. Use ORB_LINK_DEAD flag for this case.
- Don't reset agent for management ORB.
- Improve debug messages.
Spotted by: sbp target mode
a long-time bug: vm_pager_get_pages() assumes that m[reqpage] contains a
valid page upon return from pgo_getpages(). In the case of the device
pager this page has been freed and replaced by a fake page. The fake page
is properly inserted into the vm object but m[reqpage] is left pointing
to a freed page. For now, update m[reqpage] to point to the fake page.
Submitted by: tegge
caused snapshot related problems.
- The vp can not be NULL here or we would panic in vfs_bio_awrite(). Stop
confusing the logic by checking for it in several places.
Submitted by: kirk and then rototilled by me to remove vp == NULL checks.
for 21143 based cards which use SIA mode.
This fixes 10mbit mode for ZNYX ZX346Q cards and other
21143 based cards.
PR: 32118
Submitted by: Rene de Vries <rene@tunix.nl>
Geert Jan de Groot <GeertJan.deGroot@tunix.nl>
Obtained from: BSDI
MFC after: 2 weeks
VOP_INACTIVE routines need not worry about their vnode getting
recycled if they block. Remove the code from nfs_inactive() that
used vget() to get an extra vnode reference that was held during
the nfs_vinvalbuf() call.
so make the code slightly more uniform. The vnode lock is acquired in
all cases and now the only difference between VCHR and other is we
call UFS_UPDATE instead of VOP_FSYNC().
Use pre-emption detection to avoid the need for wiring a userland buffer
when copying opaque data structures.
sysctl_wire_old_buffer() is now a no-op. Other consumers of this
API should use pre-emption detection to notice update collisions.
vslock() and vsunlock() should no longer be called by any code
and should be retired in subsequent commits.
Discussed with: pete, phk
MFC after: 1 week
go away in due course. Involuntary pre-emption means that we can't count
on wiring of pages alone for consistency when performing a SYSCTL_OUT()
bigger than PAGE_SIZE.
Discussed with: pete, phk
- Slightly rewrite the fsync loop to be more lock friendly. We must
acquire the vnode interlock before dropping the mnt lock. We must
also check XLOCK to prevent vclean() races.
- Use LK_INTERLOCK in the vget() in ffs_sync to further prevent vclean()
races.
- Use a local variable to store the results of the nvp == TAILQ_NEXT
test so that we do not access the vp after we've vrele()d it.
- Add an XXX comment about UFS_UPDATE() not being protected by any lock
here. I suspect that it should need the VOP lock.
LK_RETRY either, we don't want this vnode if it turns into another.
- Remove the code that checks the mount point after acquiring the lock
we are guaranteed to either fail or get the vnode that we wanted.
- In vtryrecycle() try to vgonel the vnode if all of the previous checks
passed. We won't vgonel if someone has either acquired a hold or usecount
or started the vgone process elsewhere. This is because we may have been
removed from the free list while we were inspecting the vnode for
recycling.
- The VI_TRYLOCK stops two threads from entering getnewvnode() and recycling
the same vnode. To further reduce the likelyhood of this event, requeue
the vnode on the tail of the list prior to calling vtryrecycle(). We can
not actually remove the vnode from the list until we know that it's
going to be recycled because other interlock holders may see the VI_FREE
flag and try to remove it from the free list.
- Kill a bogus XXX comment. If XLOCK is set we shouldn't wait for it
regardless of MNT_WAIT because the vnode does not actually belong to
this filesystem.
purge, the purge in vclean, and the filesystems purge, we had 3 purges
per vnode.
- Move the insmntque(vp, 0) to vclean() so that we may remove it from the
two vgone() functions and reduce the number of lock operations required.
whether or not the sync failed. This could potentially get set between
the time that we VOP_UNLOCK and VI_LOCK() but the race would harmelssly
lead to the sync being delayed by an extra 30 seconds. If we do not move
the vnode it could cause an endless loop if it continues to fail to sync.
- Use vhold and vdrop to stop the vnode from changing identities while we
have it unlocked. Other internal vfs lists are likely to follow this
scheme.
- Create a new function, vgonechrl(), which performs vgone for an in-use
character device. Move the code from vflush() that did this into
vgonechrl().
- Hold the xlock across the entirety of vgonel() and vgonechrl() so that
at no point will an invalid vnode exist on any list without XLOCK set.
- Move the xlock code out of vclean() now that it is in the vgone*()
functions.
work in, but we had it mapped read-only. While this has always been the
case, the PG_PS enable hack hid it and the apm bios code ended up taking
advantage of it.
This is so that we may grab the interlock while still holding the
sync_mtx. We have to VI_TRYLOCK() because in all other cases the lock
order runs the other way.
- If we don't meet any of the preconditions, reinsert the vp into the
list for the next second.
- We don't need to panic if we fail to sync here because each FSYNC
function handles this case. Removing this redundant code also
simplifies locking.
will not actually be set even though we're calling sosetopt. sosetopt
calls down to a single ctloutput function if the name or level is
implemented by a specific protocol.
Submitted by: pete@isilon.com
fail. Remove the panic from that case and document why it might fail.
- Document the reason for calling cache_purge() on a newly created vnode.
- In insmntque() order the operations so that we can call mtx_unlock()
one fewer times. This makes the code somewhat clearer as well.
- Add XXX comments in sched_sync() and vflush().
- In vget(), do not sleep while waiting for XLOCK to clear if LK_NOWAIT is
set.
- In vclean() we don't need to acquire a lock around a single TAILQ_FIRST
call. It's ok if we race here, the vinvalbuf will just do nothing.
- Increase the scope of the lock in vgonel() to reduce the number of lock
operations that are performed.
we release the mntvnode_mtx.
- Call vgonel() directly instead of going through vrecycle() since we own
the interlock now.
- Remove a few cases where we locked the interlock just so that we could
call VOP_UNLOCK with interlock held.
mntvnode_mtx.
- Use a local variable to store the results of the test to see if the
next vnode on the mount list has changed. This is so that we no longer
acess the vnode after we vput() it.
stack trace supplied by phk, I now understand what's going on here. The
check for VI_XLOCK stops us from calling vinvalbuf once the vnode has been
partially torn down in vclean(). It is not clear that this would cause
a problem. Document this in nfs_bio.c, which is where the other two
filesystems copied this code from.
I do not yet understand why, but apm *depended* on the fact that the old
PSE code caused the first 1MB of ram to be mapped read/write because it
was in the same 4MB page as the kernel text+data+bss blob.
If anybody ever tried DISABLE_PSE before, apm would not work.
If your cpu did not have PSE, apm would not work there either (eg: 486).
This bug has been around for a Very Long Time.
The Pentium-4-fix commits did not emulate this unintended side effect of
the PSE post-early-boot fixup, and thus apm blew up. I've added a hack to
emulate the bug until either apm is fixed or we set fire to our bridges.
This is bad though because it gives kernel mode code the opportunity
to accidently write to the first few megs of the general page pool
which is remapped at KERNBASE. It needs to be fixed properly.
that covers updates to the contents. Note this is separate from holding
a reference and/or locking the routing table itself.
Other/related changes:
o rtredirect loses the final parameter by which an rtentry reference
may be returned; this was never used and added unwarranted complexity
for locking.
o minor style cleanups to routing code (e.g. ansi-fy function decls)
o remove the logic to bump the refcnt on the parent of cloned routes,
we assume the parent will remain as long as the clone; doing this avoids
a circularity in locking during delete
o convert some timeouts to MPSAFE callouts
Notes:
1. rt_mtx in struct rtentry is guarded by #ifdef _KERNEL as user-level
applications cannot/do-no know about mutex's. Doing this requires
that the mutex be the last element in the structure. A better solution
is to introduce an externalized version of struct rtentry but this is
a major task because of the intertwining of rtentry and other data
structures that are visible to user applications.
2. There are known LOR's that are expected to go away with forthcoming
work to eliminate many held references. If not these will be resolved
prior to release.
3. ATM changes are untested.
Sponsored by: FreeBSD Foundation
Obtained from: BSD/OS (partly)
A small helper function pmap_is_prefaultable() is added. This function
encapsulate the few lines of pmap_prefault() that actually vary from
machine to machine. Note: pmap_is_prefaultable() and pmap_mincore() have
much in common. Going forward, it's worth considering their merger.
been widely deploy and that's causing us a lot of pain. Back out the
last commit for a few weeks so that we can lessen the support load in
current@ asking why they can't build kernels anymore. Instructions in
UPDATING have been updated, but this should be more effective.
Revert the reverting: November 1st, 2003
quantities on every other architecture.) This change is required in order
to move pmap_prefault() out of the pmap and into the machine-independent
layer.
any queued packets for the isr, process those packets before the newly
submitted packet, maintaining ordering of all packets being delivered
to the netisr. Remove the bypass counter since we don't bypass anymore.
Leave the comment about possible problems and options since later
performance optimization may change the strategy for addressing ordering
problems here.
Specifically, this maintains the strong isr ordering guarantee; additional
parallelism and lower latency may be possible by moving to weaker
guarantees (per-interface, for example). We will probably at some point
also want to remove the one instance netisr dispatch limit currently
enforced by a mutex, but it's not clear that's 100% safe yet, even in
the netperf branch.
Reviewed by: sam, others
o move route_cb to be private to rtsock.c
o replace global static route_proto by locals
o eliminate global #define shorthands for info references
o remove some register decls
o ansi-fy function decls
o move items to be close in scope to their usage
o add rt_dispatch function for dispatching the actual message
o cleanup tangled logic for doing all-but-me msg send
Support by: FreeBSD Foundation
RTF_STATIC routes. Do not check for RTF_HOST so as to avoid being DoSed
when an RTF_GENMASK route exists in the table.
Add a more verbose comment about exactly what this code does.
Submitted by: ru
frame marker) and the syscall stub frame info in the trap frame.
Previously we stored the stub frame info in (rp,pfs) and the
caller frame info in (iip,cfm). This ends up being suboptimal
for the following reasons:
1. When we create a new context, such as for an execve(2), we had
to set the (rp,pfs) pair for the entry point when using the
syscall path out of the kernel but we need to set the (iip,cfm)
pair when we take the interrupt way out. This is mostly just
an inconsistency from the kernel's point of view, but an ugly
irregularity from gdb(1)'s point of view.
2. The getcontext(2) and setcontext(2) syscalls had to swap the
(rp,pfs) and (iip,cfm) pairs to make the context compatible
with one created purely in userland.
Swapping the (rp,pfs) and (iip,cfm) pairs is visible to signal
handlers that actually peek at the mcontext_t and to gdb(1).
Since this change is made for gdb(1) and we don't care about
signal handlers that peek at the mcontext_t because we're still
a tier 2 platform, this ABI breakage is academic at this moment
in time.
Note that there was no real reason to save the caller frame info
in (iip,cfm) and the stub frame info in (rp,pfs).
validating the offset within a given memory buffer before handing the
real work off to uiomove(9).
Use uiomove_frombuf in procfs to correct several issues with
integer arithmetic that could result in underflows/overflows. As a
side-effect, the code is significantly simplified.
Add additional sanity checks when computing a memory allocation size
in pfs_read.
Submitted by: rwatson (original uiomove_frombuf -- bugs are mine :-)
Reported by: Joost Pol <joost@pine.nl> (integer underflows/overflows)
And many changes.
* all
- Major change of struct fw_xfer.
o {send,recv}.buf is splitted into hdr and payload.
o Remove unnecessary fields.
o spd is moved under send and recv.
- Remove unnecessary 'volatile' keyword.
- Add definition of rtcode and extcode.
* firewire.c
- Ignore FWDEVINVAL devices in fw_noderesolve_nodeid().
- Check the existance of the bind before call STAILQ_REMOVE().
- Fix bug in the fw_bindadd().
- Change element of struct fw_bind for simplicity.
- Check rtcode of response packet.
- Reduce split transaction timeout to 200 msec.
(100msec is the default value in the spec.)
- Set watchdog timer cycle to 10 Hz.
- Set xfer->tv just before calling fw_get_tlabel().
* fwohci.c
- Simplifies fwohci_get_plen().
* sbp.c
- Fix byte order of multibyte scsi_status informations.
- Split sbp.c and sbp.h.
- Unit number is not necessary for FIFO¤ address.
- Reduce LOGIN_DELAY and SCAN_DELAY to 1 sec.
- Add some constants defineded in SBP-2 spec.
* fwmem.c
- Introduce fwmem_strategy() and reduce memory copy.
fd_cmask field in the file descriptor structure for the first process
indirectly from CMASK, and when an fd structure is initialized before
being filled in, and instead just use CMASK. This appears to be an
artifact left over from the initial integration of quotas into BSD.
Suggested by: peter
avoid problems with some Pentium 4 cpus and some older PPro/Pentium2
cpus. There are several problems, some documented in Intel errata.
This patch:
1) moves the kernel to the second page in the PSE case. There is an
errata that says that you Must Not point a 4MB page at physical
address zero on older cpus. We avoided bugs here due to sheer luck.
2) sets up PSE page tables right from the start in locore, rather than
trying to switch from 4K to 4M (or 2M) pages part way through the boot
sequence at the same time that we're messing with PG_G.
For some reason, the pmap work over the last 18 months seems to tickle
the problems, and the PAE infrastructure changes disturb the cpu
bugs even more.
A couple of people have reported a problem with APM bios calls during
boot. I'll work with people to get this resolved.
Obtained from: bmilekic
(direct dispatch) in interrupt threads when the netisr in question
isn't already active. If a netisr is already active, or direct
dispatch is already in progress, we queue the packet for later
delivery. Previously, this option was disabled by default. I have
measured 20%+ performance improvements in IP packet forwarding with
this enabled.
Please report any problems ASAP, especially relating to stack depth or
out-of-order packet processing.
Discussed with: jlemon, peter
Sponsored by: DARPA, Network Associates Laboratories
was that accessing the status reg could occour too fast, confusing
the logic in the flash part. Could not have been located without:
HW donated by: Jonas Bülow <jonas@servicefactory.se>
prior to invalidating the TLB to be certain that the processor doesn't
keep a cached copy.
Discussed with: pete
Paniced: tegge
Pointy Hat: The usual spot
evaluating them at compile time rather than at run time. As for x86
and amd64, this requires GCC and it's enabled only if __OPTIMIZE__ is
defined (ie, if at least -O is used).
Reviewed by: jake
Their purpose is to give explicit hints to the compiler to judge
the likelyhood of a test to succeed or fail. Not all architectures
have support for such optimizations, but for those who do, it can
give a nice performance improvement in hot loops.
Obviously, this should be used very rarely in very specific code.
Reviewed by: peter
Obtained from: OpenBSD
AcpiEnterSleepState() calling a long AcpiOsStall() with interrupts
disabled. This fix will instead be added to ACPI-CA.
PR:
Submitted by:
Reviewed by:
Approved by:
Obtained from:
MFC after:
the TLB and ~1600 if it is not. Therefore, it is more effecient to
invalidate the TLB after operations that use CMAP rather than before.
- So that the tlb is invalidated prior to switching off of a processor, we
must change the switchin functions to switchout functions.
- Remove td_switchout from the thread and move it to the x86 pcb.
- Move the code that calls switchout into swtch.s. These changes make this
optimization truely x86 specific.
then the mbuf has been consumed by a hook; otherwise beware of a null
mbuf return (gack). In particular the bridge was doing the wrong thing.
While in the ipv6 code make it's handling of pfil_run_hooks identical
to netbsd.
Pointed out by: Pyun YongHyeon <yongari@kt-is.co.kr>
change 38496
o add ipsec_osdep.h that holds os-specific definitions for portability
o s/KASSERT/IPSEC_ASSERT/ for portability
o s/SPLASSERT/IPSEC_SPLASSERT/ for portability
o remove function names from ASSERT strings since line#+file pinpints
the location
o use __func__ uniformly to reduce string storage
o convert some random #ifdef DIAGNOSTIC code to assertions
o remove some debuggging assertions no longer needed
change 38498
o replace numerous bogus panic's with equally bogus assertions
that at least go away on a production system
change 38502 + 38530
o change explicit mtx operations to #defines to simplify
future changes to a different lock type
change 38531
o hookup ipv4 ctlinput paths to a noop routine; we should be
handling path mtu changes at least
o correct potential null pointer deref in ipsec4_common_input_cb
chnage 38685
o fix locking for bundled SA's and for when key exchange is required
change 38770
o eliminate recursion on the SAHTREE lock
change 38804
o cleanup some types: long -> time_t
o remove refrence to dead #define
change 38805
o correct some types: long -> time_t
o add scan generation # to secpolicy to deal with locking issues
change 38806
o use LIST_FOREACH_SAFE instead of handrolled code
o change key_flush_spd to drop the sptree lock before purging
an entry to avoid lock recursion and to avoid holding the lock
over a long-running operation
o misc cleanups of tangled and twisty code
There is still much to do here but for now things look to be
working again.
Supported by: FreeBSD Foundation
file for vnode mappings. Note that this uses vn_fullpath() and may
be somewhat unreliable, although not too unreliable for shared
libraries. For non-vnode mappings, just print "-" for the field.
Obtained from: TrustedBSD Projects
Sponsored by: DARPA, AFRL, Network Associates Laboratories
make sure we return any allocated space to the drive. This should get
rid of a number of inconsistencies (hopefully all) that have been seen
after configuration errors.
even could call VOP_REVOKE() on vnodes associated with its dev_t's
has originated, but it stops right here.
If there are things people belive destroy_dev() needs to learn how to
do, please tell me about it, preferably with a reproducible test case.
Include <sys/uio.h> in bluetooth code rather than rely on <sys/vnode.h>
to do so.
The fact that some of the USB code needs to include <sys/vnode.h>
still disturbs me greatly, but I do not have time to chase that.
from fiddling with CS_TTGO since fiddling with CS_TTGO was removed in
rev.1.218 of the i386/isa version (which was merged with loss of history
in rev.1.223 of this version).
some symbols in X_db_search_symbol(). Reject the same symbols that
rev.1.13 did (all except STT_OBJECT and STT_FUNC), except don't reject
typeless symbols. This keeps the typeless symbols in non-verbosely
written assembler code visible, but makes file symbols invisible. ELF
file symbols have type STT_FILE and value 0, so this stops small values
and offsets sometimes being displayed in terms of the first file symbol
in the kernel (usually device_if.c). I think it rejects some other
unwanted symbols (small absolute symbols for things like struct offsets).
It may reject some wanted symbols (large absolute symbols for addresses
like PTmap).
about because we're still tier 2 and our current compiler, as well
as future compilers will not support varargs. This is mostly a
no-op in practice, because <sys/varargs.h> should already cause
compile failures.
use the ability on ia64 to map the register stack. The orientation of
the stack (i.e. its grow direction) is passed to vm_map_stack() in the
overloaded cow argument. Since the grow direction is represented by
bits, it is possible and allowed to create bi-directional stacks.
This is not an advertised feature, more of a side-effect.
Fix a bug in vm_map_growstack() that's specific to rstacks and which
we could only find by having the ability to create rstacks: when
the mapped stack ends at the faulting address, we have not actually
mapped the faulting address. we need to include or cover the faulting
address.
Note that at this time mmap(2) has not been extended to allow the
creation of rstacks by processes. If such a need arises, this can
be done.
Tested on: alpha, i386, ia64, sparc64
do exactly the same as vop_nopoll() for consistency and put a
comment in the two pointing at each other.
Retire seltrue() in favour of no_poll().
Create private default functions in kern_conf.c instead of public
ones.
Change default strategy to return the bio with ENODEV instead of
doing nothing which would lead the bio stranded.
Retire public nullopen() and nullclose() as well as the entire band
of public no{read,write,ioctl,mmap,kqfilter,strategy,poll,dump}
funtions, they are the default actions now.
Move the final two trivial functions from subr_xxx.c to kern_conf.c
and retire the now empty subr_xxx.c
This is just a cleanup here (modulo rev.1.108 of kern/tty.c), since the
input speed can be different from to output speed and extra code to
handle both speeds naturally handled all cases.
provide no methods does not make any sense, and is not used by any
driver.
It is a pretty hard to come up with even a theoretical concept of
a device driver which would always fail open and close with ENODEV.
Change the defaults to be nullopen() and nullclose() which simply
does nothing.
Remove explicit initializations to these from the drivers which
already used them.
- Removed conversion of a zero input speed to the output speed. This
has been done better in ttioctl() since rev.1.108 of kern/tty.c
almost 5 years ago. comparam() did the conversion incompletely for
the case where the output speed is also zero. It had complications
to avoid using zero speeds, but would still have used a zero input
speed for setting watermarks if kern/tty.c had passed one.
- Never permit the input speed to be different from the output speed.
There was no validity check on the input speed for the case of a zero
output speed. Then we didn't change the physical speeds, but we used
the unvalidated input speed for setting watermarks and didn't return
an error, so ttioctl() stored the unvalidated input speed in the tty
struct where it could cause problems later.
- Removed complications that were to avoid using a divisor of 0. The
divisor is now always valid if the speed is accepted.
cd_setreg() were still using !(read_eflags() & PSL_I) as the condition
for the lock hidden by COM_LOCK() (if any) being held. This worked
when spin mutexes and/or critical_enter() used hard interrupt disablement,
but it has caused recursion on the non-recursive mutex com_mtx since
all relevant interrupt disablement became soft. The recursion is
harmless unless there are other bugs, but it breaks an invariant so
it is fatal if spinlocks are witnessed.
struct msdosfsmount so that this file has the same prerequisites as
it used to. The new prerequistite was a meta-style bug. It required
many style bugs (unsorted includes ...) elsewhere.
Formatted prototypes in KNF. Resisted urge to sort all the prototypes,
to minimise differences with NetBSD. (NetBSD has reformatted the
prototypes but has not sorted them and still uses __P(()).)
node lock while sending a management frame as this will potentially
result in a LOR with a driver lock. This doesn't happen for the
Atheros driver but does for the wi driver. Use a generation number
to help process each node once when scanning the node table and
drop the node lock if we need to timeout a node and send a frame.
AP has basic rates that we do not support then ignore them instead
of marking the rate set in error.
This fixes an 11b station associating with an 11g/b AP.
are allowed by Windows (ref: MS KB article 120138).
XXX From my reading of the CIFS specification, it's not clear that
clients need to validate filenames at all.
PR: 57123
Submitted by: Paul Coucher
MFC after: 1 month
in the loopback test in the probe. The delay was too short for consoles
at speeds lower than about 3200 bps. This shouldn't have caused many
problems, since such low speeds are rare and the probe is forced to
succeed for consoles.
consdev structure.
If the consdev name is not set and we have a cn_dev, set the name
from there. Try to issue a printf about this, even though it may
not have a place to go.
Modify the sysctl related code to pick up the name from the consdev
instead.
Most of the actual use of the cn_dev field is merely to get the name,
and most of the actual initializations are bogusly using makedev()
because the probe/attach has not been completed.
Instead we will migrate console drivers to fill in the name and if
the driver needs it: the unit number, thereby avoiding the bogus
calls to makedev().
and the Z8530 drivers used the I/O address as a quick and dirty way to
determine which channel they operated on, but formalizing this by
introducing iobase is not a solution. How for example would a driver
know which channel it controls for a multi-channel UART that only has a
single I/O range?
Instead, add an explicit field, called chan, to struct uart_bas that
holds the channel within a device, or 0 otherwise. The chan field is
initialized both by the system device probing (i.e. a system console)
or it is passed down to uart_bus_probe() by any of the bus front-ends.
As such, it impacts all platforms and bus drivers and makes it a rather
large commit.
Remove the use of iobase in uart_cpu_eqres() for pc98. It is expected
that platforms have the capability to compare tag and handle pairs for
equality; as to determine whether two pairs access the same device or
not. The use of iobase for pc98 makes it impossible to formalize this
and turn it into a real newbus function later. This commit reverts
uart_cpu_eqres() for pc98 to an unimplemented function. It has to be
reimplemented using only the tag and handle fields in struct uart_bas.
Rewrite the SAB82532 and Z8530 drivers to use the chan field in struct
uart_bas. Remove the IS_CHANNEL_A and IS_CHANNEL_B macros. We don't
need to abstract anything anymore.
Discussed with: nyan
Tested on: i386, ia64, sparc64
aic7xxx_pci.c:
When performing our register test, be careful
to avoid resetting the chip when pausing the
controller. The test reads the HCNTRL register
and then writes it back with the PAUSE bit
explicitly set. If the last write to the controller
before our probe is to reset it, the CHIPRST
bit will still be set, so we must mask it off
before the PAUSE operation. On some chip versions,
we cannot access registers for a few 100us after
a reset, so this inadvertant reset was causing PCI
errors to occur on the read to check for paused
status.
Submitted by: gibbs
Reimplement pmap_release() such that it uses the page table rather than
the pte object to locate the page table directory pages. (Temporarily,
retain an assertion on the emptiness of the pte object.)
systems where the data/stack/etc limits are too big for a 32 bit process.
Move the 5 or so identical instances of ELF_RTLD_ADDR() into imgact_elf.c.
Supply an ia32_fixlimits function. Export the clip/default values to
sysctl under the compat.ia32 heirarchy.
Have mmap(0, ...) respect the current p->p_limits[RLIMIT_DATA].rlim_max
value rather than the sysctl tweakable variable. This allows mmap to
place mappings at sensible locations when limits have been reduced.
Have the imgact_elf.c ld-elf.so.1 placement algorithm use the same
method as mmap(0, ...) now does.
Note that we cannot remove all references to the sysctl tweakable
maxdsiz etc variables because /etc/login.conf specifies a datasize
of 'unlimited'. And that causes exec etc to fail since it can no
longer find space to mmap things.
Instead, use EXCA_MEMREG_WIN_SHIFT which is the amount we shift the
bus address by to write into upper memory (eg above 24MB). Use the
latter in this case.
be gone in FreeBSD 6, so put BURN_BRIDGES around it. The TRB also
felt that if something better comes along sooner, it can be used to
replace this code.
Delayed by: BSDcon and subsequent disk crash.
o revamp IPv4+IPv6+bridge usage to match API changes
o remove pfil_head instances from protosw entries (no longer used)
o add locking
o bump FreeBSD version for 3rd party modules
Heavy lifting by: "Max Laier" <max@love2party.net>
Supported by: FreeBSD Foundation
Obtained from: NetBSD (bits of pfil.h and pfil.c)
attached network could exhaust kernel memory, and cause a system
panic, by sending a flood of spoofed ARP requests.
Approved by: jake (mentor)
Reported by: Apple Product Security <product-security@apple.com>
Skinny is the protocol used by Cisco IP phones to talk to Cisco Call
Managers. With this code, one can use a Cisco IP phone behind a FreeBSD
NAT gateway.
Currently, having the Call Manager behind the NAT gateway is not supported.
More information on enabling Skinny support in libalias, natd, and ppp
can be found in those applications' manpages.
PR: 55843
Reviewed by: ru
Approved by: ru
MFC after: 30 days
order to use "unmanaged" pages in the kmem object, vm_map_delete() must
unconditionally perform pmap_remove(). Otherwise, sparc64 has problems.
Tested by: jake
32 bit binary stuff. 32 bit binaries do not like it much when the kernel
tries hard to put things above the 8GB mark.
I have a work-in-progress to fix this properly, but I didn't want to burn
anybody with this yet.
initialize a TSC timecounter until we know if it is broke or not.
XXX I think there is a bug in the i386 code here. init_TSC_tc() comes
after:
if (statclock_disable)
return;
ie: if you turn off the statclock interrupt, you dont get the TSC either.
for breakpoint and trace traps from usermode. Although all the setidt
entries are interrupt gates on amd64, all but the trace and bpt trap
entry handlers reenable interrupts after the swapgs instruction in order
to simulate the trap/interrupt gate distinction. In other words, the
amd64 code behaves the same way that i386 does here.
for ddb input in some atkbd-based console drivers. ddb must not use any
normal locks but DELAY() normally calls getit() which needs clock_lock.
This also removes the need for recursion on clock_lock.
known constants at compile time rather than at run time. We have a number
of nasty hacks around the place to cache ntohl() of constants (eg: nfs).
This change allows the compiler to compile-time evaluate ntohl(1) as
0x01000000 rather than having to emit assembler code to do it. This
has other smaller flow-on effects because the compiler can see that
ntohl(constant) itself has a constant value now and can propagate the
compile time evaluation.
Obtained from: Ideas from NetBSD and Linux, and some code from NetBSD
1.186: onoe; Sony's PEGA-WL110 CF WLAN (which strangely has fujitsu's
vendor id)
1.185: ichiro; Quatech Inc, PCMCIA Enhanced Parallel Port Card
Also:
o update $NetBSD$
o minor tweaks to FUJITSU. We've tried to keep the CIS only entries seprate
from vendor id/product id.
freed belong to the kernel object.)
- Increase the granularity of the vm object locking in vm_hold_load_pages()
in order to reduce the number of times that we acquire and release the
same lock.
Fix to the messages output under CAM_DEBUG_CCB: the summary sense
information (error bits and sense key) is in the error field, not
in the result field, of struct ata_request. No other functional change.
completion of recovery is indicated by positioning the CAM_AUTOSNS_VALID
bit in the status field of the CCB, not in the flags field.
This fixes an endless loop of sense recovery actions.
Reviewed by: ken
function, startup_alloc(), that is used for single page allocations prior
to the VM starting up. If it is used after the VM startups up, it
replaces the zone's allocf pointer with either page_alloc() or
uma_small_alloc() where appropriate.
Pointy hat to: me
Tested by: phk/amd64, me/x86
Temporarily disable the UMA_MD_SMALL_ALLOC stuff since recent commits
break sparc64, amd64, ia64 and alpha. It appears only i386 and maybe
powerpc were not broken.
device to access 64-bit addresses from a 32-bit PCI bus. While the
RealTek manual says you can set this bit and the chip will perform
DAC only if you give it a DMA address with any of the upper 32
bits set, this appears not to be the case. If I turn on the DAC
bit, the chip sets the 'system error' bit in the status register
when I to do a DMA on my Athlon test box with 32-bit PCI bus (VIA
chipset) even though I only have 128MB of physical memory, and thus
can never give the chip a 64-bit address.
Obviously, I can't just set it and forget it, so until I figure
out the right rule for when it's safe/necessary to enable it, keep
it turned off.
not guaranteed that the RSE writes the NaT collection immediately,
sort of atomically, to the backing store when it writes the register
immediately prior to the NaT collection point. This means that we
cannot assume that the low 9 bits of the backingstore pointer do not
point to the NaT collection. This is rather a surprise and I don't
know at this time if it's a bug in the Merced or that it's actually
a valid condition of the architecture. A quick scan over the sources
does not indicate that we depend on the false assumption elsewhere,
but it's something to keep in mind.
The fix is to write the saved contents of the ar.rnat register to
the backingstore prior to entering the loop that copies the dirty
registers from the kernel stack to the user stack.
functions reference UMA internals from <vm/uma_int.h>, which makes
them highly unwanted in non-UMA specific files.
While here, prune the includes in pmap.c and use __FBSDID(). Move
the includes above the descriptive comment.
The copyright of uma_machdep.c is assigned to the project and can
be reassigned to the foundation if and when when such is preferrable.
Tested at 100Mbit only, using Asus P4P800 onboard 3C940.
The -stable version of this patch I have in use for ~2 weeks now, and works
just fine for me.
Based on: Nathan L. Binkert's patch for OpenBSD
Patch submitted by and thanks to: Jung-uk Kim <jkim@niksun.com>
MFC after: 2 weeks
Doing so creates a race where the buf is on neither list.
- Only vfree() in an error case in vclean() if VSHOULDFREE() thinks we
should.
- Convert the error case in vclean() to INVARIANTS from DIAGNOSTIC as this
really should not happen and is fast to check.
sufficient to guarantee that this race is not hit. The XLOCK will likely
have to be redesigned due to the way reference counting and mutexes work
in FreeBSD. We currently can not be guaranteed that xlock was not set
and cleared while we were blocked on the interlock while waiting to check
for XLOCK. This would lead us to reference a vnode which was not the
vnode we requested.
- Add a backtrace() call inside of INVARIANTS in the hopes of finding out if
this condition is ever hit. It should not, since we should be retaining
a reference to the vnode in these cases. The reference would be sufficient
to block recycling.
working set cache. This has several advantages. Firstly, we never touch
the per cpu queues now in the timeout handler. This removes one more
reason for having per cpu locks. Secondly, it reduces the size of the zone
by 8 bytes, bringing it under 200 bytes for a single proc x86 box. This
tidies up other logic as well.
- The 'destroy' flag no longer needs to be passed to zone_drain() since it
always frees everything in the zone's slabs.
- cache_drain() is now only called from zone_dtor() and so it destroys by
default. It also does not need the destroy parameter now.
broken consumers of the malloc interface who assume that the allocated
address will be an even multiple of the size.
- Remove disabled time delay code on uma_reclaim(). The comment there said
it all. It was not an effective strategy and it should not be left in
#if 0'd for all eternity.
restart instruction bits in the PSR. As such, we were returning
from interrupt to the instruction in the bundle that caused us
to enter the kernel, only now we're returning to a completely
different bundle.
While close here: add two KASSERTs to make sure that we restore
sync contexts only when entered the kernel through a syscall and
restore an async context only when entered the kernel through an
interrupt, trap or fault.
While not exactly here, but close enough: use suword64() when we
copy the dirty registers from the kernel stack to the user stack.
The code was intended to be be replaced shortly after being added,
but that was a couple of weeks ago. I might as well avoid that it
is a source for panics until it's replaced.
can get (or not) and what we do with them. This fixes the behaviour
for NaT consumption and speculation faults in that we now don't panic
for user faults.
Remove the dopanic label and move the code to a function. This makes
it easier in the simulator to set a breakpoint.
While here, remove the special handling of the old break-based syscall
path and move it to where we handle the break vector. While here,
reserve a new break immediate for KSE. We currently use the old break-
based syscall to deal with restoring async contexts. However, it has
the side-effect of also setting the signal mask and callong ast() on
the way out. The new break immediate simply restores the context and
returns without calling ast().
of "dumb" PCI-based serial/parallel boards get a hint how to enable
them.
I wasn't sure about the ia64, pc98, powerpc, and sparc64 archs whether
they'd support puc(4) or not.
extended irq lists. If the resource has a trailing byte but not the full
resource string, do not attempt to parse the resource string. This fixes
panics on transition to battery and shutdown for Larry. Patch has been
submitted to vendor and they will incorporate in next release.
Tested by: Larry Rosenman <ler@lerctr.org>
PR: kern/56254
page_alloc() function from the slab_zalloc() function. This allows us
to unconditionally call uz_allocf().
- In page_alloc() cleanup the boot_pages logic some. Previously memory from
this cache that was not used by the time the system started was left in
the cache and never used. Typically this wasn't more than a few pages,
but now we will use this cache so long as memory is available.
by accepting the user supplied flags directly. Previously this was not
done so that flags for the same field would not be defined in two
different files. Add comments in each header instructing future
developers on how now to shoot their feet.
- Fix a test for !OFFPAGE which should have been a test for HASH. This would
have caused a panic if we had ever destructed a malloc zone. This also
opens up the possibility that other zones could use the vsetobj() method
rather than a hash.
but for CPL != 0. For some reason yet unknown it is possible for the
CPL to be 2. This would previously be counted as kernel mode, which
resulted in nasty panics. By changing the test it is now treated as
user mode, which is more correct. We still need to figure out how it
is possible that the privilege level can be 2 (or 1 for that matter),
because it's not used by us. We only use 3 (user mode) and 0 (kernel
mode).
don't cache as many items.
- Introduce the bucket_alloc(), bucket_free() functions to wrap bucket
allocation. These functions select the appropriate bucket zone to
allocate from or free to.
- Rename ub_ptr to ub_cnt to reflect a change in its use. ub_cnt now reflects
the count of free items in the bucket. This gets rid of many unnatural
subtractions by 1 throughout the code.
- Add ub_entries which reflects the number of entries possibly held in a
bucket.
IF_HANDOFF() does it for us behind the scenes. Remove the extra call
to re_start() otherwise we try to transmit twice.
In re_encap(), fix the code that guards against consuming too many
descriptors in the TX ring so that it actually works. With the
new 8169S chip, I was able to hit a corner case that drained the
free descriptor count all the way to 0. This is not supposed to
be possible.
reserved bits in the port that must be zero are 24:30, not 20:30. Bits
16:23 are used to set the bus number. This meant that when we tested for
config mechanism #1, if the previous PCI configuration transaction sent
used a bus number greater than 15, one of the bits in 20:23 would be
non-zero and we would fail to use config mechanism #1 and thus fail to see
that PCI existed on the machine at all.
Obtained from: Shanley's PCI System Architecture book
Tested by: des
Proxied through: njl
user mode. This goes with rev.1.468 of machdep.c which changed the gates
for these traps to interrupt gates. Having the interrupts disabled for
these traps from user mode is just an unwanted side effect.
This fixes at least 1 case of "panic: absolutely cannot call
smp_ipi_shootdown with interrupts already disabled". Too much code was
run with interrupts disabled, and it sometimes hit a sanity check.
Fix verified by: deischen
sending an ICMP packet doesn't cause a panic. A better solution is needed;
possibly defering the transmit to a dedicated thread.
Observed by: "Aaron Wohl" <freebsd@soith.com>
COM_NOFIFO() and COM_ESP cases are supposed to be a subsets of the
plain 16550A case, but 16650-related changes made the former fall into
the latter and then both fall into general code for printing the tx
fifo size. This mainly caused hard to parse boot messages like:
"sio0: type 16550A fifo disabled lookalike with 1 bytes FIFO".
COM_NOFIFO() on an ESP port gave a larger mess whose extent is not
clear.
Fixed some nearby style bugs.
defined values instead of hard-coded values. Don't repeat the register
access part of the code 4 times times or triple-space statements. This
fixes half of the style bugs in rev.1.172.
Hardware flow control of 16650As is still officially unsupported. I
was mistaken about it being broken. It is broken in 16650s but is
fixed in 16650As except for the maximum trigger level (which is no
longer used). Testing of the 16650's broken hardware flow control
watermarks by programming them on 16950s showed that their effects are
not too bad if the fifo size and trigger level are reasonably large
(16 is much better than 8).
an UART interface could get stuck when a new interrupt condition
arose while servicing a previous interrupt. Since an interrupt was
already pending, no new interrupt would be triggered.
Avoid infinite recursion by flushing the Rx FIFO and marking an
overrun condition when we could not move the data from the Rx
FIFO to the receive buffer in toto. Failure to flush the Rx FIFO
would leave the Rx ready condition pending.
Note that the SAB 82532 already did this due to the nature of the
chip.
from the sparc64 subtree, which breaks building non-sparc64 platforms
in the event the sparc64 subtree does not exist.
The problem is specific to the module, because non-module builds are
affected by the presence or absence of "device ebus" in the kernel
configuration.
PR: kern/56869
precisely where locking would be needed before adding it, but it
seems uart(4) draws slightly too much attention to have it without
locking for too long.
The lock added is a spinlock that protects access to the underlying
hardware. As a first and obvious stab at this, each method of the
hardware interface grabs the lock. Roughly speaking this serializes
the methods. Exceptions are the probe, attach and detach methods.
o change timeout to MPSAFE callout
o restructure rule deletion to deal with locking requirements
o replace static buffer used for ipfw control operations with malloc'd storage
Sponsored by: FreeBSD Foundation
o change time to MPSAFE callout
o make debug printfs conditional on DUMMYNET_DEBUG and runtime controllable
by net.inet.ip.dummynet.debug
o make boot-time printf dependent on bootverbose
Sponsored by: FreeBSD Foundation
o replace magic constants with #defines (e.g. ETHER_ADDR_LEN)
o move mib variables to net.link.ether.bridge with backwards compatible
entries for well-known items maintained under BURN_BRIDGES
o revamp debugging support so it is conditioanlly compiled with BRIDGE_DEBUG
(on currently) and runtime controlled by net.link.ether.bridge.debug
o change timeout to MPSAFE callout
o optimize lookup for common case of two interfaces
o optimize forwarding path to take IFNET lock only when needed
o make boot-time printf dependent on bootverbose
o sundry style changes (ANSI decls, extraneous spaces, etc.)
Sponsored by: FreeBSD Foundation
a problem for command responses since we rarely ever filled the queue.
However, adapter-initiated commands have a much smaller queue and could
tickle this bug. It's possible that this might fix the recently reported
problems with the aac-2120s, though I haven't been able to reproduce the
problem locally.
MFC-After: 1 day
and some of their bits (i.e., fifo trigger levels, frequency multipliers
and divisors, and bits to select the registers for these). This
attempts to completely describe the 16950's complicated register selects
for 16950-specific registers only.
the description of the data latch registers (they were described as
readonly).
Added some better and worse aliases for standard registers, mostly taken
from the 16950 data sheet. Define deprecated aliases in terms of the
preferred one.
Don't define com_efr in terms of com_fifo. It is unrelated (in a
different bank).
Merged comments to match (put them at the right of the #defines instead
of duplicating them).
Sorted the resulting sections on UART type and register bank. Added a
comment for each bank.
to ns16550.h. The organization of these files was sort of backwards.
The bits in the registers have no driver or bus dependencies but they
but the offsets of the registers in bus space are very bus-dependent.
However, it does no harm to keep the definitions of the register offsets
in ns16550.h provided they are thought of as internal ns*50 offsets.
that has been recorded earlier and overwrite it again later by
reading it directly from the EEPROM again.
Read the MAC address from the PAR0/PAR1 registers instead, which
are autoloaded on reboot.
Tested on AN985, AN983B. According to the datasheets, it should
also work for the AL981 (I don't have such a chip on a card at home)
PR: 52988
Submitted by: Andrew Gordon <arg-bsd@arg.me.uk>
MFC after: 2 weeks
calculate smoothed signal quality data for each node.
o add a 16-deep history buffer to each driver-private node storage that
holds rssi and antenna info for received frames
o override the default per-node "get rssi" method to return an average
rssi value based on samples collected over the last second
o enable beacon reception so even idle systems maintain a running history
of signal quality
This data may also be useful for improving the rate control algorithm.
Based on work by Tom Marshall <tommy@home.tig-grr.com> for MADWIFI.
things than record a single value.
o add a per-node method for returning the "current RSSI" for a node
o create a default method that returns ni_rssi which is the rssi for
the last received frame
o use the per-node "get rssi" method to return data for the RID's
submitted by wicontrol, et. al.
Loosely based on work by Tom Marshall <tommy@home.tig-grr.com> for MADWIFI.
to the 802.11 layer if they are at least IEEE80211_MIN_LEN
o mask off interrupt status bits that we don't care about so we don't do
the wrong thing; this fixes a problem where the beacon miss interrupt status
bit is delivered together with other status bits when operating in monitor
mode (we would post a beacon miss swi and then do the wrong thing)
In particular, let drivers send up control frames so we can dispatch
them to bpf in monitor mode.
This is the first (small) step to adding more functionality such as
power save mode.
was added to the fast path to support the COM_IIR_RXRDYBUG() case even
when that case is not configured. This increased the relative overhead
of sio input by almost 25% in the worst case and by 2-3% in the usual
case (usually only about 0.2% absolute per port at 115200 bps). The
quick fix is to significantly pessimize only the COM_IIR_RXRDYBUG()
case.
safe since the 802.11 layer does the right thing for 11a operation)
o select short preamble operation based on the negotiated capabilities; not
just the local state/capability
o fillin the duration field in the 802.11 header as appropriate
o remove detection of 11g support; no longer needed
Obtained from: MADWIFI (with modifications)
capabilities for outbound management frames. But beware of sending
this when operating on 5GHz channels; some 11a AP's reject association
requests if this bit is set in the capabilities listed.
Obtained from: MADWIFI (with modifications)
special signal-delivery protections for setugid processes. In the
event that a system is relying on "unusual" signal delivery to
processes that change their credentials, this can be used to work
around application problems.
Also, add SIGALRM to the set of signals permitted to be delivered to
setugid processes by unprivileged subjects.
Reported by: Joe Greco <jgreco@ns.sol.net>
we're on a 32-bit/64-bit bus or not. Use this to decide if we should
set the PCI dual-address cycle enable bit in the C+ command register.
(Enabling DAC on a 32-bit bus seems to do bad things.)
Also, initialize the C+ command register early in the re_init() routine.
The documentation says this register should be configured first.
in6_pcbpurgeif0(LIST_FIRST(udbinfo.listhead), ifp);
in6_pcbpurgeif0(LIST_FIRST(ripcbinfo.listhead), ifp);
The problem here is that udbinfo.listhead and ripcbinfo.listhead are
not initialized during the device probe/attach phase of the kernel
boot process. So if, for example, a network driver calls ether_ifattach()
in its foo_attach() routine and then decides that something is wrong
and calls ether_ifdetach() to reverse the process, we will panic trying
to dereference the uninitialized list head pointers. (Though the
same sequence of events performed after the kernel has come up works
file, i.e. doing kldload if_foo from multiuser.)
Change this to:
if (udbinfo.listhead != NULL)
in6_pcbpurgeif0(LIST_FIRST(udbinfo.listhead), ifp);
if (ripcbinfo.listhead != NULL)
in6_pcbpurgeif0(LIST_FIRST(ripcbinfo.listhead), ifp);
to avoid the NULL pointer dereferences.
pmap_remove_pte(), passed NULL instead of the required page table
page to pmap_unuse_pt(). Compute the necessary page table page
in pmap_remove_pte(). Also, remove some unreachable code from
pmap_remove_pte().
count in _vm_object_allocate(). (Access to the generation count is
governed by the vm object's lock.) Note: the introduction of the
atomic increment in revision 1.238 appears to be an accident. The
purpose of that commit was to fix an Alpha-specific bug in UMA's
debugging code.
We simply use the detected FIFO size to determine whether we have
a post 16550 UART or not. The support lacks proper serialization of
hardware access for now.
pmap_extract_and_hold(). Note, however, that GIANT_REQUIRED should not be
removed until all platforms fully implement the "prot" parameter to
pmap_extract_and_hold().
Reviewed by: tegge
fixes a longstanding issue WRT resetting the chip after startup- it
would fail if we were connected as an F-port to a switch. If we
were connected as an F-port, we got assigned a hard loop ID of 255,
which is really a bogus loop id. Then when we turned around to
reset ourselves, the firmware would reject the ICB_INIT request
because the loop id was bogus. *sputter*
Minor fixlet from somebody in NetBSD with too much time on their
hands (dma -> DMA).
the "compatible" property too in the ns8250 case. This gets the serial
console to work on Blade 100s, where the device name is just "serial".
Reviewed by: marcel
Second (PPS) timing interface. The support is non-optional and by
default uses the DCD line signal as the pulse input. A compile-time
option (UART_PPS_ON_CTS) can be used to have uart(4) use the CTS line
signal.
Include <sys/timepps.h> in uart_bus.h to avoid having to add the
inclusion of that header in all source files.
Reviewed by: phk
This commit puts the relevant code snippets under #ifdef GONE_IN_5
(rather than #ifndef BURN_BRIDGES) thereby disabling the code now.
The code wil be entirely removed before 5.2 unless we find reasons
why this would be a bad idea.
Approach suggested by: imp
seems to be necessary for the 8139C+ under certain circumstances, and
doesn't appear to hurt the other chips. (In the failure case, the
packet would be sent through the TX DMA ring but not get echoed
back. I suspect this has something to do with the link state changing
unexpectedly.)
autoload and then copying the contends of the station address
registers. For some reason, reading the EEPROM on the 8169S doesn't
work right. This gets around the problem, and allows us to read
the station address correctly on the 8169S.
- Insert a delay after initiating packet transmition in re_diag() to
allow lots of time for the frame to echo back to the host, and wait
for both the 'RX complete' and 'timeout expired' bits in the ISR
register to be set.
- Deal more intelligently with the fact that the frame length
field in the RX descriptor is a different width on the 8139C+
than it is on the 8169/8169S/8110S
- For the 8169, you have to set bit 17 in the TX config register
to enter digital loopback mode, but for the 8139C+, you have to
set both bits 17 and 18. Take this into account so that re_diag()
works properly for both types of chips.
ethernet chips. This driver is pretty simple, however it contains
special DSP initialization code which is needed in order to get
the chip to negotiate a gigE link. (This special initialization
may not be needed in subsequent chip revs.) Also:
- Fix typo in if_rlreg.h (RL_GMEDIASTAT_1000MPS -> RL_GMEDIASTAT_1000MBPS)
- Deal with shared interrupts in re_intr(): if interface isn't up,
return.
- Fix another bug in re_gmii_writereg() (properly apply data field mask)
- Allow PHY driver to read the RL_GMEDIASTAT register via the
re_gmii_readreg() register (this is register needed to determine
real time link/media status).
we think is the correct trigger mode and polarity. This allows us to
implement BUS_CONFIG_INTR() as an update of the RTE in question.
Consequently, we can trust the RTE when we enable an interrupt and
avoids that we need to know about the trigger mode and polarity at
that time.
method. This is necessary on ia64 where it's known that serial interfaces
described in the ACPI namespace may not have the well-known IRQs assigned
to them. This confuses us in thinking they are PCI based interrupts and
wrongly program the APIC.
about interrupt trigger mode and interrupt polarity. This allows ACPI
for example to pass interrupt resource information up the hierarchy.
The default implementation of the method therefore is to pass the
request to the parent.
Reviewed by: jhb, njl
for the 8169S, according to my sample board. The RealTek Linux driver
mentions 0x00800000. I'm assigning this to the 8110S until I get
more info on it. (The (preliminary) RealTek docs only say that 8169S/8110S
chips will have some combination of those two bits set, but doesn't say
exactly what bit combination goes with which chip variant.)
intpin register is expressed in hardware where 0 means none, 1 means INTA,
2 INTB, etc. The other way is commonly used in loops where 0 means INTA,
1 means INTB, etc. The matchpin argument to pci_cfgintr_search() is
supposed to be the first form, but we passsed in a loop index of the
second. This fix adds one to the loop index to convert to the first form.
Reported by: Pavlin Radoslavov <pavlin@icir.org>
is not a size of 1. Since we already know there is a FIFO, we can
safely assume that it is at least 16 bytes. Note that all this is
mostly academic anyway. We don't use the size of the Rx FIFO
currently. If we add support for hardware flow control, we only
care about Rx FIFO sizes larger than 16.
on data structures on the kernel stack which are guaranteed to be 16 byte
aligned by gcc, the amd64 ABI and __aligned(16).
Ensire the tss_rsp0 initial stack pointer is 16 byte aligned in case
sizeof(pcb) becomes odd at some point. This is convenient for the
interrupt handler case because the ring crossing pushes cause the
required odd alignment before the call to the C code.
Have fast_syscall add an additional 8 bytes to ensure that the trapframe
has the correct odd alignment for the call to C code. Note that there are
no checks to make sure that the trapframe size is appropriate for this.
This makes get/setfpcontext work properly (finally). You get a GPF in
kernel mode if any of this is botched without the alignment fixup code
that is apparently needed on i386.
written by Stuart Walsh and Duncan Barclay (with some kibbitzing by
me). I'm checking it in on Stuart's behalf.
The BCM4401 is built into several x86 laptop and desktop systems. For the
moment, I have only enabled it in the x86 kernel config because although
it's a PCI device, I haven't heard of any standalone NICs that use it. If
somebody knows of one, we can easily add it to the other arches.
This driver uses register/structure data gleaned from the Linux
driver released by Broadcom, but does not contain any of the code
from the Linux driver itself. It uses busdma.
latter is a kernel option for IA64_ID_PAGE_SHIFT, which in turn
determines IA64_ID_PAGE_MASK and IA64_ID_PAGE_SIZE.
The constants are used instead of the literal hardcoding (in its
various forms) of the size of the direct mappings created in region
6 and 7. The default and probably only workable size is still 256M,
but for kicks we use 128M for LINT.
specified directory is not found in the mount list. Before the
MNT_BYFSID changes, unmount(2) used to return ENOENT for a nonexistent
path and EINVAL for a non-mountpoint, but we can no longer distinguish
between these cases. Of the two error codes, EINVAL was more likely
to occur in practice, and it was the only one of the two that was
documented.
Update the manual page to match the current behaviour.
Suggested by: tjr
Reviewed by: tjr
atomically extracts and holds the physical page that is associated with the
given pmap and virtual address. Such a function is needed to make the
memory mapping optimizations used by, for example, pipes and raw disk I/O
MP-safe.
Reviewed by: tegge
rl(4) driver and put it in a new re(4) driver. The re(4) driver shares
the if_rlreg.h file with rl(4) but is a separate module. (Ultimately
I may change this. For now, it's convenient.)
rl(4) has been modified so that it will never attach to an 8139C+
chip, leaving it to re(4) instead. Only re(4) has the PCI IDs to
match the 8169/8169S/8110S gigE chips. if_re.c contains the same
basic code that was originally bolted onto if_rl.c, with the
following updates:
- Added support for jumbo frames. Currently, there seems to be
a limit of approximately 6200 bytes for jumbo frames on transmit.
(This was determined via experimentation.) The 8169S/8110S chips
apparently are limited to 7.5K frames on transmit. This may require
some more work, though the framework to handle jumbo frames on RX
is in place: the re_rxeof() routine will gather up frames than span
multiple 2K clusters into a single mbuf list.
- Fixed bug in re_txeof(): if we reap some of the TX buffers,
but there are still some pending, re-arm the timer before exiting
re_txeof() so that another timeout interrupt will be generated, just
in case re_start() doesn't do it for us.
- Handle the 'link state changed' interrupt
- Fix a detach bug. If re(4) is loaded as a module, and you do
tcpdump -i re0, then you do 'kldunload if_re,' the system will
panic after a few seconds. This happens because ether_ifdetach()
ends up calling the BPF detach code, which notices the interface
is in promiscuous mode and tries to switch promisc mode off while
detaching the BPF listner. This ultimately results in a call
to re_ioctl() (due to SIOCSIFFLAGS), which in turn calls re_init()
to handle the IFF_PROMISC flag change. Unfortunately, calling re_init()
here turns the chip back on and restarts the 1-second timeout loop
that drives re_tick(). By the time the timeout fires, if_re.ko
has been unloaded, which results in a call to invalid code and
blows up the system.
To fix this, I cleared the IFF_UP flag before calling ether_ifdetach(),
which stops the ioctl routine from trying to reset the chip.
- Modified comments in re_rxeof() relating to the difference in
RX descriptor status bit layout between the 8139C+ and the gigE
chips. The layout is different because the frame length field
was expanded from 12 bits to 13, and they got rid of one of the
status bits to make room.
- Add diagnostic code (re_diag()) to test for the case where a user
has installed a broken 32-bit 8169 PCI NIC in a 64-bit slot. Some
NICs have the REQ64# and ACK64# lines connected even though the
board is 32-bit only (in this case, they should be pulled high).
This fools the chip into doing 64-bit DMA transfers even though
there is no 64-bit data path. To detect this, re_diag() puts the
chip into digital loopback mode and sets the receiver to promiscuous
mode, then initiates a single 64-byte packet transmission. The
frame is echoed back to the host, and if the frame contents are
intact, we know DMA is working correctly, otherwise we complain
loudly on the console and abort the device attach. (At the moment,
I don't know of any way to work around the problem other than
physically modifying the board, so until/unless I can think of a
software workaround, this will have do to.)
- Created re(4) man page
- Modified rlphy.c to allow re(4) to attach as well as rl(4).
Note that this code works for the sample 8169/Marvell 88E1000 NIC
that I have, but probably won't work for the 8169S/8110S chips.
RealTek has sent me some sample NICs, but they haven't arrived yet.
I will probably need to add an rlgphy driver to handle the on-board
PHY in the 8169S/8110S (it needs special DSP initialization).
ia64_count_cpus() and ia64_probe_sapics() called a single function
to do the the actual work. The difference in behaviour was handled
in that function and was further complicated by adding bootverbose
related code. As such, even the simplest of changes was hard to
comprehend.
Untangling has been done by increasing code duplication and using
a more naive style of coding. FWIW, the object file is slightly
smaller than before, so things aren't as bad as it may seem.
Triggered by: a simple fix on the P4 branch that never got merged.
from the SAB82532 and the Z8530 hardware drivers by introducing
uart_cpu_busaddr(). The assumption is not true on pc98 where
bus_space_handle_t is a pointer to a structure.
The uart_cpu_busaddr() function will return the bus address
corresponding the tag and handle given to it by the BAS.
WARNING: the intend of the function is STRICTLY to allow hardware
drivers to determine which logical channel they control and is NOT
to be used for actual I/O. It is therefore EXPLICITLY allowed that
uart_cpu_busaddr() returns only the lower 8 bits of the address
and garbage in all other bits. No mistakes...
Quick fix for calling DELAY() for ddb input in some (atkbd-based)
console drivers. ddb must not use any normal locks, but DELAY()
normally calls getit() which needs clock_lock. One problem with using
normal locks in ddb is that deadlock is possible, but deadlock on
clock_lock is unlikely becaluse clock_lock is bogusly recursive,
apparently just to hide the problem of ddb using it. The i8254 clock
hardware has mostly write-only registers so it is important for it to
use a lock that gives exclusive access. (atkbd hardware is also
unfriendly to reentrant software but that problem is more local and
already solved.) I mostly saw the symptoms of the bug caused by
unlocking in getit() running cpu_unpend(). cpu_unpend() should not
be called while in ddb and Debugger() calls for failing assertions
about this caused a breakpoint within ddb.
ddb must also not call getit() because ddb may be being used to step
through clock initialization code that has stopped or otherwise mangled
the clock. If the clock is stopped, then getit() always returns the
same value and DELAY() takes forever if it trusts getit().
The quick fix is implement DELAY(n) as (n * timer_freq / 1000000)
inb(0x84)'s if ddb is active.
machdep.c:
Don't permit recursion on clock_lock.
kdb_trap(). Stopping the other CPUs acts like locking them out, but
it wasn't done early enough or held long enough to prevent concurrent
accesses to shared data. In particular, the saved regs could be
clobbered.
with 64-bit longs again. This was fixed in rev.1.42 but the fix
rotted non-fatally in rev.1.105 and fatally in rev.1.137.
Many more non-egregrious casts are strictly required for conversions
from semi-opaque types to pointers, but we avoid most of them by using
types that are almost certain to be compatible with uintptr_t for
representing pointers (e.g., vm_offset_t). Here we don't really want
the u_longs, but we have them because a.out.h and its support code
doesn't use typedefs (it uses unsigned in V7 and unsigned long in
FreeBSD) and is too obsolete to fix now.
FIDs to be 128-bits wide and adds support for realms.
Add a new CODA_COMPAT_5 option, which requests support for the old
Coda 5.x interface instead of the new one.
Create a new coda5.ko module that supports the 5.x interface, and make
the existing coda.ko module use the new 6.x interface. These modules
cannot both be loaded at the same time.
Obtained from: Jan Harkes & the coda-6.0.2 distribution,
NetBSD (drochner) (CODA_COMPAT_5 option).
building a module. Inclusion of option files (opt_ddb.h in this
case) is not possible for modules. The inclusion of opt_ddb.h
in this header is questionable.
(ns8250 copied and s/ns8250/i8251/g), but there for linkage purposes.
Real code to follow, once I get past some boot issues on my pc98 boxes
with recent current.
of what uart(4) is and/or is not see the initial commit log of one
of the files in sys/dev/uart (or see share/man/man4/uart.4).
Note that currently pc98 shares the MD file with i386. This needs
to change when pc98 support is fleshed-out to properly support the
various UARTs. A good example is sparc64 in this respect.
We build uart(4) as a module on all platforms. This may break
the ppc port. That depends on whether they do actually build
modules.
To use uart(4) on alpha, one must use the NO_SIO option.
It improves on sio(4) in the following areas:
o Fully newbusified to allow for memory mapped I/O. This is a must
for ia64 and sparc64,
o Machine dependent code to take full advantage of machine and firm-
ware specific ways to define serial consoles and/or debug ports.
o Hardware abstraction layer to allow the driver to be used with
various UARTs, such as the well-known ns8250 family of UARTs, the
Siemens sab82532 or the Zilog Z8530. This is especially important
for pc98 and sparc64 where it's common to have different UARTs,
o The notion of system devices to unkludge low-level consoles and
remote gdb ports and provides the mechanics necessary to support
the keyboard on sparc64 (which is UART based).
o The notion of a kernel interface so that a UART can be tied to
something other than the well-known TTY interface. This is needed
on sparc64 to present the user with a device and ioctl handling
suitable for a keyboard, but also allows us to cleanly hide an
UART when used as a debug port.
Following is a list of features and bugs/flaws specific to the ns8250
family of UARTs as compared to their support in sio(4):
o The uart(4) driver determines the FIFO size and automaticly takes
advantages of larger FIFOs and/or additional features. Note that
since I don't have sufficient access to 16[679]5x UARTs, hardware
flow control has not been enabled. This is almost trivial to do,
provided one can test. The downside of this is that broken UARTs
are more likely to not work correctly with uart(4). The need for
tunables or knobs may be large enough to warrant their creation.
o The uart(4) driver does not share the same bumpy history as sio(4)
and will therefore not provide the necessary hooks, tweaks, quirks
or work-arounds to deal with once common hardware. To that extend,
uart(4) supports a subset of the UARTs that sio(4) supports. The
question before us is whether the subset is sufficient for current
hardware.
o There is no support for multiport UARTs in uart(4). The decision
behind this is that uart(4) deals with one EIA RS232-C interface.
Packaging of multiple interfaces in a single chip or on a single
expansion board is beyond the scope of uart(4) and is now mostly
left for puc(4) to deal with. Lack of hardware made it impossible
to actually implement such a dependency other than is present for
the dual channel SAB82532 and Z8350 SCCs.
The current list of missing features is:
o No configuration capabilities. A set of tunables and sysctls is
being worked out. There are likely not going to be any or much
compile-time knobs. Such configuration does not fit well with
current hardware.
o No support for the PPS API. This is partly dependent on the
ability to configure uart(4) and partly dependent on having
sufficient information to implement it properly.
As usual, the manpage is present but lacks the attention the
software has gotten.
o Introduce PUC_PORT_TYPE_UART so that we can attach to uart(4),
o Introduce port sub-types (eg PUC_PORT_UART_NS8250, PUC_PORT_UART_Z8530)
to handle different hardware and determine resource sizes.
o Introduce two new IVARs: PUC_IVAR_SUBTYPE and PUC_IVAR_REGSHFT. Both
are used by uart(4) to get sufficient information to talk to the HW.
o Introduce PUC_FLAGS_ALTRES to tell puc(4) to try memory mapped I/O
if I/O port space cannot be allocated, or vice versa.
o Have ports of type PUC_PORT_TYPE_COM attach to uart(1) if attaching
to sio(4) fails (due to not having the sio driver).
o Put struct puc_device_description in struct puc_softc instead of
having a pointer to a device description in the softc. This allows
us to create device descriptions on the fly without having to use
malloc() or otherwise have them staticly defined.
o Move puc_find_description() from puc.c to puc_pci.c as it's specific
to PCI.
o Add EBUS and SBUS frontends for use on sparc64. Note that the P in
puc stands for PCI, so we kinda mess things up here. It's too soon
to worry about it though. We'll know what to do about it in time.
NOTE: This commit changes the behaviour of puc(4) to not quieten the
device probe and attach for child devices. The uart(4) driver provides
additional device description that is valuable to have.
we actually use. Originally, the code reserved 0x8000 to 0x80ff inclusive
which on my hardware conflicts with the acpi timer. This broke the amdpm
driver since it was actually given ports 0x800c to 0x810b (which should
not have happened, IMHO).
This also allows us to considerably simplify the handling of the nForce
smb driver, removing the need for a separate nfpm driver. With this, SMB
accesses appear to work on my Tyan Tiger MP board. Your mileage may vary.
In particular, the nForce changes have not been tested.
we can switch to 64M-sized identity mappings and not having to map the
first 64M. This is especially important because the first 1M contains
the VGA frame buffer and is otherwise a legacy memory range. Best to
make as little assumptions about it as possible. Switching to 64M-sized
mappings is important to avoid creating overlapping translations, which
have the side-effect of triggering machine checks. This is currently
what's preventing us to boot on an Intel Tiger 4.
Note that since we currently use 256M-sized identity mappings, we
would reduce the size of the mappings and consequently increase the
TLB pressure. The performance implications of this are minimal if
measurable at all because identify mappings are not our primary
means for memory management.
Also note that there's no guarantee that physical memory exists at
64M. Then again, we didn't had the guarantee when we were loading at
5M. We'll deal with this when it's a problem.
Discussed with: arun@
Special thanks to Pavlin Radoslavov <pavlin@icir.org> for testing and
fixing numerous problems.
Sponsored by: FreeBSD Foundation
Reviewed by: Pavlin Radoslavov <pavlin@icir.org>
and/or INTR_FAST. This belongs elsehwere and perhaps under bootverbose;
I'm committing it for now as it's uesful to know which drivers have
been converted and which have not.
we return to kernel or userland. This triggered a panic in a KSE
application when TDF_USTATCLOCK was set in the case userland was
interrupted, but we never called ast() on our way out. As such,
we called ast() at some other time. Unfortunately, TDF_USTATCLOCK
handling assumes running in the interrupt thread. This was not
the case anymore.
To avoid making the same mistake later, interrupt() now returns
to its caller whether we interrupted userland or not. This avoids
that we have to duplicate the check in assembly, where it's bound
to fall off the scope. Now we simply check the return value and
call ast() if appropriate.
Run into this: davidxu
to protect the vlan state in each ifnet (e.g. vlan count). The latter is
probably better handled through an ifnet-centric means but since changes
are infrequent shouldn't matter for now.
Sponsored by: FreeBSD Foundation
For the floppy driver, use fdcontrol to manipulate density selection.
For the CD drivers, the 'a' and 'c' suffix is without actual effect and
any applications insisting on it can be satisfied with a symlink:
ln -s /dev/cd0 /dev/cd0a
Ongoing discussion may result in these pieces of code being removed before
the 5-stable branch as opposed to after.
such a card is ejected, we'd panic. Instead, just ignore it.
I should also add a sanity check in the FUNCID code as well, but this
isn't wrong since the check is cheap and happens infrequently.
into targreadfilt(). Unlock around calls to notify_user(). If an application
is sending CCBs while the endpoint is shutting down, this may result in
incomplete disable. A more complete solution will come with a "dying" flag.
Submitted by: simokawa
a correctable DMA error. Failing to do so can cause the error interrupt
to be triggered over and over again.
- Clean up the comments for UEAFSR_* constants, fix a typo (UEAFSR_BLK is
(1 << 23), not (1 << 22)), and add two more. Also, add similar constants
for the CE AFSR bits.
We can't update the device description in attach (why not ?), so
we device_print() what we find.
Conditionalize the short cable fix on this being older than rev 16A.
Call device_printf() when we apply short cable fix.
Include interrupt hold-off setting for rev 16+ under "#ifdef notyet"
The device_printf()'s will go under bootverbose once the various
issues have settled a bit.
out of cdregister() and daregister(), which are run from interrupt context.
The sysctl code does blocking mallocs (M_WAITOK), which causes problems
if malloc(9) actually needs to sleep.
The eventual fix for this issue will involve moving the CAM probe process
inside a kernel thread. For now, though, I have fixed the issue by moving
dynamic sysctl variable creation for these two drivers to a task queue
running in a kernel thread.
The existing task queues (taskqueue_swi and taskqueue_swi_giant) run in
software interrupt handlers, which wouldn't fix the problem at hand. So I
have created a new task queue, taskqueue_thread, that runs inside a kernel
thread. (It also runs outside of Giant -- clients must explicitly acquire
and release Giant in their taskqueue functions.)
scsi_cd.c: Remove sysctl variable creation code from cdregister(), and
move it to a new function, cdsysctlinit(). Queue
cdsysctlinit() to the taskqueue_thread taskqueue once we
have fully registered the cd(4) driver instance.
scsi_da.c: Remove sysctl variable creation code from daregister(), and
move it to move it to a new function, dasysctlinit().
Queue dasysctlinit() to the taskqueue_thread taskqueue once
we have fully registered the da(4) instance.
taskqueue.h: Declare the new taskqueue_thread taskqueue, update some
comments.
subr_taskqueue.c:
Create the new kernel thread taskqueue. This taskqueue
runs outside of Giant, so any functions queued to it would
need to explicitly acquire/release Giant if they need it.
cd.4: Update the cd(4) man page to talk about the minimum command
size sysctl/loader tunable. Also note that the changer
variables are available as loader tunables as well.
da.4: Update the da(4) man page to cover the retry_count,
default_timeout and minimum_cmd_size sysctl variables/loader
tunables. Remove references to /dev/r???, they aren't used
any longer.
cd.9: Update the cd(9) man page to describe the CD_Q_10_BYTE_ONLY
quirk.
taskqueue.9: Update the taskqueue(9) man page to describe the new thread
task queue, and the taskqueue_swi_giant queue.
MFC after: 3 days
in a list head instead of a pointer to the first element at the time of
the first call. These lists are subject to change, and getdirtybuf()
would refetch from the wrong list in some cases.
Spottedy by: tegge
Pointy hat to: me
address of the device identified by its phandle_t by traversing OFW's
device tree. The space and address returned by this function can
subsequently be passed to sparc64_fake_bustag() to construct a valid
tag and handle for use by the newbus I/O functions.
Use of this function is expected to be limited to pre-newbus access to
devices, such as consoles and keyboards.
Partially obtained from: tmm
Reviewed by: jake, jmg, tmm
SBus testing made possible by: jake
Tested with: LINT
and replace it with the more intuitive name PCIR_BARS.
- Add a PCIR_BAR(x) macro that returns the config space register offset of
the 32-bit BAR x.
MFC after: 3 days
This replaces the current ioctl processing with a direct call path
from geom_dev() where the ioctl arrives (from SPECFS) to any directly
connected GEOM class.
The inverse of the above is no longer supported. This is the
situation were you have one or more intervening GEOM classes, for
instance a BSDlabel on top of a MBR or PC98. If you want to issue
MBR or PC98 specific ioctls, you will need to issue them on a MBR
or PC98 providers.
This paves the way for inviting CD's, FD's and other special cases
inside GEOM.
it in the last chunk (phys_avail block). The last chunk very often is
not larger than one or two pages, resulting in a msgbuf that's too
small to hold a complete verbose boot.
Note that pmap_steal_memory() will bzero the memory it "allocates".
Consequently, ia64 will never preserve previous msgbufs. This is not
a noticable difference in practice. If the msgbuf could be reused,
it was invariably too small to have anything preserved anyway.
Changes from the original implementation:
- Fragmentation is handled by the function m_fragment, which can
be called from whereever fragmentation is needed. Note that this
function is wrapped in #ifdef MBUF_STRESS_TEST to discourage non-testing
use.
- m_fragment works slightly differently from the old fragmentation
code in that it allocates a seperate mbuf cluster for each fragment.
This defeats dma_map_load_mbuf/buffer's feature of coalescing adjacent
fragments. While that is a nice feature in practice, it nerfed the
usefulness of mbuf_stress_test.
- Add two modes of random fragmentation. Chains with fragments all of
the same random length and chains with fragments that are each uniquely
random in length may now be requested.
o add locking
o strip irrelevant spl's
o split malloc types to better account for memory use
o remove unused IPSEC_NONBLOCK_ACQUIRE code
o remove dead code
Sponsored by: FreeBSD Foundation
NB: There is a known LOR on the forwarding path; this needs to be resolved
together with a similar issue in the bridge. For the moment it is
believed to be benign.
Sponsored by: FreeBSD Fondation
o remove irrlevant spl
Notes:
1. We don't lock domain list traversals as this is safe until we start
removing domains.
2. The calculation of max_datalen in net_init_domain appears safe as
noone depends on max_hdr and max_datalen having consistent values.
3. Giant is still held for fast and slow timeouts; this must stay until
each timeout routine is properly locked (coming soon).
Sponsored by: FreeBSD Fondation
bail out if the buffer is not already present.
- The buffer returned by incore() is not locked and should not be sent to
brelse(). Use getblk() with the new GB_NOCREAT flag to preserve the
desired semantics.
caller to acquire it. This permits drain_output() to be done atomically
with other operations as well as reducing the number of lock operations.
- Assert that the proper locks are held in drain_output().
- Change getdirtybuf() to accept a mutex as an argument. This mutex is used
to protect the vnode's buf list and the BKGRDWAIT flag. This lock is
dropped when we successfully acquire a buffer and held on return
otherwise. These semantics reduce the number of cumbersome cases in
calling code.
- Pass the mtx from getdirtybuf() into interlocked_sleep() and allow this
mutex to be used as the interlock argument to BUF_LOCK() in the LOCKBUF
case of interlocked_sleep().
- Change the return value of getdirtybuf() to be the resulting locked buffer
or NULL otherwise. This is for callers who pass in a list head that
requires a lock. It is necessary since the lock that protects the list
head must be dropped in getdirtybuf() so that we don't have a lock order
reversal with the buf queues lock in bremfree().
- Adjust all callers of getdirtybuf() to match the new semantics.
- Add a comment in indir_trunc() that points at unlocked access to a buf.
This may also be one of the last instances of incore() in the tree.
growable (stack) entries that not only grow down, but also grow up.
Have vm_map_growstack() take these flags into account when growing
an entry.
This is the first step in adding support for upward growable stacks.
It is a required feature on ia64 to support the register stack (or
rstack as I like to call it -- it also means reverse stack). We do
not currently create rstacks, so the upward growing is not exercised
and the change should be a functional no-op.
Reviewed by: alc
Remove the vnode and dev_t fields and replace them with a void *.
Introduce separate strategy functions for devices and regular (NFS)
vnodes.
For devices we don't need the vnode v_numoutput stuff.
Add a generic swaponsomething() function to add a swapdevice and
split the remainder of swaponvp() into swaponvp() and swapondev()
which calls this backend.
sockets into machine-dependent files. The rationale for this
migration is illustrated by the modified amd64 allocator. It uses the
amd64's direct map to avoid emphemeral mappings in the kernel's
address space. On an SMP, the emphemeral mappings result in an IPI
for TLB shootdown for each transmitted page. Yuck.
Maintainers of other 64-bit platforms with direct maps should be able
to use the amd64 allocator as a reference implementation.
switched from PCCARD_MEM_FOO to PCCARD_A_MEM_FOO, yet we didn't change
exca in all the right places. Do so now. Also use PCCARD_WIDTH_AUTO
rather than the magic cookie 0.
change also disables interrupts around non-S4 suspends whereas before we
did not do this. Our version of AcpiEnterSleepStateS4bios was almost
identical to the ACPICA version.
line up the function names in an earlier generation of the API when
some of the functions returned structure pointers.
Obtained from: TrustedBSD Project
Sponsored by: DARPA, Network Associates Laboratories
apply PHYS_TO_VM_PAGE() to the physical address obtained from the page
table.
(This is based upon similar changes made to the amd64 and i386 pmaps and
a part of a long-term campaign to eliminate pte objects.)
Tested by: wilko
- Add a new PCIM_HDRTYPE constant for the field in PCIR_HDRTYPE that holds
the header type.
- Replace several magic numbers with appropriate constants for the header
type register and a couple of PCI_FUNCMAX.
- Merge to amd64 the fix to the i386 bridge code to skip devices with
unknown header types.
Requested by: imp (1, 2)
- Surround all accesses of the BKGRD{WAIT,INPROG} flags with the vnode
interlock.
- Don't use the B_LOCKED flag and QUEUE_LOCKED for background write
buffers. Check for the BKGRDINPROG flag before recycling or throwing
away a buffer. We do this instead because it is not safe for us to move
the original buffer to a new queue from the callback on the background
write buffer.
- Remove the B_LOCKED flag and the locked buffer queue. They are no longer
used.
- The vnode interlock is used around checks for BKGRDINPROG where it may
not be strictly necessary. If we hold the buf lock the a back-ground
write will not be started without our knowledge, one may only be
completed while we're not looking. Rather than remove the code, Document
two of the places where this extra locking is done. A pass should be
done to verify and minimize the locking later.
reading the CIS on some cards. However, not all just yet. This makes
at least some of the xircom cards that weren't working to work. It
doesn't make my home and away card work, however.
o Don't get the card offset wrong. This is the biggest hassle for
reading the CIS. The old code was just so wrong I can't believe that
it worked at all.
o Don't set the bit that allows/forces 16-bit memory access to the
memory. It is hard coded with 0x80.
o Don't need to slow down memory access with wait-states. OLDCARD didn't
need them and it doesn't hurt anything.
o remove bogus grousying in comment.
32K pages are selected. In spec_getpages() change the printf format
specifier and add an explicit cast so that we always print the field
as a long type.
- In ULCK_BUF we no longer need to acquire the lock, just write the buf out.
- The combination of these changes eliminates one more use of B_LOCKED which
is in the way of making the buffer cache SMP safe. In the long term
ext2fs should probably not try to optimize the use of their metadata bufs
with a private cache. This will starve the rest of the system for buffers
in the extreme case.
Discussed with: bde (A long time ago..)
Tested on: md disk/x86
Bug Fixes:
- Allow users to use LAA
- Remember promiscuous mode settings while bridging
- Allow gratuitous arp's to be sent
PR: 52966/54488
MFC after: 1 week
METEORSSIGNAL ioctl. Applications use this ioctl with the value
METEOR_SIG_MODE_MASK (0xFFFF0000, -65536) to reset signal delivery,
but revision 1.126 caused the driver to return EINVAL in this case.
Interestingly, the same METEORSSIGNAL ioctl in the meteor driver uses
0 to reset signal delivery.
This commit allows METEOR_SIG_MODE_MASK as a synonym for 0 in the
bktr driver, and restructures the code a bit so that it is otherwise
identical between the bktr and meteor drivers.
returned mbuf can be NULL. Check for NULL in rip_output() when
prepending an IP header. This prevents mbuf exhaustion from
causing a local kernel panic when sending raw IP packets.
PR: kern/55886
Reported by: Pawel Malachowski <pawmal-posting@freebsd.lublin.pl>
MFC after: 3 days
compatibility routine, go ahead and accept that as 'success'. A
properly written compatible driver should return < 0 for both the
compat match and compat probe routines, so this will wind up doing the
right thing.
This will get rid of the warnings issued at shutdown (that seems to
worry alot of users), but will also no flush cache on lots of
devices that can, but doesn't set the right support bits...
Without this option it is not possible to omit the driver from the
configuration file and successfully build a kernel.
This option is specific to alpha.
implementations. Use those on platforms that don't have MD
headers. Remove the ia64 MD header. We're going to use the C
implementation there.
Suggested by: bde
Restructure the way ATA/ATAPI commands are processed, use a common
ata_request structure for both. This centralises the way requests
are handled so locking is much easier to handle.
The driver is now layered much more cleanly to seperate the lowlevel
HW access so it can be tailored to specific controllers without touching
the upper layers. This is needed to support some of the newer
semi-intelligent ATA controllers showing up.
The top level drivers (disk, ATAPI devices) are more or less still
the same with just corrections to use the new interface.
Pull ATA out from under Gaint now that locking can be done in a sane way.
Add support for a the National Geode SC1100. Thanks to Soekris engineering
for sponsoring a Soekris 4801 to make this support.
Fixed alot of small bugs in the chipset code for various chips now
we are around in that corner anyways.
bus tag is to allow bus space accesses prior to having newbus
fully initialized, such as would be the case for console drivers.
Since barriers are a fundamental part of bus space accesses, not
allowing them on fake tags would defeat the purpose of these tags.
We use the barrier function normally associated with nexus. This
is the barrier used when subordinates haven't defined a barrier
themselves.
support stripped out and minimally renamed to owi. This driver
attaches to lucent cards only. This is designed to aid in the testing
of fixes to the wi driver for lucent cards. It is supported only as a
module (you cannot compile it into your kernel). You cannot have the
wi driver in your kernel (or loaded as a moudle) to use the owi
module.
I've not connected it to build, as this module is currently for
debugging purposes. This is for developers only at the present time.
If we can't get lucent support fixed by 5.2 code freeze, then we'll
re-evaulate this support level. Please use this to fix the lucent
support in dev/wi. This will be removed from the system when lucent
support has been fixed in dev/wi.
Note to developers: Do not connect this to the build, make it possible
to build into the kernel or otherwise 'integrate' this into system
without checking with me first. This is for debugging purposes only.
If this doesn't work for you, I don't want to hear about it unless you
are fixing the wi driver :-)
gfb_draw if 'flip' is specified. This causes the mouse cut region
to be displayed in reverse color so it is visbile.
- Use the "other" implementation of gfb_cursor for the creator driver,
which doesn't assume there is a hardware cursor. It seems that the
hardware cursor that creator provides doesn't display the character
under the cursor in reverse colors, so the driver does this manually
and uses the hardware cursor for the mouse pointer (which it also works
much better for). This is wedged here because it required less hoops
than accessing the syscons vtb from inside the video driver, which is
needed to read the character and color attributes under the new cursor
position.
These are fixed resolution and operate only in pixel mode so they present
a challenge to syscons (square peg, round hole, etc, etc). The driver
provides a video driver interface for syscons and a separate character
device for X to mmap. Wherever possible the creator's accelarated graphics
functions are used so text mode is very fast.
Based roughly on the openbsd driver.
goto and abstracted by the itry, ithrow and icatch macros (among
others). The problem with this code is that it doesn't compile on
ia64. The compiler is sufficiently confused that it inserts a call
to __ia64_save_stack_nonlock(). This is a magic function that saves
enough of the stack to allow for non-local gotos, such as would be
the case for nested functions. Since it's not a compiler defined
function, it needs a runtime implementation. This we have not in a
standalone compilation as is the kernel.
There's no indication that the compiler is not confused on other
platforms. It's likely that saving the stack in those cases is
trivial enough that the compiler doesn't need to off-load the
complexity to a runtime function.
The code is believed to be correctly translated, but has not been
tested. The overall structure remained the same, except that it's
made explicit. The macros that implement the try/catch construct
have been removed to avoid reintroduction of their use. It's not
a good idea.
In general the rewritten code is slightly more optimal in that it
doesn't need as much stack space and generally is smaller in size.
Found by: LINT
round the result up to a multiple of 4 bytes so that it will always
be a multiple of the sample size. Also use the actual buffer size
from sc->bufsz instead of the default DS1_BUFFSIZE.
This fixes panics and bad distortion I have seen on Yamaha DS-1
hardware, mainly when playing certain Real Audio media.
Reviewed by: orion (an earlier version of the patch)
first sample in the buffer to be ignored. The bug caused a repetitive
glitch in one of the stereo channels when playing mono sound on
configurations that use the monotostereo16 feeder.
Reviewed by: orion
to intptr_t. This fixes a compiler warning (integer from pointer
without cast) in scvgarndr.c when SC_PIXEL_MODE is defined.
o Define readb() and writeb(). Both are used in scvgarndr.c when,
guess what, SC_PIXEL_MODE is defined.
Both changes are ia64 specific.
Found by: LINT
reacquire the "first" object's lock while a backing object's lock is held.
Since this is a lock-order reversal, vm_fault() uses trylock to acquire
the first object's lock, skipping the sequential access optimization in
the unlikely event that the trylock fails.
in struct vm_page are defined as u_int for 16K pages and u_long
for 32K pages, with the implied assumption that long will at least
be 64 bits wide on platforms where we support 32K pages.
from the ia32 specific stuff. Some of this still needs to move to the MI
freebsd32 area, and some needs to move to the MD area. This is still
work-in-progress.
commands. Add a quirk for the Creative Nomad MuVo USB device that uses
it as well as NO_SYNCHRONIZE_CACHE.
PR: kern/53094
Submitted by: Richard Nyberg <rnyberg@it.su.se>
MFC after: 3 days
device should handle them.
This prevents for instance GEOM::ioctl requests from reaching a
lower BSDlabel node, which ps@ found would confuse newfs(8).
CB710, CB720, CB1211, CB1225, CB1410 and CB1420
These are likely licensed designed from TI, and the Linux PCMCIA code
treats them as TI chips.
Add comment, but no ID for the 711E1 from O2Micro.
boot time. Instead, read it a sector at a time. While this sounds
like a significant slowdown, I've not been able to measure any
signficant difference.
Submitted by: luigi
Reviewed by: jhb, sam (both a while ago)
MFC After: 3 days
mac_reflect_mbuf_icmp()
mac_reflect_mbuf_tcp()
These entry points permit MAC policies to do "update in place"
changes to the labels on ICMP and TCP mbuf headers when an ICMP or
TCP response is generated to a packet outside of the context of
an existing socket. For example, in respond to a ping or a RST
packet to a SYN on a closed port.
Obtained from: TrustedBSD Project
Sponsored by: DARPA, Network Associates Laboratories
mac_reflect_mbuf_icmp()
mac_reflect_mbuf_tcp()
These entry points permit MAC policies to do "update in place"
changes to the labels on ICMP and TCP mbuf headers when an ICMP or
TCP response is generated to a packet outside of the context of
an existing socket. For example, in respond to a ping or a RST
packet to a SYN on a closed port.
Obtained from: TrustedBSD Project
Sponsored by: DARPA, Network Associates Laboratories
change in mac_lomac: if both flags are set on the new label, we
may not need to always fill out the label (only if one flag is
set, not both). Avoid stomping on a section of the label if we
are in fact modifying both elements.
Because we know that both flags will be set, we don't need to
test whether the range or single are set in later consistency
checks of the range and single -- just test them.
By checking the range of the new vs. the range of the old label
before testing the single against the new range, we implicitly
test that the new single is in the old range. Document this
with a comment.
Obtained from: TrustedBSD Project
Sponsored by: DARPA, Network Associates Laboratories
vendors that list the vendor ID in the proper byte order. The second
section is for vendors that get it backwards. The third is for what
appear to be 'random' ones (although 0xcxxx appears to be coherent
enough that maybe somebody else is assigning those numbers).
Framework labels:
- Re-work the label state assertions to use a set of central
ASSERT_type_LABEL() assertions.
- Test to make sure labels passed to externalize/internalize calls haven't
been destroyed.
- For access control checks, assert the condition of all labels passed in.
- For life cycle events, assert the condition of all labels passed in.
- Add new entry point implementations for new MAC Framework entry points:
mac_test_reflect_mbuf_icmp(), mac_test_reflect_mbuf_tcp(),
mac_test_check_vnode_deleteextattr(), mac_test_check_vnode_listextattr().
Obtained from: TrustedBSD Project
Sponsored by: DARPA, Network Associates Laboratories
mac_stub policy and no longer mac_none (as found in the repocopy).
Add comment to this effect.
Obtained from: TrustedBSD Project
Sponsored by: DARPA, Network Associates Laboratories
o AD1980 hook.
o ac97_fix_auxout.
and:
o Associate AC97_MIX_AUXOUT with SOUND_MIXER_OGAIN rather than
SOUND_MIXER_MONITOR.
o Add ac97_fix_tone to remove tone controls from mixer if invalid.
explicit access control checks to delete and list extended attributes
on a vnode, rather than implicitly combining with the setextattr and
getextattr checks. This reflects EA API changes in the kernel made
recently, including the move to explicit VOP's for both of these
operations.
Obtained from: TrustedBSD PRoject
Sponsored by: DARPA, Network Associates Laboratories
MAC_DEBUG_COUNTER_INC() and MAC_DEBUG_COUNTER_DEC() to maintain
debugging counter values rather than #ifdef'ing the atomic
operations to MAC_DEBUG.
Obtained from: TrustedBSD Project
Sponsored by: DARPA, Network Associates Laboratories
represents the pruely stylistic changes and should have no net impact
on the rest of the code.
bde's more substantive changes will follow in a separate commit once
we've come to closure on them.
Submitted by: bde
UMA_ZFLAG_INTERNAL zones at all. Apparently, Wilko's alpha
was crashing while entering multi-user because, I think, we
were calculating the garbage cachefree for pcpu caches that
essentially don't exist for at least the 'zones' zone and it so
happened that we were reading from an unmapped location.
Confirmed to fix crash: wilko
Helped debug: wilko, gallatin
from queue(3).
Improve vertical compactness by using a IGMP_PRINTF() macro rather
than #ifdefing IGMP_DEBUG a large number of debugging printfs.
Reviewed by: mdodd (SLIST changes)
specific interfaces. This is required by aodvd, and may in future help us
in getting rid of the requirement for BPF from our import of isc-dhcp.
Suggested by: fenestro
Obtained from: BSD/OS
Reviewed by: mini, sam
Approved by: jake (mentor)
are all bogus, and the cards that don't decode things quite right
often have hundreds of them. This will fix starvation of small dmesg
buffers and allow better debugging to happen. I thought about adding
an override, but there is such a thing as too many knobs. :-)
ntp_update_second twice when we have a large step in case that step
goes across a scheduled leap second. The only way this could happen
would be if we didn't call tc_windup over the end of day on the day of
a leap second, which would only happen if timeouts were delayed for
seconds. While it is an edge case, it is an important one to get
right for my employer.
Sponsored by: Timing Solutions Corporation
ultimate trigger for the follow-up fixes in revisions 1.78, 1.80,
1.81 and 1.82 of trap.c. I was simply too pre-occupied with the
gateway page and how it blurs kernel space with user space and
vice versa that I couldn't see that it was all a load of bollocks.
It's not the IP address that matters, it's the privilege level that
counts. We never run in user space with lifted permissions and we
sure can not run in kernel space without it. Sure, the gateway page
is the exception, but not if you look at the privilege level. It's
user space if you run with user permissions and kernel space otherwise.
So, we're back to looking at the privilege level like it should be.
There's no other way.
Pointy hat: marcel
- Fix up TX speed changes.
- Make mpi-350 cards sort-of work with new firmware. It RXs okay but TXs
only work for about 14 packets then fails to get an interrupt. The
TX watchdog fires. It has been reported that my hack for now doesn't
break cards with the older firmware. It appears my card has lost
the ability to RX or TX at all but other peoples cards work. I assume
it got damaged in tansport.
MFC: 1 week.
count handling of station entries in hostap mode:
Input path:
o driver is now expected to find the node associated with the
sender of a received frame; use ic_bss if none is located
o driver passes the (referenced) node into ieee80211_input for
use within the wlan module and is responsible for cleaning up
on return
o the antenna state is no longer passed up with each frame; this
is now considered driver-private state and drivers are responsible
for keeping it in the driver-private part of a node
Output path:
Revamp output path for management frames to eliminate redundant
locking that causes problems and to correct reference counting
bogosity that occurs when stations are timed out due to inactivity
(in AP mode). On output the refcnt'd node is stashed in the pkthdr's
recvif field (yech) and retrieved by the driver. This eliminates
an unref/ref scenario and related node table unlock/lock due to the
driver looking up the node. This is particularly important when
stations are timed out as this causes a lock order reversal that
can result in a deadlock. As a byproduct we also reduce the overhead
for sending management frames (minimal). Additional fallout from
this is a change to ieee80211_encap to return a refcn't node for
tieing to the outbound frame. Node refcnts are not reclaimed until
after a frame is completely processed (e.g. in the tx interrupt
handler). This is especially important for timed out stations as
this deref will be the final one causing the node entry to be
reclaimed.
Additional semi-related changes:
o replace m_copym use with m_copypacket (optimization)
o add assert to verify ic_bss is never free'd during normal operation
o add comments explaining calling conventions by drivers for frames
going in each direction
o remove extraneous code that "cannot be executed" (e.g. because
pointers may never be null)
that caused a 3-4 times slow down in performance.
(the primary Sparc64 developers are all using OFW_NEWPCI already, so it is
the best code path for users)
quietly discard them; this just permits them to be collected with bpf)
o add a counter for the number of rate control frames discarded when not in
monitor mode
o move the rx "too short" statistic in the stat structure so non-error rx stats
are together (NB: ABI change to apps that collect stats via driver ioctl)
mistakes (this mistake was not an issue because the length is only used to
decide whether or not to allocate a cluster)
o while here, move a beacon length comment to the "right place"
_pmap_allocpte(): Guarantee that the page table page is zero filled before
adding it to the directory. Otherwise, a 2nd, 3rd, etc. thread could
access a nearby virtual address and use garbage for the address
translation.
Discussed with: peter, tegge
of bw_meter entries were processed up to one second ahead.
After an unappropriate rescheduling of some of the bw_meter
entries, the upcalls weren't delivered.
* pim_register_prepare() uses the appropriate sw_csum flag to
call ip_fragment() so the IP checksum is computed properly.
* Modify pim_register_prepare() to take care of IP packets that
don't need fragmentation.
* Add-back in_delayed_cksum() to encap_send(), because it seems it
should be there.
Submitted by: Pavlin Radoslavov <pavlin@icir.org>
also fixes pfs_access() since it relies on VOP_GETATTR() which will call
pfs_getattr(). This prevents jailed processes from discovering the
existence, start time and ownership of processes outside the jail.
PR: kern/48156
present, and non-zero when it is (or may be) absent. The test
cbb_child_present was backwards. However, typical usage in the tree
would cause it to do the right thing because the card really wasn't
there the OK flag would be turned on.
Also, assume that if any of these bits are turned on we don't have a
card, rather than requiring both of them in the suspend/resume
routines.
Noticed by: cognet
directories. Previously, pfs_iterate() would return -1 when it
reached the end of the process list while processing a process
directory node, even if the parent directory contained further nodes
(which is the case for the linprocfs root directory, where the process
directory node is actually first in the list). With this patch,
pfs_iterate() will continue to traverse the parent directory's node
list after exhausting the process list (as was the intention all
along). The code should hopefully be easier to read as well.
While I'm here, have pfs_iterate() assert that the allproc lock is
held.
to what is in NetBSD. I have a few cards that tickles this bug, and
this just keeps us from panicing. It doesn't actually fix the problem
(that will happen once I figure out why some cards hate the address
their CIS is mapped to high memory).
binaries in /bin and /sbin installed in /lib. Only the versioned files
reside in /lib, the .so symlink continues to live /usr/lib so the
toolchain doesn't need to be modified.
like we have on other platforms. Move savectx() to <machine/pcb.h>.
A lot of files got these MD prototypes through the indirect inclusion
of <machine/cpu.h> and now need to include <machine/md_var.h>. The
number of which is unexpectedly large...
osf1_misc.c especially is tricky because szsigcode is redefined in
one of the osf1 header files. Reordering of the include files was
needed.
linprocfs.c now needs an explicit extern declaration.
Tested with: LINT
prototypes of cpu_halt(), cpu_reset() and swi_vm() from md_var.h to
cpu.h. This affects db_command.c and kern_shutdown.c.
ia64: move all MD prototypes from cpu.h to md_var.h. This affects
madt.c, interrupt.c and mp_machdep.c. Remove is_physical_memory().
It's not used (vm_machdep.c).
alpha: the MD prototypes have been left in cpu.h with a comment
that they should be there. Moving them is left for later. It was
expected that the impact would be significant enough to be done in
a seperate commit.
powerpc: MD prototypes left in cpu.h. Comment added.
Suggested by: bde
Tested with: make universe (pc98 incomplete)
A timecounter will be selected when registered if its quality is
not negative and no less than the current timecounters.
Add a sysctl to report all available timecounters and their qualities.
Give the dummy timecounter a solid negative quality of minus a million.
Give the i8254 zero and the ACPI 1000.
The TSC gets 800, unless APM or SMP forces it negative.
Other timecounters default to zero quality and thereby retain current
selection behaviour.
Sign extension happens after the shift, not before so that boundary
cases like 0x40000000 will not be caught properly.
Instead, right shift ndirty. It is guaranteed to be a multiple of 8.
While here, do some manual code motion and code commoning.
Range check bug pointed out by: iedowse
case, a "Vortex86" mini PC), the PCI device ID value in the EEPROM (0x8100)
does not agree with the PCI device ID returned by pci_get_device() (0x8139).
This means that while rl_probe() matches the device, rl_attach() doesn't.
Work around this by adding an entry to the rl_devs table for the 8100 with
a device ID of 0x8100.
Also, get rid of extra instance of __FBSDID(). One is enough.
- Update some stale comments.
- Sort a couple of includes.
- Only set 'newcpu' in updatepri() if we use it.
- No functional changes.
Obtained from: bde (via an old diff I got a long time ago)
get the same value from ip->i_ump->um_devvp.
This saves a pointer in the memory copies of inodes, which can
easily run into several hundred kilobytes.
The extra indirection is unmeasurable in benchmarks.
Approved by: mckusick
- Add a macro for the logical shift needed to extract an APIC ID from
either from the local APIC ICR Hi register or the APIC ID registers of
the local and IO APICs.
This code dates back to the very first diskless support on FreeBSD,
back when swapon(8) couldn't simply be run on a NFS backed file.
Suggested replacement command sequence on the client:
dd if=/dev/zero of=/swapfile bs=1k count=1 oseek=100000
swapon /swapfile
rm -f /swapfile
For whatever value of 100000 you want.
Minor code reorganization was required, but the only functional
change was that the first 1024 bytes of output are thrown out
after each reseed, rather than just the initial seed.
was masked. However KIMURA Yasuhiro-san noticed my mistake and was
kind enough to provide a better patch in PR 55581. I've merged that
into the routine. Hopefully I've not overlooked anything this time.
MFC After: 5 days
that were on the kernel stack into account. For now we write them
out to the register stack of the process before creating the dump.
This however is not the final solution. The problem is that we may
invalidate the coredump by overwriting vital information due to an
invalid backing store pointer. Instead we need to write the dirty
registers to an unused region of VM which will result in a seperate
segment in the coredump. For now we can at least get to all the
registers from a coredump.
and the move to control register to avoid dependency violations when
these functions are used. Note that explicit data and instruction
serialization also need to be in a subsequent instruction group.
This too requires that we have an igrp break here.
PT_SETKSTACK. These requests allow the tracing process to access the
dirty registers of the traced process that are on the kernel stack.
Note that there's currently no way to access the rnat register for
those dirty registers that are not (yet) covered by a nat collection
point. The interface for this is still being slept on.
Also note that implied by these requests is the division of work:
The tracing process has to keep track of where registers are spilled
and is responsible to figure out where the NaT bit of the stacked
registers are at any time during the execution of the traced process.
The kernel provides the interfaces but will not abstract the fact
that the register stack can be split. This model does not follow
the approach taken in Linux where PT_PEEK and PT_POKE deals with
this automagically.
check for permissions, do it for all requests, not the known requests.
Later when we actually service the request we deal with the invalid
requests we previously caught earlier.
This commit changes the behaviour of the ptrace(2) interface for
boundary cases such as an unknown request without proper permissions.
Previously we would return EINVAL. Now we return EBUSY or EPERM.
Platforms need to define __HAVE_PTRACE_MACHDEP when they have MD
requests. This makes the prototype of cpu_ptrace() visible and
introduces a call to this function for all requests greater or
equal to PT_FIRSTMACH.
Silence on: audit
Correctly handle additional disks without BIOS partition tables.
Previously, vinum_scandisk stopped scanning additional disks for
native partitions after any good partition was found. This applies
to all platforms, but was a particular problem on systems without
BIOS partition tables.
Submitted by: harti
the hardware mutex if it is held. Re-add calls to Enable/Clear fixed events.
This is not known to have caused problems. Bug symptoms might have included
instability after an aborted suspend attempt or power/sleep buttons not
being enabled.
kobj global method table; also kassert that the table has not overflowed
when defining a new method.
there are indications that the table is being overflowed in certain
situations as we gain more kobj consumers- this will allow us to check
whether kobj is at fault. symptoms would be incorrect methods being called.
CP-168U board. It initializes and attaches in the same way as the
older (but higher performance) C168H. The only difference is the
board ID, which is 0x1681.
PR: kern/53548
Submitted by: regnauld@catpipe.net
MFC after: 1 week
the Palm device and the USB host controller deadlock. The USB host
controller is expecting an early-end-of-transmission packet with 0
data, and the Palm doesn't send one because it's already communicated
the amount of data it's going to send in a header (which ucom/uvisor
are oblivious to). This is the problem that has been known on the
pilot-link lists as the "[Free]BSD USB problem", but not understood.
Submitted by: Nathan J. Williams <nathanw@MIT.EDU>
multi-fragment transmission. I'm not sure if this is a bug or a requirement
that I overlooked with going through the documentation, but the sample
8169 NIC that I have seems to require it at least some of the time or
else it botches TCP checksums on segments that span multiple descriptors.
to override the method pointers for manipulating nodes; this fixes
a problem where the ic_bss node was not being created properly
for the ath driver causing the driver to scribble on random memory.
Noticed by: David Young <dyoung@pobox.com>
rate set element id from an AP. This allows stations to associate with
AP's that violate the 802.11 spec by sending >8 rates. This corrects a
recent regression; older code did likewise.
for partly-aligned operations through /dev/crypto (unlikely)
o add missing case in iov code that never showed up because of the above bug
Submitted by: "Jason L. Wright" <jason@thought.net>
MFC after: 3 days
to walk the list and remove the current item and destroy/free it.
Alexander Kabaev will likely do the equivalent for the other list
types, but I just happened to have this one sitting in a local
non-FreeBSD tree already.
segments are lost for the application. This broke, for example,
ports/benchmarks/dbs which needs the SYN segment to filter the
contents of the trace buffer for the connection it is interested in.
This patch makes the SYN segments available again. Unfortunately they
are now associated with the listening socket instead of the new one, so
a change to applications is required, but without this patch it wouldn't
work altogether.
PR: kern/45966
code has rotten a bit so that the header length is not correct at
the point when tcp_trace is called. Temporarily compute the correct
value before the call and restore the old value after. This makes
ports/benchmarks/dbs to almost work.
This is a NOP unless you compile with TCPDEBUG.
in tcp_input that leave the function before hitting the tcp_trace
function call for the TCPDEBUG option. This has made TCPDEBUG mostly
useless (and tools like ports/benchmarks/dbs not working). Add
tcp_trace calls to the return paths that could be identified in this
maze.
This is a NOP unless you compile with TCPDEBUG.
in user space or kernel space. VM_MIN_KERNEL_ADDRESS starts after the
gateway page, which means that improper memory accesses to the gateway
page while in user mode would panic the kernel. Use VM_MAX_ADDRESS
instead. It ends before the gateway page. The difference between
VM_MIN_KERNEL_ADDRESS and VM_MAX_ADDRESS is exactly the gateway page.
move to ar.rsc. The RSE must be in enforced lazy mode when writing
to RSE modifyable registers. In this case we restore the RSE NaT
collection register ar.rnat. I have seen 2 general exception faults
on pluto1 now that indicate that the move to ar.rsc has already
happened prior to the move to ar.rnat, meaning that the RSE is not
in enforced lazy mode anymore. The ia64 dependency and instruction
ordering rules seem to allow having both registers written to in
the same instruction group, provided ar.rsc is written to later than
ar.rnat (based on the ordering semantics). It appears that we may
be pushing our luck. For now, put them in seperate cycles (by means
of the instruction group break). If we ever get a general exception
fault on the move to ar.rnat again, we have definite proof that
something else is fishy.
masks for files and directories. This should make some
of the Midnight Commander users happy.
Remove an extra ')' in the manual page.
PR: 35699
Submitted by: Eugene Grosbein <eugen@grosbein.pp.ru> (original version)
Tested by: simon
cpu_switch() where both the old and new threads are passed in as
arguments. Only powerpc uses the old conventions now.
- Update comments in the Alpha swtch.s to reflect KSE changes.
Tested by: obrien, marcel
new ATMIOCOPENVCC/CLOSEVCC. This allows us to not only use UBR channels
for IP over ATM, but also CBR, VBR and ABR. Change the format of the
link layer address to specify the channel characteristics. The old
format is still supported and opens UBR channels.
integer value and then to construct the integer from it. This buffer
was sizeof(int) bytes long, which was fine until the (undocumented) 'g'
modifier for 8-byte integers was introduced. Change this to sizeof(uint64_t).
found only many tv-cards.
We currently use more ore less evil hacks (slow_msp_audio sysctl) to
configure the various variants of these chips in order to have
stereo autodetection work. Nevertheless, this doesn't always work
even though it _should_, according to the specs.
This is, for example, the case for some popular Hauppauge models sold
sold in Germany.
However, the Linux driver always worked for me and others. Looking at
the sourcecode you will find that the linux-driver uses a very much
enhanced approach to program the various msp34xx chipset variants,
which is also found in the specs for these chips.
This is a port of the Linux MSP34xx code, written by Gerd Knorr
<kraxel@bytesex.org>, who agreed to re-release his code under a
BSD license for this port.
A new config option "BKTR_NEW_MSP34XX_DRIVER" is added, which is required
to enable the new driver. Otherwise the old code is used.
The msp34xx.c file is diff-reduced to the linux-driver to make later
modifications easier, thus it doesn't follow style(9) in most cases.
Approved by: roger (committing this, no time to test/review),
keichii (code review)
o Differentiate between CPU family and CPU model. There are multiple
Itanium 2 models and it's nice to differentiate between them.
o Seperately export the CPU family and CPU model with sysctl.
o Merced is the only model in the Itanium family.
o Add Madison to the Itanium 2 family. We already knew about McKinley.
o Print the CPU family between parenthesis, like we do with the i386
CPU class.
My prototype now identifies itself as:
CPU: Merced (800.03-Mhz Itanium)
pluto1 and pluto2 will eventually identify themselves as:
CPU: McKinley (900.00-Mhz Itanium 2)
These are 10/100 only NICs found on the IBM Thinkpad R40E and
G40. These seem to be based on the BCM5705 MAC but with a PHY
that doesn't support 1000Mbps modes.
Submitted by: Igor Sviridov <sia@nest.org>
compare the zone element size (+1 for the byte of linkage) against
UMA_SLAB_SIZE - sizeof(struct uma_slab), and not just UMA_SLAB_SIZE.
Add a KASSERT in zone_small_init to make sure that the computed
ipers (items per slab) for the zone is not zero, despite the addition
of the check, just to be sure (this part submitted by: silby)
- UMA_ZONE_VM used to imply BUCKETCACHE. Now it implies
CACHEONLY instead. CACHEONLY is like BUCKETCACHE in the
case of bucket allocations, but in addition to that also ensures that
we don't setup the zone with OFFPAGE slab headers allocated from the
slabzone. This means that we're not allowed to have a UMA_ZONE_VM
zone initialized for large items (zone_large_init) because it would
require the slab headers to be allocated from slabzone, and hence
kmem_map. Some of the zones init'd with UMA_ZONE_VM are so init'd
before kmem_map is suballoc'd from kernel_map, which is why this
change is necessary.
- All those diffs to syscalls.master for each architecture *are*
necessary. This needed clarification; the stub code generation for
mlockall() was disabled, which would prevent applications from
linking to this API (suggested by mux)
- Giant has been quoshed. It is no longer held by the code, as
the required locking has been pushed down within vm_map.c.
- Callers must specify VM_MAP_WIRE_HOLESOK or VM_MAP_WIRE_NOHOLES
to express their intention explicitly.
- Inspected at the vmstat, top and vm pager sysctl stats level.
Paging-in activity is occurring correctly, using a test harness.
- The RES size for a process may appear to be greater than its SIZE.
This is believed to be due to mappings of the same shared library
page being wired twice. Further exploration is needed.
- Believed to back out of allocations and locks correctly
(tested with WITNESS, MUTEX_PROFILING, INVARIANTS and DIAGNOSTIC).
PR: kern/43426, standards/54223
Reviewed by: jake, alc
Approved by: jake (mentor)
MFC after: 2 weeks
From alc:
Move pageable pipe memory to a seperate kernel submap to avoid awkward
vm map interlocking issues. (Bad explanation provided by me.)
From me:
Rework pipespace accounting code to handle this new layout, and adjust
our default values to account for the fact that we now have a solid
limit on allocations.
Also, remove the "maxpipes" limit, as it no longer has a purpose.
(The limit on kva usage solves the problem of having two many pipes.)
application could cause a wired page to be freed. In general,
vm_page_hold() should be preferred for ephemeral kernel mappings of pages
borrowed from a user-level address space. (vm_page_wire() should really be
reserved for indefinite duration pinning by the "owner" of the page.)
Discussed with: silby
Submitted by: tegge
psignal()/tdsignal(). The test was historically in psignal(). It was
changed into a KASSERT, and then later moved to tdsignal() when the
latter was introduced.
Reviewed by: iedowse, jhb
ioctls.
In the particular case of ptrace(), this commit more-or-less reverts
revision 1.53 of sys_process.c, which appears to have been erroneous.
Reviewed by: iedowse, jhb
parameter. The new name better reflects what the function does and
how it is used. The last parameter was always FALSE.
Note: In theory, gcc would perform constant propagation and dead code
elimination to achieve the same effect as removing the last parameter,
which is always FALSE. In practice, recent versions do not. So, there
is little point in letting unused code pessimize execution.
to configure this correctly yields many watchdog timeouts even on lightly
loaded machines. This is a common complaint from users with Dell 1750
servers with built-in dual 5704 NICs.
magic from exec_setregs(). In set_mcontext() we now also don't have
to worry that we entered the kernel with more that 512 bytes of
dirty registers on the kernel stack. Note that we cannot make any
assumptions anymore WRT to NaT collection points in exec_setregs(),
so we have to deal with them now.
word between the 8139C+ and the 8169. The 8139C+ has a 'frame alignment
error bit' (bit 27) but the 8169 does not. Rather than simply mark this
bit as reserved, RealTek removed it completely and shifted the remaining
status bits one space to the left. This was causing rl_rxeofcplus()
to misparse the error and checksum bits.
To workaround this, rl_rxeofcplus() now shifts the rxstat word one
bit to the right before testing any of the status bits (but after
the frame length has been extracted).
the time the card is inserted and the time that the card is
configured. This can lead to interrupt storms. The O2Micro suggested
workaround is to route the card function interrupt to IRQ1. It
appears from my testing that this is an acceptable workaround for most
chipsets (there's still some issue with the ricoh chipset).
Also, only look at the NOT_A_CARD bit when the bridge tells us there's
a card present. At least one test caused this to be true after the
card was removed, but the author couldn't recreate it with the
workaround in place. The change is more conservative than the
previous code, but still has the work around that wasn't present in
the older code.
note the existence of the 8169S and 8110S components. (The 8169
is just a MAC, the 8169S and 8110S contain both a MAC and PHY.)
- Properly handle list and buffer addresses as 64-bit. The RX and
TX DMA list addresses should be bus_addr_t's. Added RL_ADDR_HI()
and RL_ADDR_LO() macros to obtain values for writing into chip
registers.
- Set a slightly different TIMERINT value for 8169 NICs for improved
performance.
- Change left out of previous commit log: added some additional
hardware rev codes for other 10/100 chips and for the 8169S/8110S
'rev C' gigE MACs.
BGE_MACSTAT_MI_COMPLETE bit in the MAC status register as a link
change indicator. We turn this bit on now because some of the newer
chips need it, but it usually just means that reading/writing
an MII/GMII register has completed, not that a link change has
occured.
the standard.
1) When the bridge tells us that we have a card that isn't recognized, we
use the force register to force the CV_TEST to run. This test causes the
bridge to re-evaluate the card. Once this re-evaluation process happens,
we get a new interrupt that may say it is ready to process. We try this up
to 20 times. Tests have shown that this appears to correctly reset the
'Unknown card type' problem that I saw on my Sony PCG-505TS.
2) Take a page from OLDCARD and always read the CSC register in the ISR.
Some TI (and it seems maybe Ricoh) chipsets require this to behave
properly. This work around appears to work due to some power management
protocols that were improperly implemented. Maybe it can be removed when
this driver supports the full PME# protocol described in the standards.
3) Minor additional debug printf when debugging is enabled.
4) Minor additional commentary for things that are obvious only after study.
# I'm committing this from my Sony PCG-505TS using shared PCI interrupts
# and NEWCARD, but there are some issues with the Ricoh bridge still, but
# at least now I can boot with the card inserted and have it work.
a reference to the containing object. The purpose of the reference
being to prevent the destruction of the object and an attempt to free
the wired page. (Wired pages can't be freed.) Unfortunately, this
approach does not work. Some operations, like fork(2) that call
vm_object_split(), can move the wired page to a difference object,
thereby making the reference pointless and opening the possibility
of the wired page being freed.
A solution is to use vm_page_hold() in place of vm_page_wire(). Held
pages can be freed. They are moved to a special hold queue until the
hold is released.
Submitted by: tegge
PCG-505BX (for example) has one of those:
wi0: <Intersil Prism3> mem 0xf8000000-0xf8000fff at device 2.0 on pci2
wi0: 802.11 address: 00:02:8a:94:d8:73
wi0: using RF:PRISM3(Mini-PCI)
wi0: Intersil Firmware: Primary (1.1.1), Station (1.5.6)
wi0: 11b rates: 1Mbps 2Mbps 5.5Mbps 11Mbps
1) avoid immediately calling bzero() after malloc() by passing M_ZERO
2) do not initialize individual members of the global context to zero
3) remove an unused assignment of ifctx in bootpc_init()
Reviewed by: tegge
has the same product id, but different vendor id. It also appears
that the MELCO's id should be 0x18a instead of 0x8a01. Fix this.
Submitted by: Shizuka Kudo-san
Disabled by default. To enable it, the new "options PIM" must be
added to the kernel configuration file (in addition to MROUTING):
options MROUTING # Multicast routing
options PIM # Protocol Independent Multicast
2. Add support for advanced multicast API setup/configuration and
extensibility.
3. Add support for kernel-level PIM Register encapsulation.
Disabled by default. Can be enabled by the advanced multicast API.
4. Implement a mechanism for "multicast bandwidth monitoring and upcalls".
Submitted by: Pavlin Radoslavov <pavlin@icir.org>
semaphore and doing so can lead to a possible reversal. WITNESS would have
caught this if semaphores were used more often in the kernel.
Submitted by: Ted Unangst <tedu@stanford.edu>, Dawson Engler
and up commands. When configuring the interface down only the
connections that are currently closing are deleted from the connection
table. When the interface is configured up, all connections that
are in the table are re-opened.
connections that have been open (and were not closing) when
the interface was stopped. This makes the behaviour of fatm(4) more like
the behaviour of en(4).
when we create contexts. The meaning of the flags are documented in
<machine/ucontext.h>. I only list them here to help browsing the
commit logs:
_MC_FLAGS_ASYNC_CONTEXT
_MC_FLAGS_HIGHFP_VALID
_MC_FLAGS_KSE_SET_MBOX
_MC_FLAGS_RETURN_VALID
_MC_FLAGS_SCRATCH_VALID
Yes, _MC_FLAGS_KSE_SET_MBOX is a hack and I'm proud of it :-)
o For trap-based upcalls the argument (the kse_mailbox) to
the UTS must be written onto the kernel stack, not the
user stack. While here, deal with the fact that we may
be at a NaT collection point.
- Fix a bug in rl_dma_map_desc(): set the 'end of ring' bit in the
right descriptor (DESC_CNT - 1, not DESC_CNT). The 8139C+ is limited
to 64 descriptors and automatically wraps at 64 descriptors even
if the EOR bit isn't set, but the 8169 NIC can have up to 1024
descriptors per ring, so we must set the wrap point in the right
place.
- RealTek moved the RL_TIMERINT register from offset 0x54 to 0x58 in
the 8169 -- account for this.
- Added rl_gmii_readreg() and rl_gmii_writereg() routines.
- Fix rl_probe() to deal with the case where the base type is
not RL_8139.
The next step is to add jumbo buffer support.
Tested with the Xterasys XN-152 NIC (hard to beat $29 for a gigE NIC).
path into the kernel. Normally it's due to a syscall, but one can
also be created as the result of a clock interrupt (for example).
This now even more looks like exec_setregs().
While here, add an assert that we don't expect more than 8KB of
dirty registers on the kernel stack.
unconditionally restore ar.k7 (kernel memory stack) and ar.k6
(kernel register stack). I don't know what I was smoking then,
but if you unconditionally restore ar.k6, you also want to
compute its value unconditionally. By having the computation
predicated and dependent on whether we return to user mode, we
would end up writing junk (= invalid value for ar.bspstore) if
we would return to kernel mode. But the whole point of the
unconditional restoration was that there is a grey area where
we still need to have ar.k6 restored. If we restore with a junk
value, we would end up wedging the machine on the next interrupt.
So, unconditionally calculate the value we unconditionally write
to ar.k6.
o The previous braino was found while making the following change:
We used to clear the lower 9 bits of the value we write to ar.k6.
The meaning being that we know that the kernel register stack is
at least 512 byte aligned and simply clearing the lower 9 bits
allows us to return to a context of which we don't have dirty
registers on the kernel stack, even though the context that
entered the kernel does have dirty registers on the kernel stack.
By masking-off the lower bits, we correctly obtain the base of
the register stack without having to worry that we didn't actually
reached the base while unwinding it.
The change is to mask off the lower 13 bits, knowing that the
kernel register stack is always 8KB aligned. The advantage is that
we don't have to worry anymore if there's more than 512 bytes of
dirty registers on the kernel stack. A situation that frequently
occurs. In exec_setregs() in machdep.c:1.147 or older, we had to
deal with that situation by copying the active portion of the
register stack down in multiples of 512 bytes. Now that we mask off
the lower 13 bits we don't have to do that at all. Contemporary
IPF processors have a register file that can hold up to 96 stacked
registers (=784 bytes [incl. 2 NaT collections]). With no indication
that register files grow beyond a couple of hundred registers, we
should not have to worry about it anymore... and yes, 640KB is
enough for everybody :-)
This change helps setcontext(2) and cpu_set_upcall_kse() in that
they can return to completely different contexts without having to
mess with the kernel stack. Of course exec_setregs() doesn't need
to do that anymore as well.
queues lock such that it isn't held around the call to get_pv_entry(),
which calls uma_zalloc(). At the point of the call to get_pv_entry(), the
lock isn't necessary and holding it could lead to recursive acquisition,
which isn't allowed.
that the page's busy flag could be relied upon to synchronize access to the
pv list. I don't any longer. See, for example, the call to
pmap_insert_entry() from pmap_copy().)
(short) types for the port arg of inb() (rev.1.56). The warning started
working for u_short types with gcc-3.3. The pessimizations exposed
by this been fixed except for the cx and oltr drivers where the breakage
of the warning has been pushed to the drivers.
completenss. The pessimization is tiny compared with i/o port slowness
except on very old machines, but code that used signed short types for
i/o ports was unpessimized long ago, and the macro that detected it
recently started working for u_short types too. Use of bus space
should have made this moot long ago.
Not tested at runtime by: bde
completenss. The pessimization is tiny compared with i/o port slowness
except on very old machines, but code that used signed short types for
i/o ports was unpessimized long ago, and the macro that detected it
recently started working for u_short types too. Use of bus space
should have made this moot long ago.
Not tested at runtime by: bde
This change allows one to specify almost the complete traffic parameters
for IPoverATM channels through the routing table. Up to now we used
4 byte DL addresses (flag, vpi, vciH, vciL). This format is still allowed.
If the address is longer, however, the 5th byte is interpreted as the
traffic class (UBR, CBR, VBR or ABR) and the remaining bytes are the
parameters for this traffic class:
UBR: 0 byte or 3 byte PCR
CBR: 3 byte PCR
VBR: 3 byte PCR, 3 byte SCR, 3 byte MBS
ABR: 3 byte PCR, 3 byte MCR, 3 byte ICR, 3 byte TBE, 1 byte NRM,
1 byte TRM, 2 bytes ADTF, 1 byte RIF, 1 byte RDF and 1 byte CDF
A script to generate the corresponding 'route add' arguments will follow soon.
connection is to be established asynchronously, behave as in the
case of non-blocking mode:
- keep the SS_ISCONNECTING bit set thus indicating that
the connection establishment is in progress, which is the case
(clearing the bit in this case was just a bug);
- return EALREADY, instead of the confusing and unreasonable
EADDRINUSE, upon further connect(2) attempts on this socket
until the connection is established (this also brings our
connect(2) into accord with IEEE Std 1003.1.)
function prototypes. Use LIST_FOREACH instead of explicit loops.
The indentation of functions indendet by 4 space have been left alone.
2-space indented functions have been re-indented.
Eliminate a lot of checkes to make sure requests are not cross-device
which is unnecessary with the new layout. We know a sequential request
cannot possibly be cross-device because there is a reserved page between
the devices.
Remove a couple of comments which no longer are relevant.
i/o ports by calling the implementation-detail level below inb() and
outb() instead of inb() and outb(). Unpessimizing the types is too
hard since they are mainly used in microcode.
completenss. The pessimization is tiny compared with i/o port slowness
except on very old machines, but code that used signed short types for
i/o ports was unpessimized long ago, and the macro that detected it
recently started working for u_short types too. Use of bus space
should have made this moot long ago.
Not tested at runtime by: bde
- Build SGL's for ATA_PASSTHROUGH commands
- Fallback to using the sgl_offset when the opcode is unknown for building
SGL's/
- Add ioctl calls for adding and removing units.
- Define previously undefined AEN's
- Allocate memory for the ioctl payload in multiples of 512bytes.
MFC after: 1 week
need this for swapcontext(), KSE upcalls initiated from ast()
also need to save them so that we properly return the syscall
results after having had a context switch. Note that we don't
use r11 in the kernel. However, the runtime specification has
defined r8-r11 as return registers, so we put r11 in the context
as well. I think deischen@ was trying to tell me that we should
save the return registers before. I just wasn't ready for it :-)
o The EPC syscall code has 2 return registers and 2 frame markers
to save. The first (rp/pfs) belongs to the syscall stub itself.
The second (iip/cfm) belongs to the caller of the syscall stub.
We want to put the second in the context (note that iip and cfm
relate to interrupts. They are only being misused by the syscall
code, but are not part of a regular context).
This way, when the context is switched to again, we return to
the caller of setcontext(2) as one would expect.
o Deal with dirty registers on the kernel stack. The getcontext()
syscall will flush the RSE, so we don't expect any dirty registers
in that case. However, in thread_userret() we also need to save
the context in certain cases. When that happens, we are sure that
there are dirty registers on the kernel stack.
This implementation simply copies the registers, one at a time,
from the kernel stack to the user stack. NAT collections are not
dealt with. Hence we don't preserve NaT bits. A better solution
needs to be found at some later time.
We also don't deal with this in all cases in set_mcontext. No
temporay solution is implemented because it's not a showstopper.
The problem is that we need to ignore the dirty registers and we
automaticly do that for at most 62 registers. When there are more
than 62 dirty registers we have a memory "leak".
This commit is fundamental for KSE support.
sysctl:
- sysctlbyname("net.inet.ip.mfctable", ...)
- sysctlbyname("net.inet.ip.viftable", ...)
This change is needed so netstat can use sysctlbyname() to read
the data from those tables.
Otherwise, in some cases "netstat -g" may fail to report the
multicast forwarding information (e.g., if we run a multicast
router on PicoBSD).
* Bug fix: when sending IGMPMSG_WRONGVIF upcall to the multicast
routing daemon, set properly "im->im_vif" to the receiving
incoming interface of the packet that triggered that upcall
rather than to the expected incoming interface of that packet.
* Bug fix: add missing increment of counter "mrtstat.mrts_upcalls"
* Few formatting nits (e.g., replace extra spaces with TABs)
Submitted by: Pavlin Radoslavov <pavlin@icir.org>
code used to call rtrequest(RTM_DELETE, ...). This is a problem, because
the function that just has called us (route_output)
is not really happy with the route it just is creating beeing ripped out
from under it. Unfortunately we also cannot return an error from
ifa_rtrequest. Therefore mark the route just as RTF_REJECT.
control whether to accept RAs per-interface basis.
the new stuff ensures the backward compatibility;
- the kernel does not accept RAs on any interfaces by default.
- since the default value of the flag bit is on, the kernel accepts RAs
on all interfaces when net.inet6.ip6.accept_rtadv is 1.
Obtained from: KAME
MFC after: 1 week
preparation for supporting the OPENVCC and CLOSEVCC ioctls which
are needed for ng_atm. This required some re-organisation of the code
(mostly converting array indexes to pointers). This also gives us
an array of open vccs that will help in using the generic GETVCCS handler.
than i386 or AMD64, TP register points to thread mailbox, and they can not
atomically clear km_curthread in kse mailbox, in this case, thread retrieves
its thread pointer from TP register and sets flag TMF_NOUPCALL in its thread
mailbox to indicate a critical region.
the macro definition, and cause the generation of syntactically
incorrect code that gcc happens to accept.
Reviewed by: schweikh (mentor)
MFC after: 4 weeks
use vrele() instead of vput() on the parent directory vnode returned
by namei() in the case where it is equal to the target vnode. This
handles namei()'s somewhat strange (but documented) behaviour of
not locking either vnode when the two vnodes are equal and LOCKPARENT
but not LOCKLEAF is specified.
Note that since a vnode double-unlock is not currently fatal, these
coding errors were effectively harmless.
Spotted by: Juergen Hannken-Illjes <hannken@eis.cs.tu-bs.de>
Reviewed by: mckusick
they haven't been counted before. This test was ommitted when bus_dmamap_load()
was merged into this function, and results in the pagesneeded field growing
without bounds when multiple deferrals happen.
Thanks to Paul Saab for beating his head against this for a few hours =-)
user space region. Hence, we need to test if 5 is greater than the
region; not greater equal.
This bug caused us to call ast() while interrupting kernel mode.
malloc and mbuf allocation all not requiring Giant.
1) ostat, fstat and nfstat don't need Giant until they call fo_stat.
2) accept can copyin the address length without grabbing Giant.
3) sendit doesn't need Giant, so don't bother grabbing it until kern_sendit.
4) move Giant grabbing from each indivitual recv* syscall to recvit.
- Move the enabling of interrupts out of assembly and into C a few
instructions later at cpu_critical_fork_exit(). This puts more of the
MD critical section implementation under the MD critical section API
making it easier to test and develop alternative implementations.
set in cpu_critical_fork_exit() anymore.
- As far as I can tell, cpu_thread_link() has never been used, not even
when it was originally added, so remove it.
Also change "Auto mode" to use a "special" value
instead of 0, and define and document it.
I had thought libpthread had already been switched to use auto mode but
it appears that patch hasn't been committed yet.
Discussed with: Davidxu
to not get any cross-device I/O requests. (The unallocated first page
protecting BSD labels already gave us this, but that hack may go away
at some point in time).
Remove the check for cross-device I/O requests in swap_pager_strategy.
Move the repeated statistics updating into flushchainbuf().
larger than normal frames, to account for the case where a bge(4) NIC
is used with VLANs. Since we set the IFCAP_VLAN_MTU flag, we must allow
reception of frames up to 1522 bytes in size rather than 1518.
Note that it is possible to work around this bug by doing:
# ifconfig bge0 mtu 1504
prior to configuring any VLAN interfaces.
o Remove alpha specific timer code (mc146818A) and compiled-out
calibration of said timer.
o Remove i386 inherited timer code (i8253) and related acquire and
release functions.
o Move sysbeep() from clock.c to machdep.c and have it return
ENODEV. Console beeps should be implemented using ACPI or if no
such device is described, using the sound driver.
o Move the sysctls related to adjkerntz, disable_rtc_set and
wall_cmos_clock from machdep.c to clock.c, where the variables
are.
o Don't hardcode a hz value of 1024 in cpu_initclocks() and don't
bother faking a stathz that's 1/8 of that. Keep it simple: hz
defaults to HZ and stathz equals hz. This is also how it's done
for sparc64.
o Keep a per-CPU ITC counter (pc_clock) and adjustment (pc_clockadj)
to calculate ITC skew and corrections. On average, we adjust the
ITC match register once every ~1500 interrupts for a duration of
2 consequtive interruprs. This is to correct the non-deterministic
behaviour of the ITC interrupt (there's a delay between the match
and the raising of the interrupt).
o Add 4 debugging sysctls to monitor clock behaviour. Those are
debug.clock_adjust_edges, debug.clock_adjust_excess,
debug.clock_adjust_lost and debug.clock_adjust_ticks. The first
counts the individual adjustment cycles (when the skew first
crosses the threshold), the second counts the number of times the
adjustment was excessive (any non-zero value is to be considered
a bug), the third counts lost clock interrupts and the last counts
the number of interrupts for which we applied an adjustment
(debug.clock_adjust_ticks / debug.clock_adjust_edges gives the
avarage duration of an individual adjustment -- should be ~2).
While here, remove some nearby (trivial) left-overs from alpha and
other cleanups.
swapbkva. Swapbkva mappings are explicitly managed using pmap_qenter(),
not on-demand by vm_fault(), making kmem_alloc_nofault() more appropriate.
Submitted by: tegge
generate the inode mode from a default ACL and creation mask,
implement ufs_sync_inode_from_acl() using acl_posix1e_newfilemode().
Since ACL_OVERRIDE_MASK/ACL_PRESERVE_MASK are defined, we no
longer need to explicitly pass in a "preserve_mask" field: this
is implicit in the use of POSIX.1e semantics.
Note: this change contains a semantic bugfix for new file creation:
we now intersect the ACL-generated mode and the cmode requested by
the user process. This means permissions on newly created file
objects will now be more conservative. In the future, we may want
to provide alternative semantics (similar to Solaris and Linux) in
which the ACL mask overrides the umask, permitting ACLs to broaden
the rights beyond the requested umask.
PR: 50148
Reported by: Ritz, Bruno <bruno_ritz@gmx.ch>
Obtained from: TrustedBSD Project
support routines in kern_acl.c:
- Define ACL_OVERRIDE_MASK and ACL_PRESERVE_MASK centrally in acl.h: the
mode bits that are (and aren't) stored in the ACL.
- Add acl_posix1e_acl_to_mode(): given a POSIX.1e extended ACL, generate
a compatibility mode (only the bits supported by the POSIX.1e ACL).
- acl_posix1e_newfilemode(): Given a requested creation mode and default
ACL, calculate the mode for the new file system object (only the bits
supported by the POSIX.1e ACL).
PR: 50148
Reported by: Ritz, Bruno <bruno_ritz@gmx.ch>
Obtained from: TrustedBSD Project
cases:
- Setting sticky bit on non-directory
- Setting setgid on a file with a group that isn't in the effective
or extended groups of the authorizing credential
I.e., test the requirement first, then do the privilege test,
rather than doing the privilege test regardless of the need for
privilege.
Obtained from: TrustedBSD Project
Sponsored by: DARPA, Network Associates Laboratories
interrupting user mode. The net effect of this bug is that a clock
interrupt does not cause rescheduling and processes are not
preempted. It only takes a "while (1);" to render the machine
useless.
This bug was introduced by the context changes and EPC syscall code.
Handling of ASTs was moved to C for clarity and ease of maintenance,
but was not added for the external interrupt case.
This needs to be revisited. We now have calls to do_ast() in trap(),
break_syscall() and ivt_External_Interrupt(). A single call in
exception_restore covers these 3 places without duplication. This
is where we handled ASTs prior to the overhaul, except that the
meat has been moved to do_ast(), a C function. This was the goal
to begin with.
Pointy hat: marcel
Use ->bio_children to count child buffers, rather than abuse the
bio_caller1 pointer.
Expand the relevant bits of waitchainbuf() inline, this clarifies
the code a little bit.
striping to a per device round-robin algorithm.
Because of the policy of not attempting to retain previous swap
allocation on page-out, this means that a newly added swap device
almost instantly takes its 1/N share of the I/O load but it takes
somewhat longer for it to assume it's 1/N share of the pages if there
is plenty of space on the other devices.
Change the 8G total swapspace limitation to 8G per device instead
by using a per device blist rather than one global blist. This
reduces the memory footprint by 75% (typically a couple hundred
kilobytes) for the common case with one swapdevice but NSWAPDEV=4.
Remove the compile time constant limit of number of swap devices,
there is no limit now. Instead of a fixed size array, store the
per swapdev structure in a TAILQ.
Total swap space is still addressed by a 32 bit page number and
therefore the upper limit is now 2^42 bytes = 16TB (for i386).
We still do not allocate the first page of each device in order to
give some amount of protection to any bsdlabel at the start of the
device.
A new device is appended after the existing devices in the swap space,
no attempt is made to fill in holes left behind by swapoff (this can
trivially be changed should it ever become a problem).
The sysctl vm.nswapdev now reflects the number of currently configured
swap devices.
Rename vm_swap_size to swap_pager_avail for consistency with other
exported names.
Change argument type for vm_proc_swapin_all() and swap_pager_isswapped()
to be a struct swdevt pointer rather than an index.
Not changed: we are still using blists to manage the free space,
but since the swapspace is no longer fragmented by the striping
different resource managers might fare better.
concurrent invocations from acquiring the same address(es). Also, in case
of an incomplete allocation, free any allocated pages.
In collaboration with: tegge
sure that uma_dbg_free() is called if we're about to call
uma_zfree_internal() but we're asking it to skip the dtor and
uma_dbg_free() call itself. So, if we're about to call
uma_zfree_internal() from uma_zfree_arg() and skip == 1, call
uma_dbg_free() ourselves.
EFI file system. When booting from a CD and there's already an EFI
system partition on the disk, setting the current device to unit 0
will select the harddisk. This invariably breaks installing FreeBSD
when other operating systems have been installed before.
We obviously want to do the same when we're booting over the network.
Maybe later.
Based on a patch (from memory) from: arun
there is code that blindly allocates LDTEs starting at slot 6
and I quess it doesn't really matter to us if they overwrite the BSDI
syscall slot, since it isn't a BSDI binary. Also add some code to help track
down other such users (commented out for now).
Reviewed by: deischen@
o correct BSSID setup in ah_writeAssocid for 5211 and 5212 (fixes
reception of broadcast frames after association)
o correct transmit retry counts returned by 5211 in ah_procTxDesc
o add missing regulatory domain support that caused use of 11b channels to be
disallowed with some cards (e.g. mini-pci cards in certain IBM laptops)
o miscellaneous fixes to regulatory domain support
o increase size of 5212 ANI table to avoid overflow
o add monitor mode
o remove OS_QSORT support
o fix handling of HAL_RXDESC_INTREQ in ah_setupRxDesc
o rewrite 5212 descriptor handling for portability
o FreeBSD: track alq_open API change
type. We know about header types 0, 1 and 2. Ignore the rest in the
MD i386 code when we're looking for bridges. You cannot look at the
vendor tag. And if you don't you certainly can't look at function > 0
if the device isn't there.
The new soekris boards' GEODE cpu has issues with the old way. This
is reported to have fixed it.
MFC After: 2 days
considered to be good to try when it otherwise has no clue about which
interrupts to try. This is a band-aide and we really should try to
balance the IRQs that we arbitrarily pick, but it should help some
people that would otherwise get bad IRQs.
recompiling the driver. See the comments near the top of "if_em.h"
for descriptions of these delays. Four new loader tunables control
the system-wide default values:
hw.em.tx_int_delay
hw.em.rx_int_delay
hw.em.tx_abs_int_delay
hw.em.rx_abs_int_delay
The tunables are specified in microseconds. The valid range is
0-67108 usec., and 0 means that the timer is disabled.
There are also four new sysctls (actually, a set of four for each
"em" device in the system) to query and change the interrupt delays
after the system is up:
hw.em0.tx_int_delay
hw.em0.rx_int_delay
hw.em0.tx_abs_int_delay (not present for 82542/3/4 adapters)
hw.em0.rx_abs_int_delay (not present for 82542/3/4 adapters)
It seems to be OK to change these values even while the adapter is
passing traffic.
Approved by: Prafulla Deuskar <pdeuskar@FreeBSD.ORG>
MFC after: 4 weeks