Clean up (or if antipodic: down) some of the msgbuf stuff.
Use an inline function rather than a macro for timecounter delta.
Maintain process "on-cpu" time as 64 bits of microseconds to avoid
needless second rollover overhead.
Avoid calling microuptime the second time in mi_switch() if we do
not pass through _idle in cpu_switch()
This should reduce our context-switch overhead a bit, in particular
on pre-P5 and SMP systems.
WARNING: Programs which muck about with struct proc in userland
will have to be fixed.
Reviewed, but found imperfect by: bde
These are probably generated by other PCI devices sharing the TLAN's
interrupt. The programmer's guide says to simply re-enable interrupts
and return if one of these is detected.
Prompted by bug report from: Bill Fenner
net.inet.ip.icmp.bmcastecho = 0 by removing the extra check for the
address being a multicast address. The test now relies on the link
layer flags that indicate it was received via multicast. The previous
logic was broken and replied to ICMP echo/timestamp broadcasts even
when the sysctl option disallowed them.
Reviewed by: wollman
i contains the contents of the EP_W0_CONFIG_CTRL register.
i was being used as the array index into an array on the stack.
j is initialized to 0 as it should be.
PR: kern/6757
Reviewed by: jmb
Submitted by: Stephane E. Potvin <sepotvin@videotron.ca>
Prior to this change, Accidental recursion protection was done by
the diverted daemon feeding back the divert port number it got
the packet on, as the port number on a sendto(). IPFW knew not to
redivert a packet to this port (again). Processing of the ruleset
started at the beginning again, skipping that divert port.
The new semantic (which is how we should have done it the first time)
is that the port number in the sendto() is the rule number AFTER which
processing should restart, and on a recvfrom(), the port number is the
rule number which caused the diversion. This is much more flexible,
and also more intuitive. If the user uses the same sockaddr received
when resending, processing resumes at the rule number following that
that caused the diversion. The user can however select to resume rule
processing at any rule. (0 is restart at the beginning)
To enable the new code use
option IPFW_DIVERT_RESTART
This should become the default as soon as people have looked at it a bit
passed to the user process for incoming packets. When the sockaddr_in
is passed back to the divert socket later, use thi sas the primary
interface lookup and only revert to the IP address when the name fails.
This solves a long standing bug with divert sockets:
When two interfaces had the same address (P2P for example) the interface
"assigned" to the reinjected packet was sometimes incorect.
Probably we should define a "sockaddr_div" to officially hold this
extended information in teh same manner as sockaddr_dl.
of the TCP payload. See RFC1122 section 4.2.2.6 . This allows
Path MTU discovery to be used along with IP options.
PR: problem discovered by Kevin Lahey <kml@nas.nasa.gov>
so it must be adjusted (minus 1) before using it to do the length check.
I'm not sure who to give the credit to, but the bug was reported by
Jennifer Dawn Myers <jdm@enteract.com>, who also supplied a patch. It
was also fixed in OpenBSD previously by andreas.gunnarsson@emw.ericsson.se,
and of course I did the homework to verify that the fix was correct per
the specification.
PR: 6738
for better packing. This means that we can choose better values for the
various hash entries without having to try and get it all to fit within
an artificial power of two limit for malloc's sake.
in -current is over, I'll put a 2.2.x specific version in the RELENG_2_2
branch. If somebody wants a 2.2 version of this driver now, they can check
out the previous version from CVS or ask me via e-mail.
Gee people, I didn't mean to stir up such a controversy. I just wanted
to make sure I could get this thing to work with both kernel versions
and didn't want to have to maintain two separate copies. All ya hadda
do was ask. :)
drivers here do and it also blows up in building GENERIC during
a release build if you try and include <osreldate.h> (which shot
my SNAP dead - argh!). Use __FreeBSD__ instead.
unexpectedly do not complete writes even with sync I/O requests.
This should help the behavior of mmaped files when using
softupdates (and perhaps in other circumstances also.)
This driver supports the following cards/integrated ethernet controllers:
Compaq Netelligent 10, Compaq Netelligent 10/100, Compaq Netelligent 10/100,
Compaq Netelligent 10/100 Proliant, Compaq Netelligent 10/100 Dual Port,
Compaq NetFlex-3/P Integrated, Compaq NetFlex-3/P Integrated,
Compaq NetFlex 3/P w/ BNC, Compaq Deskpro 4000 5233MMX.
It should also support Texas Instruments NICs that use the ThunderLAN
chip, though I don't have any to test. If you've got a card that uses
the ThunderLAN chip but isn't listed in the PCI vendor/product list in
if_tl.c, try adding it and see what happens.
The driver supports any MII compliant PHY at 10 or 100Mbps speeds in
full or half duplex. (Those I've personally tested are the National
Semiconductor DP83840A (Prosignia server), the Level 1 LXT970 (Deskpro
desktop), and the ThunderLAN's internal 10baseT PHY.) Autonegotiation,
hardware multicast filtering, BPF and ifmedia support are included.
This chip is pretty fast; Prosignia servers with NCR SCSI, ThunderLAN
ethernet and FreeBSD make for a nice combination.
cases we ignore it (eg: read/write) to maintain chmod-after-open semantics
but in other cases we do care, eg: creating files, access() etc. Never
ignore errors from VOP_ACCESS() on immutable files.
This apparently comes from BSDI (from Keith Bostic) via NetBSD.
PR: 5148
Submitted by: Yoshiro MIHIRA <sanpei@yy.cs.keio.ac.jp>
end up being the same, but it doesn't look like you're comparing
apples and oranges.
2. Use need_resched instead of reset_priority. This isn't right
either, since for example you'll round-robin against equal priority FIFO
processes when lowering the priority of another process,
but this works better and a real fix needs to be in kern_synch and
not out here.
3. This is not a device driver: copyin/copyout the structure.
Althought the comments say the datasheet doesn't list the device ID
registers on the M2/MX, they seem to be there and quite alive.
(It's interesting to note that the M2/MX calls itself a 686 class cpu but
is missing a heck of a lot of features, including VME, PGE, PSE, etc)
NetBSD, ported to FreeBSD by Pierre Beyssac <pb@fasterix.freenix.org> and
minorly tweaked by me.
This is a standard part of FreeBSD, but must be enabled with:
"sysctl -w net.inet.ip.fastforwarding=1" ...and of course forwarding must
also be enabled. This should probably be modified to use the zone
allocator for speed and space efficiency. The current algorithm also
appears to lose if the number of active paths exceeds IPFLOW_MAX (256),
in which case it wastes lots of time trying to figure out which cache
entry to drop.
We had run out of bits in the nfs mount flags, I have moved the internal
state flags into a seperate variable. These are no longer visible via
statfs(), but I don't know of anything that looks at them.
Submitted by: Roger Hardiman <roger@cs.strath.ac.uk>
options BROOKTREE_SYSTEM_DEFAULT=BROOKTREE_PAL
in the kernel config file makes the driver's video_open() function
select PAL rather than NTSC. This fixed all the hangs on my
Dual Crystal card when using a PAL video signal.
As a result, you can loose the tsleep (of 2 seconds - now 0.25!!)
which I previously added. (Unless someone else wanted the 0.25
second tsleep).
available. The per-cpu variable ss_tpr has been replaced by ss_eflags.
This reduced the number of interrupts sent to the wrong CPU, due to
the cpu having the global lock being inside a critical region.
Remove some unneeded manipulation of tpr register in mplock.s.
Adjust code in mplock.s to be aware of variables on the stack being
destroyed by MPgetlock if GRAB_LOPRIO is defined.
update of cpu usage as shown by top when one process is cpu bound
(no system calls) while the system is otherwise idle (except for top).
Don't attempt to switch to the BSP in boot(). If the system was idle when
an interrupt caused a panic, this won't work. Instead, switch to the BSP
in cpu_reset.
Remove some spurious forward_statclock/forward_hardclock warnings.
Don't forget to clear the inode hash lock before returning from ext2_vget()
after getnewvnode() fails. Obtained from: rev.1.24 of ffs_vfsops.c (the
original patch for the getnewvnode() race). Forgotten in: rev.1.4 here.
Removed a duplicate comment. Duplicated in: rev.1.4 here.
Fixed the MALLOC() vs getnewvnode() race in ext2_vget(). Obtained from:
rev.1.39 of ffs_vfsops.c.
Windows 95 after rebooting FreeBSD without power off. In PC-98
system, reboot mode is set via I/O port 0x37 in cpu_reset(), and
accessing of this port is the reason of the problem. To avnoid the
fault, current status of reboot mode should be checked before
accessing the I/O port.
that local APIC should be disabled in UP system. However, some of old
BIOS does not disable local APIC, and virtual wire mode through local
APIC may cause int 15.
submitted ioctl to clear the video buffer
prior to starting video capture
Amancio : clean up yuv12 so that it does not
affect rgb capture. Basically, fxtv after
capturing in yuv12 mode , switching to rgb
would cause the video capture to be too bright.
1.32 disable inverse gamma function for rgb and yuv
capture. fixed meteor brightness ioctl it now
converts the brightness value from unsigned to
signed.
1.33 added sysctl: hw.bt848.tuner, hw.bt848.reverse_mute,
hw.bt848.card
card takes a value from 0 to bt848_max_card
tuner takes a value from 0 to bt848_max_tuner
reverse_mute : 0 no effect, 1 reverse tuner
mute function some tuners are wired reversed :(
Define a parameter which indicates the maximum number of sockets in a
system, and use this to size the zone allocators used for sockets and
for certain PCBs.
Convert PF_LOCAL PCB structures to be type-stable and add a version number.
Define an external format for infomation about socket structures and use
it in several places.
Define a mechanism to get all PF_LOCAL and PF_INET PCB lists through
sysctl(3) without blocking network interrupts for an unreasonable
length of time. This probably still has some bugs and/or race
conditions, but it seems to work well enough on my machines.
It is now possible for `netstat' to get almost all of its information
via the sysctl(3) interface rather than reading kmem (changes to follow).
signanosleep() did not deal with signal masks properly. This change was
based on a discussion with bde some time ago (at least 6 months or more).
signanosleep() should probably go away since it was never really used for
more than a few weeks and doesn't appear in released code. It should
probably be killed before somebody uses it and it becomes a gratuitous
nonstandard feature.
but doesn't do much of anything with it. I added it to siopnp_ids[]
and it was found and recognized as a serial port.
PR: 6605
Reviewed by: phk
Submitted by: Dave Marquardt <marquard@zilker.net>
these two files that are almost-but-not-quite the same leads to false grep
hits, confusion etc.
Only installing one copy with a symlink would be nice but that doesn't
work with SHARED=symlinks (it changes the source tree).
in nfs_vinvalbuf() or the nfs_removeit(), we can have the nfsnode reallocated
from underneath us (eg: replaced by a ufs 'struct inode') which can cause
disk corruption ('freeing free block' when di_db[5] gets trashed).
This is not a cheap fix, but it'll do until the nfsnodes get reference
counting and/or locking.
Apparently NetBSD have a similar fix (apparently from BSDI).
I wish all PR's had this much useful detail. :-)
PR: 6611
Submitted by: Stephen Clawson <sclawson@marker.cs.utah.edu>
This is a result of discussions on the mailing lists. Kudos to those who
have found the issue and created work-arounds. I have chosen Tor's fix
for now, before we can all work the issue more completely.
Submitted by: Tor Egge
possibly non-open devices, and we don't want to restrict dumping
to swap devices anwyay. It is especially invalid to call d_ioctl()
in non-process context for panics. d_psize() can be called on
non-open devices, at least on non-SLICED ones that support d_dump(),
and setdumpdev() has depended on this for a long time although it
is probably wrong, but even d_psize() can't be called in non-process
context - that's why dumpsys() depends on previously computed values
although these values may be stale. The historical restriction to
devices with dkpart(dev) == SWAP_PART should go away.
the only common usage of utrace (the possible problem with this
commit) is with malloc, so this should be a real problem. Add
the various NetBSD syscalls that allow full emulation of their
development environment.
language, ANSI C, enum constants must be representable as ints.
We assumed at-least-33-bit ints. This worked on some 32-bit
systems because we don't mix negative sysinit enum constants with
too-large sysinit enum constants, and the compiler used an unsigned
32-bit type for sysinit enum variables, so sysinit enum variables
were sorted correctly. The fix lops off 4 hopefully-unused bits
so that we now only assume at-least-29-bit ints.
as variables declared in the main block in the function, so shadowing
of parameters by variables declared in the main block is not just an
obfuscation).
Found by: lint
Apparently I didn't make my plans to make dev_t and devsw[] go away
under DEVFS quite clear enough to Peter Dufault as he stitched the SCSI
system together using them when he redid the configuration side of things.
This made is rather an effort to remove all vestiges of dev_t and
devsw[] entries from sd.c in DEVFS/SLICE mode.
---------
Make callers of namei() responsible for releasing references or locks
instead of having the underlying filesystems do it. This eliminates
redundancy in all terminal filesystems and makes it possible for stacked
transport layers such as umapfs or nullfs to operate correctly.
Quality testing was done with testvn, and lat_fs from the lmbench suite.
Some NFS client testing courtesy of Patrik Kudo.
vop_mknod and vop_symlink still release the returned vpp. vop_rename
still releases 4 vnode arguments before it returns. These remaining cases
will be corrected in the next set of patches.
---------
Submitted by: Michael Hancock <michaelh@cet.co.jp>
is broken. It omits the SCSI_DATA_IN flag in the SCSI READ ELEMENT
STATUS command, which makes the 'chio status' command fail.
PR: 6528
Reviewed by: phk
Submitted by: Hans Huebner <hans@artcom.de>
Reverse the VFS_VRELE patch. Reference counting of vnodes does not need
to be done per-fs. I noticed this while fixing vfs layering violations.
Doing reference counting in generic code is also the preference cited by
John Heidemann in recent discussions with him.
The implementation of alternative vnode management per-fs is still a valid
requirement for some filesystems but will be revisited sometime later,
most likely using a different framework.
Submitted by: Michael Hancock <michaelh@cet.co.jp>
Support >8G drives in CHS mode. This is done by guesstimating the
cylinder count from the LBA size reported. It works on my shiny
new Maxtor 11.5G drive, YMMV.
Reports from users of other big drives (read Quantum bigfoot's)
are welcome...
Technologies' Socket 7 chipsets. This covers all of the Apollo chipsets
except the Master (82C570) and the MVP3, and it also covers the cheap
VXPro and VXTWO knockoffs of the VP1 and VPX.
PR: 6481
Reviewed by: phk
Submitted by: Lee Cremeans <lcremean@tidalwave.net>
the page offset. If a large file offset was passed in, a large negative
array index could be generated which could cause page faults etc at worst
and file corruption at the least. (Pages are allocated within file
space on page alignment boundaries, so a file offset being passed in here
is harmless to DTRT. The case where this was happening has already been
fixed though, this is in case it happens again).
Reviewed by: dyson
[__]inline is only used to bloat the code here. It gives a separate copy
of all the strings for each time this header is included...
Fixed misuse of __P(()).
called from vfs_bio_awrite() without going through cluster_write()
or ufs_bmaparray(), in particular for all writes to block disk devices.
Only ufs_bmaparray() sets vp->v_maxio in a correct way, and it doesn't
seem to be called early enough even for regular files.
expecting a sub-page offset. We were passing the file position,
and vm_page_bits() could do some interesting things when base was
larger PAGE_SIZE.
if (size > PAGE_SIZE - base)
size = PAGE_SIZE - base;
is interesting when (PAGE_SIZE - base) is negative. I could imagine that
this could have interesting consequences for memory page -> device block
bit validation.
necessary to compile with _THREAD_SAFE defined. This means that people
will get thread-aware code whether they like it or not. This change
is required to allow a process to link against libpthread and libc
to use kernel threads (and prevent each thread from clobbering another
thread's errno just be doing a syscall).
This is bound to break some ports, but it is strictly allowed by ANSI C,
so anything that breaks as a result was already broken anyway 8-).
"Sorry".
In msdosfs_sync: spelling fix, formatting changes; fix MNT_LAZY (sync
modified denodes, don't sync device)
Mostly submitted by (and with hints from): bde
Increase limit for maximum disk size: as far as I can see previous limit was
gratuitously too low.
deallocation cycles. This should provide a measurable improvement
on swap and memory allocation on loaded systems. It is unlikely a
complete solution. Also, provide more map info with procfs.
Chuck Cranor spurred on this improvement.
Linux emulation. This make Allegro Common Lisp 4.3 work under
FreeBSD!
Submitted by: Fred Gilham <gilham@csl.sri.com>
Commented on by: bde, dg, msmith, tg
Hoping he got everything right: eivind
is believed to have been broken with the Brakmo/Peterson srtt
calculation changes. The result of this bug is that TCP connections
could time out extremely quickly (in 12 seconds).
Also backed out jdp's partial fix for this problem in rev 1.17 of
tcp_timer.c as it is obsoleted by this commit.
Bug was pointed out by Kevin Lehey <kml@roller.nas.nasa.gov>.
PR: 6068
- pessimized i/o port types.
- other pessimized types.
- Don't use DEBUG (causes LINT warnings). Use DGB_DEBUG instead.
- commented out code.
- cloned code that doesn't apply ("Smarts" is for the cy driver only).
Submitted by: bde
dereference uninitialized pointers.
o Fix DEVFS permissions
o Fix DEVFS minor numbers
o Add initial & lock devices for cua device.
o Fix permissions in line with sio.
was really removed, or simply 'faked' by a suspend/resume. Keep track
of both current and previous state, and send that information to the
userland programs.
[
XXX - This breaks binary compatability with older pccardd programs, but
they don't work reliably. :(
]
This code will be turned on with the TWO options
DEVFS and SLICE. (see LINT)
Two labels PRE_DEVFS_SLICE and POST_DEVFS_SLICE will deliniate these changes.
/dev will be automatically mounted by init (thanks phk)
on bootup. See /sys/dev/slice/slice.4 for more info.
All code should act the same without these options enabled.
Mike Smith, Poul Henning Kamp, Soeren, and a few dozen others
This code does not support the following:
bad144 handling.
Persistance. (My head is still hurting from the last time we discussed this)
ATAPI flopies are not handled by the SLICE code yet.
When this code is running, all major numbers are arbitrary and COULD
be dynamically assigned. (this is not done, for POLA only)
Minor numbers for disk slices ARE arbitray and dynamically assigned.
ftp://ftp.isi.edu/in-notes/iana/assignments/port-numbers
port numbers are divided into three ranges:
0 - 1023 Well Known Ports
1024 - 49151 Registered Ports
49152 - 65535 Dynamic and/or Private Ports
This patch changes the "local port range" from 40000-44999
to the range shown above (plus fix the comment in in_pcb.c).
WARNING: This may have an impact on firewall configurations!
PR: 5402
Reviewed by: phk
Submitted by: Stephen J. Roznowski <sjr@home.net>
is that the previous commit spammed a hacked 2.2-stable onto -current,
deleting the DMA support etc. (I guess that's one way of minimizing diffs
between -current and -stable.. :-] )
(ie: it has a vm_object attached and is marked as OBJ_MIGHTBEDIRTY) before
attempting to lock it. This should reduce the cpu hit that is incurred
when doing a sync(2) and when the syncer process is doing the 30-second
writeback of dirty mmap() data to disk. Skip this speedup if we are
doing an unmount() to be sure to get everything - we can afford to
occasionally miss a msync while the system is running, but not at unmount.
I'm not sure about the VXLOCK and MNT_WAIT case, it seems a bit odd to skip
doing a page_clean at unmount time just because a vnode is VXLOCKed, but
that's what was being done before...
Submitted by: Roger Hardiman <roger@cs.strath.ac.uk>
Roger Hardiman <roger@cs.strath.ac.uk> :
Revised autodetection code to correctly handle both
old and new VideoLogic Captivator PCI cards.
Added tsleep of 2 seconds to initialistion code for PAL users.
Corrected clock selection code on format change.
--- Amancio
data targets. At least st0 works for me again....
Also, scsi_scsi_cmd() looks like it's been exiting without a biodone() on
an attached buffer in a number of error cases, leading to locked buffers.
update got lost. This is responsible for ensuring that dirty mmap() pages
get periodically written to disk. Without it, long time mmap's might not
have their dirty pages written out at all of the system crashes or isn't
cleanly shut down. This could be nasty if you've got a long-running
writing via mmap(), dirty pages used to get written to disk within 30
seconds or so.
device. But with devfs, currently, /dev/psm0 is the blocking device
and /dev/npsm0 is the non-blocking one.
DEVFS must stay consistent with the older behaviour.
PR: 6260
Reviewed by: phk
Submitted by: Kapil Chowksey <kchowksey@hss.hns.com>
- restored async mount support. The first entry in a block is still
always written synchronously, although it probably shouldn't be in
the async case.
- restored use of BWRITE() instead of bowrite() for the DOWHITEOUT
case, although bowrite() is probably better.
Broken by: merge of softdep changes (rev.1.22).
Found by: lmbench2 delete-file benchmarks.
data is greater than MLEN, setsockopt is unable to pass it onto
the protocol handler. Allocate a cluster in such case.
PR: 2575
Reviewed by: phk
Submitted by: Julian Assange proff@iq.org
is a tremendous perf decrease due to the disabling of advanced
features such as DMA, Ultra DMA, and 32bit mode. This patch
might have been reported by someone else (I seem to remember
it.)
kernal page table may need to be extended. But while growing the
kernel page table (pmap_growkernel()), newly allocated kernel page
table pages are entered into every process' page directory. For
proc0, the page directory is not allocated yet, and results in a
page fault. Eventually, the machine panics with "lockmgr: not
holding exclusive lock".
PR: 5458
Reviewed by: phk
Submitted by: Luoqi Chen <luoqi@luoqi.watermarkgroup.com>
Use config flags 0x1000 to enable LBA mode. It should be enabled in
the BIOS too to avoid geometry confusion.
One catch though, I'm not sure all BIOS's uses the 64head/63secs
translation, all mine does but....
an error if it gets a link (like it does if it gets a socket). The
implications of letting users try and do file operations on symlinks
themselves were too worrying.
(because we can :-). This means you can open a link file (or pseudo-file
in the case of short links where the data is stored in the inode rather
than disk blocks) and read the contents.
However, trap any writes from the user as it's difficult to do the right
thing in all cases. A link may be short and the user may be trying to
extend it beyond the limit and so on. Although.. being able to re-target
a symlink without deleting it first might have been nice.
This stuff is a bit perverse since symlink() and readlink() calls can
end up actually being implemented as read/write vnode ops.
Reviewed by: phk
to not follow symlinks, but to open a handle on the link itself(!).
As strange as this might sound, it has several useful applications
safe race-free ways of opening files in hostile areas (eg: /tmp, a mode
1777 /var/mail, etc). It also would allow things like fchown() to work
on the link rather than having to implement a new syscall specifically for
that task.
Reviewed by: phk
ints. Remove some no longer needed casts. Initialize the per-cpu
global data area using the structs rather than knowing too much about
layout, alignment, etc.
- Set the correct value scp->font_size in init_scp().
- Set scp->font_size to FONT_NONE for VGA_MODEX.
Interim fix for a font problem:
- A kludge to display the correct font on some video cards.
We should be able to load multiple fonts to the VGA plane #2 and switch
between fonts by setting the font select register in the VGA sequencer.
It appears that the current code isn't functioning as expected on
some VGA cards (I have reports on Millenium and Mach64 cards). This is
either a bug in syscons or a hardware compatibility problem ;-<
This kludge will always load only one font set at a time and always use
the font page #0 on the plane #2. It is an interim kludge until
we find the exact cause and solution.
Small adjustment for mouse cursor handling:
- Turn off the mouse cursor early when changing video modes.
Video mode switch fixes:
- Stop the screen saver when changing video modes.
- Enclose the critical section with a pair of spltty()/splx().
- A kludge to prevent scrn_update() from accessing video memory in less-
critical sections in video mode change; artificially turn on the
UNKNOWN_MODE flag.
PR: bin/5899, bin/5907
Tested by: ache and a couple of users
OKed by: sos
* Figure out UTC relative to boottime. Four new functions provide
time relative to boottime.
* move "runtime" into struct proc. This helps fix the calcru()
problem in SMP.
* kill mono_time.
* add timespec{add|sub|cmp} macros to time.h. (XXX: These may change!)
* nanosleep, select & poll takes long sleeps one day at a time
Reviewed by: bde
Tested by: ache and others
- Attempt to handle PCI devices where the interrupt is
an ISA/EISA interrupt according to the mp table.
- Attempt to handle multiple IO APIC pins connected to
the same PCI or ISA/EISA interrupt source. Print a
warning if this happens, since performance is suboptimal.
This workaround is only used for PCI devices.
With these two workarounds, the -SMP kernel is capable of running on
my Asus P/I-P65UP5 motherboard when version 1.4 of the MP table is disabled.
Defer the WRITE SESSION command until the first write command, so that
it works like the prepare track command, allowing the device to be
closed after the command.
"time" wasn't a atomic variable, so splfoo() protection were needed
around any access to it, unless you just wanted the seconds part.
Most uses of time.tv_sec now uses the new variable time_second instead.
gettime() changed to getmicrotime(0.
Remove a couple of unneeded splfoo() protections, the new getmicrotime()
is atomic, (until Bruce sets a breakpoint in it).
A couple of places needed random data, so use read_random() instead
of mucking about with time which isn't random.
Add a new nfs_curusec() function.
Mark a couple of bogosities involving the now disappeard time variable.
Update ffs_update() to avoid the weird "== &time" checks, by fixing the
one remaining call that passwd &time as args.
Change profiling in ncr.c to use ticks instead of time. Resolution is
the same.
Add new function "tvtohz()" to avoid the bogus "splfoo(), add time, call
hzto() which subtracts time" sequences.
Reviewed by: bde
This only affects the KERNEL case.
Don't include <sys/radix.h> twice for the KERNEL case. This fixes
a mismerge from Lite2.
Don't include <sys/radix.h> at all for the !KERNEL case. This fixes
a wrong cleanup in Lite2.
_KPOSIX_PRIORITY_SCHEDULING options to work. Changes:
Change all "posix4" to "p1003_1b". Misnamed files are left
as "posix4" until I'm told if I can simply delete them and add
new ones;
Add _POSIX_PRIORITY_SCHEDULING system calls for FreeBSD and Linux;
Add man pages for _POSIX_PRIORITY_SCHEDULING system calls;
Add options to LINT;
Minor fixes to P1003_1B code during testing.
all the LKM load/unload junk, and don't forget to register the SYSINIT
so that the cdevsw entry is attached.
BTW: I think the way it builds it's /dev nodes on the fly as an LKM with
vnode ops is kinda cute - I guess that'd be one way to solve the devfs
persistance problems.. :-) (ie: have the drivers make the nodes in /dev
on disk directly if they are missing, but leave them alone if present).
softdep mode could only be activated on the initial mount of a filesystem
and then only if it was a read-write mount. A 'mount -r' (as done in the
rootfs mount) followed by a 'mount -u' to convert to read-write didn't
start softdep mode.
They are atomic, but return in essence what is in the "time" variable.
gettime() is now a macro front for getmicrotime().
Various patches to use the two new functions instead of the various
hacks used in their absence.
Some puntuation and grammer patches from Bruce.
A couple of XXX comments.
its own zone; this is used particularly by TCP which allocates both inpcb and
tcpcb in a single allocation. (Some hackery ensures that the tcpcb is
reasonably aligned.) Also keep track of the number of pcbs of each type
allocated, and keep a generation count (instance version number) for future
use.
everything is contained inside #ifdef VM86, so this option must be
present in the config file to use this functionality.
Thanks to Tor Egge, these changes should work on SMP machines. However,
it may not be throughly SMP-safe.
Currently, the only BIOS calls made are memory-sizing routines at bootup,
these replace reading the RTC values.
- Implement proper EISA probing.
- Better support for the new transputer based host cards.
- use standard termios settings, one can use the intial/lock devices.
- use a simple bcopy since some cards/systems apparently don't support
32 bit accesses.
- hard reset and halt host card CPU prior to download in case of a soft
restart.
- recognize new remote module types (ASIC vs. CD1400 based)
- a number of cosmetic changes (my fault, not Nick's)
Submitted by: Nick Sayer <nsayer@quack.kfu.com>
24 (which is magnalink!) rather than the correct 26.
Initial attempt at a compatability kludge that will negotiate for either
but will prefer to use the correct deflate compression type.
to #include <sys/time.h> first. I've lost count of the number of times
I've had to patch this in porting code. The problem is the
"struct timeval ifi_lastchange" in the mib stats. (most other systems don't
have this, until 4.4bsd anyway).
connect. This check was added as part of the defense against the "land"
attack, to prevent attacks which guess the ISS from going into ESTABLISHED.
The "src == dst" check will still prevent the single-homed case of the
"land" attack, and guessing ISS's should be hard anyway.
Submitted by: David Borman <dab@bsdi.com>
In vfs_bio.c, remove b_generation count usage,
remove redundant reassignbuf,
remove redundant spl(s),
manage page PG_ZERO flags more correctly,
utilize in invalid value for b_offset until it
is properly initialized. Add asserts
for #ifdef DIAGNOSTIC, when b_offset is
improperly used.
when a process is not performing I/O, and just waiting
on a buffer generally, make the sleep priority
low.
only check page validity in getblk for B_VMIO buffers.
In vfs_cluster, add b_offset asserts, correct pointer calculation
for clustered reads. Improve readability of certain parts of
the code. Remove redundant spl(s).
In vfs_subr, correct usage of vfs_bio_awrite (From Andrew Gallatin
<gallatin@cs.duke.edu>). More vtruncbuf problems fixed.
work reliably yet (I've had panics), but it does seem to occasionally
be able to transmit and receive syntactically-correct packets.
Also fixes one of if_ethersubr.c's legion style bugs, and removes
the hostcache code from standard kernels---the code that depends on it
is not going to happen any time soon, I'm afraid.
Don't try to flush buffers if the drive says it has none.
More error checking and reporting.
Hack: if drive hangs, it can be reset by issuing a mt -f device offline.
I've been able to make several 4G backups. However there is still problems
with some configurations. It is not clear if it is hardware or driver
problems yet.
me any problems until after the previous commit. This problem then
caused a severe case of creeping crud on my diskdrive, and hosed
my system so bad, that I needed to do a complete reinstall. Sorry!!!
I assume that others have manifest this bug.
problems. Tor Egge and others have helped with various VM bugs
lately, but don't blame him -- blame me!!!
pmap.c:
1) Create an object for kernel page table allocations. This
fixes a bogus allocation method previously used for such, by
grabbing pages from the kernel object, using bogus pindexes.
(This was a code cleanup, and perhaps a minor system stability
issue.)
pmap.c:
2) Pre-set the modify and accessed bits when prudent. This will
decrease bus traffic under certain circumstances.
vfs_bio.c, vfs_cluster.c:
3) Rather than calculating the beginning virtual byte offset
multiple times, stick the offset into the buffer header, so
that the calculated offset can be reused. (Long long multiplies
are often expensive, and this is a probably unmeasurable performance
improvement, and code cleanup.)
vfs_bio.c:
4) Handle write recursion more intelligently (but not perfectly) so
that it is less likely to cause a system panic, and is also
much more robust.
vfs_bio.c:
5) getblk incorrectly wrote out blocks that are incorrectly sized.
The problem is fixed, and writes blocks out ONLY when B_DELWRI
is true.
vfs_bio.c:
6) Check that already constituted buffers have fully valid pages. If
not, then make sure that the B_CACHE bit is not set. (This was
a major source of Sig-11 type problems.)
vfs_bio.c:
7) Fix a potential system deadlock due to an incorrectly specified
sleep priority while waiting for a buffer write operation. The
change that I made opens the system up to serious problems, and
we need to examine the issue of process sleep priorities.
vfs_cluster.c, vfs_bio.c:
8) Make clustered reads work more correctly (and more completely)
when buffers are already constituted, but not fully valid.
(This was another system reliability issue.)
vfs_subr.c, ffs_inode.c:
9) Create a vtruncbuf function, which is used by filesystems that
can truncate files. The vinvalbuf forced a file sync type operation,
while vtruncbuf only invalidates the buffers past the new end of file,
and also invalidates the appropriate pages. (This was a system reliabiliy
and performance issue.)
10) Modify FFS to use vtruncbuf.
vm_object.c:
11) Make the object rundown mechanism for OBJT_VNODE type objects work
more correctly. Included in that fix, create pager entries for
the OBJT_DEAD pager type, so that paging requests that might slip
in during race conditions are properly handled. (This was a system
reliability issue.)
vm_page.c:
12) Make some of the page validation routines be a little less picky
about arguments passed to them. Also, support page invalidation
change the object generation count so that we handle generation
counts a little more robustly.
vm_pageout.c:
13) Further reduce pageout daemon activity when the system doesn't
need help from it. There should be no additional performance
decrease even when the pageout daemon is running. (This was
a significant performance issue.)
vnode_pager.c:
14) Teach the vnode pager to handle race conditions during vnode
deallocations.
on the IOAPIC being connected to the 8254 timer interrupt.
Verify that timer interrupts are delivered. If they aren't, attempt
a fallback to mixed mode (i.e. routing the timer interrupt via the 8259 PIC).
Gee, I wish there was a better way to run makesyscalls.sh so that
a make world finds missing things like this. Running it manually
sucks.
Pointed out by: Peter Dufault
unsigned integral type. Changing it doesn't seem to cause any
sign extension bugs in /usr/src. In the kernel, this is partly
because `struct speedtab' and its lookup function are too bogus
to use speed_t's for speeds - they use ints.
Reminded by: PR 5786
previous commit. Opportunities to clean pages were often missed,
and leaving of the idle state was sometimes delayed until the next
interrupt (after any that occurred while cleaning).
Fixed an unstaticization, a syntax error and a style bug in the
previous commit.
of a disk, because that slice does not exist, try again mounting from the
compatability slice.
This handles the case where a disk has been initialised by 'disklabel
auto', which places a bogus and invalid slice entry on the disk.
The bootstrap is not smart enough to reject this slice, and pretends to
boot from it. Believing the the bootstrap at this point is unwise.
Booting from non-'wd' disks thus prepared is still broken, as
'disklabel -rwB xdN auto' does not initialise the disk type field, and
the bootstrap mistakenly claims that the disk is handled by 'wd'.
Behaviour is now consistent with DEVFS expected characteristics.
(and even run). These files don't necessarily make sense for a
FreeBSD/Alpha kernel build. That will come later and these files
will be changed accordingly.
the other syscall files are generated into. This new file is to be
included by src/lib/libc/sys/Makefile.inc to automatically pick up
syscall names.
The other file, netbsd_syscall.mk, is the hand-generated NetBSD
equivalent to be included by src/lib/libc/sys/Makefile.inc when
_NETBSD_SYSCALLS is defined during the build of libc/libc_r.
during the libc/libc_r to automatically pick up syscall names on
the assumption that default asm code needs to generated for them.
In the up-coming changes to the libc makefiles, there is the option
to provide a machine dependent asm source file which will turn off
the automatic generation of the default. There is also an option
to just stop code being generated for a syscall. In most cases,
though, the default asm code is all that is required, so this
change makes that the most convenient was to do business.
Idea suggested by: bde
Changes to support building with _POSIX_SOURCE set to 199309L:
1. Add sys/_posix.h to handle those preprocessor defs that POSIX
says have effects when defined before including any header files;
2. Change POSIX4_VISIBLE back to _POSIX4_VISIBLE
3. Add _POSIX4_VISIBLE_HISTORICALLY for pre-existing BSD features now
defined in POSIX. These show up when:
_POSIX_SOURCE and _POSIX_C_SOURCE are not set or
_POSIX_C_SOURCE is set >= 199309L
and vanish when:
_POSIX_SOURCE is set or _POSIX_C_SOURCE is < 199309L.
4. Explain these in man 9 posix4;
5. Include _posix.h and conditionalize on new feature test.
slice number passed in by the bootblocks. This means the kernel will
not use the compatability slice to obtain the root filesystem when
booting from a sliced disk.
Use the extraction macros from reboot.h rather than stating them in full
again.
1) When freeing pages, it is a good idea to protect them off.
(This is probably gratuitious, but good form.)
2) Allow collapsing pages in the backing object that are
PQ_CACHE. This will improve memory utilization.
3) Correct the collapse code so that pages that were on the
cache queue are moved to the inactive queue. This is
done when pages are marked dirty (so that those pages
will be properly paged out instead of freed), so that
cached pages will not be paradoxically marked dirty.
has been some bitrot and incorrect assumptions in the vfs_bio code. These
problems have manifest themselves worse on NFS type filesystems, but can
still affect local filesystems under certain circumstances. Most of
the problems have involved mmap consistancy, and as a side-effect broke
the vfs.ioopt code. This code might have been committed seperately, but
almost everything is interrelated.
1) Allow (pmap_object_init_pt) prefaulting of buffer-busy pages that
are fully valid.
2) Rather than deactivating erroneously read initial (header) pages in
kern_exec, we now free them.
3) Fix the rundown of non-VMIO buffers that are in an inconsistent
(missing vp) state.
4) Fix the disassociation of pages from buffers in brelse. The previous
code had rotted and was faulty in a couple of important circumstances.
5) Remove a gratuitious buffer wakeup in vfs_vmio_release.
6) Remove a crufty and currently unused cluster mechanism for VBLK
files in vfs_bio_awrite. When the code is functional, I'll add back
a cleaner version.
7) The page busy count wakeups assocated with the buffer cache usage were
incorrectly cleaned up in a previous commit by me. Revert to the
original, correct version, but with a cleaner implementation.
8) The cluster read code now tries to keep data associated with buffers
more aggressively (without breaking the heuristics) when it is presumed
that the read data (buffers) will be soon needed.
9) Change to filesystem lockmgr locks so that they use LK_NOPAUSE. The
delay loop waiting is not useful for filesystem locks, due to the
length of the time intervals.
10) Correct and clean-up spec_getpages.
11) Implement a fully functional nfs_getpages, nfs_putpages.
12) Fix nfs_write so that modifications are coherent with the NFS data on
the server disk (at least as well as NFS seems to allow.)
13) Properly support MS_INVALIDATE on NFS.
14) Properly pass down MS_INVALIDATE to lower levels of the VM code from
vm_map_clean.
15) Better support the notion of pages being busy but valid, so that
fewer in-transit waits occur. (use p->busy more for pageouts instead
of PG_BUSY.) Since the page is fully valid, it is still usable for
reads.
16) It is possible (in error) for cached pages to be busy. Make the
page allocation code handle that case correctly. (It should probably
be a printf or panic, but I want the system to handle coding errors
robustly. I'll probably add a printf.)
17) Correct the design and usage of vm_page_sleep. It didn't handle
consistancy problems very well, so make the design a little less
lofty. After vm_page_sleep, if it ever blocked, it is still important
to relookup the page (if the object generation count changed), and
verify it's status (always.)
18) In vm_pageout.c, vm_pageout_clean had rotted, so clean that up.
19) Push the page busy for writes and VM_PROT_READ into vm_pageout_flush.
20) Fix vm_pager_put_pages and it's descendents to support an int flag
instead of a boolean, so that we can pass down the invalidate bit.
put alot of it's context into a data structure. This allows
significant shortening of its codepath, and will significantly
decrease it's cache footprint.
Also, add some stats to vmmeter. Note that you'll have to
rebuild/recompile vmstat, systat, etc... Otherwise, you'll
get "very interesting" paging stats.
f00f_hack has run.
Use the global r_idt descriptor in f00f_hack when in SMP mode,
so the APs find the relocated interrupt descriptor table.
Submitted by: Partially from David A Adkins <adkin003@tc.umn.edu>
dynamically depending on the line speed(s). This should give the old
sizes and watermarks until drivers are changed.
Display the input watermarks in pstat and sicontrol.
Fix for RTPRIO scheduler to eliminate invalid context switches.
POSIX.4 headers and sysctl variables. Nothing should change
unless POSIX4 is defined or _POSIX_VERSION is set to 199309.
interrupts are masked, and EOI is sent iff the corresponding ISR bit
is set in the local apic. If the CPU cannot obtain the interrupt
service lock (currently the global kernel lock) the interrupt is
forwarded to the CPU holding that lock.
Clock interrupts now have higher priority than other slow interrupts.
the signal handling latency for cpu-bound processes that performs very
few system calls.
The IPI for forcing an additional software trap is no longer dependent upon
BETTER_CLOCK being defined.
than rolling it's own. This means that it now uses the "safe"
exec_map_first_page() to get the ld.so headers rather than risking a panic
on a page fault failure (eg: NFS server goes down).
Since all the ELF tools go to a lot of trouble to make sure everything
lives in the first page for executables, this is a win. I have not seen
any ELF executable on any system where all the headers didn't fit in the
first page with lots of room to spare.
I have been running variations of this code for some time on my pure ELF
systems.
a complement to all ops that return a vpp, VFS_VRELE. This is
initially only for file systems that implement the following ops
that do a WILLRELE:
vop_create, vop_whiteout, vop_mknod, vop_remove, vop_link,
vop_rename, vop_mkdir, vop_rmdir, vop_symlink
This is initial DNA that doesn't do anything yet. VFS_VRELE is
implemented but not called.
A default vfs_vrele was created for fs implementations that use the
standard vnode management routines.
VFS_VRELE implementations were made for the following file systems:
Standard (vfs_vrele)
ffs mfs nfs msdosfs devfs ext2fs
Custom
union umapfs
Just EOPNOTSUPP
fdesc procfs kernfs portal cd9660
These implementations may change as VOP changes are implemented.
In the next phase, in the vop implementations calls to vrele and the vrele
part of vput will be moved to the top layer vfs_vnops and made visible
to all layers. vput will be replaced by unlock in these cases. Unlocking
will still be done in the per fs layer but the refcount decrement will be
triggered at the top because it doesn't hurt to hold a vnode reference a
little longer. This will have minimal impact on the structure of the
existing code.
This will only be done for vnode arguments that are released by the various
fs vop implementations.
Wider use of VFS_VRELE will likely require restructuring of the code.
Reviewed by: phk, dyson, terry et. al.
Submitted by: Michael Hancock <michaelh@cet.co.jp>
|In the process of evaluating the getpages/putpages issues I discovered
|that mmap on MSDOSFS does not work. This is because I blindly merged
|NetBSD changes in msdosfs_bmap and msdosfs_strategy. Apparently, their
|blocksize is always DEV_BSIZE (even in files), while in FreeBSD
|blocksize in files is v_mount->mnt_stat.f_iosize (i.e. clustersize in
|MSDOSFS case). The patch is below.
Submitted by: Dmitrij Tejblum <dima@tejblum.dnttm.rssi.ru>
It does endeed work, but there is still some problems to solve.
I get a "nonrecovered data error" from time to time, but besides
this it has backed up several Gigs allready.
Please report any success/failure directly to me.
Thanks to Warner Losh for providing a drive to use in writing
this driver!
2) Do not unnecessarily force page blocking when paging
pages out.
3) Further improve swap pager performance and correctness,
including fixing the paging in progress deadlock (except
in severe I/O error conditions.)
4) Enable vfs_ioopt=1 as a default.
5) Fix and enable the page prezeroing in SMP mode.
All in all, SMP systems especially should show a significant
improvement in "snappyness."
it runs at a constant frequency. This was less of an issue before,
because the TSC only interpolated in the HZ intervals, but now where
the timecounter is used all the way, this becomes much more visible.
Nit: Fix a printf which triggered the bde-filter.
since there might be permanent entries still left after
calls to DeleteLink (it will be nullified by DeleteLink
if all entries are deleted, won't it ?)
2) in PacketAliasSetAddress, set the aliasing address
even when PKT_ALIAS_RESET_ON_ADDR_CHANGE is in effect.
Just don't clean up links in this case.
Submitted by: Ari Suutari <ari@suutari.iki.fi>
via: Charles Mott <cmott@srv.net>
PR: 5041
- 'mv longnamedfile1 longnamedfile2' would cause longnamedfile2 to lose its
long name.
- Long names have trailing spaces/dots stripped for lookup as well as
assignment.
- A lockup when the mdsosfs was accessed from within the Linux emulator is fixed.
- A bug whereby long filenames were recognised by Microsoft operating systems but
not FreeBSD is fixed.
Submitted by: Dmitrij Tejblum <dima@tejblum.dnttm.rssi.ru>
These diffs implement the first stage of a VOP_{GET|PUT}PAGES pushdown
for local media FS's.
See ffs_putpages in /sys/ufs/ufs/ufs_readwrite.c for implementation
details for generic *_{get|put}pages for local media FS's. Support
is trivial to add for any FS that formerly relied on the default
behaviour of the vnode_pager in in EOPNOTSUPP cases (just copy the
ffs_getpages() code for the FS in question's *_{get|put}pages).
Obviously, it would be better if each local media FS implemented a
more optimal method, instead of calling an exported interface from
the /sys/vm/vnode_pager.c, but this is a necessary first step in
getting the FS's to a point where they can be supplied with better
implementations on a case-by-case basis.
Obviously, the cd9660_putpages() can be rather trivial (since it
is a read-only FS type 8-)).
A slight (temporary) modification is made to print a diagnostic message
in the case where the underlying filesystem attempts to engage in the
previous behaviour. Failure is likely to be ungraceful.
Submitted by: terry@freebsd.org (Terry Lambert)
Fixed pedantic semantics errors (in ANSI C, static arrays must have
a size, and static objects should be consistently declared as static
unless you know more than anyone should have to know about the
linkage rules).
There is now less need for the vfs.usermount sysctl. msdosfs already
has this change, modulo a missing LK_RETRY, via NetBSD. At least
ext2fs is missing this and many other changes from Lite2.
Obtained from: Lite2
times consistently wrong (up to 1 tick too late), but recent changes
fixed the setting of the main clock, making other times inconsistent.
The inconsistencies tended to show up as a negative resource usage
for the process that set the time.
Fixed the check for setting the clock backwards. A stale timestamp
(`time') was checked, so it was possible to set the clock backwards
by up to almost 1 tick. Until recently, this bug was compensated
for by setting the clock consistently wrong.
Merged the comment about setting the clock backwards from Lite2.
Removed latency micro-optimizations/speed pessimizations in settime().
microtime() and set_timecounter() are relatively expensive, and
they must be called together with clock updates blocked to get a
consistent `delta', so significant latency optimizations are not
possible.
Removed some stale comments.
sources can't include it. However, until recently it was included
by <sys/stat.h>, so it should have been (almost) entirely inside
_POSIX_SOURCE ifdefs for <sys/stat.h> to be (almost) POSIX.1 conformant,
but it was only about half inside _POSIX_SOURCE ifdefs.
longstanding namespace pollution. The need for the pollution
(unconditional use of timespecs) went away in the first round of
Lite2 merges (rev.1.7 for stat.h), but <sys.time.h> was still
unconditionally included, and a stale comment about the need for
the pollution was not removed.
improve tuning on larger systems. (A couple of the VM tuning params for
small systems were so badly chosen that the system could hang under load.)
The broken tuning was originaly my fault.
the main #include of it.
Fixed disordering of _BSD_CLOCKID_T.
Removed forward declarations of "common" structs. The declarations are
now made closer to where they are used.
have declined due to code-rot over time. The swap pager rundown code
has been clean-up, and unneeded wakeups removed. Lots of splbio's
are changed to splvm's. Also, set the dynamic tunables for the
pageout daemon to be more sane for larger systems (thereby decreasing
the daemon overheadla.)
in a way identically as before.) I had problems with the system properly
handling the number of vnodes when there is alot of system memory, and the
default VM_KMEM_SIZE. Two new options "VM_KMEM_SIZE_SCALE" and
"VM_KMEM_SIZE_MAX" have been added to support better auto-sizing for systems
with greater than 128MB.
Add some accouting for vm_zone memory allocations, and provide properly
for vm_zone allocations out of the kmem_map. Also move the vm_zone
allocation stats to the VM OID tree from the KERN OID tree.
in a way identically as before.) I had problems with the system properly
handling the number of vnodes when there is alot of system memory, and the
default VM_KMEM_SIZE. Two new options "VM_KMEM_SIZE_SCALE" and
"VM_KMEM_SIZE_MAX" have been added to support better auto-sizing for systems
with greater than 128MB.
vnodes, therefore vget doesn't need to do so anymore. Other minor
improvements include the temp free vnode queue obeying the VAGE
flag and a printf that warns of to-be-removed code being executed.