Commit graph

6578 commits

Author SHA1 Message Date
Alexander Motin
8b30323843 Add two disk ioctls, giving user-level tools information about disk/array
stripe (optimal access block) size and offset.
2009-12-24 11:05:23 +00:00
Edward Tomasz Napierala
5a8eb3a9d1 Cosmetic fixes. 2009-12-22 09:03:59 +00:00
Edward Tomasz Napierala
9340fc72e6 Implement NFSv4 ACL support for UFS.
Reviewed by:	rwatson
2009-12-21 19:39:10 +00:00
Konstantin Belousov
49e3050e6c VI_OBJDIRTY vnode flag mirrors the state of OBJ_MIGHTBEDIRTY vm object
flag. Besides providing the redundand information, need to update both
vnode and object flags causes more acquisition of vnode interlock.
OBJ_MIGHTBEDIRTY is only checked for vnode-backed vm objects.

Remove VI_OBJDIRTY and make sure that OBJ_MIGHTBEDIRTY is set only for
vnode-backed vm objects.

Suggested and reviewed by:	alc
Tested by:	pho
MFC after:	3 weeks
2009-12-21 12:29:38 +00:00
Ed Schouten
8dc9b4cf04 Let access overriding to TTYs depend on the cdev_priv, not the vnode.
Basically this commit changes two things, which improves access to TTYs
in exceptional conditions. Basically the problem was that when you ran
jexec(8) to attach to a jail, you couldn't use /dev/tty (well, also the
node of the actual TTY, e.g. /dev/pts/X). This is very inconvenient if
you want to attach to screens quickly, use ssh(1), etc.

The fixes:

- Cache the cdev_priv of the controlling TTY in struct session. Change
  devfs_access() to compare against the cdev_priv instead of the vnode.
  This allows you to bypass UNIX permissions, even across different
  mounts of devfs.

- Extend devfs_prison_check() to unconditionally expose the device node
  of the controlling TTY, even if normal prison nesting rules normally
  don't allow this. This actually allows you to interact with this
  device node.

To be honest, I'm not really happy with this solution. We now have to
store three pointers to a controlling TTY (s_ttyp, s_ttyvp, s_ttydp).
In an ideal world, we should just get rid of the latter two and only use
s_ttyp, but this makes certian pieces of code very impractical (e.g.
devfs, kern_exit.c).

Reported by:	Many people
2009-12-19 18:42:12 +00:00
Warner Losh
8ffab8645b Revert 200606. 2009-12-16 21:53:56 +00:00
Warner Losh
42b3331b8a Fix compiling FREEBSD_COMPAT[4,5,6] without FREEBSD_COMPAT7.
Note: Not sure this is the right way to do compat, but it makes the
headers consistent with the implementations.
2009-12-16 17:17:40 +00:00
Rui Paulo
37c90cfec7 Add apple-boot and apple-ufs.
Submitted by:	nwhitehorn
2009-12-14 22:47:09 +00:00
Rui Paulo
ee085c333e Add more Apple partition types. 2009-12-14 20:04:28 +00:00
Bjoern A. Zeeb
de0bd6f76b Throughout the network stack we have a few places of
if (jailed(cred))
left.  If you are running with a vnet (virtual network stack) those will
return true and defer you to classic IP-jails handling and thus things
will be "denied" or returned with an error.

Work around this problem by introducing another "jailed()" function,
jailed_without_vnet(), that also takes vnets into account, and permits
the calls, should the jail from the given cred have its own virtual
network stack.

We cannot change the classic jailed() call to do that,  as it is used
outside the network stack as well.

Discussed with:	julian, zec, jamie, rwatson (back in Sept)
MFC after:	5 days
2009-12-13 13:57:32 +00:00
Attilio Rao
2028867def In current code, threads performing an interruptible sleep (on both
sxlock, via the sx_{s, x}lock_sig() interface, or plain lockmgr), will
leave the waiters flag on forcing the owner to do a wakeup even when if
the waiter queue is empty.
That operation may lead to a deadlock in the case of doing a fake wakeup
on the "preferred" (based on the wakeup algorithm) queue while the other
queue has real waiters on it, because nobody is going to wakeup the 2nd
queue waiters and they will sleep indefinitively.

A similar bug, is present, for lockmgr in the case the waiters are
sleeping with LK_SLEEPFAIL on.  In this case, even if the waiters queue
is not empty, the waiters won't progress after being awake but they will
just fail, still not taking care of the 2nd queue waiters (as instead the
lock owned doing the wakeup would expect).

In order to fix this bug in a cheap way (without adding too much locking
and complicating too much the semantic) add a sleepqueue interface which
does report the actual number of waiters on a specified queue of a
waitchannel (sleepq_sleepcnt()) and use it in order to determine if the
exclusive waiters (or shared waiters) are actually present on the lockmgr
(or sx) before to give them precedence in the wakeup algorithm.
This fix alone, however doesn't solve the LK_SLEEPFAIL bug. In order to
cope with it, add the tracking of how many exclusive LK_SLEEPFAIL waiters
a lockmgr has and if all the waiters on the exclusive waiters queue are
LK_SLEEPFAIL just wake both queues.

The sleepq_sleepcnt() introduction and ABI breakage require
__FreeBSD_version bumping.

Reported by:	avg, kib, pho
Reviewed by:	kib
Tested by:	pho
2009-12-12 21:31:07 +00:00
Luigi Rizzo
37e20d0a37 only export bio_cmd and flags to userland (bio_cmd are
used by ggatectl, flags are potentially useful).
Other parts are internal kernel data structures and should
not be visible to userland.

No API change involved.

MFC after:	3 days
2009-12-11 10:35:58 +00:00
John Baldwin
42a346fa63 For some buses, devices may have active resources assigned even though they
are not allocated by the device driver.  These resources should still appear
allocated from the system's perspective so that their assigned ranges are
not reused by other resource requests.  The PCI bus driver has used a hack
to effect this for a while now where it uses rman_set_device() to assign
devices to the PCI bus when they are first encountered and later assigns
them to the actual device when a driver allocates a BAR.  A few downsides of
this approach is that it results in somewhat confusing devinfo -r output as
well as not being very easily portable to other bus drivers.

This commit adds generic support for "reserved" resources to the resource
list API used by many bus drivers to manage the resources of child devices.
A resource may be reserved via resource_list_reserve().  This will allocate
the resource from the bus' parent without activating it.
resource_list_alloc() recognizes an attempt to allocate a reserved resource.
When this happens it activates the resource (if requested) and then returns
the reserved resource.  Similarly, when a reserved resource is released via
resource_list_release(), it is deactivated (if it is active) and the
resource is then marked reserved again, but is left allocated from the
bus' parent.  To completely remove a reserved resource, a bus driver may
use resource_list_unreserve().  A bus driver may use resource_list_busy()
to determine if a reserved resource is allocated by a child device or if
it can be unreserved.

The PCI bus driver has been changed to use this framework instead of
abusing rman_set_device() to keep track of reserved vs allocated resources.

Submitted by:	imp (an older version many moons ago)
MFC after:	1 month
2009-12-09 21:52:53 +00:00
Ed Schouten
04b0c5bbfa Add a libutempter compatibility interface to libulog.
The ulog_login_pseudo(3) and ulog_logout_pseudo(3) interfaces provide a
functionality identical to what libutempter has to offer. Just transform
libutempter's calls into the before mentioned functions.

libutempter doesn't work with utmpx, so instead of fixing I thought the
easiest way would be to integrate this functionality. libutempter is
used by applications like xterm and the KDE libraries, so if I ever
change the underlying file format, these applications will keep working
automatically.

Also increase __FreeBSD_version to indicate the addition (as well as the
import of libulog).
2009-12-06 20:30:21 +00:00
Konstantin Belousov
4cbf3715bc Bump __FreeBSD_version for sigpause(3) addition [1] and
PIE support in csu.

Requested by:	fluffy [1]
2009-12-02 16:40:23 +00:00
Alexander Motin
f3631e8d74 Add CAM_ATAIO_DMA ATA command flag to mark DMA protocol commands.
It is not needed for SATA controllers, but required for PATA.
2009-12-01 23:01:29 +00:00
Ed Schouten
f14ad5fa40 Decompose <sys/termios.h>.
The <sys/termios.h> header file is hardlinked to <termios.h>. It
contains both the structures and the flag definitions, but also the C
library interface that's implemented by the C library.

This header file has the typical problem of including too many random
things and being badly ordered. Instead of trying to fix this, decompose
it into two header files:

- <sys/_termios.h>, which contains struct termios and the flags.
- <termios.h>, which includes <sys/_termios.h> and contains the C
  library interface.

This means userspace has to include <termios.h> for struct termios,
while kernelspace code has to include <sys/tty.h>. Also add a
<sys/termios.h>, which prints a warning message before including
<termios.h>. I am aware that there are some applications that use this
header file as well.
2009-11-28 23:50:48 +00:00
Bjoern A. Zeeb
e7ad2d3410 Add SDT_PROBE[1-5] in the same way we have SDT_PROBE_DEFINE[1-5] to
avoid having to add all the unused trailing arguments as zeros.

MFC after:	6 days
2009-11-28 16:47:42 +00:00
Konstantin Belousov
0d3bc8a930 Implement rtld part of the support for -z nodlopen (see ld(1)).
Reviewed by:	kan
MFC after:	3 weeks
2009-11-26 13:57:20 +00:00
Konstantin Belousov
9a6ceacede Implement sighold, sigignore, sigpause, sigrelse, sigset functions
from SUSv4 XSI. Note that the functions are obsoleted, and only
provided to ease porting from System V-like systems. Since sigpause
already exists in compat with different interface, XSI sigpause is
named xsi_sigpause.

Reviewed by:	davidxu
MFC after:	3 weeks
2009-11-26 13:49:37 +00:00
Alexander Motin
bcbe578a6a Drop USB mass storage devices support from ata(4). It is out of the build as
long as I remember, and completely superseded by better maintained umass(4).
It's main idea was to optionally avoid CAM dependency for such devices, but
with move ATA to CAM, it is not actual any more.

No objections:	hselasky@, thompsa@, arch@
2009-11-26 12:41:43 +00:00
Marcel Moolenaar
a78fb6f93d Don't make MJUMPAGESIZE equal to PAGE_SIZE unconditionally.
When PAGE_SIZE is 16K, MJUMPAGESIZE equals MJUM16BYTES and
causes build breakages.
For PAGE_SIZE < 2K, define MJUMPAGESIZE as MCLBYTES.
For PAGE_SIZE > 8K, define MJUMPAGESIZE as 8K.
Everywhere inbetween, define MJUMPAGESIZE as PAGE_SIZE.

Thus MCLBYTES <= MJUMPAGESIZE <= 8KB.
2009-11-23 23:23:05 +00:00
Konstantin Belousov
a3de221dbe Among signal generation syscalls, only sigqueue(2) is allowed by POSIX
to fail due to lack of resources to queue siginfo. Add KSI_SIGQ flag
that allows sigqueue_add() to fail while trying to allocate memory for
new siginfo. When the flag is not set, behaviour is the same as for
KSI_TRAP: if memory cannot be allocated, set bit in sq_kill. KSI_TRAP is
kept to preserve KBI.

Add SI_KERNEL si_code, to be used in siginfo.si_code when signal is
generated by kernel. Deliver siginfo when signal is generated by kill(2)
family of syscalls (SI_USER with properly filled si_uid and si_pid), or
by kernel (SI_KERNEL, mostly job control or SIGIO). Since KSI_SIGQ flag
is not set for the ksi, low memory condition cause old behaviour.

Keep psignal(9) KBI intact, but modify it to generate SI_KERNEL
si_code. Pgsignal(9) and gsignal(9) now take ksi explicitely. Add
pksignal(9) that behaves like psignal but takes ksi, and ddb kill
command implemented as pksignal(..., ksi = NULL) to not do allocation
while in debugger.

While there, remove some register specifiers and use ANSI C prototypes.

Reviewed by:	davidxu
MFC after:	1 month
2009-11-17 11:39:15 +00:00
Xin LI
1a9d4dda9b Revert revision 199201 for now as it has introduced a kernel vulnerability
and requires more polishing.
2009-11-12 19:02:10 +00:00
Xin LI
41c8c6e876 Add interface description capability as inspired by OpenBSD.
MFC after:	3 months
2009-11-11 21:30:58 +00:00
Konstantin Belousov
75c586a4c8 In r198506, kern_sigsuspend() started doing cursig/postsig loop to make
sure that a signal was delivered to the thread before returning from
syscall. Signal delivery puts new return frame on the user stack, and
modifies trap frame to enter signal handler. As a consequence, syscall
return code sets EINTR as error return for signal frame, instead of the
syscall return.

Also, for ia64, due to different registers layout for those two kind of
frames, usermode sigsegfaulted when returned from signal handler.

Use newly-introduced cpu_set_syscall_retval(9) to set syscall result,
and return EJUSTRETURN from kern_sigsuspend() to prevent syscall return
code from modifying this frame [1].

Another issue is that pending SIGCONT might be cancelled by SIGSTOP,
causing postsig() not to deliver any catched signal [2]. Modify
postsig() to return 1 if signal was posted, and 0 otherwise, and use
this in the kern_sigsuspend loop.

Proposed by:	marcel [1]
Noted by:	davidxu [2]
Reviewed by:	marcel, davidxu
MFC after:	1 month
2009-11-10 11:46:53 +00:00
Konstantin Belousov
a7b890448c Extract the code that records syscall results in the frame into MD
function cpu_set_syscall_retval().

Suggested by:	marcel
Reviewed by:	marcel, davidxu
PowerPC, ARM, ia64 changes:	marcel
Sparc64 tested and reviewed by:	marius, also sunv reviewed
MIPS tested by:	gonzo
MFC after:	1 month
2009-11-10 11:43:07 +00:00
Ed Schouten
54a1c2b5aa Add MAP_ANONYMOUS.
Many operating systems also provide MAP_ANONYMOUS. It's not hard to
support this ourselves, we'd better add it to make it more likely for
applications to work out of the box.

Reviewed by:	alc (mman.h)
2009-11-06 07:17:31 +00:00
Alexander Motin
c1bd46c2d3 MFp4:
- Add support for sector size > 512 bytes and physical sector of several
logical sectors, introduced by ATA-7 specification.
- Remove some obsoleted code.
2009-11-04 15:24:32 +00:00
Alexander Motin
1f45f0733b Fix constants. 2009-11-03 23:26:58 +00:00
Ed Schouten
ca1d2f657a Make /dev/klog and kern.msgbuf* MPSAFE.
Normally msgbufp is locked using Giant. Switch it to use the
msgbuf_lock. Instead of changing the tsleep() calls to msleep(), just
convert it to condvar(9).

In my opinion the locking around msgbuf_peekbytes() still remains
questionable. It looks like locks are dropped while performing copies of
multiple blocks to userspace, which may cause the msgbuf to be reset in
the mean time. At least getting it underneath from Giant should make it
a little easier for us to figure out how to solve that.

Reminded by:	rdivacky
2009-11-03 21:06:19 +00:00
Jung-uk Kim
761eeb5fff Fix VESA color palette corruption:
- VBE 3.0 says palette format resets to 6-bit mode when video mode changes.
We simply set 8-bit mode when we switch modes if the adapter supports it.
- VBE 3.0 also says if the mode is not VGA compatible, we must use VBE
function to save/restore palette.  Otherwise, VGA function may be used.
Thus, reinstate the save/load palette functions only for non-VGA compatible
modes regardless of its palette format.
- Let vesa(4) set VESA modes even if vga(4) claims to support it.
- Reset default palette if VESA pixel mode is set initially.
- Fix more style nits.
2009-11-03 20:22:09 +00:00
Attilio Rao
1b9d701fee Split P_NOLOAD into a per-thread flag (TDF_NOLOAD).
This improvements aims for avoiding further cache-misses in scheduler
specific functions which need to keep track of average thread running
time and further locking in places setting for this flag.

Reported by:	jeff (originally), kris (currently)
Reviewed by:	jhb
Tested by:	Giuseppe Cocomazzi <sbudella at email dot it>
2009-11-03 16:46:52 +00:00
Ed Schouten
2d2a89dd2d Turn unused structure fields of cdevsw into spares.
d_uid, d_gid and d_mode are unused, because permissions are stored in
cdevpriv nowadays. d_kind doesn't seem to be used at all. We no longer
keep a list of cdevsw's, so d_list is also unused.

uid_t and gid_t are 32 bits, but mode_t is 16 bits, Because of alignment
constraints of d_kind, we can safely turn it into three 32-bit integers.
d_kind and d_list is equal in size to three pointers.

Discussed with:	kib
2009-10-31 10:35:41 +00:00
Konstantin Belousov
80a8b0f3bf Trapsignal() and postsig() call kern_sigprocmask() with both process
lock and curproc->p_sigacts->ps_mtx. Reschedule_signals may need to have
ps_mtx locked to decide and wakeup a thread, causing recursion on the
mutex.

Inform kern_sigprocmask() and reschedule_signals() about lock state
of the ps_mtx by new flag SIGPROCMASK_PS_LOCKED to avoid recursion.

Reported and tested by:	keramida
MFC after:	1 month
2009-10-30 10:10:39 +00:00
Ed Maste
8e43cc231b Add additional featuresState.fBits entries to simplify compiling and
testing Adaptec's vendor driver.

Submitted by:	Adaptec, driver 17517
2009-10-29 17:21:41 +00:00
Alexander Motin
21a2ac1953 Define identify fields described in CF specification. 2009-10-29 13:52:34 +00:00
Ruslan Ermilov
052e971d25 HZ is now 1000 on most platforms, update a comment.
Reviewed by:	phk, markm
2009-10-29 09:27:09 +00:00
Konstantin Belousov
550ca2a8a3 Regenerate 2009-10-27 11:01:40 +00:00
Konstantin Belousov
066d836b02 Current pselect(3) is implemented in usermode and thus vulnerable to
well-known race condition, which elimination was the reason for the
function appearance in first place. If sigmask supplied as argument to
pselect() enables a signal, the signal might be delivered before thread
called select(2), causing lost wakeup. Reimplement pselect() in kernel,
making change of sigmask and sleep atomic.

Since signal shall be delivered to the usermode, but sigmask restored,
set TDP_OLDMASK and save old mask in td_oldsigmask. The TDP_OLDMASK
should be cleared by ast() in case signal was not gelivered during
syscall execution.

Reviewed by:	davidxu
Tested by:	pho
MFC after:	1 month
2009-10-27 10:55:34 +00:00
Konstantin Belousov
84440afb54 In kern_sigsuspend(), better manipulate thread signal mask using
kern_sigprocmask() to properly notify other possible candidate threads
for signal delivery.

Since sigsuspend() shall only return to usermode after a signal was
delivered, do cursig/postsig loop immediately after waiting for
signal, repeating the wait if wakeup was spurious due to race with
other thread fetching signal from the process queue before us. Add
thread_suspend_check() call to allow the thread to be stopped or killed
while in loop.

Modify last argument of kern_sigprocmask() from boolean to flags,
allowing the function to be called with locked proc. Convertion of the
callers that supplied 1 to the old argument will be done in the next
commit, and due to SIGPROCMASK_OLD value equial to 1, code is formally
correct in between.

Reviewed by:	davidxu
Tested by:	pho
MFC after:	1 month
2009-10-27 10:42:24 +00:00
John Baldwin
5ca4819ddf - Fix several off-by-one errors when using MAXCOMLEN. The p_comm[] and
td_name[] arrays are actually MAXCOMLEN + 1 in size and a few places that
  created shadow copies of these arrays were just using MAXCOMLEN.
- Prefer using sizeof() of an array type to explicit constants for the
  array length in a few places.
- Ensure that all of p_comm[] and td_name[] is always zero'd during
  execve() to guard against any possible information leaks.  Previously
  trailing garbage in p_comm[] could be leaked to userland in ktrace
  record headers via td_name[].

Reviewed by:	bde
2009-10-23 15:14:54 +00:00
John Baldwin
0deb032554 Style fix. 2009-10-23 15:10:41 +00:00
John Baldwin
62486e93c4 Properly sort the intr_event_describe_handler() prototype.
Submitted by:	bde
2009-10-23 13:28:33 +00:00
Ruslan Ermilov
e64585bdc2 Random number generator initialization cleanup:
- Introduce new SI_SUB_RANDOM point in boot sequence to make it
clear from where one may start using random(9).  It should be as
early as possible, so place it just after SI_SUB_CPU where we
have some randomness on most platforms via get_cyclecount().

- Move stack protector initialization to be after SI_SUB_RANDOM
as before this point we have no randomness at all.  This fixes
stack protector to actually protect stack with some random guard
value instead of a well-known one.

Note that this patch doesn't try to address arc4random(9) issues.
With current code, it will be implicitly seeded by stack protector
and hence will get the same entropy as random(9).  It will be
securely reseeded once /dev/random is feeded by some entropy from
userland.

Submitted by:	Maxim Dounin <mdounin@mdounin.ru>
MFC after:	3 days
2009-10-20 16:36:51 +00:00
Ed Schouten
6015f6f35a Properly set the low watermarks when reducing the baud rate.
Now that buffers are deallocated lazily, we should not use
tty*q_getsize() to obtain the buffer size to calculate the low
watermarks. Doing this may cause the watermark to be placed outside the
typical buffer size.

This caused some regressions after my previous commit to the TTY code,
which allows pseudo-devices to resize the buffers as well.

Reported by:	yongari, dougb
MFC after:	1 week
2009-10-19 07:17:37 +00:00
John Baldwin
8dfed8b0a1 Style fixes to the function prototypes for bus_alloc_resources() and
bus_release_resources().
2009-10-15 14:55:11 +00:00
John Baldwin
37b8ef16cd Add a facility for associating optional descriptions with active interrupt
handlers.  This is primarily intended as a way to allow devices that use
multiple interrupts (e.g. MSI) to meaningfully distinguish the various
interrupt handlers.
- Add a new BUS_DESCRIBE_INTR() method to the bus interface to associate
  a description with an active interrupt handler setup by BUS_SETUP_INTR.
  It has a default method (bus_generic_describe_intr()) which simply passes
  the request up to the parent device.
- Add a bus_describe_intr() wrapper around BUS_DESCRIBE_INTR() that supports
  printf(9) style formatting using var args.
- Reserve MAXCOMLEN bytes in the intr_handler structure to hold the name of
  an interrupt handler and copy the name passed to intr_event_add_handler()
  into that buffer instead of just saving the pointer to the name.
- Add a new intr_event_describe_handler() which appends a description string
  to an interrupt handler's name.
- Implement support for interrupt descriptions on amd64 and i386 by having
  the nexus(4) driver supply a custom bus_describe_intr method that invokes
  a new intr_describe() MD routine which in turn looks up the associated
  interrupt event and invokes intr_event_describe_handler().

Requested by:	many
Reviewed by:	scottl
MFC after:	2 weeks
2009-10-15 14:54:35 +00:00
Konstantin Belousov
6b286ee8b5 Currently, when signal is delivered to the process and there is a thread
not blocking the signal, signal is placed on the thread sigqueue. If
the selected thread is in kernel executing thr_exit() or sigprocmask()
syscalls, then signal might be not delivered to usermode for arbitrary
amount of time, and for exiting thread it is lost.

Put process-directed signals to the process queue unconditionally,
selecting the thread to deliver the signal only by the thread returning
to usermode, since only then the thread can handle delivery of signal
reliably. For exiting thread or thread that has blocked some signals,
check whether the newly blocked signal is queued for the process, and
try to find a thread to wakeup for delivery, in reschedule_signal(). For
exiting thread, assume that all signals are blocked.

Change cursig() and postsig() to look both into the thread and process
signal queues. When there is a signal that thread returning to usermode
could consume, TDF_NEEDSIGCHK flag is not neccessary set now. Do
unlocked read of p_siglist and p_pendingcnt to check for queued signals.

Note that thread that has a signal unblocked might get spurious wakeup
and EINTR from the interruptible system call now, due to the possibility
of being selected by reschedule_signals(), while other thread returned
to usermode earlier and removed the signal from process queue. This
should not cause compliance issues, since the thread has not blocked a
signal and thus should be ready to receive it anyway.

Reported by:	Justin Teller <justin.teller gmail com>
Reviewed by:	davidxu, jilles
MFC after:	1 month
2009-10-11 16:49:30 +00:00
Robert Watson
44a43f00ed Add a new errno, ENOTCAPABLE, to be returned when a process requests an
operation on a file descriptor that is not authorized by the descriptor's
capability flags.

MFC after:	1 month
Sponsored by:	Google
2009-10-07 20:20:51 +00:00