Commit graph

6551 commits

Author SHA1 Message Date
Ed Schouten
54a1c2b5aa Add MAP_ANONYMOUS.
Many operating systems also provide MAP_ANONYMOUS. It's not hard to
support this ourselves, we'd better add it to make it more likely for
applications to work out of the box.

Reviewed by:	alc (mman.h)
2009-11-06 07:17:31 +00:00
Alexander Motin
c1bd46c2d3 MFp4:
- Add support for sector size > 512 bytes and physical sector of several
logical sectors, introduced by ATA-7 specification.
- Remove some obsoleted code.
2009-11-04 15:24:32 +00:00
Alexander Motin
1f45f0733b Fix constants. 2009-11-03 23:26:58 +00:00
Ed Schouten
ca1d2f657a Make /dev/klog and kern.msgbuf* MPSAFE.
Normally msgbufp is locked using Giant. Switch it to use the
msgbuf_lock. Instead of changing the tsleep() calls to msleep(), just
convert it to condvar(9).

In my opinion the locking around msgbuf_peekbytes() still remains
questionable. It looks like locks are dropped while performing copies of
multiple blocks to userspace, which may cause the msgbuf to be reset in
the mean time. At least getting it underneath from Giant should make it
a little easier for us to figure out how to solve that.

Reminded by:	rdivacky
2009-11-03 21:06:19 +00:00
Jung-uk Kim
761eeb5fff Fix VESA color palette corruption:
- VBE 3.0 says palette format resets to 6-bit mode when video mode changes.
We simply set 8-bit mode when we switch modes if the adapter supports it.
- VBE 3.0 also says if the mode is not VGA compatible, we must use VBE
function to save/restore palette.  Otherwise, VGA function may be used.
Thus, reinstate the save/load palette functions only for non-VGA compatible
modes regardless of its palette format.
- Let vesa(4) set VESA modes even if vga(4) claims to support it.
- Reset default palette if VESA pixel mode is set initially.
- Fix more style nits.
2009-11-03 20:22:09 +00:00
Attilio Rao
1b9d701fee Split P_NOLOAD into a per-thread flag (TDF_NOLOAD).
This improvements aims for avoiding further cache-misses in scheduler
specific functions which need to keep track of average thread running
time and further locking in places setting for this flag.

Reported by:	jeff (originally), kris (currently)
Reviewed by:	jhb
Tested by:	Giuseppe Cocomazzi <sbudella at email dot it>
2009-11-03 16:46:52 +00:00
Ed Schouten
2d2a89dd2d Turn unused structure fields of cdevsw into spares.
d_uid, d_gid and d_mode are unused, because permissions are stored in
cdevpriv nowadays. d_kind doesn't seem to be used at all. We no longer
keep a list of cdevsw's, so d_list is also unused.

uid_t and gid_t are 32 bits, but mode_t is 16 bits, Because of alignment
constraints of d_kind, we can safely turn it into three 32-bit integers.
d_kind and d_list is equal in size to three pointers.

Discussed with:	kib
2009-10-31 10:35:41 +00:00
Konstantin Belousov
80a8b0f3bf Trapsignal() and postsig() call kern_sigprocmask() with both process
lock and curproc->p_sigacts->ps_mtx. Reschedule_signals may need to have
ps_mtx locked to decide and wakeup a thread, causing recursion on the
mutex.

Inform kern_sigprocmask() and reschedule_signals() about lock state
of the ps_mtx by new flag SIGPROCMASK_PS_LOCKED to avoid recursion.

Reported and tested by:	keramida
MFC after:	1 month
2009-10-30 10:10:39 +00:00
Ed Maste
8e43cc231b Add additional featuresState.fBits entries to simplify compiling and
testing Adaptec's vendor driver.

Submitted by:	Adaptec, driver 17517
2009-10-29 17:21:41 +00:00
Alexander Motin
21a2ac1953 Define identify fields described in CF specification. 2009-10-29 13:52:34 +00:00
Ruslan Ermilov
052e971d25 HZ is now 1000 on most platforms, update a comment.
Reviewed by:	phk, markm
2009-10-29 09:27:09 +00:00
Konstantin Belousov
550ca2a8a3 Regenerate 2009-10-27 11:01:40 +00:00
Konstantin Belousov
066d836b02 Current pselect(3) is implemented in usermode and thus vulnerable to
well-known race condition, which elimination was the reason for the
function appearance in first place. If sigmask supplied as argument to
pselect() enables a signal, the signal might be delivered before thread
called select(2), causing lost wakeup. Reimplement pselect() in kernel,
making change of sigmask and sleep atomic.

Since signal shall be delivered to the usermode, but sigmask restored,
set TDP_OLDMASK and save old mask in td_oldsigmask. The TDP_OLDMASK
should be cleared by ast() in case signal was not gelivered during
syscall execution.

Reviewed by:	davidxu
Tested by:	pho
MFC after:	1 month
2009-10-27 10:55:34 +00:00
Konstantin Belousov
84440afb54 In kern_sigsuspend(), better manipulate thread signal mask using
kern_sigprocmask() to properly notify other possible candidate threads
for signal delivery.

Since sigsuspend() shall only return to usermode after a signal was
delivered, do cursig/postsig loop immediately after waiting for
signal, repeating the wait if wakeup was spurious due to race with
other thread fetching signal from the process queue before us. Add
thread_suspend_check() call to allow the thread to be stopped or killed
while in loop.

Modify last argument of kern_sigprocmask() from boolean to flags,
allowing the function to be called with locked proc. Convertion of the
callers that supplied 1 to the old argument will be done in the next
commit, and due to SIGPROCMASK_OLD value equial to 1, code is formally
correct in between.

Reviewed by:	davidxu
Tested by:	pho
MFC after:	1 month
2009-10-27 10:42:24 +00:00
John Baldwin
5ca4819ddf - Fix several off-by-one errors when using MAXCOMLEN. The p_comm[] and
td_name[] arrays are actually MAXCOMLEN + 1 in size and a few places that
  created shadow copies of these arrays were just using MAXCOMLEN.
- Prefer using sizeof() of an array type to explicit constants for the
  array length in a few places.
- Ensure that all of p_comm[] and td_name[] is always zero'd during
  execve() to guard against any possible information leaks.  Previously
  trailing garbage in p_comm[] could be leaked to userland in ktrace
  record headers via td_name[].

Reviewed by:	bde
2009-10-23 15:14:54 +00:00
John Baldwin
0deb032554 Style fix. 2009-10-23 15:10:41 +00:00
John Baldwin
62486e93c4 Properly sort the intr_event_describe_handler() prototype.
Submitted by:	bde
2009-10-23 13:28:33 +00:00
Ruslan Ermilov
e64585bdc2 Random number generator initialization cleanup:
- Introduce new SI_SUB_RANDOM point in boot sequence to make it
clear from where one may start using random(9).  It should be as
early as possible, so place it just after SI_SUB_CPU where we
have some randomness on most platforms via get_cyclecount().

- Move stack protector initialization to be after SI_SUB_RANDOM
as before this point we have no randomness at all.  This fixes
stack protector to actually protect stack with some random guard
value instead of a well-known one.

Note that this patch doesn't try to address arc4random(9) issues.
With current code, it will be implicitly seeded by stack protector
and hence will get the same entropy as random(9).  It will be
securely reseeded once /dev/random is feeded by some entropy from
userland.

Submitted by:	Maxim Dounin <mdounin@mdounin.ru>
MFC after:	3 days
2009-10-20 16:36:51 +00:00
Ed Schouten
6015f6f35a Properly set the low watermarks when reducing the baud rate.
Now that buffers are deallocated lazily, we should not use
tty*q_getsize() to obtain the buffer size to calculate the low
watermarks. Doing this may cause the watermark to be placed outside the
typical buffer size.

This caused some regressions after my previous commit to the TTY code,
which allows pseudo-devices to resize the buffers as well.

Reported by:	yongari, dougb
MFC after:	1 week
2009-10-19 07:17:37 +00:00
John Baldwin
8dfed8b0a1 Style fixes to the function prototypes for bus_alloc_resources() and
bus_release_resources().
2009-10-15 14:55:11 +00:00
John Baldwin
37b8ef16cd Add a facility for associating optional descriptions with active interrupt
handlers.  This is primarily intended as a way to allow devices that use
multiple interrupts (e.g. MSI) to meaningfully distinguish the various
interrupt handlers.
- Add a new BUS_DESCRIBE_INTR() method to the bus interface to associate
  a description with an active interrupt handler setup by BUS_SETUP_INTR.
  It has a default method (bus_generic_describe_intr()) which simply passes
  the request up to the parent device.
- Add a bus_describe_intr() wrapper around BUS_DESCRIBE_INTR() that supports
  printf(9) style formatting using var args.
- Reserve MAXCOMLEN bytes in the intr_handler structure to hold the name of
  an interrupt handler and copy the name passed to intr_event_add_handler()
  into that buffer instead of just saving the pointer to the name.
- Add a new intr_event_describe_handler() which appends a description string
  to an interrupt handler's name.
- Implement support for interrupt descriptions on amd64 and i386 by having
  the nexus(4) driver supply a custom bus_describe_intr method that invokes
  a new intr_describe() MD routine which in turn looks up the associated
  interrupt event and invokes intr_event_describe_handler().

Requested by:	many
Reviewed by:	scottl
MFC after:	2 weeks
2009-10-15 14:54:35 +00:00
Konstantin Belousov
6b286ee8b5 Currently, when signal is delivered to the process and there is a thread
not blocking the signal, signal is placed on the thread sigqueue. If
the selected thread is in kernel executing thr_exit() or sigprocmask()
syscalls, then signal might be not delivered to usermode for arbitrary
amount of time, and for exiting thread it is lost.

Put process-directed signals to the process queue unconditionally,
selecting the thread to deliver the signal only by the thread returning
to usermode, since only then the thread can handle delivery of signal
reliably. For exiting thread or thread that has blocked some signals,
check whether the newly blocked signal is queued for the process, and
try to find a thread to wakeup for delivery, in reschedule_signal(). For
exiting thread, assume that all signals are blocked.

Change cursig() and postsig() to look both into the thread and process
signal queues. When there is a signal that thread returning to usermode
could consume, TDF_NEEDSIGCHK flag is not neccessary set now. Do
unlocked read of p_siglist and p_pendingcnt to check for queued signals.

Note that thread that has a signal unblocked might get spurious wakeup
and EINTR from the interruptible system call now, due to the possibility
of being selected by reschedule_signals(), while other thread returned
to usermode earlier and removed the signal from process queue. This
should not cause compliance issues, since the thread has not blocked a
signal and thus should be ready to receive it anyway.

Reported by:	Justin Teller <justin.teller gmail com>
Reviewed by:	davidxu, jilles
MFC after:	1 month
2009-10-11 16:49:30 +00:00
Robert Watson
44a43f00ed Add a new errno, ENOTCAPABLE, to be returned when a process requests an
operation on a file descriptor that is not authorized by the descriptor's
capability flags.

MFC after:	1 month
Sponsored by:	Google
2009-10-07 20:20:51 +00:00
Rui Paulo
f49e371d3b Reserve numbers for XScale.
Reviewed by:	jkoshy
2009-10-02 11:14:12 +00:00
Edward Tomasz Napierala
2c29cfa083 Provide default implementation for VOP_ACCESS(9), so that filesystems which
want to provide VOP_ACCESSX(9) don't have to implement both.  Note that
this commit makes implementation of either of these two mandatory.

Reviewed by:	kib
2009-10-01 17:22:03 +00:00
Attilio Rao
ddce63ca73 When releasing a read/shared lock we need to use a write memory barrier
in order to avoid, on architectures which doesn't have strong ordered
writes, CPU instructions reordering.

Diagnosed by:	fabio
Reviewed by:	jhb
Tested by:	Giovanni Trematerra
		<giovanni dot trematerra at gmail dot com>
2009-09-30 13:26:31 +00:00
Robert Watson
718dbcfeaa Regenerate system call files following r197636. 2009-09-30 08:48:59 +00:00
Xin LI
82aebf697c Add two new fcntls to enable/disable read-ahead:
- F_READAHEAD: specify the amount for sequential access.  The amount is
   specified in bytes and is rounded up to nearest block size.
 - F_RDAHEAD: Darwin compatible version that use 128KB as the sequential
   access size.

A third argument of zero disables the read-ahead behavior.

Please note that the read-ahead amount is also constrainted by sysctl
variable, vfs.read_max, which may need to be raised in order to better
utilize this feature.

Thanks Igor Sysoev for proposing the feature and submitting the original
version, and kib@ for his valuable comments.

Submitted by:	Igor Sysoev <is rambler-co ru>
Reviewed by:	kib@
MFC after:	1 month
2009-09-28 16:59:47 +00:00
Alexander Motin
4598c08b9b Add more defines from recent and not only specs. 2009-09-27 20:48:10 +00:00
Stanislav Sedov
50989f36fc - Bump __FreeBSD_version to reflect the point when EVFILT_USER kevent filter
has been implemented.
2009-09-23 12:33:32 +00:00
Roman Divacky
1c2825bd80 Change unsigned foo to u_foo as required by style(9).
Requested by:	bde
Approved by:	ed (mentor)
2009-09-22 16:16:02 +00:00
Edward Tomasz Napierala
a9315dded6 Add pieces of infrastructure required for NFSv4 ACL support in UFS.
Reviewed by:	rwatson
2009-09-22 15:15:03 +00:00
Konstantin Belousov
51a6ef34fb Remove forward_roundrobin(), it is unused for quite some time.
Reviewed by:	jhb
MFC after:	1 week
2009-09-21 13:09:56 +00:00
Alan Cox
aa35c4db08 Add getpagesizes(3). This functions either the number of supported page
sizes or some number of the sizes themselves.  It is functionally
compatible with a function by the same name under Solaris.

Reviewed by:	jhb
2009-09-19 18:01:32 +00:00
Ed Schouten
b05f9c86d1 Make the keyboard layer Unicode aware.
Just take keyent_t to use an u_int to store the Unicode codepoints.
Unfortunately the keymap is now too big to be loaded using an ioctl
argument, so change the ioctl to pick a pointer.

This change breaks kbdcontrol ABI. It doesn't break X11, because X11
doesn't do anything with syscons keymaps. It just switches the device
out of K_XLATE.

Obtained from:	//depot/user/ed/newcons/...
2009-09-19 17:56:26 +00:00
Alan Cox
fe105d45a2 Add a new sysctl for reporting all of the supported page sizes.
Reviewed by:	jhb
MFC after:	3 weeks
2009-09-18 17:04:57 +00:00
Roman Divacky
6413e27b3a Fix the style of the previous commit.
Approved by:	ed (mentor, implicit)
2009-09-17 17:48:13 +00:00
Roman Divacky
abc8594d70 Make these argument/variable unsigned as the defines for them don't fit
into signed 32bit integer.

Approved by:	ed (mentor, implicit)
Approved by:	sson
2009-09-17 17:41:28 +00:00
Ed Schouten
46ce5e26a7 Extend the keyboard character size to 24 bits.
Because we use an int to store keyboard chacacters and their flags, we
can easily store the flags in the top byte instead of the second byte.
This means it's a lot easier to make Unicode work.

The only change that still needs to be made, is that keyent_t's map is
extended to u_int.

Obtained from:	//depot/projects/newcons/sys/sys/kbio.h
2009-09-16 07:01:11 +00:00
Stacey Son
fdc1a1131e Add EV_RECEIPT to kevents.
EV_RECEIPT is useful to disambiguating error conditions when multiple
events structures are passed to kevent(2).  The error code is returned
in the data field and EV_ERROR is set.

Approved by:	rwatson (co-mentor)
2009-09-16 03:49:54 +00:00
Stacey Son
1a921c410a Add the EV_DISPATCH flag to kevents.
When the EV_DISPATCH flag is used the event source will be disabled
immediately after the delivery of an event.   This is similar to the
EV_ONESHOT flag but it doesn't delete the event.

Approved by:	rwatson (co-mentor)
2009-09-16 03:37:39 +00:00
Stacey Son
2c2e449905 Add EVFILT_USER to kevents.
Add user events support to kernel events which are not associated with any
kernel mechanism but are triggered by user level code.  This is useful for
adding user level events to an event handler that may also be monitoring
kernel events.

Approved by:	rwatson (co-mentor)
2009-09-16 03:30:12 +00:00
Stacey Son
95128e983f Add optional touch event filter hooks to kevents.
The touch event filter is called when a kernel event data is possibly
updated.  There are two hook points.  First, during a kevent() system
call.  Second, when an event has been triggered.

Approved by:	rwatson (co-mentor)
2009-09-16 03:15:57 +00:00
Attilio Rao
9a3ca99927 Use explicit int values for the device states in order to allow,
if necessary, in the future, adds of new states without breaking ABI
between revisions.

Proposed by:	kib
Approved by:	imp
2009-09-15 16:59:52 +00:00
Attilio Rao
4c68dee0fb Revert r196779 in order to implement a different scheme for newbus locking
methodology.

Requested by:	imp
2009-09-13 15:08:19 +00:00
Dag-Erling Smørgrav
edefbdd7cb If a certain feature that was present in FreeBSD 7 was removed or changed in
FreeBSD 8, the compatibility shims should be built not just when FreeBSD 7
compatibility is requested, but also when compatibility with any older
FreeBSD version where that feature was present is requested.o

Without this patch, a kernel config that sets COMPAT_FREEBSD6 but not *7
would fail to build due to inconsistencies between the declaration of the
compatibility shims and their use in the SysV code.

There are similar errors in other *proto.h headers in the tree.

MFC after:	3 weeks
2009-09-10 08:33:28 +00:00
Konstantin Belousov
b55ef216fe kern_select(9) copies fd_set in and out of userspace in quantities of
longs. Since 32bit processes longs are 4 bytes, 64bit kernel may copy in
or out 4 bytes more then the process expected.

Calculate the amount of bytes to copy taking into account size of fd_set
for the current process ABI.

Diagnosed and tested by:	Peter Jeremy <peterjeremy acm org>
Reviewed by:	jhb
MFC after:	1 week
2009-09-09 20:59:01 +00:00
Warner Losh
fedf3e9ed3 kern_execve.c hasn't been around in ages, so update the file(s) where
a_magic is used instead of the a_midmag....

# maybe we can retire this hack soon...
2009-09-09 06:49:49 +00:00
Xin LI
efba048eb5 - Port x86emu to FreeBSD.
- Connect x86emu to build.

Tested with:	make universe
Submitted by:	swell.k at gmail com
2009-09-09 05:53:26 +00:00
Poul-Henning Kamp
a254d1f16d Get rid of the _NO_NAMESPACE_POLLUTION kludge by creating an
architecture specific include file containing the _ALIGN*
stuff which <sys/socket.h> needs.
2009-09-08 20:45:40 +00:00