Commit graph

605 commits

Author SHA1 Message Date
Alexander Leidinger
3680a41902 Backout the linux aio stuff. Several problems where identified and the
dynamic nature (if no native aio code is available, the linux part
returns ENOSYS because of missing requisites) should be solved differently
than it is.

All this will be done in P4.

Not included in this commit is a backout of the changes to the native aio
code (removing static in some places). Those changes (and some more) will
also be needed when the reworked linux aio stuff will reenter the tree.

Requested by:	rwatson
Discussed with:	rwatson
2006-10-29 14:02:39 +00:00
Alexander Leidinger
c4ce314b40 Fix style(9).
Noticed by:	rwatson
2006-10-28 16:47:38 +00:00
Alexander Leidinger
955d762aca MFP4:
Implement prctl().

Submitted by:	rdivacky
Tested with:	LTP
2006-10-28 10:59:59 +00:00
Robert Watson
aed5570872 Complete break-out of sys/sys/mac.h into sys/security/mac/mac_framework.h
begun with a repo-copy of mac.h to mac_framework.h.  sys/mac.h now
contains the userspace and user<->kernel API and definitions, with all
in-kernel interfaces moved to mac_framework.h, which is now included
across most of the kernel instead.

This change is the first step in a larger cleanup and sweep of MAC
Framework interfaces in the kernel, and will not be MFC'd.

Obtained from:	TrustedBSD Project
Sponsored by:	SPARTA
2006-10-22 11:52:19 +00:00
Alexander Leidinger
6474221698 Fix compile (use the right variable name). 2006-10-15 14:34:03 +00:00
Alexander Leidinger
6a1162d4cd MFP4 (with some minor changes):
Implement the linux_io_* syscalls (AIO). They are only enabled if the native
AIO code is available (either compiled in to the kernel or as a module) at
the time the functions are used. If the AIO stuff is not available there
will be a ENOSYS.

From the submitter:
---snip---
DESIGN NOTES:

1. Linux permits a process to own multiple AIO queues (distinguished by
   "context"), but FreeBSD creates only one single AIO queue per process.
   My code maintains a request queue (STAILQ of queue(3)) per "context",
   and throws all AIO requests of all contexts owned by a process into
   the single FreeBSD per-process AIO queue.

   When the process calls io_destroy(2), io_getevents(2), io_submit(2) and
   io_cancel(2), my code can pick out requests owned by the specified context
   from the single FreeBSD per-process AIO queue according to the per-context
   request queues maintained by my code.

2. The request queue maintained by my code stores contrast information between
   Linux IO control blocks (struct linux_iocb) and FreeBSD IO control blocks
   (struct aiocb). FreeBSD IO control block actually exists in userland memory
   space, required by FreeBSD native aio_XXXXXX(2).

3. It is quite troubling that the function io_getevents() of libaio-0.3.105
   needs to use Linux-specific "struct aio_ring", which is a partial mirror
   of context in user space. I would rather take the address of context in
   kernel as the context ID, but the io_getevents() of libaio forces me to
   take the address of the "ring" in user space as the context ID.

   To my surprise, one comment line in the file "io_getevents.c" of
   libaio-0.3.105 reads:

             Ben will hate me for this

REFERENCE:

1. Linux kernel source code:   http://www.kernel.org/pub/linux/kernel/v2.6/
   (include/linux/aio_abi.h, fs/aio.c)

2. Linux manual pages:         http://www.kernel.org/pub/linux/docs/manpages/
   (io_setup(2), io_destroy(2), io_getevents(2), io_submit(2), io_cancel(2))

3. Linux Scalability Effort:   http://lse.sourceforge.net/io/aio.html
   The design notes:           http://lse.sourceforge.net/io/aionotes.txt

4. The package libaio, both source and binary:
       http://rpmfind.net/linux/rpm2html/search.php?query=libaio
   Simple transparent interface to Linux AIO system calls.

5. Libaio-oracle:              http://oss.oracle.com/projects/libaio-oracle/
   POSIX AIO implementation based on Linux AIO system calls (depending on
   libaio).
---snip---

Submitted by:	Li, Xiao <intron@intron.ac>
2006-10-15 14:22:14 +00:00
Alexander Leidinger
687c23be1d MFP4 (107868 - 107870):
Use a macro to test for a valid signal instead of doing it my hand everywhere.

Submitted by:	rdivacky
2006-10-15 12:51:43 +00:00
John Baldwin
8528552b0d Don't pass unused bufsz to kern_shmctl(). 2006-10-10 22:46:50 +00:00
John Baldwin
f3ea244ea9 Only try to copyin a msqid for the IPC_SET command to msgctl(). Other
commands (such as IPC_RMID) were bogusly failing with EFAULT.

Tested by:	jkim
2006-10-10 22:46:22 +00:00
John Baldwin
7f4c1dd0d6 Remove unnecessary casts before PTRIN(). 2006-10-10 22:44:59 +00:00
Alexander Leidinger
28638377ad - change if (cond) panic() to KASSERT.
- Dont forget to free em in a case of error.

Suggested by:	ssouhlal
Submitted by:	rdivacky
Tested with:	LTP
2006-10-08 17:10:34 +00:00
Alexander Leidinger
7660ace19c - Replace homegrown check for FIFO with S_ISFIFO. [1]
- Check the status of the options before messing with it.

Inspired by:	NetBSD [1]
Submitted by:	rdivacky
Tested with:	LTP
2006-10-08 17:08:27 +00:00
Alexander Leidinger
d4b7423fa1 MFp4:
- Linux returns ENOPROTOOPT in a case of not supported opt to setsockopt.
- Return EISDIR in pread() when arg is a directory.
- Return EINVAL instead of EFAULT when namelen is not correct in accept().
- Return EINVAL instead of EACCESS if invalid access mode is entered in
  access().
- Return EINVAL instead of EADDRNOTAVAIL in a case of bad salen param
  to bind().

Submitted by:	rdivacky
Tested with:	LTP (vfork01 fails now, but it seems to be a race and
		not caused by those changes)
MFC after:	1 week
2006-09-23 19:06:54 +00:00
Alexander Leidinger
18f81b3dfa - don't reboot() when feed with wrong parameters (and enough permissions) [1]
- add support to power off the system [2]
- check the linux magic values [3]

Submitted by:	Marcin Cieslak <saper@SYSTEM.PL> [1,2]
Modelled after:	linux man page of the reboot() syscall [3]
Found by:	LTP testcase "reboot02" [1]
Tested with:	LTP testcase "reboot02" [1,3]
MFC after:	1 week
2006-09-16 14:12:04 +00:00
Alexander Leidinger
db0d964062 The Linux unlink syscall uses a different errno value when trying to unlink
a directory.

PR:		102897 [1]
Noticed by:	Knut Anders Hatlen <kahatlen@gmail.com>, testrun with LTP [1]
Submitted by:	Marcin Cieslak <saper@SYSTEM.PL>
Tested by:	netchild (LTP test run)
2006-09-10 13:47:56 +00:00
Alexander Leidinger
8618fd85a3 - Extend the coverage of PROC_LOCK to cover wakeup(&p->p_emuldata);
- Lock the emuldata in a case when we just created it.

Sponsored by:	Google SoC 2006
Submitted by:	rdivacky
Suggested by:	jhb
2006-09-09 16:55:55 +00:00
Alexander Leidinger
bb59e63f8f Change futex lock from mutex to sx. Make futex_get atomic (protected by the
futex lock).

Sponsored by:	Google SoC 2006
Submitted by:	rdivacky
Suggested by:	jhb
2006-09-09 16:25:25 +00:00
Alexander Leidinger
c19ddeda07 - don't wake every sleeper just the first one [1]
- remove debuging printf			[2]

Submitted by:	intron <mag@intron.ac> [1], rdivacky [2]
2006-09-09 13:04:28 +00:00
Suleiman Souhlal
c67e0cc9e7 FREE -> free
Submitted by:	rdivacky
2006-08-28 13:52:27 +00:00
Alexander Leidinger
835e506190 Add the linux statfs64 call. This allows Tivoli backup to proceed a little
but further on -current (still not successful, but a step into the right
direction).

Sponsored by:	Google SoC 2006
Submitted by:	rdivacky
Tested by:	Paul Mather <paul@gromit.dlib.vt.edu>
2006-08-27 08:56:54 +00:00
Alexander Leidinger
84ed9f91d8 Correct the number of retries in a futex_wake() call.
Sponsored by:	Google SoC 2006
Submitted by:	rdivacky
2006-08-26 10:36:16 +00:00
Robert Watson
3e8df637c0 Don't call suser_cred() directly from linux_sethostname(), as it just
wraps userland_sysctl(), which performs necessary privilege checks as
part of its normal operation.

MFC after:	1 week
2006-08-25 11:02:42 +00:00
Alexander Leidinger
1a28c0df09 Sync the MI parts for amd64 with i386 and remove the corresponding special
handling for amd64 in the common code. The MD parts for amd64 are still
outstanding, but at least this fixes some panics on amd64.

Sponsored by:	Google SoC 2006
Submitted by:	rdivacky
Tested by:	bsam
2006-08-20 13:50:27 +00:00
Alexander Leidinger
29ddc19bbf Get rid of some nested includes.
Sponsored by:	Google SoC 2006
Submitted by:	rdivacky
Noticed by:	jhb
2006-08-19 15:13:01 +00:00
Suleiman Souhlal
5342db0872 MALLOC -> malloc and FREE -> free
Submitted by:	rdivacky
Pointed out by:	jhb
2006-08-19 11:54:19 +00:00
Suleiman Souhlal
b273d5aa72 ifdef DEBUG a printf
Submitted by:	rdivacky
2006-08-19 11:07:22 +00:00
Alexander Leidinger
590e3a06e8 - disable some more code when osrelease=2.4.2
- protect td->td_proc->p_pid with the proc lock in linux_getpid
  in the amd64 (= non i386) case [1]

Sponsored by:	Google SoC 2006
Submitted by:	rdivacky
Noticed by:	netchild [1]
2006-08-17 21:21:30 +00:00
Alexander Leidinger
94cb2ecf79 Move some stuff into headers where they belong.
Sponsored by:	Google SoC 2006
Submitted by:	rdivacky
Noticed by:	jhb, ssouhlal
2006-08-17 21:06:48 +00:00
Alexander Leidinger
9b0bcbfbda Fix the DEBUG build:
- linux_emul.c [1]
 - linux_futex.c [2]

Sponsored by:	Google SoC 2006	[1]
Submitted by:	rdivacky	[1]
		netchild	[2]
2006-08-17 09:50:30 +00:00
Alexander Leidinger
0eef2f8a4e Style fixes to comments.
Sponsored by:	Google SoC 2006
Submitted by:	rdivacky
Noticed by:	jhb, ssouhlal
2006-08-16 18:54:51 +00:00
Alexander Leidinger
a43eeaabe4 Disable some parts of the code on amd64 for now to prevent a panic. A better
fix will come later.

Sponsored by:	Google SoC 2006
Submitted by:	rdivacky
2006-08-15 15:15:17 +00:00
Alexander Leidinger
9b44bfc556 Add the linux 2.6.x stuff (not used by default!):
- TLS - complete
 - pid/tid mangling - complete
 - thread area - complete
 - futexes - complete with issues
 - clone() extension - complete with some possible minor issues
 - mq*/timer*/clock* stuff - complete but untested and the mq* stuff is
   disabled when not build as part of the kernel with native FreeBSD mq*
   support (module support for this will come later)

Tested with:
 - linux-firefox - works, tested
 - linux-opera - works, tested
 - linux-realplay - doesnt work, issue with futexes
 - linux-skype - doesnt work, issue with futexes
 - linux-rt2-demo - works, tested
 - linux-acroread - doesnt work, unknown reason (coredump) and sometimes
   issue with futexes
 - various unix utilities in linux-base-gentoo3 and linux-base-fc4:
   everything tried worked

On amd64 not everything is supported like on i386, the catchup is planned for
later when the remaining bugs in the new functions are fixed.

To test this new stuff, you have to run
	sysctl compat.linux.osrelease=2.6.16
to switch back use
	sysctl compat.linux.osrelease=2.4.2

Don't switch while running a linux program, strange things may or may not
happen.

Sponsored by:			Google SoC 2006
Submitted by:			rdivacky
Some suggestions/help by:	jhb, kib, manu@NetBSD.org, netchild
2006-08-15 12:54:30 +00:00
Alexander Leidinger
ad2056f2c4 Add some new files needed for linux 2.6.x compatibility.
Please don't style(9) the NetBSD code, we want to stay in sync. Not imported
on a vendor branch since we need local changes.

Sponsored by:	Google SoC 2006
Submitted by:	rdivacky
With help from:	manu@NetBSD.org
Obtained from:	NetBSD (linux_{futex,time}.*)
2006-08-15 12:20:59 +00:00
John Baldwin
b4c63329d5 - Pass the MPSAFE flag to namei() in linux_uselib() and handle conditional
Giant VFS locking in that function.
- Remove bogus code to handle the case where namei() returns success but a
  NULL vnode pointer.
- Note that this code duplicates exec_check_permissions() and annotate
  where it differs.
- Hold the vnode lock longer to protect the write to set VV_TEXT in
  v_vflag.
- Mark linux_uselib() MPSAFE.

Reviewed by:	rwatson
2006-07-21 20:22:13 +00:00
John Baldwin
b33887ea31 Don't free the sockaddr in kern_bind() and kern_connect() as not all
callers pass a sockaddr allocated via malloc() from M_SONAME anymore.
Instead, free it in the callers when necessary.
2006-07-19 18:28:52 +00:00
John Baldwin
be5747d5b5 - Add conditional VFS Giant locking to getdents_common() (linux ABIs),
ibcs2_getdents(), ibcs2_read(), ogetdirentries(), svr4_sys_getdents(),
  and svr4_sys_getdents64() similar to that in getdirentries().
- Mark ibcs2_getdents(), ibcs2_read(), linux_getdents(), linux_getdents64(),
  linux_readdir(), ogetdirentries(), svr4_sys_getdents(), and
  svr4_sys_getdents64() MPSAFE.
2006-07-11 20:52:08 +00:00
John Baldwin
c1cccebe8b Add a kern_close() so that the ABIs can close a file descriptor w/o having
to populate a close_args struct and change some of the places that do.
2006-07-08 20:03:39 +00:00
John Baldwin
b1ee5b654d Rework kern_semctl a bit to always assume the UIO_SYSSPACE case. This
mostly consists of pushing a few copyin's and copyout's up into
__semctl() as all the other callers were already doing the UIO_SYSSPACE
case.  This also changes kern_semctl() to set the return value in a passed
in pointer to a register_t rather than td->td_retval[0] directly so that
callers can only set td->td_retval[0] if all the various copyout's succeed.

As a result of these changes, kern_semctl() no longer does copyin/copyout
(except for GETALL/SETALL) so simplify the locking to acquire the semakptr
mutex before the MAC check and hold it all the way until the end of the
big switch statement.  The GETALL/SETALL cases have to temporarily drop it
while they do copyin/malloc and copyout.  Also, simplify the SETALL case to
remove handling for a non-existent race condition.
2006-07-08 19:51:38 +00:00
John Baldwin
ad6d226d43 - Protect the list of linux ioctl handlers with an sx lock.
- Hold Giant while calling linux ioctl handlers for now as they aren't all
  known to be MPSAFE yet.
- Mark linux_ioctl() MPSAFE.
2006-07-06 21:42:36 +00:00
John Baldwin
4db580972e Axe the stackgap macros as the Linux ABIs no longer use the stackgap. 2006-06-27 18:30:49 +00:00
John Baldwin
49d409a108 - Add a kern_semctl() helper function for __semctl(). It accepts a pointer
to a copied-in copy of the 'union semun' and a uioseg to indicate which
  memory space the 'buf' pointer of the union points to.  This is then used
  in linux_semctl() and svr4_sys_semctl() to eliminate use of the stackgap.
- Mark linux_ipc() and svr4_sys_semsys() MPSAFE.
2006-06-27 18:28:50 +00:00
Alexander Leidinger
555f86b8b6 The linux times syscall can be called with a NULL pointer, so keep cool
and don't panic.

This fix is different from the patch submitted as it not only prevents
a NULL-pointer dereference, but also skips some work in this case.

Noticed by:	Dmitry Ganenko <dima@apk-inform.com>
Reviewed by:	rdivacky (the original version as in emulation@)
MFC after:	1 week
Security:	This is a RELENG_x_y candidate (local DoS).
Go ahead by:	secteam (cperciva)
2006-06-23 18:49:38 +00:00
Doug Ambrisko
edb75eca27 Fix file leaking in translate_path_major_minor. 2006-05-16 17:57:00 +00:00
Alexander Leidinger
01e0ffbae8 Now that we don't have a linuxolator on alpha anymore:
- unifdef __alpha__
 - revert rev. 1.66 of linux_socket.c
2006-05-10 20:38:16 +00:00
Alexander Leidinger
17138b619c Implement rt_sigpending in the linuxolator.
PR:		92671
Submitted by:	Markus Niemist"o <markus.niemisto@gmx.net>
2006-05-10 18:17:29 +00:00
Doug Ambrisko
03487601c2 Fix the the duplicate cut-n-paste in linux_fstat64 pointed out by
Alexander Leidinger.  I forget to fix it in this version.
2006-05-05 16:17:59 +00:00
Doug Ambrisko
060e488247 Enhance the Linux emulation layer to make MegaRAID SAS managements tool happy.
Add back in a scheme to emulate old type major/minor numbers via hooks into
stat, linprocfs to return major/minors that Linux app's expect.  Currently
only /dev/null is always registered.  Drivers can register via the Linux
type shim similar to the ioctl shim but by using
linux_device_register_handler/linux_device_unregister_handler functions.
The structure is:

    struct linux_device_handler {
        char    *bsd_driver_name;
        char    *linux_driver_name;
        char    *bsd_device_name;
        char    *linux_device_name;
        int     linux_major;
        int     linux_minor;
        int     linux_char_device;
    };

Linprocfs uses this to display the major number of the driver.  The
soon to be available linsysfs will use it to fill in the driver name.
Linux_stat uses it to translate the major/minor into Linux type values.

Note major numbers are dynamically assigned via passing in a -1 for
the major number so we don't need to keep track of them.

This is somewhat needed due to us switching to our devfs.  MegaCli
will not run until I add in the linsysfs and mfi Linux compat changes.

Sponsored by:	IronPort Systems
2006-05-05 16:10:45 +00:00
Robert Watson
f7f45ac8e2 Annotate uses of fgetsock() with indications that they should rely
on their existing file descriptor references to sockets, rather than
use fgetsock() to retrieve a direct socket reference.

MFC after:	3 months
2006-04-01 15:25:01 +00:00
Tai-hwa Liang
d9d46ed258 Unbreaking build by removing a now unused variable. 2006-03-27 23:27:11 +00:00
John Baldwin
b77619bd7f Use td_ucred rather than p_ucred to avoid panics and general unhappiness.
Pointy hat to:	netchild
2006-03-27 19:16:31 +00:00
Alexander Leidinger
1daa386fcf Fix the LINT build on alpha:
- rename some file local structure definitions, the names clash with
  autogenerated names
- on !alpha add some compatibility defines for those renamed structures
- make some functions globally visible on alpha
2006-03-21 21:56:04 +00:00
Alexander Leidinger
61da9d97fb Fix tinderbox on alpha.
Tested by:	cross-compile
2006-03-20 19:46:56 +00:00
Ruslan Ermilov
aefce619cf Unbreak COMPAT_LINUX32 option support on amd64.
Broken by:	netchild
2006-03-19 11:10:33 +00:00
Alexander Leidinger
d4a3f5ddb6 Fixup some problems in my previous commit (COMPAT_43).
Pointyhat to:	netchild
2006-03-18 20:47:36 +00:00
Alexander Leidinger
5c8919adf4 Get rid of the need of COMPAT_43 in the linuxolator.
Submitted by:	Divacky Roman <xdivac02@stud.fit.vutbr.cz>
Obtained from:	DragonFly (some parts)
2006-03-18 18:20:17 +00:00
Jeff Roberson
c4be19469a - Remove ifdef disabled code that doesn't have a chance of working anymore. 2006-02-06 10:10:42 +00:00
Jeff Roberson
d6791f7615 - vn_lock with LK_RETRY can not return an error. The code that handled this
case was not necessary.

Sponsored by:	Isilon Systems, Inc.
2006-01-30 08:22:56 +00:00
Olivier Houchard
d425dbec89 Fix a typo : deivce => device
Spotted by:	rwatson
2006-01-26 21:48:50 +00:00
Olivier Houchard
e83d253beb Linux compat bits needed to make linux programs use the new ptys :
linux_ioctl.[ch] : Implement LINUX_TIOCGPTN, which returns the pty number
linux_stats.c :
	- Return the magic number for devfs.
	- In various stats()-related functions, check that we're stating a
file in /dev/pts, and if so, change the st_rdev field to match what linux
expects to be there for a slave pty device. The glibc checks for this, and
their openpty() fails if it is no correct.
2006-01-26 01:32:46 +00:00
Tom Rhodes
0e36e11d57 Cast tv_sec to intmax_t and print with %jd in some ifdef'ed code. 2005-12-28 07:08:54 +00:00
Gleb Smirnoff
3c6160327d Add \n to log() message.
Submitted by:	Stanislaw Halik <weirdo tehran.lain.pl>
2005-12-27 00:17:11 +00:00
John Baldwin
410d857972 Remove linux_mib_destroy() (which I actually added in between 5.0 and 5.1)
which existed to cleanup the linux_osname mutex.  Now that MTX_SYSINIT()
has grown a SYSUNINIT to destroy mutexes on unload, the extra destroy here
was redundant and resulted in panics in debug kernels.

MFC after:	1 week
Reported by:	Goran Gajic ggajic at afrodita dot rcub dot bg dot ac dot yu
2005-12-15 16:30:41 +00:00
Xin LI
1278dd6847 In Linux, kernel parameters passed to ioctl are by value, while in FreeBSD
they are passed by reference.  Handle the difference within the
linux_ioctl_termio on the LINUX_TCFLSH path.

Submitted by:	Jaroslav Drzik <jaro_AT_coop-voz_dot_sk>
2005-12-13 15:32:52 +00:00
Gleb Smirnoff
7a14354549 Suppress logging about unimplemented syscalls to one time per process. This
prevents hard flood of the system console.

Reviewed by:	bde
2005-12-08 13:33:57 +00:00
Ruslan Ermilov
f4e9888107 Fix -Wundef. 2005-12-04 02:12:43 +00:00
David Xu
9104847f21 1. Change prototype of trapsignal and sendsig to use ksiginfo_t *, most
changes in MD code are trivial, before this change, trapsignal and
   sendsig use discrete parameters, now they uses member fields of
   ksiginfo_t structure. For sendsig, this change allows us to pass
   POSIX realtime signal value to user code.

2. Remove cpu_thread_siginfo, it is no longer needed because we now always
   generate ksiginfo_t data and feed it to libpthread.

3. Add p_sigqueue to proc structure to hold shared signals which were
   blocked by all threads in the proc.

4. Add td_sigqueue to thread structure to hold all signals delivered to
   thread.

5. i386 and amd64 now return POSIX standard si_code, other arches will
   be fixed.

6. In this sigqueue implementation, pending signal set is kept as before,
   an extra siginfo list holds additional siginfo_t data for signals.
   kernel code uses psignal() still behavior as before, it won't be failed
   even under memory pressure, only exception is when deleting a signal,
   we should call sigqueue_delete to remove signal from sigqueue but
   not SIGDELSET. Current there is no kernel code will deliver a signal
   with additional data, so kernel should be as stable as before,
   a ksiginfo can carry more information, for example, allow signal to
   be delivered but throw away siginfo data if memory is not enough.
   SIGKILL and SIGSTOP have fast path in sigqueue_add, because they can
   not be caught or masked.
   The sigqueue() syscall allows user code to queue a signal to target
   process, if resource is unavailable, EAGAIN will be returned as
   specification said.
   Just before thread exits, signal queue memory will be freed by
   sigqueue_flush.
   Current, all signals are allowed to be queued, not only realtime signals.

Earlier patch reviewed by: jhb, deischen
Tested on: i386, amd64
2005-10-14 12:43:47 +00:00
Robert Watson
5f419982c2 Back out alpha/alpha/trap.c:1.124, osf1_ioctl.c:1.14, osf1_misc.c:1.57,
osf1_signal.c:1.41, amd64/amd64/trap.c:1.291, linux_socket.c:1.60,
svr4_fcntl.c:1.36, svr4_ioctl.c:1.23, svr4_ipc.c:1.18, svr4_misc.c:1.81,
svr4_signal.c:1.34, svr4_stat.c:1.21, svr4_stream.c:1.55,
svr4_termios.c:1.13, svr4_ttold.c:1.15, svr4_util.h:1.10,
ext2_alloc.c:1.43, i386/i386/trap.c:1.279, vm86.c:1.58,
unaligned.c:1.12, imgact_elf.c:1.164, ffs_alloc.c:1.133:

Now that Giant is acquired in uprintf() and tprintf(), the caller no
longer leads to acquire Giant unless it also holds another mutex that
would generate a lock order reversal when calling into these functions.
Specifically not backed out is the acquisition of Giant in nfs_socket.c
and rpcclnt.c, where local mutexes are held and would otherwise violate
the lock order with Giant.

This aligns this code more with the eventual locking of ttys.

Suggested by:	bde
2005-09-28 07:03:03 +00:00
Robert Watson
84d2b7df26 Add GIANT_REQUIRED and WITNESS sleep warnings to uprintf() and tprintf(),
as they both interact with the tty code (!MPSAFE) and may sleep if the
tty buffer is full (per comment).

Modify all consumers of uprintf() and tprintf() to hold Giant around
calls into these functions.  In most cases, this means adding an
acquisition of Giant immediately around the function.  In some cases
(nfs_timer()), it means acquiring Giant higher up in the callout.

With these changes, UFS no longer panics on SMP when either blocks are
exhausted or inodes are exhausted under load due to races in the tty
code when running without Giant.

NB: Some reduction in calls to uprintf() in the svr4 code is probably
desirable.

NB: In the case of nfs_timer(), calling uprintf() while holding a mutex,
or even in a callout at all, is a bad idea, and will generate warnings
and potential upset.  This needs to be fixed, but was a problem before
this change.

NB: uprintf()/tprintf() sleeping is generally a bad ideas, as is having
non-MPSAFE tty code.

MFC after:	1 week
2005-09-19 16:51:43 +00:00
Xin LI
e68796868a Fix kernel build.
Reported by:	tinderbox
2005-08-28 13:11:08 +00:00
Craig Rodrigues
8739cd44d0 Rewrite linux_ifconf() to be more like ifconf() in net/if.c
so that we do not call uiomove() while IFNET_RLOCK() is held.
This eliminates the witness warning:

Calling uiomove() with the following non-sleepable locks held:
exclusive sleep mutex ifnet r = 0 (0xc096dd60) locked @
/usr/src/sys/modules/linux/../../compat/linux/linux_ioctl.c:2170

MFC after:	2 days
2005-08-27 14:44:10 +00:00
Robert Watson
13f4c340ae Propagate rename of IFF_OACTIVE and IFF_RUNNING to IFF_DRV_OACTIVE and
IFF_DRV_RUNNING, as well as the move from ifnet.if_flags to
ifnet.if_drv_flags.  Device drivers are now responsible for
synchronizing access to these flags, as they are in if_drv_flags.  This
helps prevent races between the network stack and device driver in
maintaining the interface flags field.

Many __FreeBSD__ and __FreeBSD_version checks maintained and continued;
some less so.

Reviewed by:	pjd, bz
MFC after:	7 days
2005-08-09 10:20:02 +00:00
John Baldwin
813a5e14ec Move MODULE_DEPEND() statements for SYSVIPC dependencies to linux_ipc.c
so that they aren't duplicated 3 times and are also in the same file as
the code that depends on the SYSVIPC modules.
2005-07-29 19:40:39 +00:00
John Baldwin
02295eedc7 Add Giant around linux_getcwd_common() in linux_getcwd().
Approved by:	re (scottl)
2005-07-09 12:34:49 +00:00
John Baldwin
4641373fde Add missing locking to linux_connect() so that it can be marked MP safe:
- Conditionally grab Giant around the EISCONN hack at the end based on
  debug.mpsafenet.
- Protect access to so_emuldata via SOCK_LOCK.

Reviewed by:	rwatson
Approved by:	re (scottl)
2005-07-09 12:26:22 +00:00
John Baldwin
8d948cd1ec Fix the computation of uptime for linux_sysinfo(). Before it was returning
the uptime in seconds mod 60 which wasn't very useful.

Approved by:	re (scottl)
2005-07-07 19:17:55 +00:00
Pawel Jakub Dawidek
06a137780b Actually only protect mount-point if security.jail.enforce_statfs is set to 2.
If we don't return statistics about requested file systems, system tools
may not work correctly or at all.

Approved by:	re (scottl)
2005-06-23 22:13:29 +00:00
Pawel Jakub Dawidek
820a0de9a9 Rename sysctl security.jail.getfsstatroot_only to security.jail.enforce_statfs
and extend its functionality:

value	policy
0	show all mount-points without any restrictions
1	show only mount-points below jail's chroot and show only part of the
	mount-point's path (if jail's chroot directory is /jails/foo and
	mount-point is /jails/foo/usr/home only /usr/home will be shown)
2	show only mount-point where jail's chroot directory is placed.

Default value is 2.

Discussed with:	rwatson
2005-06-09 18:49:19 +00:00
Maxim Sobolev
bc165ab0fe Properly convert FreeBSD priority values into Linux values in the
getpriority(2) syscall.

PR:		kern/81951
Submitted by:	Andriy Gapon <avg@icyb.net.ua>
2005-06-08 20:41:28 +00:00
Pawel Jakub Dawidek
d0cad55da8 Remove (now) unused argument 'td' from bsd_to_linux_statfs(). 2005-05-27 19:25:39 +00:00
Pawel Jakub Dawidek
672d95c55d The code is under '#ifdef not_that_way', but anyway:
- Add missing prison_check_mount() check.
2005-05-22 22:30:31 +00:00
Pawel Jakub Dawidek
a0e96a49df If we need to hide fsid, kern_statfs()/kern_fstatfs() will do it for us,
so do not duplicate the code in cvtstatfs().
Note, that we now need to clear fsid in freebsd4_getfsstat().

This moves all security related checks from functions like cvtstatfs()
and will allow to add more security related stuff (like statfs(2), etc.
protection for jails) a bit easier.
2005-05-22 21:52:30 +00:00
Jeff Roberson
7625cbf3cc - Pass the ISOPEN flag to namei so filesystems will know we're about to
open them or otherwise access the data.
2005-04-27 09:05:19 +00:00
Jeff Roberson
4585e3ac5a - Change all filesystems and vfs_cache to relock the dvp once the child is
locked in the ISDOTDOT case.  Se vfs_lookup.c r1.79 for details.

Sponsored by:	Isilon Systems, Inc.
2005-04-13 10:59:09 +00:00
Matthew N. Dodd
f9763094f1 Implement SOUND_MIXER_INFO ioctl in compat layer. 2005-04-13 04:33:06 +00:00
Matthew N. Dodd
73c730a694 Add support for O_NOFOLLOW and O_DIRECT to Linux fcntl() F_GETFL/F_SETFL. 2005-04-13 04:31:43 +00:00
John Baldwin
98df9218da - Change the vm_mmap() function to accept an objtype_t parameter specifying
the type of object represented by the handle argument.
- Allow vm_mmap() to map device memory via cdev objects in addition to
  vnodes and anonymous memory.  Note that mmaping a cdev directly does not
  currently perform any MAC checks like mapping a vnode does.
- Unbreak the DRM getbufs ioctl by having it call vm_mmap() directly on the
  cdev the ioctl is acting on rather than trying to find a suitable vnode
  to map from.

Reviewed by:	alc, arch@
2005-04-01 20:00:11 +00:00
Jeff Roberson
9f3d9acd26 - Initial cn_lkflags to LK_EXCLUSIVE.
Sponsored by:	Isilon Systems, Inc.
2005-03-29 10:16:12 +00:00
Brooks Davis
044ba81b85 Use the CTASSERT() macro instead of rolling my own, non-portable one
using #error.

Suggested by:	jhb
2005-03-24 19:26:50 +00:00
Brooks Davis
fe753c29f7 Compile errors are way more useful then panics later.
Replace a KASSERT of LINUX_IFNAMSIZ == IFNAMSIZ with a preprocessor
check and #error message.  This will prevent nasty suprises if users
change IFNAMSIZ without updating the linux code appropriatly.
2005-03-24 17:51:15 +00:00
David Schultz
aa675b572f Reject packets larger than IP_MAXPACKET in linux_sendto() for sockets
with the IP_HDRINCL option set.  Without this change, a Linux process
with access to a raw socket could cause a kernel panic.  Raw sockets
must be created by root, and are generally not consigned to untrusted
applications; hence, the security implications of this bug are
minimal.  I believe this only affects 6-CURRENT on or after 2005-01-30.

Found by:	Coverity Prevent analysis tool
Security:	Local DOS
2005-03-23 08:28:00 +00:00
Poul-Henning Kamp
bbbc2d967e Neuter the duplicated disk-device magic code for now. Somebody with
serious linux-clue is necessary to fix this properly.
2005-03-15 11:58:40 +00:00
Maxim Sobolev
8d6e40c3f1 Add kernel-only flag MSG_NOSIGNAL to be used in emulation layers to surpress
SIGPIPE signal for the duration of the sento-family syscalls. Use it to
replace previously added hack in Linux layer based on temporarily setting
SO_NOSIGPIPE flag.

Suggested by:	alfred
2005-03-08 16:11:41 +00:00
Maxim Sobolev
2302f0fea8 Handle MSG_NOSIGNAL flag in linux_send() by setting SO_NOSIGPIPE on socket
for the duration of the send() call. Such approach may be less than ideal
in threading environment, when several threads share the same socket and it
might happen that several of them are calling linux_send() at the same time
with and without SO_NOSIGPIPE set.

However, such race condition is very unlikely in practice, therefore this
change provides practical improvement compared to the previous behaviour.

PR:		kern/76426
Submitted by:	Steven Hartland <killing@multiplay.co.uk>
MFC after:	3 days
2005-03-07 07:26:42 +00:00
Maxim Sobolev
e3478fe000 Handle unimplemented syscall by instantly returning ENOSYS instead of sending
signal first and only then returning ENOSYS to match what real linux does.

PR:		kern/74302
Submitted by:	Travis Poppe <tlp@LiquidX.org>
2005-03-07 00:18:06 +00:00
John Baldwin
501ce30561 Remove linux_emul_find() and the CHECKALT*() macros as they are no longer
used.
2005-03-01 17:57:45 +00:00
Poul-Henning Kamp
1e247cc2ce Neuter linux_ustat() until somebody finds time to try to fix it.
The fundamental problem is that we get only the lower 8 bits of the
minor device number so there is no guarantee that we can actually
find the disk device in question at all.

This was probably a bigger issue pre-GEOM where the upper bits
signaled which slice were in use.

The secondary problem is how we get from (partial) dev_t to vnode.

The correct implementation will involve traversing the mount list
looking for a perfect match or a possible match (for truncated
minor).
2005-02-22 13:39:46 +00:00
Nate Lawson
1e8d246eee Unbreak the kernel build. Pointy hat to: sobomax. 2005-02-13 19:50:57 +00:00
Maxim Sobolev
1a88a252fd Backout previous change (disabling of security checks for signals delivered
in emulation layers), since it appears to be too broad.

Requested by:   rwatson
2005-02-13 17:37:20 +00:00
Maxim Sobolev
d8ff44b79f Split out kill(2) syscall service routine into user-level and kernel part, the
former is callable from user space and the latter from the kernel one. Make
kernel version take additional argument which tells if the respective call
should check for additional restrictions for sending signals to suid/sugid
applications or not.

Make all emulation layers using non-checked version, since signal numbers in
emulation layers can have different meaning that in native mode and such
protection can cause misbehaviour.

As a result remove LIBTHR from the signals allowed to be delivered to a
suid/sugid application.

Requested (sorta) by:	rwatson
MFC after:	2 weeks
2005-02-13 16:42:08 +00:00
Maxim Sobolev
282fae35d6 Semctl with IPC_STAT command should return zero in case of success.
PR:		73778
Submitted by:	Andriy Gapon <avg@icyb.net.ua>
MFC after:	2 weeks
2005-02-11 13:46:55 +00:00