Commit graph

191 commits

Author SHA1 Message Date
Dmitry Chagin
e667ee63f6 Whitespaces and style(9) fix. No functional changes.
MFC after:	1 week
2016-03-27 08:10:20 +00:00
Andrey V. Elsukov
86a9058b01 Add support for IPPROTO_IPV6 socket layer for getsockopt/setsockopt calls.
Also add mapping for several options from RFC 3493 and 3542.

Reviewed by:	dchagin
Tested by:	Joe Love <joe at getsomwhere dot net>
MFC after:	2 weeks
2016-03-09 09:12:40 +00:00
Dmitry Chagin
91f514e413 Does not leak fp. While here remove bogus cast of fp->f_data.
MFC after:	1 week
2016-03-08 15:55:43 +00:00
Dmitry Chagin
fc4b98fb88 Linux accept() system call return EOPNOTSUPP errno instead of EINVAL
for UDP sockets.

MFC after:	1 week
2016-03-08 15:15:34 +00:00
Dmitry Chagin
67968b35a0 Prevent double free of control in common sendmsg path as sosend
already freeing it.
2016-01-17 19:28:13 +00:00
Jung-uk Kim
1a01bdf906 Properly initialize flags for accept4(2) not to return spurious EINVAL.
Note this fixes a Linuxulator regression introduced in r283490.

PR:		200662
2015-06-08 20:03:15 +00:00
Dmitry Chagin
bbf392d5ef Convert SCM_TIMESTAMP in recvmsg(). 2015-05-24 18:13:21 +00:00
Dmitry Chagin
4f65e9cff4 Fix an mbuf(9) leak in sendmsg() under failure condition and
remove unneeded check for failed M_WAITOK allocation.

Found by: Brainy Code Scanner
Reported by: Maxime Villard
2015-05-24 18:10:07 +00:00
Dmitry Chagin
4cf10e2934 Since FreeBSD supports SOCK_CLOEXEC & SOCK_NONBLOCK options
remove its emulation via fcntl call from Linuxulator.
2015-05-24 18:06:12 +00:00
Dmitry Chagin
e1ff74c0f7 Implement recvmmsg() and sendmmsg() system calls. 2015-05-24 18:04:04 +00:00
Dmitry Chagin
4d0f380d87 To avoid code duplication move open/fcntl definitions to the MI
header file.

Differential Revision:	https://reviews.freebsd.org/D1087
Reviewed by:	trasz
2015-05-24 16:31:44 +00:00
Dmitry Chagin
9599b0ec3a Rewrite linux_recvfrom. To avoid double conversion of sockaddr use
kern_recvit() directly.
And check fromlen parameter before sockaddr copyin and conversion.

Differential Revision:	https://reviews.freebsd.org/D1082
2015-05-24 16:26:55 +00:00
Dmitry Chagin
e0d3ea8c65 Where possible we will use M_LINUX malloc(9) type.
Move M_FUTEX defines to the linux_common.ko.

Differential Revision:	https://reviews.freebsd.org/D1077
Reviewed by:	emaste
2015-05-24 16:14:41 +00:00
Dmitry Chagin
7f8f1d7f7a Disable i386 call for x86-64 Linux.
Differential Revision:	https://reviews.freebsd.org/D1067
Reviewed by:	trasz
2015-05-24 15:43:53 +00:00
Dmitry Chagin
a12b9b3d96 64-bit paltforms, like x86_64, do not use multiplexing on
socketcall system calls.

Differential Revision:	https://reviews.freebsd.org/D1065
Reviewed by:	trasz
2015-05-24 15:41:27 +00:00
Dmitry Chagin
857ad5a31b Fix Clang -Wpointer-sign warnings.
MFC after:	1 week
2015-01-01 20:53:38 +00:00
Konstantin Belousov
6e646651d3 Remove the no-at variants of the kern_xx() syscall helpers. E.g., we
have both kern_open() and kern_openat(); change the callers to use
kern_openat().

This removes one (sometimes two) levels of indirection and
consolidates arguments checks.

Reviewed by:	mckusick
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
2014-11-13 18:01:51 +00:00
Robert Watson
4a14441044 Update kernel inclusions of capability.h to use capsicum.h instead; some
further refinement is required as some device drivers intended to be
portable over FreeBSD versions rely on __FreeBSD_version to decide whether
to include capability.h.

MFC after:	3 weeks
2014-03-16 10:55:57 +00:00
Gleb Smirnoff
eedc7fd9e8 Provide includes that are needed in these files, and before were read
in implicitly via if.h -> if_var.h pollution.

Sponsored by:	Netflix
Sponsored by:	Nginx, Inc.
2013-10-26 18:18:50 +00:00
Pawel Jakub Dawidek
7008be5bd7 Change the cap_rights_t type from uint64_t to a structure that we can extend
in the future in a backward compatible (API and ABI) way.

The cap_rights_t represents capability rights. We used to use one bit to
represent one right, but we are running out of spare bits. Currently the new
structure provides place for 114 rights (so 50 more than the previous
cap_rights_t), but it is possible to grow the structure to hold at least 285
rights, although we can make it even larger if 285 rights won't be enough.

The structure definition looks like this:

	struct cap_rights {
		uint64_t	cr_rights[CAP_RIGHTS_VERSION + 2];
	};

The initial CAP_RIGHTS_VERSION is 0.

The top two bits in the first element of the cr_rights[] array contain total
number of elements in the array - 2. This means if those two bits are equal to
0, we have 2 array elements.

The top two bits in all remaining array elements should be 0.
The next five bits in all array elements contain array index. Only one bit is
used and bit position in this five-bits range defines array index. This means
there can be at most five array elements in the future.

To define new right the CAPRIGHT() macro must be used. The macro takes two
arguments - an array index and a bit to set, eg.

	#define	CAP_PDKILL	CAPRIGHT(1, 0x0000000000000800ULL)

We still support aliases that combine few rights, but the rights have to belong
to the same array element, eg:

	#define	CAP_LOOKUP	CAPRIGHT(0, 0x0000000000000400ULL)
	#define	CAP_FCHMOD	CAPRIGHT(0, 0x0000000000002000ULL)

	#define	CAP_FCHMODAT	(CAP_FCHMOD | CAP_LOOKUP)

There is new API to manage the new cap_rights_t structure:

	cap_rights_t *cap_rights_init(cap_rights_t *rights, ...);
	void cap_rights_set(cap_rights_t *rights, ...);
	void cap_rights_clear(cap_rights_t *rights, ...);
	bool cap_rights_is_set(const cap_rights_t *rights, ...);

	bool cap_rights_is_valid(const cap_rights_t *rights);
	void cap_rights_merge(cap_rights_t *dst, const cap_rights_t *src);
	void cap_rights_remove(cap_rights_t *dst, const cap_rights_t *src);
	bool cap_rights_contains(const cap_rights_t *big, const cap_rights_t *little);

Capability rights to the cap_rights_init(), cap_rights_set(),
cap_rights_clear() and cap_rights_is_set() functions are provided by
separating them with commas, eg:

	cap_rights_t rights;

	cap_rights_init(&rights, CAP_READ, CAP_WRITE, CAP_FSTAT);

There is no need to terminate the list of rights, as those functions are
actually macros that take care of the termination, eg:

	#define	cap_rights_set(rights, ...)				\
		__cap_rights_set((rights), __VA_ARGS__, 0ULL)
	void __cap_rights_set(cap_rights_t *rights, ...);

Thanks to using one bit as an array index we can assert in those functions that
there are no two rights belonging to different array elements provided
together. For example this is illegal and will be detected, because CAP_LOOKUP
belongs to element 0 and CAP_PDKILL to element 1:

	cap_rights_init(&rights, CAP_LOOKUP | CAP_PDKILL);

Providing several rights that belongs to the same array's element this way is
correct, but is not advised. It should only be used for aliases definition.

This commit also breaks compatibility with some existing Capsicum system calls,
but I see no other way to do that. This should be fine as Capsicum is still
experimental and this change is not going to 9.x.

Sponsored by:	The FreeBSD Foundation
2013-09-05 00:09:56 +00:00
Eitan Adler
1eb9ea583b Remove check for NULL prior to free(9) and m_freem(9).
Approved by:	cperciva (mentor)
2013-03-04 02:21:34 +00:00
John Baldwin
fb709557a3 Don't assume that all Linux TCP-level socket options are identical to
FreeBSD TCP-level socket options (only the first two are).  Instead,
using a mapping function and fail unsupported options as we do for other
socket option levels.

MFC after:	2 weeks
2013-01-23 21:44:48 +00:00
Gleb Smirnoff
eb1b1807af Mechanically substitute flags from historic mbuf allocator with
malloc(9) flags within sys.

Exceptions:

- sys/contrib not touched
- sys/mbuf.h edited manually
2012-12-05 08:04:20 +00:00
Ulrich Spörlein
9a14aa017b Convert files to UTF-8 2012-01-15 13:23:18 +00:00
Jung-uk Kim
bf3a36cc7b Use the caculated length instead of maximum length. 2011-10-06 21:55:05 +00:00
Jung-uk Kim
b6f96462ba Remove a now-defunct variable. 2011-10-06 21:40:08 +00:00
Jung-uk Kim
3106d6704b Use uint32_t instead of u_int32_t. Fix style(9) nits. 2011-10-06 21:17:46 +00:00
Jung-uk Kim
c02637c717 Make sure to ignore the leading NULL byte from Linux abstract namespace. 2011-10-06 21:09:28 +00:00
Jung-uk Kim
f05531a392 Restore the original socket address length if it was not really AF_INET6. 2011-10-06 20:48:23 +00:00
Jung-uk Kim
43399111a7 Retern more appropriate errno when Linux path name is too long. 2011-10-06 20:28:08 +00:00
Jung-uk Kim
0007f669ca Inline do_sa_get() function and remove an unused return value. 2011-10-06 20:20:30 +00:00
Jung-uk Kim
c15cdbf2f3 Unroll inlined strnlen(9) and make it easier to read. No functional change. 2011-10-06 19:59:14 +00:00
Colin Percival
5da3eb94fc Fix a bug in UNIX socket handling in the linux emulator which was
exposed by the security fix in FreeBSD-SA-11:05.unix.

Approved by:	so (cperciva)
Approved by:	re (kib)
Security:	Related to FreeBSD-SA-11:05.unix, but not actually
		a security fix.
2011-10-04 19:07:38 +00:00
Kip Macy
8451d0dd78 In order to maximize the re-usability of kernel code in user space this
patch modifies makesyscalls.sh to prefix all of the non-compatibility
calls (e.g. not linux_, freebsd32_) with sys_ and updates the kernel
entry points and all places in the code that use them. It also
fixes an additional name space collision between the kernel function
psignal and the libc function of the same name by renaming the kernel
psignal kern_psignal(). By introducing this change now we will ease future
MFCs that change syscalls.

Reviewed by:	rwatson
Approved by:	re (bz)
2011-09-16 13:58:51 +00:00
Robert Watson
a9d2f8d84f Second-to-last commit implementing Capsicum capabilities in the FreeBSD
kernel for FreeBSD 9.0:

Add a new capability mask argument to fget(9) and friends, allowing system
call code to declare what capabilities are required when an integer file
descriptor is converted into an in-kernel struct file *.  With options
CAPABILITIES compiled into the kernel, this enforces capability
protection; without, this change is effectively a no-op.

Some cases require special handling, such as mmap(2), which must preserve
information about the maximum rights at the time of mapping in the memory
map so that they can later be enforced in mprotect(2) -- this is done by
narrowing the rights in the existing max_protection field used for similar
purposes with file permissions.

In namei(9), we assert that the code is not reached from within capability
mode, as we're not yet ready to enforce namespace capabilities there.
This will follow in a later commit.

Update two capability names: CAP_EVENT and CAP_KEVENT become
CAP_POST_KEVENT and CAP_POLL_KEVENT to more accurately indicate what they
represent.

Approved by:	re (bz)
Submitted by:	jonathan
Sponsored by:	Google Inc
2011-08-11 12:30:23 +00:00
Andriy Gapon
a930718af1 Revert r220032:linux compat: add SO_PASSCRED option with basic handling
I have not properly thought through the commit.  After r220031 (linux
compat: improve and fix sendmsg/recvmsg compatibility) the basic
handling for SO_PASSCRED is not sufficient as it breaks recvmsg
functionality for SCM_CREDS messages because now we would need to handle
sockcred data in addition to cmsgcred.  And that is not implemented yet.

Pointyhat to:	avg
2011-03-31 08:14:51 +00:00
Andriy Gapon
01a9e1a11b linux compat: add SO_PASSCRED option with basic handling
This seems to have been a part of a bigger patch by dchagin that either
haven't been committed or committed partially.

Submitted by:	dchagin, nox
MFC after:	2 weeks
2011-03-26 11:25:36 +00:00
Andriy Gapon
605da56bc3 linux compat: improve and fix sendmsg/recvmsg compatibility
- implement baseic stubs for capget, capset, prctl PR_GET_KEEPCAPS
  and prctl PR_SET_KEEPCAPS.
- add SCM_CREDS support to sendmsg and recvmsg
- modify sendmsg to ignore control messages if not using UNIX
  domain sockets

This should allow linux pulse audio daemon and client work on FreeBSD
and interoperate with native counter-parts modulo the differences in
pulseaudio versions.

PR:		kern/149168
Submitted by:	John Wehle <john@feith.com>
Reviewed by:	netchild
MFC after:	2 weeks
2011-03-26 11:05:53 +00:00
Xin LI
5cb9c68cc9 - Return EAFNOSUPPORT instead of EINVAL for unsupported address family,
this matches the Linux behavior.
 - Check if we have sufficient space allocated for socket structure, which
   fixes a buffer overflow when wrong length is being passed into the
   emulation layer. [1]

PR:		kern/138860
Submitted by:	Mateusz Guzik <mjguzik gmail com>
Reported by:	Alexander Best [1]
MFC after:	2 weeks
2010-02-09 22:30:51 +00:00
Bjoern A. Zeeb
d97bee3e7e Unconditionally call the setsockopt for IPV6_V6ONLY for v6 linux sockets
no matter whether we are compiled as module or if our default of the
net.inet6.ip6.v6only sysctl already matches what we would set.

This avoids unnecessary complications with modules, VIMAGES, INET6 and
the sysctl value, especially considering that most users will use
linux compat as a module.

Discussed with:	kib, rwatson (weeks ago)
Reviewed by:	rwatson
MFC after:	6 weeks
2009-10-25 09:58:56 +00:00
Robert Watson
530c006014 Merge the remainder of kern_vimage.c and vimage.h into vnet.c and
vnet.h, we now use jails (rather than vimages) as the abstraction
for virtualization management, and what remained was specific to
virtual network stacks.  Minor cleanups are done in the process,
and comments updated to reflect these changes.

Reviewed by:	bz
Approved by:	re (vimage blanket)
2009-08-01 19:26:27 +00:00
Robert Watson
eddfbb763d Build on Jeff Roberson's linker-set based dynamic per-CPU allocator
(DPCPU), as suggested by Peter Wemm, and implement a new per-virtual
network stack memory allocator.  Modify vnet to use the allocator
instead of monolithic global container structures (vinet, ...).  This
change solves many binary compatibility problems associated with
VIMAGE, and restores ELF symbols for virtualized global variables.

Each virtualized global variable exists as a "reference copy", and also
once per virtual network stack.  Virtualized global variables are
tagged at compile-time, placing the in a special linker set, which is
loaded into a contiguous region of kernel memory.  Virtualized global
variables in the base kernel are linked as normal, but those in modules
are copied and relocated to a reserved portion of the kernel's vnet
region with the help of a the kernel linker.

Virtualized global variables exist in per-vnet memory set up when the
network stack instance is created, and are initialized statically from
the reference copy.  Run-time access occurs via an accessor macro, which
converts from the current vnet and requested symbol to a per-vnet
address.  When "options VIMAGE" is not compiled into the kernel, normal
global ELF symbols will be used instead and indirection is avoided.

This change restores static initialization for network stack global
variables, restores support for non-global symbols and types, eliminates
the need for many subsystem constructors, eliminates large per-subsystem
structures that caused many binary compatibility issues both for
monitoring applications (netstat) and kernel modules, removes the
per-function INIT_VNET_*() macros throughout the stack, eliminates the
need for vnet_symmap ksym(2) munging, and eliminates duplicate
definitions of virtualized globals under VIMAGE_GLOBALS.

Bump __FreeBSD_version and update UPDATING.

Portions submitted by:  bz
Reviewed by:            bz, zec
Discussed with:         gnn, jamie, jeff, jhb, julian, sam
Suggested by:           peter
Approved by:            re (kensmith)
2009-07-14 22:48:30 +00:00
Dmitry Chagin
f83427b833 Add forgotten in previous commit flags argument.
Approved by:	kib (mentor)
MFC after:	1 month
2009-06-01 20:54:41 +00:00
Dmitry Chagin
f8cd0af232 Implement accept4 syscall.
Approved by:	kib (mentor)
MFC after:	1 month
2009-06-01 20:48:39 +00:00
Dmitry Chagin
93e694c9df Implement a variation of the accept_common() which takes
a flags argument.

Do not preserve td_retval before kern_fcntl(F_SETFL) as it does not
changed.

Approved by:	kib (mentor)
MFC after:	1 month
2009-06-01 20:44:58 +00:00
Dmitry Chagin
c8f37d612d Split linux_accept() syscall onto linux_accept_common() which should
be used by linuxulator and linux_accept() itself.

Approved by:	kib (mentor)
MFC after:	1 month
2009-06-01 20:42:27 +00:00
Dmitry Chagin
39253cf9bb Implement a variation of the socketpair() syscall which takes a flags
in addition to the type argument.

Approved by:	kib (mentor)
MFC after:	1 month
2009-05-31 12:16:31 +00:00
Dmitry Chagin
38a18e9760 Move new socket flags handling into a separate function as Linux
introduced more syscalls which uses these flags.

Approved by:	kib (mentor)
MFC after:	1 month
2009-05-31 12:04:01 +00:00
Dmitry Chagin
20a4ff27b0 Remove empty lines.
Approved by:	kib (mentor)
MFC after:	1 month
2009-05-31 12:00:16 +00:00
Dmitry Chagin
ea7b81d2bd Validate user-supplied arguments values.
Args argument is a pointer to the structure located in user space in
which the socketcall arguments are packed. The structure must be
copied to the kernel instead of direct dereferencing.

Approved by:	kib (mentor)
MFC after:	1 week
2009-05-19 09:10:53 +00:00
Dmitry Chagin
3a72bf04c4 Implement MSG_CMSG_CLOEXEC flag for linux_recvmsg().
Approved by:	kib (mentor)
MFC after:	1 month
2009-05-18 04:07:46 +00:00
Dmitry Chagin
3933bde22e Somewhere between 2.6.23 and 2.6.27, Linux added SOCK_CLOEXEC and
SOCK_NONBLOCK flags, that allow to save fcntl() calls.

Implement a variation of the socket() syscall which takes a flags
in addition to the type argument.

Approved by:	kib (mentor)
MFC after:	1 month
2009-05-16 18:48:41 +00:00
Dmitry Chagin
eeb63e515f Return EINVAL in case when the incorrect or unsupported
type argument is specified.

Do not map type argument value as its Linux values are
identical to FreeBSD values.

Approved by:	kib (mentor)
2009-05-16 18:46:51 +00:00
Dmitry Chagin
6994ea543f Use the protocol family constants for the domain argument validation.
Return immediately when the socket() failed.

Approved by:	kib (mentor)
MFC after:	1 month
2009-05-16 18:44:56 +00:00
Dmitry Chagin
d4dd69c46c Emulate SO_PEERCRED socket option.
Temporarily use 0 for pid member as the FreeBSD does not cache remote
UNIX domain socket peer pid.

PR:		kern/102956
Reviewed by:	rwatson
Approved by:	kib (mentor)
MFC after:	1 month
2009-05-16 18:42:18 +00:00
Dmitry Chagin
03cc95d21a Translate l_timeval arg to native struct timeval in
linux_setsockopt()/linux_getsockopt() for SO_RCVTIMEO,
SO_SNDTIMEO opts as l_timeval has MD members.

Remove bogus __packed attribute from l_timeval struct on __amd64__.

PR:		kern/134276
Submitted by:	Thomas Mueller <tmueller sysgo com>
Approved by:	kib (mentor)
MFC after:	2 weeks
2009-05-11 13:50:42 +00:00
Dmitry Chagin
3980a435a2 Add forgotten linux to bsd flags argument mapping into the linux_recv().
PR:		kern/134276
Submitted by:	Thomas Mueller <tmueller sysgo com>
Approved by:	kib (mentor)
MFC after:	2 weeks
2009-05-11 13:42:40 +00:00
Dmitry Chagin
d9b063cc9d Return EAFNOSUPPORT instead of EINVAL in case when the incorrect or
unsupported domain argument is specified.

Approved by:	kib (mentor)
2009-05-07 09:34:02 +00:00
Dmitry Chagin
1a52a4abf7 Rework r191742.
Use the protocol family constants for the domain argument validation.

Return EAFNOSUPPORT in case when the incorrect domain argument
is specified.

Return EPROTONOSUPPORT instead of passing values that are not 0
to the BSD layer.

Suggested by:   rwatson

Approved by:	kib (mentor)
MFC after:	1 month
2009-05-07 03:23:22 +00:00
Dmitry Chagin
40092d93b4 Linux socketpair() call expects explicit specified protocol for
AF_LOCAL domain unlike FreeBSD which expects 0 in this case.

Approved by:	kib (mentor)
MFC after:	1 month
2009-05-02 10:51:40 +00:00
Marko Zec
093f25f8c8 In preparation for turning on options VIMAGE in next commits,
rearrange / replace / adjust several INIT_VNET_* initializer
macros, all of which currently resolve to whitespace.

Reviewed by:	bz (an older version of the patch)
Approved by:	julian (mentor)
2009-04-26 22:06:42 +00:00
Bjoern A. Zeeb
4b79449e2f Rather than using hidden includes (with cicular dependencies),
directly include only the header files needed. This reduces the
unneeded spamming of various headers into lots of files.

For now, this leaves us with very few modules including vnet.h
and thus needing to depend on opt_route.h.

Reviewed by:	brooks, gnn, des, zec, imp
Sponsored by:	The FreeBSD Foundation
2008-12-02 21:37:28 +00:00
Konstantin Belousov
74f5d68011 Make linux_sendmsg() and linux_recvmsg() work on linux32/amd64.
Change types used in the linux' struct msghdr and struct cmsghdr
definitions to the properly-sized architecture-specific types.
Move ancillary data handler from linux_sendit() to linux_sendmsg().

Submitted by:	dchagin
2008-11-29 17:14:06 +00:00
Dag-Erling Smørgrav
1ede983cc9 Retire the MALLOC and FREE macros. They are an abomination unto style(9).
MFC after:	3 months
2008-10-23 15:53:51 +00:00
Marko Zec
8b615593fc Step 1.5 of importing the network stack virtualization infrastructure
from the vimage project, as per plan established at devsummit 08/08:
http://wiki.freebsd.org/Image/Notes200808DevSummit

Introduce INIT_VNET_*() initializer macros, VNET_FOREACH() iterator
macros, and CURVNET_SET() context setting macros, all currently
resolving to NOPs.

Prepare for virtualization of selected SYSCTL objects by introducing a
family of SYSCTL_V_*() macros, currently resolving to their global
counterparts, i.e. SYSCTL_V_INT() == SYSCTL_INT().

Move selected #defines from sys/sys/vimage.h to newly introduced header
files specific to virtualized subsystems (sys/net/vnet.h,
sys/netinet/vinet.h etc.).

All the changes are verified to have zero functional impact at this
point in time by doing MD5 comparision between pre- and post-change
object files(*).

(*) netipsec/keysock.c did not validate depending on compile time options.

Implemented by:	julian, bz, brooks, zec
Reviewed by:	julian, bz, brooks, kris, rwatson, ...
Approved by:	julian (mentor)
Obtained from:	//depot/projects/vimage-commit2/...
X-MFC after:	never
Sponsored by:	NLnet Foundation, The FreeBSD Foundation
2008-10-02 15:37:58 +00:00
Konstantin Belousov
745aaef5b5 Remove superfluous copyin() of args, structures are already in kernel space.
Submitted by:	dchagin
MFC after:	1 week
2008-09-09 13:01:14 +00:00
Bjoern A. Zeeb
603724d3ab Commit step 1 of the vimage project, (network stack)
virtualization work done by Marko Zec (zec@).

This is the first in a series of commits over the course
of the next few weeks.

Mark all uses of global variables to be virtualized
with a V_ prefix.
Use macros to map them back to their global names for
now, so this is a NOP change only.

We hope to have caught at least 85-90% of what is needed
so we do not invalidate a lot of outstanding patches again.

Obtained from:	//depot/projects/vimage-commit2/...
Reviewed by:	brooks, des, ed, mav, julian,
		jamie, kris, rwatson, zec, ...
		(various people I forgot, different versions)
		md5 (with a bit of help)
Sponsored by:	NLnet Foundation, The FreeBSD Foundation
X-MFC after:	never
V_Commit_Message_Reviewed_By:	more people than the patch
2008-08-17 23:27:27 +00:00
Robert Watson
0bf686c125 Remove the now-unused NET_{LOCK,UNLOCK,ASSERT}_GIANT() macros, which
previously conditionally acquired Giant based on debug.mpsafenet.  As that
has now been removed, they are no longer required.  Removing them
significantly simplifies error-handling in the socket layer, eliminated
quite a bit of unwinding of locking in error cases.

While here clean up the now unneeded opt_net.h, which previously was used
for the NET_WITH_GIANT kernel option.  Clean up some related gotos for
consistency.

Reviewed by:	bz, csjp
Tested by:	kris
Approved by:	re (kensmith)
2007-08-06 14:26:03 +00:00
Robert Watson
d72a615878 Some Linux applications (ping) pass a non-NULL msg_control argument to
sendmsg() while using a 0-length msg_controllen.  This isn't allowed in
the FreeBSD system call ABI, so detect this case and set msg_control to
NULL.  This allows Linux ping to work.

Submitted by:	rdivacky
2007-04-14 10:35:09 +00:00
Konstantin Belousov
d0b2365eec Introduce some more SO_ option equivalents from Linux to FreeBSD.
The msg variable in linux_recvmsg() was not initialized.
Copy it from userspace.

Submitted by: rdivacky
2007-02-01 13:36:19 +00:00
Alexander Leidinger
d4b7423fa1 MFp4:
- Linux returns ENOPROTOOPT in a case of not supported opt to setsockopt.
- Return EISDIR in pread() when arg is a directory.
- Return EINVAL instead of EFAULT when namelen is not correct in accept().
- Return EINVAL instead of EACCESS if invalid access mode is entered in
  access().
- Return EINVAL instead of EADDRNOTAVAIL in a case of bad salen param
  to bind().

Submitted by:	rdivacky
Tested with:	LTP (vfork01 fails now, but it seems to be a race and
		not caused by those changes)
MFC after:	1 week
2006-09-23 19:06:54 +00:00
John Baldwin
b33887ea31 Don't free the sockaddr in kern_bind() and kern_connect() as not all
callers pass a sockaddr allocated via malloc() from M_SONAME anymore.
Instead, free it in the callers when necessary.
2006-07-19 18:28:52 +00:00
John Baldwin
c1cccebe8b Add a kern_close() so that the ABIs can close a file descriptor w/o having
to populate a close_args struct and change some of the places that do.
2006-07-08 20:03:39 +00:00
Alexander Leidinger
01e0ffbae8 Now that we don't have a linuxolator on alpha anymore:
- unifdef __alpha__
 - revert rev. 1.66 of linux_socket.c
2006-05-10 20:38:16 +00:00
Robert Watson
f7f45ac8e2 Annotate uses of fgetsock() with indications that they should rely
on their existing file descriptor references to sockets, rather than
use fgetsock() to retrieve a direct socket reference.

MFC after:	3 months
2006-04-01 15:25:01 +00:00
Alexander Leidinger
1daa386fcf Fix the LINT build on alpha:
- rename some file local structure definitions, the names clash with
  autogenerated names
- on !alpha add some compatibility defines for those renamed structures
- make some functions globally visible on alpha
2006-03-21 21:56:04 +00:00
Ruslan Ermilov
aefce619cf Unbreak COMPAT_LINUX32 option support on amd64.
Broken by:	netchild
2006-03-19 11:10:33 +00:00
Alexander Leidinger
d4a3f5ddb6 Fixup some problems in my previous commit (COMPAT_43).
Pointyhat to:	netchild
2006-03-18 20:47:36 +00:00
Alexander Leidinger
5c8919adf4 Get rid of the need of COMPAT_43 in the linuxolator.
Submitted by:	Divacky Roman <xdivac02@stud.fit.vutbr.cz>
Obtained from:	DragonFly (some parts)
2006-03-18 18:20:17 +00:00
Gleb Smirnoff
3c6160327d Add \n to log() message.
Submitted by:	Stanislaw Halik <weirdo tehran.lain.pl>
2005-12-27 00:17:11 +00:00
Robert Watson
5f419982c2 Back out alpha/alpha/trap.c:1.124, osf1_ioctl.c:1.14, osf1_misc.c:1.57,
osf1_signal.c:1.41, amd64/amd64/trap.c:1.291, linux_socket.c:1.60,
svr4_fcntl.c:1.36, svr4_ioctl.c:1.23, svr4_ipc.c:1.18, svr4_misc.c:1.81,
svr4_signal.c:1.34, svr4_stat.c:1.21, svr4_stream.c:1.55,
svr4_termios.c:1.13, svr4_ttold.c:1.15, svr4_util.h:1.10,
ext2_alloc.c:1.43, i386/i386/trap.c:1.279, vm86.c:1.58,
unaligned.c:1.12, imgact_elf.c:1.164, ffs_alloc.c:1.133:

Now that Giant is acquired in uprintf() and tprintf(), the caller no
longer leads to acquire Giant unless it also holds another mutex that
would generate a lock order reversal when calling into these functions.
Specifically not backed out is the acquisition of Giant in nfs_socket.c
and rpcclnt.c, where local mutexes are held and would otherwise violate
the lock order with Giant.

This aligns this code more with the eventual locking of ttys.

Suggested by:	bde
2005-09-28 07:03:03 +00:00
Robert Watson
84d2b7df26 Add GIANT_REQUIRED and WITNESS sleep warnings to uprintf() and tprintf(),
as they both interact with the tty code (!MPSAFE) and may sleep if the
tty buffer is full (per comment).

Modify all consumers of uprintf() and tprintf() to hold Giant around
calls into these functions.  In most cases, this means adding an
acquisition of Giant immediately around the function.  In some cases
(nfs_timer()), it means acquiring Giant higher up in the callout.

With these changes, UFS no longer panics on SMP when either blocks are
exhausted or inodes are exhausted under load due to races in the tty
code when running without Giant.

NB: Some reduction in calls to uprintf() in the svr4 code is probably
desirable.

NB: In the case of nfs_timer(), calling uprintf() while holding a mutex,
or even in a callout at all, is a bad idea, and will generate warnings
and potential upset.  This needs to be fixed, but was a problem before
this change.

NB: uprintf()/tprintf() sleeping is generally a bad ideas, as is having
non-MPSAFE tty code.

MFC after:	1 week
2005-09-19 16:51:43 +00:00
John Baldwin
4641373fde Add missing locking to linux_connect() so that it can be marked MP safe:
- Conditionally grab Giant around the EISCONN hack at the end based on
  debug.mpsafenet.
- Protect access to so_emuldata via SOCK_LOCK.

Reviewed by:	rwatson
Approved by:	re (scottl)
2005-07-09 12:26:22 +00:00
David Schultz
aa675b572f Reject packets larger than IP_MAXPACKET in linux_sendto() for sockets
with the IP_HDRINCL option set.  Without this change, a Linux process
with access to a raw socket could cause a kernel panic.  Raw sockets
must be created by root, and are generally not consigned to untrusted
applications; hence, the security implications of this bug are
minimal.  I believe this only affects 6-CURRENT on or after 2005-01-30.

Found by:	Coverity Prevent analysis tool
Security:	Local DOS
2005-03-23 08:28:00 +00:00
Maxim Sobolev
8d6e40c3f1 Add kernel-only flag MSG_NOSIGNAL to be used in emulation layers to surpress
SIGPIPE signal for the duration of the sento-family syscalls. Use it to
replace previously added hack in Linux layer based on temporarily setting
SO_NOSIGPIPE flag.

Suggested by:	alfred
2005-03-08 16:11:41 +00:00
Maxim Sobolev
2302f0fea8 Handle MSG_NOSIGNAL flag in linux_send() by setting SO_NOSIGPIPE on socket
for the duration of the send() call. Such approach may be less than ideal
in threading environment, when several threads share the same socket and it
might happen that several of them are calling linux_send() at the same time
with and without SO_NOSIGPIPE set.

However, such race condition is very unlikely in practice, therefore this
change provides practical improvement compared to the previous behaviour.

PR:		kern/76426
Submitted by:	Steven Hartland <killing@multiplay.co.uk>
MFC after:	3 days
2005-03-07 07:26:42 +00:00
Maxim Sobolev
a6886ef173 Extend kern_sendit() to take another enum uio_seg argument, which specifies
where the buffer to send lies and use it to eliminate yet another stackgap
in linuxlator.

MFC after:	2 weeks
2005-01-30 07:20:36 +00:00
David E. O'Brien
1997c537be Match the LINUX32's style with existing style
Submitted by:	Jung-uk Kim <jkim@niksun.com>

Use positive, not negative logic.
2005-01-14 04:44:56 +00:00
John Baldwin
2ca25ab53e Fix the ABI wrappers to use kern_fcntl() rather than calling fcntl()
directly.  This removes a few more users of the stackgap and also marks
the syscalls using these wrappers MP safe where appropriate.

Tested on:	i386 with linux acroread5
Compiled on:	i386, alpha LINT
2004-08-24 20:21:21 +00:00
Dag-Erling Smørgrav
72261b9f61 Don't try to translate the control message unless we're certain it's
valid; otherwise a caller could trick us into changing any 32-bit word
in kernel memory to LINUX_SOL_SOCKET (0x00000001) if its previous value
is SOL_SOCKET (0x0000ffff).

MFC after:	3 days
2004-08-23 12:41:29 +00:00
Tim J. Robbins
4af2762336 Changes to MI Linux emulation code necessary to run 32-bit Linux binaries
on AMD64, and the general case where the emulated platform has different
size pointers than we use natively:
- declare certain structure members as l_uintptr_t and use the new PTRIN
  and PTROUT macros to convert to and from native pointers.
- declare some structures __packed on amd64 when the layout would differ
  from that used on i386.
- include <machine/../linux32/linux.h> instead of <machine/../linux/linux.h>
  if compiling with COMPAT_LINUX32. This will need to be revisited before
  32-bit and 64-bit Linux emulation support can coexist in the same kernel.
- other small scattered changes.

This should be a no-op on i386 and Alpha.
2004-08-16 07:28:16 +00:00
David Malone
fb75797e40 I missed two pieces of the commit to this file. Robert has already
added one, this adds the other.
2004-07-18 09:26:34 +00:00
Robert Watson
38da2381cd Remove 'sg' argument to linux_sendto_hdrincl, which is what I think was
intended.  This fixes the build, but might require revision.
2004-07-18 04:09:40 +00:00
David Malone
e140eb430c Add a kern_setsockopt and kern_getsockopt which can read the option
values from either user land or from the kernel. Use them for
[gs]etsockopt and to clean up some calls to [gs]etsockopt in the
Linux emulation code that uses the stackgap.
2004-07-17 21:06:36 +00:00
Poul-Henning Kamp
552afd9c12 Clean up and wash struct iovec and struct uio handling.
Add copyiniov() which copies a struct iovec array in from userland into
a malloc'ed struct iovec.  Caller frees.

Change uiofromiov() to malloc the uio (caller frees) and name it
copyinuio() which is more appropriate.

Add cloneuio() which returns a malloc'ed copy.  Caller frees.

Use them throughout.
2004-07-10 15:42:16 +00:00
Poul-Henning Kamp
87d72a8f27 Use a couple of regular kernel entry points, rather than COMPAT_43
entry points.
2004-07-08 10:18:07 +00:00
Bruce Evans
3db2a84395 Quick fix for LINT breakage caused by interface changes in accept(2), etc.
The log message for rev.1.160 of kern/uipc_syscalls.c and associated
changes only claimed to add restrict qualifiers (which have no effect in
the kernel so they probably shouldn't be added), but the following
interface changes were also made:
- caddr_t to `void *' and `struct sockaddr_t *'
- `int *' to `socklen_t *'.
These interface changes are not quite null, and this fix is quick (like
the changes in uipc_syscalls 1.160) because it uses bogus casts instead
of complete bounds-checked conversions.

Things should be fixed better when the conversions can be done without
using the stack gap.  linux_check_hdrincl() already uses the stack gap
and is fixed completely though the type mismatches in it were not fatal
(there were only fatal type mismatches from unopaquing pointers to
[o]sockaddr't's -- the difference between accept()'s args and oaccept()'s
args is now non-opaque, but this is not reflected in their args structs).
2003-12-25 09:59:02 +00:00
David Malone
5a8a13e0fc Use kern_sendit rather than sendit for the Linux send* syscalls.
This means we can avoid using the stack gap for most send* syscalls
now (it is still used in the IP_HDRINCL case).
2003-11-09 17:04:04 +00:00
Mitsuru IWASAKI
84b11cd80a Fix some problems in linux_sendmsg() and linux_recvmsg().
- Allocate storage for uap->msg always because it is copyin()'ed in
   native sendmsg().
 - Convert sockopt level from Linux to FreeBSD after native recvmsg() calling.
 - Some cleanups.

Tested with:	Oracle 9i shared server connection mode.

MFC after:	1 week
2003-10-11 15:08:32 +00:00
David E. O'Brien
16dbc7f228 Use __FBSDID(). 2003-06-10 21:29:12 +00:00