Commit graph

1116 commits

Author SHA1 Message Date
Edward Tomasz Napierala
ae6b6ef6cb Replace sys_ftruncate() with kern_ftruncate() in various compats.
Reviewed by:	kib@
MFC after:	2 weeks
Sponsored by:	DARPA, AFRL
Differential Revision:	https://reviews.freebsd.org/D9368
2017-01-30 11:50:54 +00:00
Gleb Smirnoff
9565357905 Use getsock_cap() instead of fgetsock().
Reviewed by:	dchagin
2017-01-06 04:38:38 +00:00
Konstantin Belousov
2f304845e2 Do not allocate struct statfs on kernel stack.
Right now size of the structure is 472 bytes on amd64, which is
already large and stack allocations are indesirable.  With the ino64
work, MNAMELEN is increased to 1024, which will make it impossible to have
struct statfs on the stack.

Extracted from:	ino64 work by gleb
Discussed with:	mckusick
Tested by:	pho
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
2017-01-05 17:19:26 +00:00
Mariusz Zaborski
85b0f9de11 capsicum: propagate rights on accept(2)
Descriptor returned by accept(2) should inherits capabilities rights from
the listening socket.

PR:		201052
Reviewed by:	emaste, jonathan
Discussed with:	many
Differential Revision:	https://reviews.freebsd.org/D7724
2016-09-22 09:58:46 +00:00
Ed Maste
df4336ddfa Catch up to sys/capability.h rename to sys/capsicum.h in r263232
MFC after:	1 month
Sponsored by:	The FreeBSD Foundation
2016-09-19 18:44:43 +00:00
Dmitry Chagin
ede2869c4c Implement BLKSSZGET ioctl for the Linuxulator.
PR:		212700
Submitted by:	Erik Cederstrand
Reported by:	Erik Cederstrand
MFC after:	1 week
2016-09-17 08:10:01 +00:00
Ed Schouten
93d9ebd82e Eliminate use of sys_fsync() and sys_fdatasync().
Make the kern_fsync() function public, so that it can be used by other
parts of the kernel. Fix up existing consumers to make use of it.

Requested by:	kib
2016-08-15 20:11:52 +00:00
Dmitry Chagin
97d06da692 Fix a copy/paste bug introduced during X86_64 Linuxulator work.
FreeBSD support NX bit on X86_64 processors out of the box, for i386 emulation
use READ_IMPLIES_EXEC flag, introduced in r302515.

While here move common part of mmap() and mprotect() code to the files in compat/linux
to reduce code dupcliation between Linuxulator's.

Reported by:    Johannes Jost Meixner, Shawn Webb

MFC after:	1 week
XMFC with:	r302515, r302516
2016-07-10 08:22:04 +00:00
Dmitry Chagin
23e8912c60 Implement Linux personality() system call mainly due to READ_IMPLIES_EXEC flag.
In Linux if this flag is set, PROT_READ implies PROT_EXEC for mmap().
Linux/i386 set this flag automatically if the binary requires executable stack.

READ_IMPLIES_EXEC flag will be used in the next Linux mmap() commit.
2016-07-10 08:15:50 +00:00
Dmitry Chagin
3a49978f45 Fix a bug introduced in r283433.
[1] Remove unneeded sockaddr conversion before kern_recvit() call as the from
argument is used to record result (the source address of the received message) only.

[2] In Linux the type of msg_namelen member of struct msghdr is signed but native
msg_namelen has a unsigned type (socklen_t). So use the proper storage to fetch fromlen
from userspace and than check the user supplied value and return EINVAL if it is less
than 0 as a Linux do.

Reported by:	Thomas Mueller <tmueller at sysgo dot com> [1]
Reviewed by:	kib@
Approved by:	re (gjb, kib)
MFC after:	3 days
2016-06-26 16:59:59 +00:00
Konstantin Belousov
5c2cf81845 Update comments for the MD functions managing contexts for new
threads, to make it less confusing and using modern kernel terms.

Rename the functions to reflect current use of the functions, instead
of the historic KSE conventions:
  cpu_set_fork_handler -> cpu_fork_kthread_handler (for kthreads)
  cpu_set_upcall -> cpu_copy_thread (for forks)
  cpu_set_upcall_kse -> cpu_set_upcall (for new threads creation)

Reviewed by:	jhb (previous version)
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
Approved by:	re (hrs)
Differential revision:	https://reviews.freebsd.org/D6731
2016-06-16 12:05:44 +00:00
Gleb Smirnoff
34e05ebe72 Fix kernel stack disclosures in the Linux and 4.3BSD compat layers.
Submitted by:	CTurt
Security:	SA-16:20
Security:	SA-16:21
2016-05-31 16:56:30 +00:00
Dmitry Chagin
ab610366b5 Don't leak fp in case where fo_ioctl() returns an error.
Reported by:	C Turt <ecturt@gmail.com>
MFC after:	1 week
2016-05-24 05:29:41 +00:00
Dmitry Chagin
df964aa45d Convert proto family in both directions. The linux and native values for
local and inet are identical, but for inet6 values differ.

PR:		155040
Reported by:	Simon Walton
MFC after:	2 week
2016-05-22 19:08:29 +00:00
Dmitry Chagin
d56e689e7d Add a missing errno translation for SO_ERROR optname.
PR:		135458
Reported by:	Stefan Schmidt @ stadtbuch.de
MFC after:	1 week
2016-05-22 12:49:08 +00:00
Dmitry Chagin
f8d72f5312 For future use move futex timeout code to the separate function and
switch to the high resolution sbintime_t.

MFC after:	1 week
2016-05-22 12:37:40 +00:00
Dmitry Chagin
a03566dd95 Due to lack the priority propagation feature replace sx by mutex. WIth this
commit NPTL tests are ends in 1 minute faster.

MFC after:	1 week
2016-05-22 12:35:50 +00:00
Dmitry Chagin
ea53658ed7 Add my copyright as I rewrote most of the futex code. Minor style(9) cleanup
while here.

MFC after:	1 week
2016-05-22 12:28:55 +00:00
Dmitry Chagin
56c4f83d2e Minor style(9) cleanup, no functional changes.
MFC after:	1 week
2016-05-22 12:26:03 +00:00
Konstantin Belousov
2a339d9e3d Add implementation of robust mutexes, hopefully close enough to the
intention of the POSIX IEEE Std 1003.1TM-2008/Cor 1-2013.

A robust mutex is guaranteed to be cleared by the system upon either
thread or process owner termination while the mutex is held.  The next
mutex locker is then notified about inconsistent mutex state and can
execute (or abandon) corrective actions.

The patch mostly consists of small changes here and there, adding
neccessary checks for the inconsistent and abandoned conditions into
existing paths.  Additionally, the thread exit handler was extended to
iterate over the userspace-maintained list of owned robust mutexes,
unlocking and marking as terminated each of them.

The list of owned robust mutexes cannot be maintained atomically
synchronous with the mutex lock state (it is possible in kernel, but
is too expensive).  Instead, for the duration of lock or unlock
operation, the current mutex is remembered in a special slot that is
also checked by the kernel at thread termination.

Kernel must be aware about the per-thread location of the heads of
robust mutex lists and the current active mutex slot.  When a thread
touches a robust mutex for the first time, a new umtx op syscall is
issued which informs about location of lists heads.

The umtx sleep queues for PP and PI mutexes are split between
non-robust and robust.

Somewhat unrelated changes in the patch:
1. Style.
2. The fix for proper tdfind() call use in umtxq_sleep_pi() for shared
   pi mutexes.
3. Removal of the userspace struct pthread_mutex m_owner field.
4. The sysctl kern.ipc.umtx_vnode_persistent is added, which controls
   the lifetime of the shared mutex associated with a vnode' page.

Reviewed by:	jilles (previous version, supposedly the objection was fixed)
Discussed with:	brooks, Martin Simmons <martin@lispworks.com> (some aspects)
Tested by:	pho
Sponsored by:	The FreeBSD Foundation
2016-05-17 09:56:22 +00:00
Pedro F. Giffuni
1ce4275dd2 sys/compat/linux*: spelling fixes.
Mostly on comments but there are some user-visible messages as well.

MFC after: 2 weeks
2016-04-30 00:53:10 +00:00
Conrad Meyer
aa90aec270 osd(9): Change array pointer to array pointer type from void*
This is a minor follow-up to r297422, prompted by a Coverity warning.  (It's
not a real defect, just a code smell.)  OSD slot array reservations are an
array of pointers (void **) but were cast to void* and back unnecessarily.
Keep the correct type from reservation to use.

osd.9 is updated to match, along with a few trivial igor fixes.

Reported by:	Coverity
CID:		1353811
Sponsored by:	EMC / Isilon Storage Division
2016-04-26 19:57:35 +00:00
Jamie Gritton
d56cf22d22 linux_map_osrel doesn't need to be checked in linux_prison_set,
since it already was in linux_prison_check.
2016-04-25 06:08:45 +00:00
Pedro F. Giffuni
b66bb393f2 Cleanup redundant parenthesis from existing howmany()/roundup() macro uses. 2016-04-22 16:57:42 +00:00
Pedro F. Giffuni
02abd40029 kernel: use our nitems() macro when it is available through param.h.
No functional change, only trivial cases are done in this sweep,

Discussed in:	freebsd-current
2016-04-19 23:48:27 +00:00
Pedro F. Giffuni
500ed14d6e compat/linux: for pointers replace 0 with NULL.
plvc is a pointer, no functional change.

Found with devel/coccinelle.
2016-04-15 16:21:13 +00:00
Dmitry Chagin
5743aa47f5 More complete implementation of /proc/self/limits.
Fix the way the code accesses process limits struct - pointed out by mjg@.

PR:		207386
Reviewed by:	no objection form des@
MFC after:	3 weeks
2016-04-10 07:11:29 +00:00
Pedro F. Giffuni
ae26eab161 Fix indentation oops. 2016-04-03 14:40:54 +00:00
Dmitry Chagin
8bc21bafba Move Linux specific times tests up to guarantee the values are defined.
CID:		1305178
Submitted by:	pfg@
MFC after:	1 week
2016-04-03 06:33:16 +00:00
Jamie Gritton
7ab25e3d18 Use osd_reserve / osd_jail_set_reserved, which is known to succeed.
Also don't work around nonexistent osd_register failure.
2016-03-30 17:05:04 +00:00
Dmitry Chagin
7c5982000d Revert r297310 as the SOL_XXX are equal to the IPPROTO_XX except SOL_SOCKET.
Pointed out by:	ae@
2016-03-27 10:09:10 +00:00
Dmitry Chagin
c826fcfe22 iConvert Linux SOL_IPV6 level.
MFC after:	1 week
2016-03-27 08:12:01 +00:00
Dmitry Chagin
e667ee63f6 Whitespaces and style(9) fix. No functional changes.
MFC after:	1 week
2016-03-27 08:10:20 +00:00
Dmitry Chagin
09806d8e3e When write(2) on eventfd object fails with the error EAGAIN do not return
the number of bytes written.

MFC after:	1 week
2016-03-26 19:16:53 +00:00
Dmitry Chagin
2bb14e7541 Implement O_NONBLOCK flag via fcntl(F_SETFL) for eventfd object.
MFC after:	1 week
2016-03-26 19:15:23 +00:00
Dmitry Chagin
2ad0231309 Check bsd_to_linux_statfs() return value. Forgotten in r297070.
MFC after:	1 week
2016-03-20 19:06:21 +00:00
Dmitry Chagin
525c9796c3 Return EOVERFLOW in case when actual statfs values are large enough and
not fit into 32 bit fileds of a Linux struct statfs.

PR:		181012
MFC after:	1 week
2016-03-20 18:31:30 +00:00
Dmitry Chagin
7958a34cb5 Whitespaces, style(9) fixes. No functional changes.
MFC after:	1 week
2016-03-20 14:06:27 +00:00
Dmitry Chagin
99546279d6 Implement fstatfs64 system call.
PR:		181012
Submitted by:	John Wehle
MFC after:	1 week
2016-03-20 13:21:20 +00:00
Dmitry Chagin
4525bb829f Rework r296543:
1. Limit secs to INT32_MAX / 2 to avoid errors from kern_setitimer().
   Assert that kern_setitimer() returns 0.
   Remove bogus cast of secs.
   Fix style(9) issues.

2. Increment the return value if the remaining tv_usec value more than 500000 as a Linux does.

Pointed out by: [1] Bruce Evans

MFC after:	1 week
2016-03-20 11:40:52 +00:00
Andrey V. Elsukov
86a9058b01 Add support for IPPROTO_IPV6 socket layer for getsockopt/setsockopt calls.
Also add mapping for several options from RFC 3493 and 3542.

Reviewed by:	dchagin
Tested by:	Joe Love <joe at getsomwhere dot net>
MFC after:	2 weeks
2016-03-09 09:12:40 +00:00
Dmitry Chagin
a87488d1e4 Better english.
Submitted by:	Kevin P. Neal
MFC after:	1 week
2016-03-08 19:40:01 +00:00
Dmitry Chagin
649ca5e9dc Put a commit message from r296502 about Linux alarm() system call
behaviour to the source.

Suggested by:	emaste@

MFC after:	1 week
2016-03-08 19:20:57 +00:00
Dmitry Chagin
91f514e413 Does not leak fp. While here remove bogus cast of fp->f_data.
MFC after:	1 week
2016-03-08 15:55:43 +00:00
Dmitry Chagin
fc4b98fb88 Linux accept() system call return EOPNOTSUPP errno instead of EINVAL
for UDP sockets.

MFC after:	1 week
2016-03-08 15:15:34 +00:00
Dmitry Chagin
15c3b371e2 According to POSIX and Linux implementation the alarm() system call
is always successfull.
So, ignore any errors and return 0 as a Linux do.

XXX. Unlike POSIX, Linux in case when the invalid seconds value specified
always return 0, so in that case Linux does not return proper remining time.

MFC after:	1 week
2016-03-08 15:12:49 +00:00
Dmitry Chagin
9f4e66afb9 Link the newly created process to the corresponding parent as
if CLONE_PARENT is set, then the parent of the new process will
be the same as that of the calling process.

MFC after:	1 week
2016-03-08 15:08:22 +00:00
Svatopluk Kraus
35a0bc1260 As <machine/vmparam.h> is included from <vm/vm_param.h>, there is no
need to include it explicitly when <vm/vm_param.h> is already included.

Suggested by:	alc
Reviewed by:	alc
Differential Revision:	https://reviews.freebsd.org/D5379
2016-02-22 09:08:04 +00:00
Mateusz Guzik
33fd9b9a2b fork: pass arguments to fork1 in a dedicated structure
Suggested by:	kib
2016-02-04 04:22:18 +00:00
Dmitry Chagin
67968b35a0 Prevent double free of control in common sendmsg path as sosend
already freeing it.
2016-01-17 19:28:13 +00:00
Gleb Smirnoff
c8358c6e0d Call crextend() before copying old credentials to the new credentials
and replace crcopysafe by crcopy as crcopysafe is is not intended to be
safe in a threaded environment, it drops PROC_LOCK() in while() that
can lead to unexpected results, such as overwrite kernel memory.

In my POV crcopysafe() needs special attention. For now I do not see
any problems with this function, but who knows.

Submitted by:	dchagin
Found by:	trinity
Security:	SA-16:04.linux
2016-01-14 10:16:25 +00:00
Gleb Smirnoff
037f750877 Change linux get_robust_list system call to match actual linux one.
The set_robust_list system call request the kernel to record the head
of the list of robust futexes owned by the calling thread. The head
argument is the list head to record.
The get_robust_list system call should return the head of the robust
list of the thread whose thread id is specified in pid argument.
The list head should be stored in the location pointed to by head
argument.

In contrast, our implemenattion of get_robust_list system call copies
the known portion of memory pointed by recorded in set_robust_list
system call pointer to the head of the robust list to the location
pointed by head argument.

So, it is possible for a local attacker to read portions of kernel
memory, which may result in a privilege escalation.

Submitted by:	mjg
Security:	SA-16:03.linux
2016-01-14 10:13:58 +00:00
Dmitry Chagin
6437b8e7d9 Unlock process lock when return error from getrobustlist call and add
an forgotten dtrace probe when return the same error.

MFC after:	3 days
XMFC with:	r292743
2016-01-10 07:36:43 +00:00
Dmitry Chagin
bfb5568a3c Return EINVAL in case of incorrect sigev_signo value specified instead of panicing. 2015-12-26 09:09:49 +00:00
Dmitry Chagin
6e5549717a Do not allow access to emuldata for non Linux processes.
Pointed out by:	mjg@
Security:	https://admbugs.freebsd.org/show_bug.cgi?id=679
2015-12-26 09:04:47 +00:00
Mark Johnston
3616095801 Fix style issues around existing SDT probes.
- Use SDT_PROBE<N>() instead of SDT_PROBE(). This has no functional effect
  at the moment, but will be needed for some future changes.
- Don't hardcode the module component of the probe identifier. This is
  set automatically by the SDT framework.

MFC after:	1 week
2015-12-16 23:39:27 +00:00
Konstantin Belousov
cd0a26c53f Fix build for the KTR-enabled kernels.
Sponsored by:	The FreeBSD Foundation
2015-10-23 11:41:55 +00:00
Bryan Drewery
a730673058 Remove redundant RFFPWAIT/vfork(2) handling in Linux fork(2) and clone(2) wrappers.
r161611 added some of the code from sys_vfork() directly into the Linux
module wrappers since they use RFSTOPPED.  In r232240, the RFFPWAIT handling
was moved to syscallret(), thus this code in the Linux module is no longer
needed as it will be called later.

This also allows the Linux wrappers to benefit from the fix in r275616 for
threads not getting suspended if their vforked child is stopped while they
wait on them.

Reviewed by:	jhb, kib
MFC after:	2 weeks
Differential Revision:	https://reviews.freebsd.org/D3828
2015-10-07 19:10:38 +00:00
Andriy Gapon
2f2f522b5d save some bytes by using more concise SDT_PROBE<n> instead of SDT_PROBE
SDT_PROBE requires 5 parameters whereas SDT_PROBE<n> requires n parameters
where n is typically smaller than 5.

Perhaps SDT_PROBE should be made a private implementation detail.

MFC after:	20 days
2015-09-28 12:14:16 +00:00
Edward Tomasz Napierala
089d32934a Fixes a panic triggered by threaded Linux applications when running
with RACCT/RCTL enabled.

Reviewed by:	ngie@, ed@
Tested by:	Larry Rosenman <ler@lerctr.org>
MFC after:	1 month
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D3470
2015-09-02 14:04:13 +00:00
Ed Schouten
a2034cc98a Allow the creation of kqueues with a restricted set of Capsicum rights.
On CloudABI we want to create file descriptors with just the minimal set
of Capsicum rights in place. The reason for this is that it makes it
easier to obtain uniform behaviour across different operating systems.

By explicitly whitelisting the operations, we can return consistent
error codes, but also prevent applications from depending OS-specific
behaviour.

Extend kern_kqueue() to take an additional struct filecaps that is
passed on to falloc_caps(). Update the existing consumers to pass in
NULL.

Differential Revision:	https://reviews.freebsd.org/D3259
2015-08-05 07:36:50 +00:00
Ed Schouten
367a13f905 Limit rights on process descriptors.
On CloudABI, the rights bits returned by cap_rights_get() match up with
the operations that you can actually perform on the file descriptor.

Limiting the rights is good, because it makes it easier to get uniform
behaviour across different operating systems. If process descriptors on
FreeBSD would suddenly gain support for any new file operation, this
wouldn't become exposed to CloudABI processes without first extending
the rights.

Extend fork1() to gain a 'struct filecaps' argument that allows you to
construct process descriptors with custom rights. Use this in
cloudabi_sys_proc_fork() to limit the rights to just fstat() and
pdwait().

Obtained from:	https://github.com/NuxiNL/freebsd
2015-07-31 10:21:58 +00:00
Ed Schouten
8328babdd0 Make pipes in CloudABI work.
Summary:
Pipes in CloudABI are unidirectional. The reason for this is that
CloudABI attempts to provide a uniform runtime environment across
different flavours of UNIX.

Instead of implementing a custom pipe that is unidirectional, we can
simply reuse Capsicum permission bits to support this. This is nice,
because CloudABI already attempts to restrict permission bits to
correspond with the operations that apply to a certain file descriptor.

Replace kern_pipe() and kern_pipe2() by a single kern_pipe() that takes
a pair of filecaps. These filecaps are passed to the newly introduced
falloc_caps() function that creates the descriptors with rights in
place.

Test Plan:
CloudABI pipes seem to be created with proper rights in place:

https://github.com/NuxiNL/cloudlibc/blob/master/src/libc/unistd/pipe_test.c#L44

Reviewers: jilles, mjg

Reviewed By: mjg

Subscribers: imp

Differential Revision: https://reviews.freebsd.org/D3236
2015-07-29 17:18:27 +00:00
Konstantin Belousov
b4490c6e93 The si_status field of the siginfo_t, provided by the waitid(2) and
SIGCHLD signal, should keep full 32 bits of the status passed to the
_exit(2).

Split the combined p_xstat of the struct proc into the separate exit
status p_xexit for normal process exit, and signalled termination
information p_xsig.  Kernel-visible macro KW_EXITCODE() reconstructs
old p_xstat from p_xexit and p_xsig.  p_xexit contains complete status
and copied out into si_status.

Requested by:	Joerg Schilling
Reviewed by:	jilles (previous version), pho
Tested by:	pho
Sponsored by:	The FreeBSD Foundation
2015-07-18 09:02:50 +00:00
Mateusz Guzik
f131759f54 fd: make 'rights' a manadatory argument to fget* functions 2015-07-05 19:05:16 +00:00
Dmitry Chagin
3c91646b46 Add EPOLLRDHUP support.
Tested by:	abi at abinet dot ru
2015-06-20 05:40:35 +00:00
Mateusz Guzik
4da8456f0a Replace struct filedesc argument in getvnode with struct thread
This is is a step towards removal of spurious arguments.
2015-06-16 13:09:18 +00:00
Mateusz Guzik
6871c7c3f1 linux: make sure to grab all cow structs when creating a thread
This is a fixup for r284214.

Reported and tested by: Ivan Klymenko <fidaj ukr.net>
2015-06-10 15:34:43 +00:00
Mateusz Guzik
f6f6d24062 Implement lockless resource limits.
Use the same scheme implemented to manage credentials.

Code needing to look at process's credentials (as opposed to thred's) is
provided with *_proc variants of relevant functions.

Places which possibly had to take the proc lock anyway still use the proc
pointer to access limits.
2015-06-10 10:48:12 +00:00
Jung-uk Kim
1a01bdf906 Properly initialize flags for accept4(2) not to return spurious EINVAL.
Note this fixes a Linuxulator regression introduced in r283490.

PR:		200662
2015-06-08 20:03:15 +00:00
Dmitry Chagin
32ba368ba9 Finish r283544. In exec case properly detach threads from user space
before suicide.
2015-06-06 06:12:14 +00:00
Dmitry Chagin
d707582f83 When I merged the lemul branch I missied kib@'s r282708 commit.
This is not the final fix as I need properly cleanup thread resources
before other threads suicide.

Tested by:	Ruslan Makhmatkhanov
2015-05-25 20:44:46 +00:00
Dmitry Chagin
5c2748d5e7 Linux nanosleep() and clock_nanosleep() system calls always
writes the remaining time into the structure pointed to by rmtp
unless rmtp is NULL. The value of *rmtp can then be used to call
nanosleep() again and complete the specified pause if the previous
call was interrupted.

Note. clock_nanosleep() with an absolute time value does not write
the remaining time.

While here fix whitespaces and typo in SDT_PROBE.
2015-05-24 18:14:38 +00:00
Dmitry Chagin
bbf392d5ef Convert SCM_TIMESTAMP in recvmsg(). 2015-05-24 18:13:21 +00:00
Dmitry Chagin
5989b75bdb The latest cp tool is trying to use the btrfs clone operation that is
implemented via ioctl interface. First of all return ENOTSUP for this
operation as a cp fallback to usual method in that case. Secondly, do
not print out the message about unimplemented operation.
2015-05-24 18:12:04 +00:00
Dmitry Chagin
4f65e9cff4 Fix an mbuf(9) leak in sendmsg() under failure condition and
remove unneeded check for failed M_WAITOK allocation.

Found by: Brainy Code Scanner
Reported by: Maxime Villard
2015-05-24 18:10:07 +00:00
Dmitry Chagin
9802eb9ebc Implement Linux specific syncfs() system call. 2015-05-24 18:08:01 +00:00
Dmitry Chagin
d9cbe8f0ef Properly check tv_nsec value. The tv_nsec field can also be one
of the special value UTIME_NOW or UTIME_OMIT.
2015-05-24 18:06:46 +00:00
Dmitry Chagin
4cf10e2934 Since FreeBSD supports SOCK_CLOEXEC & SOCK_NONBLOCK options
remove its emulation via fcntl call from Linuxulator.
2015-05-24 18:06:12 +00:00
Dmitry Chagin
e1ff74c0f7 Implement recvmmsg() and sendmmsg() system calls. 2015-05-24 18:04:04 +00:00
Dmitry Chagin
b7aaa9fdb0 Reduce duplication between MD Linux code by moving msg related
struct definitions out into the compat/linux/linux_socket.h
2015-05-24 18:03:14 +00:00
Dmitry Chagin
6e4c8004dc Implement epoll_pwait() system call. 2015-05-24 18:00:14 +00:00
Dmitry Chagin
b7c4ebdb56 Convert signal number to native for VT_SETMODE ioctl and remove
strange and invalid ISSIGVALID macro.
The code has not been tested right way but it was originally broken.
2015-05-24 17:59:17 +00:00
Dmitry Chagin
19d8b461f4 Add utimensat() system call.
The patch developed by Jilles Tjoelker and Andrew Wilcox and
adopted for lemul branch by me.
2015-05-24 17:57:07 +00:00
Dmitry Chagin
5885e5ab29 Convert Linux signal number to the FreeBSD. 2015-05-24 17:49:09 +00:00
Dmitry Chagin
4ab7403bbd Rework signal code to allow using it by other modules, like linprocfs:
1. Linux sigset always 64 bit on all platforms. In order to move Linux
sigset code to the linux_common module define it as 64 bit int. Move
Linux sigset manipulation routines to the MI path.

2. Move Linux signal number definitions to the MI path. In general, they
are the same on all platforms except for a few signals.

3. Map Linux RT signals to the FreeBSD RT signals and hide signal conversion
tables to avoid conversion errors.

4. Emulate Linux SIGPWR signal via FreeBSD SIGRTMIN signal which is outside
of allowed on Linux signal numbers.

PR:		197216
2015-05-24 17:47:20 +00:00
Dmitry Chagin
a7ac457613 According to Linux man sigaltstack(3) shall return EINVAL if the ss
argument is not a null pointer, and the ss_flags member pointed to by ss
contains flags other than SS_DISABLE. However, in fact, Linux also
allows SS_ONSTACK flag which is simply ignored.

For buggy apps (at least mono) ignore other than SS_DISABLE
flags as a Linux do.

While here move MI part of sigaltstack code to the appropriate place.

Reported by:	abi at abinet dot ru
2015-05-24 17:44:08 +00:00
Dmitry Chagin
76672e1113 Add EPOLLERR flag handling to epoll.
Tested by:	abi at abinet dot ru
2015-05-24 17:42:45 +00:00
Dmitry Chagin
e2ff4b9864 As fo_fill_kinfo() does not check fo_fill_kinfo to NULL
add a fo_fill_kinfo op to eventfdops.

Reported by:	trinity
2015-05-24 17:40:14 +00:00
Dmitry Chagin
b6aeb7d5dd Add preliminary fallocate system call implementation
to emulate posix_fallocate() function.

Differential Revision:	https://reviews.freebsd.org/D1523
Reviewed by:	emaste
2015-05-24 17:33:21 +00:00
Dmitry Chagin
16ac71bc4f Delete the duplicate of linux_to_native_clockid() function.
Differential Revision:	https://reviews.freebsd.org/D1521
Reviewed by:	trasz
2015-05-24 17:30:31 +00:00
Dmitry Chagin
680982281b Do not use struct l_timespec without conversion. While here move
args->timeout handling before acquiring the futex key at FUTEX_WAIT path.

Differential Revision:	https://reviews.freebsd.org/D1520
Reviewed by:	trasz
2015-05-24 17:29:18 +00:00
Dmitry Chagin
7e947ccc81 Add prototypes for static futex functions.
Differential Revision:	https://reviews.freebsd.org/D1519
Reviewed by:	trasz
2015-05-24 17:27:59 +00:00
Dmitry Chagin
2166e4e0a5 As for now our tmpfs is no longer being considered
"highly experimental" remove /dev/shm magic commited
in r218497 and convert tmpfs type to an expected magic number.

Differential Revision:	https://reviews.freebsd.org/D1497
Reviewed by:	emaste, trasz
2015-05-24 17:26:58 +00:00
Dmitry Chagin
5dd1d097f8 Print out unsupported futex operation message only once for the process.
Differential Revision:	https://reviews.freebsd.org/D1498
2015-05-24 17:25:57 +00:00
Dmitry Chagin
2711aba97e Add some clock mappings used in glibc 2.20.
Differential Revision:	https://reviews.freebsd.org/D1465
Reviewd by:	trasz
2015-05-24 17:23:08 +00:00
Dmitry Chagin
7d96520b25 Improve ktr(9) records in thread managment code.
Differential Revision:	https://reviews.freebsd.org/D1464
Reviewed by:	trasz
2015-05-24 17:09:07 +00:00
Dmitry Chagin
68cf0367e9 Use local struct proc * varable instead of dereferencing td->td_proc.
Differential Revision:	https://reviews.freebsd.org/D1463
Reviewed by:	emaste
2015-05-24 17:08:25 +00:00
Dmitry Chagin
97cfa5c899 Avoid unnecessary em zeroing in non-exec path
as it already zeroed by malloc with M_ZERO flag
and move zeroing to the proper place in exec path.

Differential Revision:	https://reviews.freebsd.org/D1462
Reviewed by:	trasz
2015-05-24 17:07:10 +00:00
Dmitry Chagin
e0327ddba0 Remove the unnecessary cast.
Differential Revision:	https://reviews.freebsd.org/D1461
Reviewed by:	emaste
2015-05-24 17:05:59 +00:00
Dmitry Chagin
a6b40812ec Implement ppoll() system call.
Differential Revision:	https://reviews.freebsd.org/D1105
Reviewed by:	trasz
2015-05-24 16:59:25 +00:00
Dmitry Chagin
3d7b4b3720 td_sigmask of a newly created thread copied from td.
Remove excess initialization of td_sigmask.

Differential Revision:	https://reviews.freebsd.org/D1128
Reviewed by:	emaste
2015-05-24 16:56:32 +00:00
Dmitry Chagin
2c4f134b25 Update Linux compat revision to 32.
Differential Revision:	https://reviews.freebsd.org/D1122
Reviewed by:	emaste
2015-05-24 16:55:32 +00:00
Dmitry Chagin
520e9c187d Fix linux_common module build with KTR option.
Differential Revision:	https://reviews.freebsd.org/D1096
Reviewed by:	trasz
2015-05-24 16:52:45 +00:00
Dmitry Chagin
a31d76867d Implement eventfd system call.
Differential Revision:	https://reviews.freebsd.org/D1094
In collaboration with:	Jilles Tjoelker
2015-05-24 16:49:14 +00:00
Dmitry Chagin
3e89b64168 Put the correct value for the abi_nfdbits parameter of kern_select() for
all supported Linuxulators.

Differential Revision:	https://reviews.freebsd.org/D1093
Reviewed by:	trasz
2015-05-24 16:47:13 +00:00
Dmitry Chagin
e16fe1c730 Implement epoll family system calls. This is a tiny wrapper
around kqueue() to implement epoll subset of functionality.
The kqueue user data are 32bit on i386 which is not enough for
epoll user data, so we keep user data in the proc emuldata.

Initial patch developed by rdivacky@ in 2007, then extended
by Yuri Victorovich @ r255672 and finished by me
in collaboration with mjg@ and jillies@.

Differential Revision:	https://reviews.freebsd.org/D1092
2015-05-24 16:41:39 +00:00
Dmitry Chagin
d2b6dbc06f Implement F_DUPFD_CLOEXEC fcntl flag.
Differential Revision:	https://reviews.freebsd.org/D1089
Reviewed by:	trasz
2015-05-24 16:34:57 +00:00
Dmitry Chagin
bfa4d74baf Add several fcntl flags.
Differential Revision:	https://reviews.freebsd.org/D1088
Reviewed by:	trasz
2015-05-24 16:32:52 +00:00
Dmitry Chagin
4d0f380d87 To avoid code duplication move open/fcntl definitions to the MI
header file.

Differential Revision:	https://reviews.freebsd.org/D1087
Reviewed by:	trasz
2015-05-24 16:31:44 +00:00
Dmitry Chagin
26c68e1fe5 Use the BSD_TO_LINUX_SIGNAL() wherever there is no need
to check the ABI as it is known.

Differential Revision:	https://reviews.freebsd.org/D1086
2015-05-24 16:30:23 +00:00
Dmitry Chagin
2245df381a Convert Linux wait options to the FreeBSD.
Check wait options as a Linux do.
Linux always set WEXITED option not a WUNTRACED|WNOHANG
which is a strange bug.

Differential Revision:	https://reviews.freebsd.org/D1085
Reviewed by:	trasz
2015-05-24 16:28:58 +00:00
Dmitry Chagin
7a7a6efc25 Set WIFCONTINUED to the wait status if needed.
Differential Revision:	https://reviews.freebsd.org/D1083
Reviewed by:	trasz
2015-05-24 16:27:38 +00:00
Dmitry Chagin
9599b0ec3a Rewrite linux_recvfrom. To avoid double conversion of sockaddr use
kern_recvit() directly.
And check fromlen parameter before sockaddr copyin and conversion.

Differential Revision:	https://reviews.freebsd.org/D1082
2015-05-24 16:26:55 +00:00
Dmitry Chagin
4048f59cd0 Add AT_RANDOM and AT_EXECFN auxiliary vector entries which are used by
glibc. At list since glibc version 2.16 using AT_RANDOM is mandatory.

Differential Revision:	https://reviews.freebsd.org/D1080
2015-05-24 16:24:24 +00:00
Dmitry Chagin
baa232bbfd Change linux faccessat syscall definition to match actual linux one.
The AT_EACCESS and AT_SYMLINK_NOFOLLOW flags are actually implemented
within the glibc wrapper function for faccessat().  If either of these
flags are specified, then the wrapper function employs fstatat() to
determine access permissions.

Differential Revision:	https://reviews.freebsd.org/D1078
Reviewed by:	trasz
2015-05-24 16:18:03 +00:00
Dmitry Chagin
e0d3ea8c65 Where possible we will use M_LINUX malloc(9) type.
Move M_FUTEX defines to the linux_common.ko.

Differential Revision:	https://reviews.freebsd.org/D1077
Reviewed by:	emaste
2015-05-24 16:14:41 +00:00
Dmitry Chagin
0edc82b564 Move FEATURE macros for v4l and v4l2 to the common module.
Differential Revision:	https://reviews.freebsd.org/D1075
Reviewed by:	emaste
2015-05-24 16:00:01 +00:00
Dmitry Chagin
bc27367760 Refund the proc emuldata struct for future use. For now move flags from
thread emuldata to proc emuldata as it was originally intended.

As we can have both 64 & 32 bit Linuxulator running any eventhandler
can be called twice for us. To prevent this move eventhandlers code
from linux_emul.c to the linux_common.ko module.

Differential Revision:	https://reviews.freebsd.org/D1073
2015-05-24 15:54:58 +00:00
Dmitry Chagin
67d3974849 Introduce a new module linux_common.ko which is intended for the
following primary purposes:

1. Remove the dependency of linsysfs and linprocfs modules from linux.ko,
which will be architecture specific on amd64.

2. Incorporate into linux_common.ko general code for platforms on which
we'll support two Linuxulator modules (for both instruction set - 32 & 64 bit).

3. Move malloc(9) declaration to linux_common.ko, to enable getting memory
usage statistics properly.

Currently linux_common.ko incorporates a code from linux_mib.c and linux_util.c
and linprocfs, linsysfs and linux kernel modules depend on linux_common.ko.

Temporarily remove dtrace garbage from linux_mib.c and linux_util.c

Differential Revision:	https://reviews.freebsd.org/D1072
In collaboration with:	Vassilis Laganakos.

Reviewed by:	trasz
2015-05-24 15:51:18 +00:00
Dmitry Chagin
606bcc1741 Add newfstatat system call for 64-bit Linuxulator.
Differential Revision:	https://reviews.freebsd.org/D1071
Reviewed by:	trasz
2015-05-24 15:48:34 +00:00
Dmitry Chagin
4ca75bed31 Fix compilation with -DDEBUG option.
Differential Revision:	https://reviews.freebsd.org/D1070
Reviewed by:	trasz
2015-05-24 15:47:15 +00:00
Dmitry Chagin
36204c3016 Add 64 bit support to the vdso.
Differential Revision:	https://reviews.freebsd.org/D1069
Reviewed by:	trasz
2015-05-24 15:45:36 +00:00
Dmitry Chagin
31eb438886 x86_64 Linux do not use multiplexing on ipc system calls.
Move struct ipc_perm definition to the MD path as it differs for 64 and
32 bit platform.

Differential Revision:	https://reviews.freebsd.org/D1068
Reviewed by:	trasz
2015-05-24 15:44:41 +00:00
Dmitry Chagin
7f8f1d7f7a Disable i386 call for x86-64 Linux.
Differential Revision:	https://reviews.freebsd.org/D1067
Reviewed by:	trasz
2015-05-24 15:43:53 +00:00
Dmitry Chagin
a12b9b3d96 64-bit paltforms, like x86_64, do not use multiplexing on
socketcall system calls.

Differential Revision:	https://reviews.freebsd.org/D1065
Reviewed by:	trasz
2015-05-24 15:41:27 +00:00
Dmitry Chagin
297f61cc01 Get ready to commit x86_64 Linux emulation.
All fields of type l_int in struct statfs are defined
as l_long on i386 and amd64.

Differential Revision:	https://reviews.freebsd.org/D1064
Reviewed by:	trasz
2015-05-24 15:39:08 +00:00
Dmitry Chagin
0020bdf13a Put linux_platform into the vdso to avoid copying it onto the stack at
every exec.

Differential Revision:	https://reviews.freebsd.org/D1062
Reviewed by:	trasz
2015-05-24 15:30:52 +00:00
Dmitry Chagin
bdc379344a Implement vdso - virtual dynamic shared object. Through vdso Linux
exposes functions from kernel with proper DWARF CFI information so that
it becomes easier to unwind through them.
Using vdso is a mandatory for a thread cancelation && cleanup
on a modern glibc.

Differential Revision:	https://reviews.freebsd.org/D1060
2015-05-24 15:28:17 +00:00
Dmitry Chagin
ae50b4d7b5 Implement pselect6() system call.
Differential Revision:	https://reviews.freebsd.org/D1051
Reviewed by:	trasz
2015-05-24 15:21:25 +00:00
Dmitry Chagin
c3978c7bb1 Implement prlimit64() system call.
Differential Revision:	https://reviews.freebsd.org/D1050
Reviewed by:	emaste, trasz
2015-05-24 15:18:19 +00:00
Dmitry Chagin
254a937ee5 Implement dup3() system call.
Differential Revision:	https://reviews.freebsd.org/D1049
Reviewed by:	emaste
2015-05-24 15:14:51 +00:00
Dmitry Chagin
44e93b234f Sched_rr_get_interval returns EINVAL in case when the invalid pid
specified. This silence the ltp tests.

Differential Revision:	https://reviews.freebsd.org/D1048
Reviewed by:	trasz
2015-05-24 15:13:56 +00:00
Dmitry Chagin
7ac9766db4 Implement rt_sigqueueinfo() system call.
Differential Revision:	https://reviews.freebsd.org/D1047
Reviewed by:	trasz
2015-05-24 15:11:32 +00:00
Dmitry Chagin
e5fe4ccf59 Implement waitid() system call.
Differential Revision:	https://reviews.freebsd.org/D1046
2015-05-24 15:06:39 +00:00
Dmitry Chagin
001398c4c5 To reduce code duplication introduce linux_copyout_rusage() method.
Use it in linux_wait4() system call and move linux_wait4() to the MI path.
While here add a prototype for the static bsd_to_linux_rusage().

Differential Revision:	https://reviews.freebsd.org/D2138
Reviewed by:	trasz
2015-05-24 15:03:09 +00:00
Dmitry Chagin
a7ae3c557f Add a function for converting wait options.
Differential Revision:	https://reviews.freebsd.org/D1045
Reviewed by:	trasz
2015-05-24 15:00:27 +00:00
Dmitry Chagin
fe4ed1e768 Add a siginfo_t conversion function.
Differential Revision:	https://reviews.freebsd.org/D1044
Reviewed by:	emaste, trasz
2015-05-24 14:58:30 +00:00
Dmitry Chagin
86bda7a02d Remove a now unused define.
Differential Revision:	https://reviews.freebsd.org/D1043
Reviewed by:	trasz
2015-05-24 14:57:39 +00:00
Dmitry Chagin
a6326909bb Introduce LINUX_VERSION_STR, LINUX_VERSION_CODE macro for use instead
of harcoded pr_osrelease, pr_osrel values. This will be used later in
the VDSO.

Differential Revision:	https://reviews.freebsd.org/D1042
Reviewed by:	trasz
2015-05-24 14:56:21 +00:00
Dmitry Chagin
5e609834bd pthread_join() caller do futex_wait on child_clear_tid. As a results
of multiple simultaneous calls to pthread_join() specifying the same
target thread are undefined wake up the one thread.

Differential Revision:	https://reviews.freebsd.org/D1040
2015-05-24 14:54:12 +00:00
Dmitry Chagin
81338031c4 Switch linuxulator to use the native 1:1 threads.
The reasons:
1. Get rid of the stubs/quirks with process dethreading,
   process reparent when the process group leader exits and close
   to this problems on wait(), waitpid(), etc.
2. Reuse our kernel code instead of writing excessive thread
   managment routines in Linuxulator.

Implementation details:

1. The thread is created via kern_thr_new() in the clone() call with
   the CLONE_THREAD parameter. Thus, everything else is a process.
2. The test that the process has a threads is done via P_HADTHREADS
   bit p_flag of struct proc.
3. Per thread emulator state data structure is now located in the
   struct thread and freed in the thread_dtor() hook.
   Mandatory holdig of the p_mtx required when referencing emuldata
   from the other threads.
4. PID mangling has changed. Now Linux pid is the native tid
   and Linux tgid is the native pid, with the exception of the first
   thread in the process where tid and pid are one and the same.

Ugliness:

   In case when the Linux thread is the initial thread in the thread
   group thread id is equal to the process id. Glibc depends on this
   magic (assert in pthread_getattr_np.c). So for system calls that
   take thread id as a parameter we should use the special method
   to reference struct thread.

Differential Revision:	https://reviews.freebsd.org/D1039
2015-05-24 14:53:16 +00:00
Dmitry Chagin
2003907d45 Implement a Linux version of sched_getparam() && sched_setparam().
Temporarily use the first thread in proc.

Differential Revision:	https://reviews.freebsd.org/D1036
Reviewed by:	trasz
2015-05-24 14:45:57 +00:00
Dmitry Chagin
1aa90eca33 In preparation for switching linuxulator to the use the native 1:1
threads refactor kern_sched_rr_get_interval() and sys_sched_rr_get_interval().
Add a kern_sched_rr_get_interval() counterpart which takes a targettd
parameter to allow specify target thread directly by callee (new Linuxulator).

Linuxulator temporarily uses first thread in proc.

Move linux_sched_rr_get_interval() to the MI part.

Differential Revision:	https://reviews.freebsd.org/D1032
Reviewed by:	trasz
2015-05-24 14:39:26 +00:00
Dmitry Chagin
161acbb670 In preparation for switching linuxulator to the use the native 1:1
threads introduce linux_exit() stub instead of sys_exit() call
(which terminates process).
In the new linuxulator exit() system call terminates the calling
thread (not a whole process).

Differential Revision:	https://reviews.freebsd.org/D1027
Reviewed by:	trasz
2015-05-24 14:33:19 +00:00
Edward Tomasz Napierala
310e931198 Simplify linux_getcwd(), removing code that was longer used.
Differential Revision:	https://reviews.freebsd.org/D2326
Reviewed by:	dchagin@, kib@
MFC after:	1 month
Sponsored by:	The FreeBSD Foundation
2015-04-23 08:41:50 +00:00
Edward Tomasz Napierala
6289b482ec Modify kern___getcwd() to take max pathlen limit as an additional
argument.  This will be used for the Linux emulation layer - for Linux,
PATH_MAX is 4096 and not 1024.

Differential Revision:	https://reviews.freebsd.org/D2335
Reviewed by:	kib@
MFC after:	1 month
Sponsored by:	The FreeBSD Foundation
2015-04-21 13:55:24 +00:00
Edward Tomasz Napierala
565716e60e Add back fdrop() missed in r281726.
MFC after:	1 month
Sponsored by:	The FreeBSD Foundation
2015-04-19 07:35:18 +00:00
Edward Tomasz Napierala
92f7441328 Optimize the O_NOCTTY handling hack in linux_common_open().
Differential Revision:	https://reviews.freebsd.org/D2323
Reviewed by:	kib@
MFC after:	1 month
Sponsored by:	The FreeBSD Foundation
2015-04-19 07:12:16 +00:00
Edward Tomasz Napierala
94d014f079 Remove unused code from linux_mount(), and make it possible to mount
any kind of filesystem instead of harcoded three.

MFC after:	1 month
Sponsored by:	The FreeBSD Foundation
2015-04-18 09:49:09 +00:00