Commit graph

1010 commits

Author SHA1 Message Date
Dmitry Chagin
e16fe1c730 Implement epoll family system calls. This is a tiny wrapper
around kqueue() to implement epoll subset of functionality.
The kqueue user data are 32bit on i386 which is not enough for
epoll user data, so we keep user data in the proc emuldata.

Initial patch developed by rdivacky@ in 2007, then extended
by Yuri Victorovich @ r255672 and finished by me
in collaboration with mjg@ and jillies@.

Differential Revision:	https://reviews.freebsd.org/D1092
2015-05-24 16:41:39 +00:00
Dmitry Chagin
d2b6dbc06f Implement F_DUPFD_CLOEXEC fcntl flag.
Differential Revision:	https://reviews.freebsd.org/D1089
Reviewed by:	trasz
2015-05-24 16:34:57 +00:00
Dmitry Chagin
bfa4d74baf Add several fcntl flags.
Differential Revision:	https://reviews.freebsd.org/D1088
Reviewed by:	trasz
2015-05-24 16:32:52 +00:00
Dmitry Chagin
4d0f380d87 To avoid code duplication move open/fcntl definitions to the MI
header file.

Differential Revision:	https://reviews.freebsd.org/D1087
Reviewed by:	trasz
2015-05-24 16:31:44 +00:00
Dmitry Chagin
26c68e1fe5 Use the BSD_TO_LINUX_SIGNAL() wherever there is no need
to check the ABI as it is known.

Differential Revision:	https://reviews.freebsd.org/D1086
2015-05-24 16:30:23 +00:00
Dmitry Chagin
2245df381a Convert Linux wait options to the FreeBSD.
Check wait options as a Linux do.
Linux always set WEXITED option not a WUNTRACED|WNOHANG
which is a strange bug.

Differential Revision:	https://reviews.freebsd.org/D1085
Reviewed by:	trasz
2015-05-24 16:28:58 +00:00
Dmitry Chagin
7a7a6efc25 Set WIFCONTINUED to the wait status if needed.
Differential Revision:	https://reviews.freebsd.org/D1083
Reviewed by:	trasz
2015-05-24 16:27:38 +00:00
Dmitry Chagin
9599b0ec3a Rewrite linux_recvfrom. To avoid double conversion of sockaddr use
kern_recvit() directly.
And check fromlen parameter before sockaddr copyin and conversion.

Differential Revision:	https://reviews.freebsd.org/D1082
2015-05-24 16:26:55 +00:00
Dmitry Chagin
4048f59cd0 Add AT_RANDOM and AT_EXECFN auxiliary vector entries which are used by
glibc. At list since glibc version 2.16 using AT_RANDOM is mandatory.

Differential Revision:	https://reviews.freebsd.org/D1080
2015-05-24 16:24:24 +00:00
Dmitry Chagin
baa232bbfd Change linux faccessat syscall definition to match actual linux one.
The AT_EACCESS and AT_SYMLINK_NOFOLLOW flags are actually implemented
within the glibc wrapper function for faccessat().  If either of these
flags are specified, then the wrapper function employs fstatat() to
determine access permissions.

Differential Revision:	https://reviews.freebsd.org/D1078
Reviewed by:	trasz
2015-05-24 16:18:03 +00:00
Dmitry Chagin
e0d3ea8c65 Where possible we will use M_LINUX malloc(9) type.
Move M_FUTEX defines to the linux_common.ko.

Differential Revision:	https://reviews.freebsd.org/D1077
Reviewed by:	emaste
2015-05-24 16:14:41 +00:00
Dmitry Chagin
0edc82b564 Move FEATURE macros for v4l and v4l2 to the common module.
Differential Revision:	https://reviews.freebsd.org/D1075
Reviewed by:	emaste
2015-05-24 16:00:01 +00:00
Dmitry Chagin
bc27367760 Refund the proc emuldata struct for future use. For now move flags from
thread emuldata to proc emuldata as it was originally intended.

As we can have both 64 & 32 bit Linuxulator running any eventhandler
can be called twice for us. To prevent this move eventhandlers code
from linux_emul.c to the linux_common.ko module.

Differential Revision:	https://reviews.freebsd.org/D1073
2015-05-24 15:54:58 +00:00
Dmitry Chagin
67d3974849 Introduce a new module linux_common.ko which is intended for the
following primary purposes:

1. Remove the dependency of linsysfs and linprocfs modules from linux.ko,
which will be architecture specific on amd64.

2. Incorporate into linux_common.ko general code for platforms on which
we'll support two Linuxulator modules (for both instruction set - 32 & 64 bit).

3. Move malloc(9) declaration to linux_common.ko, to enable getting memory
usage statistics properly.

Currently linux_common.ko incorporates a code from linux_mib.c and linux_util.c
and linprocfs, linsysfs and linux kernel modules depend on linux_common.ko.

Temporarily remove dtrace garbage from linux_mib.c and linux_util.c

Differential Revision:	https://reviews.freebsd.org/D1072
In collaboration with:	Vassilis Laganakos.

Reviewed by:	trasz
2015-05-24 15:51:18 +00:00
Dmitry Chagin
606bcc1741 Add newfstatat system call for 64-bit Linuxulator.
Differential Revision:	https://reviews.freebsd.org/D1071
Reviewed by:	trasz
2015-05-24 15:48:34 +00:00
Dmitry Chagin
4ca75bed31 Fix compilation with -DDEBUG option.
Differential Revision:	https://reviews.freebsd.org/D1070
Reviewed by:	trasz
2015-05-24 15:47:15 +00:00
Dmitry Chagin
36204c3016 Add 64 bit support to the vdso.
Differential Revision:	https://reviews.freebsd.org/D1069
Reviewed by:	trasz
2015-05-24 15:45:36 +00:00
Dmitry Chagin
31eb438886 x86_64 Linux do not use multiplexing on ipc system calls.
Move struct ipc_perm definition to the MD path as it differs for 64 and
32 bit platform.

Differential Revision:	https://reviews.freebsd.org/D1068
Reviewed by:	trasz
2015-05-24 15:44:41 +00:00
Dmitry Chagin
7f8f1d7f7a Disable i386 call for x86-64 Linux.
Differential Revision:	https://reviews.freebsd.org/D1067
Reviewed by:	trasz
2015-05-24 15:43:53 +00:00
Dmitry Chagin
a12b9b3d96 64-bit paltforms, like x86_64, do not use multiplexing on
socketcall system calls.

Differential Revision:	https://reviews.freebsd.org/D1065
Reviewed by:	trasz
2015-05-24 15:41:27 +00:00
Dmitry Chagin
297f61cc01 Get ready to commit x86_64 Linux emulation.
All fields of type l_int in struct statfs are defined
as l_long on i386 and amd64.

Differential Revision:	https://reviews.freebsd.org/D1064
Reviewed by:	trasz
2015-05-24 15:39:08 +00:00
Dmitry Chagin
0020bdf13a Put linux_platform into the vdso to avoid copying it onto the stack at
every exec.

Differential Revision:	https://reviews.freebsd.org/D1062
Reviewed by:	trasz
2015-05-24 15:30:52 +00:00
Dmitry Chagin
bdc379344a Implement vdso - virtual dynamic shared object. Through vdso Linux
exposes functions from kernel with proper DWARF CFI information so that
it becomes easier to unwind through them.
Using vdso is a mandatory for a thread cancelation && cleanup
on a modern glibc.

Differential Revision:	https://reviews.freebsd.org/D1060
2015-05-24 15:28:17 +00:00
Dmitry Chagin
ae50b4d7b5 Implement pselect6() system call.
Differential Revision:	https://reviews.freebsd.org/D1051
Reviewed by:	trasz
2015-05-24 15:21:25 +00:00
Dmitry Chagin
c3978c7bb1 Implement prlimit64() system call.
Differential Revision:	https://reviews.freebsd.org/D1050
Reviewed by:	emaste, trasz
2015-05-24 15:18:19 +00:00
Dmitry Chagin
254a937ee5 Implement dup3() system call.
Differential Revision:	https://reviews.freebsd.org/D1049
Reviewed by:	emaste
2015-05-24 15:14:51 +00:00
Dmitry Chagin
44e93b234f Sched_rr_get_interval returns EINVAL in case when the invalid pid
specified. This silence the ltp tests.

Differential Revision:	https://reviews.freebsd.org/D1048
Reviewed by:	trasz
2015-05-24 15:13:56 +00:00
Dmitry Chagin
7ac9766db4 Implement rt_sigqueueinfo() system call.
Differential Revision:	https://reviews.freebsd.org/D1047
Reviewed by:	trasz
2015-05-24 15:11:32 +00:00
Dmitry Chagin
e5fe4ccf59 Implement waitid() system call.
Differential Revision:	https://reviews.freebsd.org/D1046
2015-05-24 15:06:39 +00:00
Dmitry Chagin
001398c4c5 To reduce code duplication introduce linux_copyout_rusage() method.
Use it in linux_wait4() system call and move linux_wait4() to the MI path.
While here add a prototype for the static bsd_to_linux_rusage().

Differential Revision:	https://reviews.freebsd.org/D2138
Reviewed by:	trasz
2015-05-24 15:03:09 +00:00
Dmitry Chagin
a7ae3c557f Add a function for converting wait options.
Differential Revision:	https://reviews.freebsd.org/D1045
Reviewed by:	trasz
2015-05-24 15:00:27 +00:00
Dmitry Chagin
fe4ed1e768 Add a siginfo_t conversion function.
Differential Revision:	https://reviews.freebsd.org/D1044
Reviewed by:	emaste, trasz
2015-05-24 14:58:30 +00:00
Dmitry Chagin
86bda7a02d Remove a now unused define.
Differential Revision:	https://reviews.freebsd.org/D1043
Reviewed by:	trasz
2015-05-24 14:57:39 +00:00
Dmitry Chagin
a6326909bb Introduce LINUX_VERSION_STR, LINUX_VERSION_CODE macro for use instead
of harcoded pr_osrelease, pr_osrel values. This will be used later in
the VDSO.

Differential Revision:	https://reviews.freebsd.org/D1042
Reviewed by:	trasz
2015-05-24 14:56:21 +00:00
Dmitry Chagin
5e609834bd pthread_join() caller do futex_wait on child_clear_tid. As a results
of multiple simultaneous calls to pthread_join() specifying the same
target thread are undefined wake up the one thread.

Differential Revision:	https://reviews.freebsd.org/D1040
2015-05-24 14:54:12 +00:00
Dmitry Chagin
81338031c4 Switch linuxulator to use the native 1:1 threads.
The reasons:
1. Get rid of the stubs/quirks with process dethreading,
   process reparent when the process group leader exits and close
   to this problems on wait(), waitpid(), etc.
2. Reuse our kernel code instead of writing excessive thread
   managment routines in Linuxulator.

Implementation details:

1. The thread is created via kern_thr_new() in the clone() call with
   the CLONE_THREAD parameter. Thus, everything else is a process.
2. The test that the process has a threads is done via P_HADTHREADS
   bit p_flag of struct proc.
3. Per thread emulator state data structure is now located in the
   struct thread and freed in the thread_dtor() hook.
   Mandatory holdig of the p_mtx required when referencing emuldata
   from the other threads.
4. PID mangling has changed. Now Linux pid is the native tid
   and Linux tgid is the native pid, with the exception of the first
   thread in the process where tid and pid are one and the same.

Ugliness:

   In case when the Linux thread is the initial thread in the thread
   group thread id is equal to the process id. Glibc depends on this
   magic (assert in pthread_getattr_np.c). So for system calls that
   take thread id as a parameter we should use the special method
   to reference struct thread.

Differential Revision:	https://reviews.freebsd.org/D1039
2015-05-24 14:53:16 +00:00
Dmitry Chagin
2003907d45 Implement a Linux version of sched_getparam() && sched_setparam().
Temporarily use the first thread in proc.

Differential Revision:	https://reviews.freebsd.org/D1036
Reviewed by:	trasz
2015-05-24 14:45:57 +00:00
Dmitry Chagin
1aa90eca33 In preparation for switching linuxulator to the use the native 1:1
threads refactor kern_sched_rr_get_interval() and sys_sched_rr_get_interval().
Add a kern_sched_rr_get_interval() counterpart which takes a targettd
parameter to allow specify target thread directly by callee (new Linuxulator).

Linuxulator temporarily uses first thread in proc.

Move linux_sched_rr_get_interval() to the MI part.

Differential Revision:	https://reviews.freebsd.org/D1032
Reviewed by:	trasz
2015-05-24 14:39:26 +00:00
Dmitry Chagin
161acbb670 In preparation for switching linuxulator to the use the native 1:1
threads introduce linux_exit() stub instead of sys_exit() call
(which terminates process).
In the new linuxulator exit() system call terminates the calling
thread (not a whole process).

Differential Revision:	https://reviews.freebsd.org/D1027
Reviewed by:	trasz
2015-05-24 14:33:19 +00:00
Edward Tomasz Napierala
310e931198 Simplify linux_getcwd(), removing code that was longer used.
Differential Revision:	https://reviews.freebsd.org/D2326
Reviewed by:	dchagin@, kib@
MFC after:	1 month
Sponsored by:	The FreeBSD Foundation
2015-04-23 08:41:50 +00:00
Edward Tomasz Napierala
6289b482ec Modify kern___getcwd() to take max pathlen limit as an additional
argument.  This will be used for the Linux emulation layer - for Linux,
PATH_MAX is 4096 and not 1024.

Differential Revision:	https://reviews.freebsd.org/D2335
Reviewed by:	kib@
MFC after:	1 month
Sponsored by:	The FreeBSD Foundation
2015-04-21 13:55:24 +00:00
Edward Tomasz Napierala
565716e60e Add back fdrop() missed in r281726.
MFC after:	1 month
Sponsored by:	The FreeBSD Foundation
2015-04-19 07:35:18 +00:00
Edward Tomasz Napierala
92f7441328 Optimize the O_NOCTTY handling hack in linux_common_open().
Differential Revision:	https://reviews.freebsd.org/D2323
Reviewed by:	kib@
MFC after:	1 month
Sponsored by:	The FreeBSD Foundation
2015-04-19 07:12:16 +00:00
Edward Tomasz Napierala
94d014f079 Remove unused code from linux_mount(), and make it possible to mount
any kind of filesystem instead of harcoded three.

MFC after:	1 month
Sponsored by:	The FreeBSD Foundation
2015-04-18 09:49:09 +00:00
Mateusz Guzik
daf63fd2f9 cred: add proc_set_cred helper
The goal here is to provide one place altering process credentials.

This eases debugging and opens up posibilities to do additional work when such
an action is performed.
2015-03-16 00:10:03 +00:00
Dmitry Chagin
9f7a06f27e Indeed, instead of hiding the kern___getcwd() bug by bogus cast
in r276564, change path type to char * (pathnames are always char *).
And remove bogus casts of malloc().
kern___getcwd() internally doesn't actually use or support u_char *
paths, except to copy them to a normal char * path.

These changes are not visible to libc as libc/gen/getcwd.c misdeclares
__getcwd() as taking a plain char * path.

While here remove _SYS_SYSPROTO_H_ for __getcwd() syscall as
we always have sysproto.h.

Pointed out by:	bde

MFC after:	1 week
2015-01-04 10:34:02 +00:00
Dmitry Chagin
9fa04b52ec Cast *path to silence clang -Wpointer-sign warning.
MFC after:	1 week
2015-01-02 19:29:32 +00:00
Dmitry Chagin
de90b09a79 Remove Giant from linux_getcwd() due to VFS is MPSAFE now.
Discussed with:	kib
MFC after:	1 week
2015-01-02 18:36:08 +00:00
Dmitry Chagin
857ad5a31b Fix Clang -Wpointer-sign warnings.
MFC after:	1 week
2015-01-01 20:53:38 +00:00
Dmitry Chagin
5072ad67ae Fix Clang warning: passing 'unsigned int *' to parameter of type 'int *' converts between pointers to integer types with different sign.
MFC after:	1 week
2015-01-01 19:57:24 +00:00
Konstantin Belousov
5c7bebf961 The process spin lock currently has the following distinct uses:
- Threads lifetime cycle, in particular, counting of the threads in
  the process, and interlocking with process mutex and thread lock.
  The main reason of this is that turnstile locks are after thread
  locks, so you e.g. cannot unlock blockable mutex (think process
  mutex) while owning thread lock.

- Virtual and profiling itimers, since the timers activation is done
  from the clock interrupt context.  Replace the p_slock by p_itimmtx
  and PROC_ITIMLOCK().

- Profiling code (profil(2)), for similar reason.  Replace the p_slock
  by p_profmtx and PROC_PROFLOCK().

- Resource usage accounting.  Need for the spinlock there is subtle,
  my understanding is that spinlock blocks context switching for the
  current thread, which prevents td_runtime and similar fields from
  changing (updates are done at the mi_switch()).  Replace the p_slock
  by p_statmtx and PROC_STATLOCK().

The split is done mostly for code clarity, and should not affect
scalability.

Tested by:	pho
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
2014-11-26 14:10:00 +00:00
Konstantin Belousov
6e646651d3 Remove the no-at variants of the kern_xx() syscall helpers. E.g., we
have both kern_open() and kern_openat(); change the callers to use
kern_openat().

This removes one (sometimes two) levels of indirection and
consolidates arguments checks.

Reviewed by:	mckusick
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
2014-11-13 18:01:51 +00:00
Alexander Motin
6a9bcacfcf Remake Linux' SOUND_MIXER_INFO IOCTL as a wrapper around new FreeBSD's one.
Submitted by:	Dmitry Luhtionov <dmitryluhtionov@gmail.com>
MFC after:	3 days
2014-09-24 08:18:11 +00:00
Sean Bruno
d143d69857 Bump minimum linux compat version to support Centos6 ports updates for linux.
Update linux compat minimum revision to match linux-c6 now in ports.  This
is a candidate for 10.1 R as it matches the current state of supported
linux compat packages in the ports tree.

PR:		187786
Reviewed by:	xmj
MFC after:	2 days
Relnotes:	yes
2014-09-22 17:26:07 +00:00
Bjoern A. Zeeb
0a041f3b47 Implement most of timer_{create,settime,gettime,getoverrun,delete}
for amd64/linux32.  Fix the entirely bogus (untested) version from
r161310 for i386/linux using the same shared code in compat/linux.

It is unclear to me if we could support more clock mappings but
the current set allows me to successfully run commercial
32bit linux software under linuxolator on amd64.

Reviewed by:		jhb
Differential Revision:	D784
MFC after:		3 days
Sponsored by:		DARPA, AFRL
2014-09-18 08:36:45 +00:00
Alexander Motin
94fe9f959c - Add support for SG_GET_SG_TABLESIZE IOCTL to report that we don't support
scatter/gather lists.
- Return error for still unsupported SG 3.x API read/write calls.

MFC after:	1 month
2014-06-04 12:05:47 +00:00
Alexander Motin
fcaf473cfc Overhaul CAM SG driver IOCTL interfaces.
Make it really work for native FreeBSD programs.  Before this it was broken
for years due to different number of pointer dereferences in Linux and
FreeBSD IOCTL paths, permanently returning errors to FreeBSD programs.
This change breaks the driver FreeBSD IOCTL ABI, making it more strict,
but since it was not working any way -- who bother.

Add shims for 32-bit programs on 64-bit host, translating the argument
of the SG_IO IOCTL for both FreeBSD and Linux ABIs.

With this change I was able to run 32-bit Linux sg3_utils tools and simple
32 and 64-bit FreeBSD test tools on both 32 and 64-bit FreeBSD systems.

MFC after:	1 month
2014-06-02 19:53:53 +00:00
Dmitry Chagin
fb6bf8bba9 Glibc was switched to the FUTEX_WAIT_BITSET op and CLOCK_REALTIME
flag has been added instead of FUTEX_WAIT to replace the FUTEX_WAIT
logic which needs to do gettimeofday() calls before the futex syscall
to convert the absolute timeout to a relative timeout.
Before this the CLOCK_MONOTONIC used by the FUTEX_WAIT_BITSET op.

When the FUTEX_CLOCK_REALTIME is specified the timeout is an absolute
time, not a relative time. Rework futex_wait to handle this.
On the side fix the futex leak in error case and remove useless
parentheses.

Properly calculate the timeout for the CLOCK_MONOTONIC case.

MFC after:	3 days
2014-05-31 14:58:53 +00:00
Dmitry Chagin
32fd44657c In r218101 I have not changed properly the futex syscall definition.
Some Linux futex ops atomically verifies that the futex address uaddr
(uval) contains the value val. Comparing signed uval and unsigned val
may lead to an unexpected result, mostly to a deadlock.

So copyin uaddr to an unsigned int to compare the parameters correctly.

While here change ktr records to print parameters in more readable format.

Tested by	eadler@

MFC after:	3 days
2014-05-28 05:57:35 +00:00
Bryan Drewery
44f1c91610 Rename global cnt to vm_cnt to avoid shadowing.
To reduce the diff struct pcu.cnt field was not renamed, so
PCPU_OP(cnt.field) is still used. pc_cnt and pcpu are also used in
kvm(3) and vmstat(8). The goal was to not affect externally used KPI.

Bump __FreeBSD_version_ in case some out-of-tree module/code relies on the
the global cnt variable.

Exp-run revealed no ports using it directly.

No objection from:	arch@
Sponsored by:	EMC / Isilon Storage Division
2014-03-22 10:26:09 +00:00
Robert Watson
4a14441044 Update kernel inclusions of capability.h to use capsicum.h instead; some
further refinement is required as some device drivers intended to be
portable over FreeBSD versions rely on __FreeBSD_version to decide whether
to include capability.h.

MFC after:	3 weeks
2014-03-16 10:55:57 +00:00
Andriy Gapon
d9fae5ab88 dtrace sdt: remove the ugly sname parameter of SDT_PROBE_DEFINE
In its stead use the Solaris / illumos approach of emulating '-' (dash)
in probe names with '__' (two consecutive underscores).

Reviewed by:	markj
MFC after:	3 weeks
2013-11-26 08:46:27 +00:00
Attilio Rao
54366c0bd7 - For kernel compiled only with KDTRACE_HOOKS and not any lock debugging
option, unbreak the lock tracing release semantic by embedding
  calls to LOCKSTAT_PROFILE_RELEASE_LOCK() direclty in the inlined
  version of the releasing functions for mutex, rwlock and sxlock.
  Failing to do so skips the lockstat_probe_func invokation for
  unlocking.
- As part of the LOCKSTAT support is inlined in mutex operation, for
  kernel compiled without lock debugging options, potentially every
  consumer must be compiled including opt_kdtrace.h.
  Fix this by moving KDTRACE_HOOKS into opt_global.h and remove the
  dependency by opt_kdtrace.h for all files, as now only KDTRACE_FRAMES
  is linked there and it is only used as a compile-time stub [0].

[0] immediately shows some new bug as DTRACE-derived support for debug
in sfxge is broken and it was never really tested.  As it was not
including correctly opt_kdtrace.h before it was never enabled so it
was kept broken for a while.  Fix this by using a protection stub,
leaving sfxge driver authors the responsibility for fixing it
appropriately [1].

Sponsored by:	EMC / Isilon storage division
Discussed with:	rstone
[0] Reported by:	rstone
[1] Discussed with:	philip
2013-11-25 07:38:45 +00:00
Gleb Smirnoff
af50ea380f Axe IFF_SMART. Fortunately this layering violating flag was never used,
it was just declared.
2013-11-05 12:52:56 +00:00
Gleb Smirnoff
eedc7fd9e8 Provide includes that are needed in these files, and before were read
in implicitly via if.h -> if_var.h pollution.

Sponsored by:	Netflix
Sponsored by:	Nginx, Inc.
2013-10-26 18:18:50 +00:00
Gleb Smirnoff
76039bc84f The r48589 promised to remove implicit inclusion of if_var.h soon. Prepare
to this event, adding if_var.h to files that do need it. Also, include
all includes that now are included due to implicit pollution via if_var.h

Sponsored by:	Netflix
Sponsored by:	Nginx, Inc.
2013-10-26 17:58:36 +00:00
Mark Johnston
92c6196caa Fix some typos that were causing probe argument types to show up as unknown.
Reviewed by:	rwatson (mac provider)
Approved by:	re (glebius)
MFC after:	1 week
2013-10-01 15:40:27 +00:00
Roman Divacky
b12698e1a1 Revert r255672, it has some serious flaws, leaking file references etc.
Approved by:	re (delphij)
2013-09-18 18:48:33 +00:00
Roman Divacky
253c75c0de Implement epoll support in Linuxulator. This is a tiny wrapper around kqueue
to implement epoll subset of functionality. The kqueue user data are 32bit
on i386 which is not enough for epoll user data so this patch overrides
kqueue fileops to maintain enough space in struct file.

Initial patch developed by me in 2007 and then extended and finished
by Yuri Victorovich.

Approved by:    re (delphij)
Sponsored by:   Google Summer of Code
Submitted by:   Yuri Victorovich <yuri at rawbw dot com>
Tested by:      Yuri Victorovich <yuri at rawbw dot com>
2013-09-18 17:56:04 +00:00
John Baldwin
edb572a38c Add a mmap flag (MAP_32BIT) on 64-bit platforms to request that a mapping use
an address in the first 2GB of the process's address space.  This flag should
have the same semantics as the same flag on Linux.

To facilitate this, add a new parameter to vm_map_find() that specifies an
optional maximum virtual address.  While here, fix several callers of
vm_map_find() to use a VMFS_* constant for the findspace argument instead of
TRUE and FALSE.

Reviewed by:	alc
Approved by:	re (kib)
2013-09-09 18:11:59 +00:00
Pawel Jakub Dawidek
7008be5bd7 Change the cap_rights_t type from uint64_t to a structure that we can extend
in the future in a backward compatible (API and ABI) way.

The cap_rights_t represents capability rights. We used to use one bit to
represent one right, but we are running out of spare bits. Currently the new
structure provides place for 114 rights (so 50 more than the previous
cap_rights_t), but it is possible to grow the structure to hold at least 285
rights, although we can make it even larger if 285 rights won't be enough.

The structure definition looks like this:

	struct cap_rights {
		uint64_t	cr_rights[CAP_RIGHTS_VERSION + 2];
	};

The initial CAP_RIGHTS_VERSION is 0.

The top two bits in the first element of the cr_rights[] array contain total
number of elements in the array - 2. This means if those two bits are equal to
0, we have 2 array elements.

The top two bits in all remaining array elements should be 0.
The next five bits in all array elements contain array index. Only one bit is
used and bit position in this five-bits range defines array index. This means
there can be at most five array elements in the future.

To define new right the CAPRIGHT() macro must be used. The macro takes two
arguments - an array index and a bit to set, eg.

	#define	CAP_PDKILL	CAPRIGHT(1, 0x0000000000000800ULL)

We still support aliases that combine few rights, but the rights have to belong
to the same array element, eg:

	#define	CAP_LOOKUP	CAPRIGHT(0, 0x0000000000000400ULL)
	#define	CAP_FCHMOD	CAPRIGHT(0, 0x0000000000002000ULL)

	#define	CAP_FCHMODAT	(CAP_FCHMOD | CAP_LOOKUP)

There is new API to manage the new cap_rights_t structure:

	cap_rights_t *cap_rights_init(cap_rights_t *rights, ...);
	void cap_rights_set(cap_rights_t *rights, ...);
	void cap_rights_clear(cap_rights_t *rights, ...);
	bool cap_rights_is_set(const cap_rights_t *rights, ...);

	bool cap_rights_is_valid(const cap_rights_t *rights);
	void cap_rights_merge(cap_rights_t *dst, const cap_rights_t *src);
	void cap_rights_remove(cap_rights_t *dst, const cap_rights_t *src);
	bool cap_rights_contains(const cap_rights_t *big, const cap_rights_t *little);

Capability rights to the cap_rights_init(), cap_rights_set(),
cap_rights_clear() and cap_rights_is_set() functions are provided by
separating them with commas, eg:

	cap_rights_t rights;

	cap_rights_init(&rights, CAP_READ, CAP_WRITE, CAP_FSTAT);

There is no need to terminate the list of rights, as those functions are
actually macros that take care of the termination, eg:

	#define	cap_rights_set(rights, ...)				\
		__cap_rights_set((rights), __VA_ARGS__, 0ULL)
	void __cap_rights_set(cap_rights_t *rights, ...);

Thanks to using one bit as an array index we can assert in those functions that
there are no two rights belonging to different array elements provided
together. For example this is illegal and will be detected, because CAP_LOOKUP
belongs to element 0 and CAP_PDKILL to element 1:

	cap_rights_init(&rights, CAP_LOOKUP | CAP_PDKILL);

Providing several rights that belongs to the same array's element this way is
correct, but is not advised. It should only be used for aliases definition.

This commit also breaks compatibility with some existing Capsicum system calls,
but I see no other way to do that. This should be fine as Capsicum is still
experimental and this change is not going to 9.x.

Sponsored by:	The FreeBSD Foundation
2013-09-05 00:09:56 +00:00
Mark Johnston
1570438586 Remove a couple of unused macros.
MFC after:	3 days
2013-08-17 21:53:37 +00:00
Jeff Roberson
5df87b21d3 Replace kernel virtual address space allocation with vmem. This provides
transparent layering and better fragmentation.

 - Normalize functions that allocate memory to use kmem_*
 - Those that allocate address space are named kva_*
 - Those that operate on maps are named kmap_*
 - Implement recursive allocation handling for kmem_arena in vmem.

Reviewed by:	alc
Tested by:	pho
Sponsored by:	EMC / Isilon Storage Division
2013-08-07 06:21:20 +00:00
Hans Petter Selasky
a40a377cc7 Add some missing LIBUSB IOCTL conversion codes. 2013-07-14 10:13:01 +00:00
Alexander Leidinger
b85e1f7d05 - Move videodev headers from compat/linux to contrib/v4l (cp from vendor and
apply diff to compat/linux versions).
- The cp implies an update of videodev2.h to the linux kernel 2.6.34.14 one.

The update makes video in skype v4 work on FreeBSD.

Tested by:	Artyom Mirgorodskiy <artyom.mirgorodsky@gmail.com>
		(update of header only)
2013-07-06 19:59:06 +00:00
Jilles Tjoelker
d289dc7b73 Rename do_pipe() to kern_pipe2() and declare it properly. 2013-03-31 17:42:54 +00:00
Eitan Adler
1eb9ea583b Remove check for NULL prior to free(9) and m_freem(9).
Approved by:	cperciva (mentor)
2013-03-04 02:21:34 +00:00
Pawel Jakub Dawidek
2609222ab4 Merge Capsicum overhaul:
- Capability is no longer separate descriptor type. Now every descriptor
  has set of its own capability rights.

- The cap_new(2) system call is left, but it is no longer documented and
  should not be used in new code.

- The new syscall cap_rights_limit(2) should be used instead of
  cap_new(2), which limits capability rights of the given descriptor
  without creating a new one.

- The cap_getrights(2) syscall is renamed to cap_rights_get(2).

- If CAP_IOCTL capability right is present we can further reduce allowed
  ioctls list with the new cap_ioctls_limit(2) syscall. List of allowed
  ioctls can be retrived with cap_ioctls_get(2) syscall.

- If CAP_FCNTL capability right is present we can further reduce fcntls
  that can be used with the new cap_fcntls_limit(2) syscall and retrive
  them with cap_fcntls_get(2).

- To support ioctl and fcntl white-listing the filedesc structure was
  heavly modified.

- The audit subsystem, kdump and procstat tools were updated to
  recognize new syscalls.

- Capability rights were revised and eventhough I tried hard to provide
  backward API and ABI compatibility there are some incompatible changes
  that are described in detail below:

	CAP_CREATE old behaviour:
	- Allow for openat(2)+O_CREAT.
	- Allow for linkat(2).
	- Allow for symlinkat(2).
	CAP_CREATE new behaviour:
	- Allow for openat(2)+O_CREAT.

	Added CAP_LINKAT:
	- Allow for linkat(2). ABI: Reuses CAP_RMDIR bit.
	- Allow to be target for renameat(2).

	Added CAP_SYMLINKAT:
	- Allow for symlinkat(2).

	Removed CAP_DELETE. Old behaviour:
	- Allow for unlinkat(2) when removing non-directory object.
	- Allow to be source for renameat(2).

	Removed CAP_RMDIR. Old behaviour:
	- Allow for unlinkat(2) when removing directory.

	Added CAP_RENAMEAT:
	- Required for source directory for the renameat(2) syscall.

	Added CAP_UNLINKAT (effectively it replaces CAP_DELETE and CAP_RMDIR):
	- Allow for unlinkat(2) on any object.
	- Required if target of renameat(2) exists and will be removed by this
	  call.

	Removed CAP_MAPEXEC.

	CAP_MMAP old behaviour:
	- Allow for mmap(2) with any combination of PROT_NONE, PROT_READ and
	  PROT_WRITE.
	CAP_MMAP new behaviour:
	- Allow for mmap(2)+PROT_NONE.

	Added CAP_MMAP_R:
	- Allow for mmap(PROT_READ).
	Added CAP_MMAP_W:
	- Allow for mmap(PROT_WRITE).
	Added CAP_MMAP_X:
	- Allow for mmap(PROT_EXEC).
	Added CAP_MMAP_RW:
	- Allow for mmap(PROT_READ | PROT_WRITE).
	Added CAP_MMAP_RX:
	- Allow for mmap(PROT_READ | PROT_EXEC).
	Added CAP_MMAP_WX:
	- Allow for mmap(PROT_WRITE | PROT_EXEC).
	Added CAP_MMAP_RWX:
	- Allow for mmap(PROT_READ | PROT_WRITE | PROT_EXEC).

	Renamed CAP_MKDIR to CAP_MKDIRAT.
	Renamed CAP_MKFIFO to CAP_MKFIFOAT.
	Renamed CAP_MKNODE to CAP_MKNODEAT.

	CAP_READ old behaviour:
	- Allow pread(2).
	- Disallow read(2), readv(2) (if there is no CAP_SEEK).
	CAP_READ new behaviour:
	- Allow read(2), readv(2).
	- Disallow pread(2) (CAP_SEEK was also required).

	CAP_WRITE old behaviour:
	- Allow pwrite(2).
	- Disallow write(2), writev(2) (if there is no CAP_SEEK).
	CAP_WRITE new behaviour:
	- Allow write(2), writev(2).
	- Disallow pwrite(2) (CAP_SEEK was also required).

	Added convinient defines:

	#define	CAP_PREAD		(CAP_SEEK | CAP_READ)
	#define	CAP_PWRITE		(CAP_SEEK | CAP_WRITE)
	#define	CAP_MMAP_R		(CAP_MMAP | CAP_SEEK | CAP_READ)
	#define	CAP_MMAP_W		(CAP_MMAP | CAP_SEEK | CAP_WRITE)
	#define	CAP_MMAP_X		(CAP_MMAP | CAP_SEEK | 0x0000000000000008ULL)
	#define	CAP_MMAP_RW		(CAP_MMAP_R | CAP_MMAP_W)
	#define	CAP_MMAP_RX		(CAP_MMAP_R | CAP_MMAP_X)
	#define	CAP_MMAP_WX		(CAP_MMAP_W | CAP_MMAP_X)
	#define	CAP_MMAP_RWX		(CAP_MMAP_R | CAP_MMAP_W | CAP_MMAP_X)
	#define	CAP_RECV		CAP_READ
	#define	CAP_SEND		CAP_WRITE

	#define	CAP_SOCK_CLIENT \
		(CAP_CONNECT | CAP_GETPEERNAME | CAP_GETSOCKNAME | CAP_GETSOCKOPT | \
		 CAP_PEELOFF | CAP_RECV | CAP_SEND | CAP_SETSOCKOPT | CAP_SHUTDOWN)
	#define	CAP_SOCK_SERVER \
		(CAP_ACCEPT | CAP_BIND | CAP_GETPEERNAME | CAP_GETSOCKNAME | \
		 CAP_GETSOCKOPT | CAP_LISTEN | CAP_PEELOFF | CAP_RECV | CAP_SEND | \
		 CAP_SETSOCKOPT | CAP_SHUTDOWN)

	Added defines for backward API compatibility:

	#define	CAP_MAPEXEC		CAP_MMAP_X
	#define	CAP_DELETE		CAP_UNLINKAT
	#define	CAP_MKDIR		CAP_MKDIRAT
	#define	CAP_RMDIR		CAP_UNLINKAT
	#define	CAP_MKFIFO		CAP_MKFIFOAT
	#define	CAP_MKNOD		CAP_MKNODAT
	#define	CAP_SOCK_ALL		(CAP_SOCK_CLIENT | CAP_SOCK_SERVER)

Sponsored by:	The FreeBSD Foundation
Reviewed by:	Christoph Mallon <christoph.mallon@gmx.de>
Many aspects discussed with:	rwatson, benl, jonathan
ABI compatibility discussed with:	kib
2013-03-02 00:53:12 +00:00
John Baldwin
d825ce0a5d Reduce duplication between i386/linux/linux.h and amd64/linux32/linux.h
by moving bits that are MI out into headers in compat/linux.

Reviewed by:	Chagin Dmitry  dmitry | gmail
MFC after:	2 weeks
2013-01-29 18:41:30 +00:00
Dmitry Chagin
4d04cf1d9e Arithmetic on pointers takes into account the size of the type. Properly cast the pointer to avoid incorrect pointer scaling.
MFC after:	1 Week
2013-01-25 14:40:54 +00:00
John Baldwin
fb709557a3 Don't assume that all Linux TCP-level socket options are identical to
FreeBSD TCP-level socket options (only the first two are).  Instead,
using a mapping function and fail unsupported options as we do for other
socket option levels.

MFC after:	2 weeks
2013-01-23 21:44:48 +00:00
Gleb Smirnoff
eb1b1807af Mechanically substitute flags from historic mbuf allocator with
malloc(9) flags within sys.

Exceptions:

- sys/contrib not touched
- sys/mbuf.h edited manually
2012-12-05 08:04:20 +00:00
Colin Percival
43f13bea35 MFS security patches which seem to have accidentally not reached HEAD:
Fix insufficient message length validation for EAP-TLS messages.

Fix Linux compatibility layer input validation error.

Security:	FreeBSD-SA-12:07.hostapd
Security:	FreeBSD-SA-12:08.linux
Security:	CVE-2012-4445, CVE-2012-4576
With hat:	so@
2012-11-23 01:48:31 +00:00
Konstantin Belousov
140dedb81c The r241025 fixed the case when a binary, executed from nullfs mount,
was still possible to open for write from the lower filesystem.  There
is a symmetric situation where the binary could already has file
descriptors opened for write, but it can be executed from the nullfs
overlay.

Handle the issue by passing one v_writecount reference to the lower
vnode if nullfs vnode has non-zero v_writecount.  Note that only one
write reference can be donated, since nullfs only keeps one use
reference on the lower vnode.  Always use the lower vnode v_writecount
for the checks.

Introduce the VOP_GET_WRITECOUNT to read v_writecount, which is
currently always bypassed to the lower vnode, and VOP_ADD_WRITECOUNT
to manipulate the v_writecount value, which manages a single bypass
reference to the lower vnode.  Caling the VOPs instead of directly
accessing v_writecount provide the fix described in the previous
paragraph.

Tested by:	pho
MFC after:	3 weeks
2012-11-02 13:56:36 +00:00
Konstantin Belousov
5050aa86cf Remove the support for using non-mpsafe filesystem modules.
In particular, do not lock Giant conditionally when calling into the
filesystem module, remove the VFS_LOCK_GIANT() and related
macros. Stop handling buffers belonging to non-mpsafe filesystems.

The VFS_VERSION is bumped to indicate the interface change which does
not result in the interface signatures changes.

Conducted and reviewed by:	attilio
Tested by:	pho
2012-10-22 17:50:54 +00:00
Konstantin Belousov
877d24ac8a Fix the mis-handling of the VV_TEXT on the nullfs vnodes.
If you have a binary on a filesystem which is also mounted over by
nullfs, you could execute the binary from the lower filesystem, or
from the nullfs mount. When executed from lower filesystem, the lower
vnode gets VV_TEXT flag set, and the file cannot be modified while the
binary is active. But, if executed as the nullfs alias, only the
nullfs vnode gets VV_TEXT set, and you still can open the lower vnode
for write.

Add a set of VOPs for the VV_TEXT query, set and clear operations,
which are correctly bypassed to lower vnode.

Tested by:	pho (previous version)
MFC after:	2 weeks
2012-09-28 11:25:02 +00:00
Kevin Lo
457a9cfbc1 Remove redundant check 2012-09-12 10:12:03 +00:00
Konstantin Belousov
c5c1199c83 Extend the KPI to lock and unlock f_offset member of struct file. It
now fully encapsulates all accesses to f_offset, and extends f_offset
locking to other consumers that need it, in particular, to lseek() and
variants of getdirentries().

Ensure that on 32bit architectures f_offset, which is 64bit quantity,
always read and written under the mtxpool protection. This fixes
apparently easy to trigger race when parallel lseek()s or lseek() and
read/write could destroy file offset.

The already broken ABI emulations, including iBCS and SysV, are not
converted (yet).

Tested by:	pho
No objections from:	jhb
MFC after:    3 weeks
2012-07-02 21:01:03 +00:00
Alexander Leidinger
19e252baeb - >500 static DTrace probes for the linuxulator
- DTrace scripts to check for errors, performance, ...
  they serve mostly as examples of what you can do with the static probe;s
  with moderate load the scripts may be overwhelmed, excessive lock-tracing
  may influence program behavior (see the last design decission)

Design decissions:
 - use "linuxulator" as the provider for the native bitsize; add the
   bitsize for the non-native emulation (e.g. "linuxuator32" on amd64)
 - Add probes only for locks which are acquired in one function and released
   in another function. Locks which are aquired and released in the same
   function should be easy to pair in the code, inter-function
   locking is more easy to verify in DTrace.
 - Probes for locks should be fired after locking and before releasing to
   prevent races (to provide data/function stability in DTrace, see the
   man-page of "dtrace -v ..." and the corresponding DTrace docs).
2012-05-05 19:42:38 +00:00
Jung-uk Kim
d69a426fce - Implement pipe2 syscall for Linuxulator. This syscall appeared in 2.6.27
but GNU libc used it without checking its kernel version, e. g., Fedora 10.
- Move pipe(2) implementation for Linuxulator from MD files to MI file,
sys/compat/linux/linux_file.c.  There is no MD code for this syscall at all.
- Correct an argument type for pipe() from l_ulong * to l_int *.  Probably
this was the source of MI/MD confusion.

Reviewed by:	emulation
2012-04-16 21:22:02 +00:00
Konstantin Belousov
3494f31ad2 Fix misuse of the kernel map in miscellaneous image activators.
Vnode-backed mappings cannot be put into the kernel map, since it is a
system map.

Use exec_map for transient mappings, and remove the mappings with
kmem_free_wakeup() to notify the waiters on available map space.

Do not map the whole executable into KVA at all to copy it out into
usermode.  Directly use vn_rdwr() for the case of not page aligned
binary.

There is one place left where the potentially unbounded amount of data
is mapped into exec_map, namely, in the COFF image activator
enumeration of the needed shared libraries.

Reviewed by:   alc
MFC after:     2 weeks
2012-02-17 23:47:16 +00:00
Ed Schouten
7870adb640 Remove direct access to si_name.
Code should just use the devtoname() function to obtain the name of a
character device. Also add const keywords to pieces of code that need it
to build properly.

MFC after:	2 weeks
2012-02-10 12:35:57 +00:00
Ulrich Spörlein
9a14aa017b Convert files to UTF-8 2012-01-15 13:23:18 +00:00
Dimitry Andric
69ee3e2f52 In sys/compat/linux/linux_ioctl.c, work around a warning when a pointer
is compared to an integer, by casting the pointer to l_uintptr_t.  No
functional difference on both i386 and amd64.

Reviewed by:	ed, jhb
MFC after:	1 week
2012-01-03 18:49:39 +00:00
John Baldwin
dd01579cde Implement linux_fadvise64() and linux_fadvise64_64() using
kern_posix_fadvise().

Reviewed by:	silence on emulation@
MFC after:	2 weeks
2011-12-29 15:34:59 +00:00
Ed Schouten
767a32641c Make the Linux *at() calls a bit more complete.
Properly support:

- AT_EACCESS for faccessat(),
- AT_SYMLINK_FOLLOW for linkat().
2011-11-19 07:19:37 +00:00
Ed Schouten
d3a993d46b Improve *access*() parameter name consistency.
The current code mixes the use of `flags' and `mode'. This is a bit
confusing, since the faccessat() function as a `flag' parameter to store
the AT_ flag.

Make this less confusing by using the same name as used in the POSIX
specification -- `amode'.
2011-11-19 06:35:15 +00:00
Ed Schouten
6472ac3d8a Mark all SYSCTL_NODEs static that have no corresponding SYSCTL_DECLs.
The SYSCTL_NODE macro defines a list that stores all child-elements of
that node. If there's no SYSCTL_DECL macro anywhere else, there's no
reason why it shouldn't be static.
2011-11-07 15:43:11 +00:00
Ed Schouten
d745c852be Mark MALLOC_DEFINEs static that have no corresponding MALLOC_DECLAREs.
This means that their use is restricted to a single C file.
2011-11-07 06:44:47 +00:00
Christian Brueffer
796fa5e465 Add curly braces missed in r226247.
Pointy hat to:	brueffer
Submitted by:	many
MFC after:	1 week
2011-10-11 13:40:37 +00:00
Christian Brueffer
05ad7ad667 Properly free linux_gidset in case of an error.
CID:		4136
Found with:	Coverity Prevent(tm)
MFC after:	1 week
2011-10-11 10:32:23 +00:00
Jung-uk Kim
bf3a36cc7b Use the caculated length instead of maximum length. 2011-10-06 21:55:05 +00:00
Jung-uk Kim
b6f96462ba Remove a now-defunct variable. 2011-10-06 21:40:08 +00:00
Jung-uk Kim
3106d6704b Use uint32_t instead of u_int32_t. Fix style(9) nits. 2011-10-06 21:17:46 +00:00
Jung-uk Kim
c02637c717 Make sure to ignore the leading NULL byte from Linux abstract namespace. 2011-10-06 21:09:28 +00:00
Jung-uk Kim
f05531a392 Restore the original socket address length if it was not really AF_INET6. 2011-10-06 20:48:23 +00:00
Jung-uk Kim
43399111a7 Retern more appropriate errno when Linux path name is too long. 2011-10-06 20:28:08 +00:00
Jung-uk Kim
0007f669ca Inline do_sa_get() function and remove an unused return value. 2011-10-06 20:20:30 +00:00
Jung-uk Kim
c15cdbf2f3 Unroll inlined strnlen(9) and make it easier to read. No functional change. 2011-10-06 19:59:14 +00:00
Colin Percival
5da3eb94fc Fix a bug in UNIX socket handling in the linux emulator which was
exposed by the security fix in FreeBSD-SA-11:05.unix.

Approved by:	so (cperciva)
Approved by:	re (kib)
Security:	Related to FreeBSD-SA-11:05.unix, but not actually
		a security fix.
2011-10-04 19:07:38 +00:00
Kip Macy
8451d0dd78 In order to maximize the re-usability of kernel code in user space this
patch modifies makesyscalls.sh to prefix all of the non-compatibility
calls (e.g. not linux_, freebsd32_) with sys_ and updates the kernel
entry points and all places in the code that use them. It also
fixes an additional name space collision between the kernel function
psignal and the libc function of the same name by renaming the kernel
psignal kern_psignal(). By introducing this change now we will ease future
MFCs that change syscalls.

Reviewed by:	rwatson
Approved by:	re (bz)
2011-09-16 13:58:51 +00:00
Jonathan Anderson
cfb5f76865 Add experimental support for process descriptors
A "process descriptor" file descriptor is used to manage processes
without using the PID namespace. This is required for Capsicum's
Capability Mode, where the PID namespace is unavailable.

New system calls pdfork(2) and pdkill(2) offer the functional equivalents
of fork(2) and kill(2). pdgetpid(2) allows querying the PID of the remote
process for debugging purposes. The currently-unimplemented pdwait(2) will,
in the future, allow querying rusage/exit status. In the interim, poll(2)
may be used to check (and wait for) process termination.

When a process is referenced by a process descriptor, it does not issue
SIGCHLD to the parent, making it suitable for use in libraries---a common
scenario when using library compartmentalisation from within large
applications (such as web browsers). Some observers may note a similarity
to Mach task ports; process descriptors provide a subset of this behaviour,
but in a UNIX style.

This feature is enabled by "options PROCDESC", but as with several other
Capsicum kernel features, is not enabled by default in GENERIC 9.0.

Reviewed by: jhb, kib
Approved by: re (kib), mentor (rwatson)
Sponsored by: Google Inc
2011-08-18 22:51:30 +00:00
Robert Watson
a9d2f8d84f Second-to-last commit implementing Capsicum capabilities in the FreeBSD
kernel for FreeBSD 9.0:

Add a new capability mask argument to fget(9) and friends, allowing system
call code to declare what capabilities are required when an integer file
descriptor is converted into an in-kernel struct file *.  With options
CAPABILITIES compiled into the kernel, this enforces capability
protection; without, this change is effectively a no-op.

Some cases require special handling, such as mmap(2), which must preserve
information about the maximum rights at the time of mapping in the memory
map so that they can later be enforced in mprotect(2) -- this is done by
narrowing the rights in the existing max_protection field used for similar
purposes with file permissions.

In namei(9), we assert that the code is not reached from within capability
mode, as we're not yet ready to enforce namespace capabilities there.
This will follow in a later commit.

Update two capability names: CAP_EVENT and CAP_KEVENT become
CAP_POST_KEVENT and CAP_POLL_KEVENT to more accurately indicate what they
represent.

Approved by:	re (bz)
Submitted by:	jonathan
Sponsored by:	Google Inc
2011-08-11 12:30:23 +00:00
Bjoern A. Zeeb
74d7a2539e Remove the 'either' from the comment as it'll be less obvious that we
removed semmap in a bit of time from now.   Re-wrap.

Suggested by:	jhb
2011-07-17 05:33:22 +00:00
Bjoern A. Zeeb
1080a2c85d Remove semaphore map entry count "semmap" field and its tuning
option that is highly recommended to be adjusted in too much
documentation while doing nothing in FreeBSD since r2729 (rev 1.1).

ipcs(1) needs to be recompiled as it is accessing _KERNEL private
variables.

Reviewed by:	jhb (before comment change on linux code)
Sponsored by:	Sandvine Incorporated
2011-07-14 14:18:14 +00:00
Alexander Leidinger
f4cb7c85e6 Commit the missing linux_videdev2_compat.h (lost somewhere between
commit tree patch generation -> successful compile tree build test -> commmit).

Pointy hat to:	netchild
2011-05-04 13:09:20 +00:00
Alexander Leidinger
60c6d23685 Add FEATURE macros for v4l and v4l2 to the linuxulator.
Suggested by:	ae
2011-05-04 09:52:34 +00:00
Alexander Leidinger
15bf9014c9 This is v4l2 support for the linuxulator. This allows to access FreeBSD
native devices which support the v4l2 API from processes running within
the linuxulator, e.g. skype or flash can access the multimedia/pwcbsd
or multimedia/webcamd supplied drivers.

Submitted by:	nox
MFC after:	1 month
2011-05-04 09:05:39 +00:00
Alexander Leidinger
4c94038794 Fix typo in comment, improve comment. 2011-05-04 08:42:31 +00:00
Alexander Leidinger
d0f5ca6d40 Add explanation about the use-permission and FreeBSDify it. 2011-05-04 08:41:55 +00:00
Alexander Leidinger
41ebeb8e6f Copy the v4l2 header unchanged from the vendor branch. 2011-05-04 08:31:58 +00:00
Edward Tomasz Napierala
1ba5ad4210 Add accounting for most of the memory-related resources.
Sponsored by:	The FreeBSD Foundation
Reviewed by:	kib (earlier version)
2011-04-05 20:23:59 +00:00
Andriy Gapon
a930718af1 Revert r220032:linux compat: add SO_PASSCRED option with basic handling
I have not properly thought through the commit.  After r220031 (linux
compat: improve and fix sendmsg/recvmsg compatibility) the basic
handling for SO_PASSCRED is not sufficient as it breaks recvmsg
functionality for SCM_CREDS messages because now we would need to handle
sockcred data in addition to cmsgcred.  And that is not implemented yet.

Pointyhat to:	avg
2011-03-31 08:14:51 +00:00
Andriy Gapon
01a9e1a11b linux compat: add SO_PASSCRED option with basic handling
This seems to have been a part of a bigger patch by dchagin that either
haven't been committed or committed partially.

Submitted by:	dchagin, nox
MFC after:	2 weeks
2011-03-26 11:25:36 +00:00
Andriy Gapon
605da56bc3 linux compat: improve and fix sendmsg/recvmsg compatibility
- implement baseic stubs for capget, capset, prctl PR_GET_KEEPCAPS
  and prctl PR_SET_KEEPCAPS.
- add SCM_CREDS support to sendmsg and recvmsg
- modify sendmsg to ignore control messages if not using UNIX
  domain sockets

This should allow linux pulse audio daemon and client work on FreeBSD
and interoperate with native counter-parts modulo the differences in
pulseaudio versions.

PR:		kern/149168
Submitted by:	John Wehle <john@feith.com>
Reviewed by:	netchild
MFC after:	2 weeks
2011-03-26 11:05:53 +00:00
Alexander Leidinger
0d7b5e545c Staticize functions which are not used somewhere else, move the
corresponding prototypes from the header to the code file.
2011-03-15 13:40:47 +00:00
Dmitry Chagin
31f7ad1545 Style(9) fixes. No functional changes.
MFC after:	2 Week
2011-03-12 07:47:05 +00:00
John Baldwin
c28a98e948 Remove now-obsolete comment.
Submitted by:	netchild
MFC after:	1 week
2011-03-10 19:50:12 +00:00
Dmitry Chagin
a2cd91cf28 Indeed, remove bogus since r219405 check of the Linux ABI.
Pointed out:	jhb

MFC after:	2 Week
2011-03-09 05:59:33 +00:00
Dmitry Chagin
e5d81ef1b5 Extend struct sysvec with new method sv_schedtail, which is used for an
explicit process at fork trampoline path instead of eventhadler(schedtail)
invocation for each child process.

Remove eventhandler(schedtail) code and change linux ABI to use newly added
sysvec method.

While here replace explicit comparing of module sysentvec structure with the
newly created process sysentvec to detect the linux ABI.

Discussed with:	kib

MFC after:	2 Week
2011-03-08 19:01:45 +00:00
Dmitry Chagin
3a4bc25691 Print out shared flag for debug purpose.
MFC after:	1 Week
2011-03-03 18:29:55 +00:00
Dmitry Chagin
815cb72a0c Switch PROCESS_SHARE to AUTO_SHARE (as umtx do). Even for SHARED,
if page mapped MAP_ANON linux uses private algorithm too.

Disscussed with:	jhb

MFC after:	3 Days
2011-03-03 18:19:10 +00:00
John Baldwin
21f8f506fb Use umtx_key objects to uniquely identify futexes. Private futexes in
different processes that happen to use the same user address in the
separate processes will now be treated as distinct futexes rather than the
same futex.  We can now honor shared futexes properly by mapping them to a
PROCESS_SHARED umtx_key.  Private futexes use THREAD_SHARED umtx_key
objects.

In conjunction with:	dchagin
Reviewed by:	kib
MFC after:	1 week
2011-02-23 13:23:28 +00:00
Dmitry Chagin
f9e66923e5 Do not clobber %rdx.
Before calling vfork() syscall the linux user-space stores the current PID
in the %rdx and restore it when the parent process will leave the kernel.
2011-02-20 07:58:30 +00:00
Dmitry Chagin
09d6cb0a23 For realtime signals fill the sigval value. 2011-02-15 21:46:36 +00:00
Dmitry Chagin
f3481dd9ab Make a linux_rt_sigtimedwait() system call is actually working.
1) Translate the native signal number in the appropriate Linux signal.
2) Remove bogus code, which can lead to a panic as it calls
   kern_sigtimedwait with same ksiginfo.
3) Return the corresponding signal number.
2011-02-15 21:42:48 +00:00
Dmitry Chagin
8c50c56206 Style(9) fix. Wrap long lines in linux_rt_sigtimedwait(). 2011-02-15 21:24:50 +00:00
Dmitry Chagin
d207e753da Put the macro declaration in the relevant include file for future use. 2011-02-15 21:22:09 +00:00
Dmitry Chagin
e2ef00a426 Style(9) fix. Do not initialize variables in the declarations. 2011-02-14 17:24:58 +00:00
Dmitry Chagin
49fa1a745e Sort include files in the alphabetical order. 2011-02-13 20:07:48 +00:00
Dmitry Chagin
4ca49f41ec Remove comment about 'ftlk' LOR. 2011-02-13 18:46:34 +00:00
Dmitry Chagin
890c582fe5 Stop printing the LOR, as this is expected behavior. 2011-02-13 18:41:40 +00:00
Dmitry Chagin
7c3b05b99c The bitset field of freshly created futex should be initialized explicity.
Otherwise, REQUEUE operations fails.
2011-02-13 17:56:22 +00:00
Dmitry Chagin
d14cc07d07 Rename used_requeue and use it as bitwise field to store more flags.
Reimplement used_requeue logic with LINUX_XDEPR_REQUEUEOP flag.
2011-02-12 20:58:59 +00:00
Dmitry Chagin
cfa57401b0 Slightly rewrite linux_fork:
1) Remove bogus error checking.
2) A new process exit from kernel through fork_trampoline(),
   so remove bogus check.
2011-02-12 20:16:25 +00:00
Dmitry Chagin
9588e04dde Remove bogus include <machine/frame.h> 2011-02-12 19:14:57 +00:00
Dmitry Chagin
222198ab0b Move linux_clone(), linux_fork(), linux_vfork() to a MI path. 2011-02-12 18:17:12 +00:00
Alexander Leidinger
529844c77c Linux' shm_open() fails because it wants to find some funky shmfs
to construct the full pathname. It starts to search at the default
mountpoint which is /dev/shm. If this fails it runs through fstab
and searches for shmfs and tmpfs. Whatever it finds will be
statfs()'ed to be checked for Linux' fs magic for shmfs (0x01021994).

Ideally our tmpfs should deliver this fs magic to Linux processes, but
as our tmpfs is considered to be an experimental feature we can not
assume that there is always a tmpfs available.

To make shared memory work in the Linuxulator, force the fs type of
/dev/shm (which can be a symlink) to match what Linux expects. The user
is responsible (info has to be added to the linux base ports and the docs)
to setup a suitable link for /dev/shm.

Noticed by:	Andre Albsmeier <Andre.Albsmeier@siemens.com>
Submitted by:	Andre Albsmeier <Andre.Albsmeier@siemens.com>
MFC after:	1 month
2011-02-09 20:23:22 +00:00
Dmitry Chagin
78ec1867a2 Yet another unimplemented futex operation, print out about.
Submitted by:	arundel
MFC after:	1 month.
2011-01-31 06:06:23 +00:00
Dmitry Chagin
5163762354 Implement a futex BITSET op.
Submitted by:	arundel
MFC after:	1 month.
2011-01-31 05:59:05 +00:00