|
syscalls — Linux system calls
Linux system calls.
The system call is the fundamental interface between an application and the Linux kernel.
System calls are generally not invoked directly, but
rather via wrapper functions in glibc (or perhaps some
other library). For details of direct invocation of a
system call, see intro(2). Often, but not
always, the name of the wrapper function is the same as the
name of the system call that it invokes. For example, glibc
contains a function truncate
() which invokes the underlying
"truncate" system call.
Often the glibc wrapper function is quite thin, doing
little work other than copying arguments to the right
registers before invoking the system call, and then setting
errno
appropriately after the
system call has returned. (These are the same steps that
are performed by syscall(2), which can be
used to invoke system calls for which no wrapper function
is provided.) Note: system calls indicate a failure by
returning a negative error number to the caller; when this
happens, the wrapper function negates the returned error
number (to make it positive), copies it to errno
, and returns −1 to the caller
of the wrapper.
Sometimes, however, the wrapper function does some extra
work before invoking the system call. For example, nowadays
there are (for reasons described below) two related system
calls, truncate(2) and truncate64(2), and the
glibc truncate
() wrapper
function checks which of those system calls are provided by
the kernel and determines which should be employed.
Below is a list of the Linux system calls. In the list,
the Kernel
column
indicates the kernel version for those system calls that
were new in Linux 2.2, or have appeared since that kernel
version. Note the following points:
Where no kernel version is indicated, the system call appeared in kernel 1.0 or earlier.
Where a system call is marked "1.2" this means the system call probably appeared in a 1.1.x kernel version, and first appeared in a stable kernel with 1.2. (Development of the 1.2 kernel was initiated from a branch of kernel 1.0.6 via the 1.1.x unstable kernel series.)
Where a system call is marked "2.0" this means the system call probably appeared in a 1.3.x kernel version, and first appeared in a stable kernel with 2.0. (Development of the 2.0 kernel was initiated from a branch of kernel 1.2.x, somewhere around 1.2.10, via the 1.3.x unstable kernel series.)
Where a system call is marked "2.2" this means the system call probably appeared in a 2.1.x kernel version, and first appeared in a stable kernel with 2.2.0. (Development of the 2.2 kernel was initiated from a branch of kernel 2.0.21 via the 2.1.x unstable kernel series.)
Where a system call is marked "2.4" this means the system call probably appeared in a 2.3.x kernel version, and first appeared in a stable kernel with 2.4.0. (Development of the 2.4 kernel was initiated from a branch of kernel 2.2.8 via the 2.3.x unstable kernel series.)
Where a system call is marked "2.6" this means the system call probably appeared in a 2.5.x kernel version, and first appeared in a stable kernel with 2.6.0. (Development of kernel 2.6 was initiated from a branch of kernel 2.4.15 via the 2.5.x unstable kernel series.)
Starting with kernel 2.6.0, the development model changed, and new system calls may appear in each 2.6.x release. In this case, the exact version number where the system call appeared is shown. This convention continues with the 3.x kernel series, which followed on from kernel 2.6.39.
In some cases, a system call was added to a stable kernel series after it branched from the previous stable kernel series, and then backported into the earlier stable kernel series. For example some system calls that appeared in 2.6.x were also backported into a 2.4.x release after 2.4.15. When this is so, the version where the system call appeared in both of the major kernel series is listed.
The list of system calls that are available as at kernel 3.9 (or in a few cases only on older kernels) is as follows:
On many platforms, including x86-32, socket calls are all multiplexed (via glibc wrapper functions) through socketcall(2) and similarly System V IPC calls are multiplexed through ipc(2).
Although slots are reserved for them in the system call table, the following system calls are not implemented in the standard kernel: afs_syscall(2), break(2), ftime(2), getpmsg(2), gtty(2), idle(2), lock(2), madvise1(2), mpx(2), phys(2), prof(2), profil(2), putpmsg(2), security(2), stty(2), tuxcall(2), ulimit(2), and vserver(2) (see also unimplemented(2)). However, ftime(3), profil(3) and ulimit(3) exist as library routines. The slot for phys(2) is in use since kernel 2.1.116 for umount(2); phys(2) will never be implemented. The getpmsg(2) and putpmsg(2) calls are for kernels patched to support STREAMS, and may never be in the standard kernel.
There was briefly set_zone_reclaim(2), added in Linux 2.6.13, and removed in 2.6.16; this system call was never available to user space.
Roughly speaking, the code belonging to the system call
with number __NR_xxx defined in /usr/include/asm/unistd.h
can be found in
the Linux kernel source in the routine sys_xxx
(). (The dispatch table for i386 can
be found in /usr/src/linux/arch/i386/kernel/entry.S
.)
There are many exceptions, however, mostly because older
system calls were superseded by newer ones, and this has been
treated somewhat unsystematically. On platforms with
proprietary operating-system emulation, such as parisc,
sparc, sparc64 and alpha, there are many additional system
calls; mips64 also contains a full set of 32-bit system
calls.
Over time, changes to the interfaces of some system calls have been necessary. One reason for such changes was the need to increase the size of structures or scalar values passed to the system call. Because of these changes, there are now various groups of related system calls (e.g., truncate(2) and truncate64(2)) which perform similar tasks, but which vary in details such as the size of their arguments. (As noted earlier, applications are generally unaware of this: the glibc wrapper functions do some work to ensure that the right system call is invoked, and that ABI compatibility is preserved for old binaries.) Examples of systems calls that exist in multiple versions are the following:
By now there are three different versions of
stat(2): sys_stat
() (slot __NR_oldstat
),
sys_newstat
() (slot
__NR_stat
),
and sys_stat64
() (slot
__NR_stat64
),
with the last being the most current. A similar story
applies for lstat(2) and
fstat(2).
Similarly, the defines __NR_oldolduname
,
__NR_olduname
, and
__NR_uname
refer to the routines sys_olduname
(), sys_uname
() and sys_newuname
().
In Linux 2.0, a new version of vm86(2) appeared,
with the old and the new kernel routines being named
sys_vm86old
() and
sys_vm86
().
In Linux 2.4, a new version of getrlimit(2)
appeared, with the old and the new kernel routines
being named sys_old_getrlimit
() (slot __NR_getrlimit
) and
sys_getrlimit
() (slot
__NR_ugetrlimit
).
Linux 2.4 increased the size of user and group IDs from 16 to 32 bits. To support this change, a range of system calls were added (e.g., chown32(2), getuid32(2), getgroups32(2), setresuid32(2)), superseding earlier calls of the same name without the "32" suffix.
Linux 2.4 added support for applications on 32-bit architectures to access large files (i.e., files for which the sizes and file offsets can't be represented in 32 bits.) To support this change, replacements were required for system calls that deal with file offsets and sizes. Thus the following system calls were added: fcntl64(2), ftruncate64(2), getdents64(2), stat64(2), statfs64(2), and their analogs that work with file descriptors or symbolic links. These system calls supersede the older system calls which, except in the case of the "stat" calls, have the same name without the "64" suffix.
On newer platforms that only have 64-bit file access and 32-bit uids (e.g., alpha, ia64, s390x) there are no *64 or *32 calls. Where the *64 and *32 calls exist, the other versions are obsolete.
The rt_sig*
calls were
added in kernel 2.2 to support the addition of
real-time signals (see signal(7)). These
system calls supersede the older system calls of the
same name without the "rt_" prefix.
The select(2) and
mmap(2) system calls
use five or more arguments, which caused problems in
the way argument passing on the i386 used to be set up.
Thus, while other architectures have sys_select
() and sys_mmap
() corresponding to
__NR_select
and __NR_mmap
, on i386 one
finds old_select
() and
old_mmap
() (routines that
use a pointer to a argument block) instead. These days
passing five arguments is not a problem any more, and
there is a __NR__newselect
that
corresponds directly to sys_select
() and similarly __NR_mmap2
.
This page is part of release 3.52 of the Linux man-pages
project. A
description of the project, and information about reporting
bugs, can be found at
http://www.kernel.org/doc/man−pages/.
Copyright (C) 2007 Michael Kerrisk <mtk.manpagesgmail.com> with some input from Stepan Kasal <kasalucw.cz> Some content retained from an earlier version of this page: Copyright (C) 1998 Andries Brouwer (aebcwi.nl) Modifications for 2.2 and 2.4 Copyright (C) 2002 Ian Redfern <redfernilogica.com> %%%LICENSE_START(VERBATIM) Permission is granted to make and distribute verbatim copies of this manual provided the copyright notice and this permission notice are preserved on all copies. Permission is granted to copy and distribute modified versions of this manual under the conditions for verbatim copying, provided that the entire resulting derived work is distributed under the terms of a permission notice identical to this one. Since the Linux kernel and libraries are constantly changing, this manual page may be incorrect or out-of-date. The author(s) assume no responsibility for errors or omissions, or for damages resulting from the use of the information contained herein. The author(s) may not have taken the same level of care in the production of this manual, which is licensed free of charge, as they might when working professionally. Formatted or processed versions of this manual, if unaccompanied by the source, must acknowledge the copyright and authors of this work. %%%LICENSE_END |