Next Previous Contents

2. The Kernel Contexts

The Linux Kernel is not fully pre-emptive, unlike userspace. Otherwise many of the tricks used wouldn't work.

There are two contexts (patterns of execution flow) in the Linux kernel: interrupt and user(space) contexts. User contexts are code which is entered from userspace: a system call. Unless the kernel code sleeps for some reason (explicitly allowing other code to run), no other user context will run on that CPU; this is the non-preemtive part. They are always associated with a particular process.

However, an interrupt can occur at any time, which halts the user context in its tracks and runs an interrupt context. This is not associated with any process; it is caused by a timer, an external hardware interrupt, or a bottom-half (bottom halves may be run off the timer or other interrupts, see below). When it is finished, the user context will resume.

We'll see a number of ways that the user context can block interrupts, to prevent this from happening and become truly non-preemptable.

2.1 The Magic of Bottom Halves

Interrupt handlers are sometimes divided into two parts: a top and a bottom half. The top half is the real interrupt handler: often it just tells the kernel to run the bottom half, and exits. The kernel guarantees that the top half is never re-entered: if another interrupt arrives, it is queued until the top half is finished. Because the top half disables interrupts, it has to be fast.

The bottom half is run after the interrupts are processed, or off the timer interrupt (which occurs HZ times per second, see include/asm/param.h). The interrupts are not off while the bottom half is run, so it can do slower actions.

In Linux bottom halves don't have to be reentrant; the kernel will not run it again until it is finished, even on SMP machines. This will be changed for 2.5, so the same bottom half can run on many CPUs at the same time.

2.2 Some Basic Rules

No memory protection

If you corrupt memory the whole machine will crash. Are you sure you can't do what you want in userspace?

No floating point or MMX

The FPU context is not saved; you would mess with some user process' FPU. If you really want to do this, you would have to explicitely save/restore the full FPU state (and avoid context switches). It is generally a bad idea; use fixed point arithmetic first.

A rigid stack limit

The stack is about 6K in 2.2 (for most architectures: it's about 14K on the Alpha), and shared with interrups so you can't use it all. Avoid deep recursion and huge local arrays on the stack (allocate them dynamically instead).

The Linux kernel is portable

Let's keep it that way. Your code should be 64-bit clean, and endian-independent. You should also minimize CPU specific stuff, e.g. inline assembly should be cleanly encapsulated and minimized to ease porting.

[FIXME: Really expand on this ]


Next Previous Contents