Being able to easily run and debug a simple operating system can be really useful when you want to learn how low level components are implemented. Xv6 is a very simple Unix-like operating system that allows you to do just that.

sillysaurus2 exemplified this in the Hacker News’ thread on Xv6:

Have you ever:

  • Wondered how a filesystem can survive a power outage?
  • Wondered how to organize C code?
  • Wondered how memory allocation works?
  • Wondered how memory paging works?
  • Wondered about the difference between a kernel function and a userspace function?
  • Wondered how a shell works? (How it parses your commands, or how to write your own, etc)
  • Wondered how a mutex can be implemented? Or how to have multiple threads executing safely?
  • How multiple processes are scheduled by the OS? Priority, etc?
  • How permissions are enforced by the OS? Security model? Why Unix won while Multics didn’t (simplicity)?
  • How piping works? Stdin/stdout and how to compose them together to build complicated systems without drowning in complexity?
  • So much more!

I credit studying xv6 as being one of the most important decisions I’ve made; up there with learning vim or emacs, or touch typing. This is foundational knowledge which will serve you the rest of your life in a thousand ways, both subtle and overt. Spend a weekend or two dissecting xv6 and you’ll love yourself for it later. (Be sure to study the book, not just the source code. It’s freely available online. The source code is also distributed as a PDF, which seems strange till you start reading the book. Both PDFs are meant to be read simultaneously, rather than each alone.)

Download, Compile, and Run

You can download Xv6’s source code, compile, and run under QEMU using the following commands:

1
2
3
4
5
6
7
8
9
10
11
12
# Download the code
git clone https://github.com/mit-pdos/xv6-public.git xv6
cd xv6

# Compile
make

# Install QEMU
sudo apt-get install qemu

# Run under QEMU
make qemu

It looks like this:

Xv6 Under QEMU

There are very few commands available but you can see UNIX here:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
xv6...
cpu1: starting
cpu0: starting
sb: size 1000 nblocks 941 ninodes 200 nlog 30 logstart 2 inodestart 32 bmap start 58
init: starting sh
$ ls
.              1 1 512
..             1 1 512
README         2 2 1973
cat            2 3 13320
echo           2 4 12428
forktest       2 5 8144
grep           2 6 15020
init           2 7 13084
kill           2 8 12580
ln             2 9 12424
ls             2 10 14812
mkdir          2 11 12572
rm             2 12 12560
sh             2 13 23564
stressfs       2 14 13280
usertests      2 15 58584
wc             2 16 13848
zombie         2 17 12196
console        3 18 0
$

Remote Debug Xv6 Under QEMU using GDB

Xv6’s Makefile has a rule to make this very easy (qemu-gdb):

1
2
3
4
5
$ make qemu-gdb
...
sed "s/localhost:1234/localhost:26000/" < .gdbinit.tmpl > .gdbinit
*** Now run 'gdb'.
qemu-system-i386 -serial mon:stdio -hdb fs.img xv6.img -smp 2 -m 512  -S -gdb tcp::26000

Execution stopped before the first instruction and is now waiting for GDB to connect and supervise the execution (breakpoint, continue…). Open GDB from another shell:

1
2
3
4
5
6
7
8
9
10
11
12
$ gdb kernel
...
Reading symbols from kernel...done.
+ target remote localhost:26000
warning: A handler for the OS ABI "GNU/Linux" is not built into this configuration
of GDB.  Attempting to continue with the default i8086 settings.

The target architecture is assumed to be i8086
[f000:fff0]    0xffff0: ljmp   $0xf000,$0xe05b
0x0000fff0 in ?? ()
+ symbol-file kernel
(gdb)

Now you can set breakpoints, resume execution, and do whatever you want. Here I set a breakpoint at the exec function:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
(gdb) breakpoint exec
Breakpoint 1 at 0x80100b5f: file exec.c, line 12.
(gdb) continue
Continuing.
[New Thread 2]
[Switching to Thread 2]
The target architecture is assumed to be i386
=> 0x80100b5f <exec>:   push   %ebp

Breakpoint 1, exec (path=0x1c "/init", argv=0x8dfffec8) at exec.c:12
12      {
(gdb) continue
Continuing.
=> 0x80100b5f <exec>:   push   %ebp

Breakpoint 1, exec (path=0x87d "sh", argv=0x8dffeec8) at exec.c:12
12      {
(gdb) continue
Continuing.
[Switching to Thread 1]
=> 0x80100b5f <exec>:   push   %ebp

Breakpoint 1, exec (path=0x19c0 "ls", argv=0x8dfbeec8) at exec.c:12
12      {
(gdb) p argv[0]
$5 = 0x19c0 "ls"
(gdb) p argv[1]
$6 = 0x0
(gdb) backtrace
#0  exec (path=0x19c0 "ls", argv=0x8df2bec8) at exec.c:12
#1  0x801062ba in sys_exec () at sysfile.c:417
#2  0x80105619 in syscall () at syscall.c:133
#3  0x801067d7 in trap (tf=0x8df2bfb4) at trap.c:43
#4  0x801065cc in alltraps () at trapasm.S:23
#5  0x8df2bfb4 in ?? ()
Backtrace stopped: previous frame inner to this frame (corrupt stack?)
(gdb) continue
Continuing.

You can see that /init was the first process, it spawned sh and I entered ls in the console myself. Things are magically simple!

Useful Resources