DæmonNews: News and views for the BSD community

Daemon News Ezine BSD News BSD Mall BSD Support Forum BSD Advocacy BSD Updates

A Tour through the NetBSD Source Tree - Part III: Kernel

Hubert Feyrer <hubert@feyrer.de>

This is the third part of our tour through the NetBSD source tree. After we have talked about the various components that build up the userland, we will concentrate on the kernel source now. It is located in /usr/src/sys, with the /sys symlink being a well-known abbreviation to reach the system's kernel source.

Let's remember what happens when building a kernel: after editing the kernel config file located in /sys/arch/<arch>/conf and running config(8) on it, a number of files are created in /sys/arch/<arch>/compile/KERNELNAME. The header files contain data about what and how many devices to include, as well as other data for the system's configuration. Besides that, a Makefile is created, that is used to build the kernel from source. The interesting point to note here is that there is only one Makefile that will locate and compile all the needed sources and place the object files in the .../compile/KERNELNAME directory. In NetBSD, there is no recursive tree-walk of the whole source tree using several Makefiles to build the various sub-trees of the kernel source. This allows building kernels for several configurations and platforms from the same source, without different builds tripping across one another.

Still, the various parts of the NetBSD kernel are placed in various subdirectories that we will have a closer look at now. Under /usr/src/sys, there are:

adosfs, coda, filecorefs, isofs, msdosfs, nfs, ntfs:
These are various filesystems used directly by NetBSD to access data. Some of the filesystems' primary goal is to help in exchanging data between the machine's native operating system (AmigaOS's adosfs, Acorn Computers RISC OS's filecorefs, ...), while others implement filesystems that can be found on many systems (isofs, nfs, ...).

ufs:
The Unix (User) File System is the base of the native filesystem used in NetBSD. Ancient (AT&T) Unix filesystems only allowed up to 14 chars long filenames; there were no symlinks, for one. The problems were solved by the Berkeley computer scientists implementing BSD Unix. Their filesystem implementation serves as a base for several filesystems based on it these days, using various methods of data layout on the disk.

The filesystems are stored in the "ufs" subdirectory, filesystems contained in there include:

  • ext2fs: Linux' ext2fs
  • lfs: Log structured filesystem
  • mfs: Memory filesystem, for things like in-core /tmp
  • ufs: The native NetBSD filesystem
  • ffs: General routines of the Berkeley Fast File System, used by the other UFS-based filesystems, including things like softdeps.

miscfs:
This directory contains further filesystems that aren't directly related to physical storage. Instead they implement various layered filesystems for services like data translation or routines for implementing kernel features. Using the virtual filesystem operations table, it is easy to change behaviour of a operation upon certain conditions, e.g. mapping operations to deadfs on a file whose filedescriptors were revoke(2)'d.

The filesystems included here are:

  • deadfs: Implements operations that don't modify any data and instead return indications of invalid IO. Used to revoke(2) file descriptors.

  • fdesc: Maps a process' file descriptors into filesystem space, depending on the accessing process. Can be mounted on /dev/fd using mount_fdesc(8).

  • fifofs: Implements FIFOs using Unix domain sockets internally.

  • genfs: Generic filesystem functions that mostly return errors of some kind - bad filedescriptor, bad operation, or one that does no operation at all. Used for implementing deadfs etc.

  • kernfs: This filesystem is usually mounted under /kern and provides various informations about the running system, like kernel version, system time, etc.

  • nullfs: Used to "mirror" one directory tree onto another directory, providing the same tree on both mount points. Also known as loopback mount - see mount_null(8) for more information.

  • overlay: The operation of this filesystem is similar to the null filesystem. The implementation allows using this filesystem as a base for further layered filesystems, however, as all VFS operations are defined. See mount_overlay(8) for more information.

  • portal: The portal filesystem provides an service that allows descriptors such as sockets to be made available in the filesystem namespace following conversion rules given in a config file. See the mount_portal(8) manpage for further information.

  • procfs: Similar to kernfs, this filesystem is usually mounted on /proc and allows accessing various data about processes. It is used by ps(1) and other utilities. See mount_proc(8) for more information.

  • specfs: Implements routines to access special devices. The filesystem provides a filesystem interface, and calls the device-specific routines depending on the device's type, major and minor number.

  • syncfs: Operations used to implement the ioflush kernel thread that writes out modified pages to disk.

  • umapfs: A filesystem for re-mapping UIDs/GIDs, useful, e.g., when mounting a NFS volume from a server that has a different set of UIDs/GIDs than the local machine.

  • union: This layered filesystem allows merging two filesystems, providing a view as if they were mounted on the same mountpoint. Modifications go either to the "upper" or to the "lower" layer, which allows mounting a CDROM (read-only :), and mounting an empty but writable directory over it, making it possible, for example, to do a compile on a source expanded on the CDROM. See mount_union(8) for further details.

compat:
This directory contains code for emulating binary compatibility with various non-NetBSD operating systems as well as with old NetBSD binaries. It includes:

  • aout: This subsystem is used to run native NetBSD a.out binaries on systems that made the transition to the ELF executable format. As for most emulations, the shared library loader ld.so, shared libs etc. are looked for in /etc/aout first.

  • common: Various common routines used by all emulations like system call table translation routines; also contains compat code for prior NetBSD releases, see the COMPAT_* kernel options in options(4).

  • freebsd: mostly a few glue routines for running FreeBSD/i386 a.out and ELF binaries; See the compat_freebsd(8) manpage for details on setting things up!

  • hpux: To run native HP/UX programs on the Motorola based hp300/hp400 machines. Adjusts a fair number of calls, including terminal IO, signals, IO, etc.

  • ibcs2: This code implements the Intel Binary Compatibility Suite version2 used for running SCO programs on i386, but also for general compatibility with AT&T System V.3 which is used on the VAX port. Maybe it should have been named COMPAT_SVR3 - the compat_ibcs2(8) manpage contains more data.

  • linux: Code to run a.out and ELF Linux binaries for a number of hardware platforms, including alpha, arm32, i386, powerpc, mips, m68k, sparc and sparc64. One of the special things of the Linux emulation is that Linux uses a different system call table on each port, which makes maintaining things a bit more interesting. The code is seperated in a "common" directory that applies to all platforms, and various architecture specific directories for different CPUs. The compat_linux(8) manpage contains more information on using the system, and there are also several packages in pkgsrc that help in setting up the necessary shared libraries ,etc., to run Linux binaries like Netscape or Acrobat Reader.

  • m68k4k: Some of the m68k ports used to use a pagesize of 4k instead of the 8k common today. This code helps in maintaining binary compatibility with old binaries that still use 4k.

  • netbsd32: Used by 64bit systems like sparc64 to run native 32bit binaries. Maps the programs' 32bit args to the 64bit args used by LP64 systems' kernels.

  • osf1: The compat_osf1(8) system allows running OSF/1 (AKA Digital Unix AKA Tru64) on the Alpha platform.

  • ossaudio: This software layer provides Open Sound System compatible ioctl calls that are then mapped to the native NetBSD audio model by this code. Enabled when compiling in support for Linux and/or FreeBSD binary compatibility.

  • pecoff: This subsystem allows running programs that are in the PEcoff executable format, which is found on the Microsoft Windows platform. Of course mapping system calls is a real challenge here, as the API to present to the upper layer is definitely nothing that is even remotely near to the API used on all the Unix-like compat systems, and as such there's no easy mapping of the calls to NetBSD functions. Much of the work is done by libraries in the userspace instead, which then talk to the X server, etc. See the compat_pecoff(8) manpage for further details.

  • sunos: If users still have SPARC or m68k applications built for SunOS 4.x, this emulation layer will help run them. See compat_sunos(8) for more information.

  • svr4: The System V compat system allows binary compatibility for several systems, e.g. Solaris (SunOS 5.x) on i386, sparc and sparc64, Amix on m68k and SCO/Xenix on i386. The compat_svr4 manpage contains further information.

  • ultrix: For pmax and other MIPS based systems as well as VAX systems, to run Ultrix binaries. See compat_ultrix(8).

  • vax1k: For VAX binaries that still use 1k pagesizes, this allows running them. No idea where these originate - probably very historic. :)

conf:
The /sys/conf directory contains the main list of files to include into kernel builds as well as scripts and files used to update the OS version and compile it into the kernel. The operating system's version is stored in the "osrelease.sh" script, which is used from a number of places to determine the OS version.

crypto:
This directory contains code for various data encryption standards (arc4, blowfish, DES, Rijndael etc.) that is subject to crypto export regulations. The code is use by the IPSec kernel subsystem.

ddb:
The DDB kernel debugger that can be used to do post mortem debugging is found here. The debugger is used on all NetBSD ports.

dev:
This directory contains device drivers that use the machine independent bus_dma(9) and bus_space(9) interfaces and that work on all platforms that support the necessary bus glue routines. There are several subdirectories grouping drivers by various categories:

  • bus interface: cardbus, eisa, ieee1394, isa, isapnp, mca, pci, pcmcia, sbus, tc, usb, vme, qbus, xmi

  • functionality: ata, i2c, i2o, mii, ofw, pckbc, raidframe, rasops, rcons, scsipi, sysmon, wscons, wsfont

  • general interfaces that are backed by bus-specific drivers: audio, midi, rnd

The directory structure is mostly oriented towards the bus system that a hardware device attaches to, not towards the functionality it provides. There are no special categories for things like audio, network etc. - these are in their bus-specific directories like pci, isa etc. containing (only) the bus-specific attachment routines.

If a chip implements some functionality like audio, network or SCSI, it is often used on several cards that all have the same chip, but different bus interfaces - ISA, PCI, etc. To prevent maintaining several drivers that have identical core functionality, NetBSD drivers are separated into bus-glue code kept in the bus-specific directories mentioned above, and the core functionality of the integrated circuit. Naming conventions help identifying e.g. network cards (if_*), but aren't implemented thoroughly, unfortunately.

The drivers for the core functionality are stored in the "ic" subdirectory, with the file names indicating the IC's chip numbers:
	% ls /sys/dev/ic
	CVS                 cac.c               isp_target.c        pckbc.c
	Makefile            cacreg.h            isp_target.h        pckbcvar.h
	README.ncr5380sbc   cacvar.h            isp_tpublic.h       pdq.c
	ac97.c              cd1190reg.h         ispmbox.h           pdq_ifsubr.c
	ac97reg.h           cd1400reg.h         ispreg.h            pdqreg.h 
	...
ipkdb:
An IP-based debugger interface to a remote machine. Another way to debug the NetBSD besides the DDB kernel debugger and gdb, which can be used for debugging both userland and kernel.

kern:
This directory contains the core kernel code including a number of facilities:

  • loaders for executables in various formats (a.out, EOF, COFF, scripts ...)

  • process and (kernel) thread management

  • signal delivery and handling

  • terminal IO subsystem

  • sockets and other interprocess comunication primitives

  • virtual filesystem layer, providing the framework used by the filesystems in /sys/miscfs.

  • many auxiliary routines used from all places

lib:
Throughout the NetBSD kernel, there are many tasks that are used from many places, and that are stored within a few libraries that are used only in the kernel:

  • libkern: This is basically what libc is for the userland, with functions used for providing various arithmetic operations that can't be inlined by gcc as well as string/memory copy/comparison operations.

  • libsa: The StandAlone library provides functions used for loading the kernel, when there's no operating system running yet and thus many of the services provided by the NetBSD operating system are not available. The library includes code for netbooting (rarp, RPC, NFS), locating/loading the kernel from an UFS, LFS, ISO 9660 or tar-structured media, memory management and others.

  • libz: In-kernel decompression library for loading gzip compressed kernels.

stand:
This directory contains source for several standalone programs that aren't used by NetBSD currently.

lkm:
NetBSD supports loadable kernel modules, and the sources are in this directory. LKMs include a floppy driver for mac68k, various binary emulations, IPFilter logging and several filesystems.

net:
NetBSD's networking framework contains many routines that are independent of a special protocol, and that are used by several networking protocols/stacks. The components are included in this directory, functions include packet filtering (BPF), access routines for all hardware cards (ARCNET, ATM, Ethernet, FDDI, IEEE 802.11, PPP, Token Ring, etc.) that hand device access to drivers in the /sys/dev directory, routing code etc.

netatalk:
The code in this directory implements the kernel part of the AppleTalk protocol stack. The userland part is not included in NetBSD, it can be installed from pkgsrc/net/netatalk(-sun).

netccitt, netiso:
Not in widespread use these days, NetBSD compes with an ISO/OSI protocol stack which is located in these directories.

netinet:
Internet stuff - the NetBSD TCP/IP (v4) stack. Documentation on this is available in section 9 of the NetBSD manual pages as well as in Richard Steven's "TCP/IP Illustrated" books.

netinet6:
Internet, next generation - this directory contains the KAME IPv6 stack that is shipping with NetBSD. See http://www.kame.net/ for further information.

netkey:
Key management for IPSec - see the ipsec(4) manpage for more details.

netnatm:
The code in this directory implements native mode ATM to transport other protocols like IP.

netns:
NetBSD has support for the Xerox network service protocol, which can be found in this directory. Not in widespread use any more today, the protocol is described in the first edition of Richard Stevens' "TCP/IP Network Programming" book.

sys:
This directory contains only header files that get installed into /usr/include/sys.

uvm:
The code in this directory implements NetBSD's New Virtual Memory system that replaced the old Mach-based VM system some time ago. See the uvm(4) manpage for more information.

vm:
This directory has only the header files of the old Mach-based virtual memory system left, for use with various programs. The VM system itself is not used any longer.

arch:
Code specific to one hardware platform is collected under this directory. Directories are present for each port as well as for CPU-specific functions that are shared by several ports that use the same CPU, avoiding redundancy.

Port-specific directories contain several subdirectories, with the following ones being present for all ports:

  • conf: contains kernel config files, a list of files specific to the port and a template for the Makefile used to build a kernel

  • compile: This directory is initially empty. It gets populated by config(8) with directories that contain a Makefile and headerfiles to build a kernel.

  • <port>: Port-specific functions, CPU/MMU/CPU initialisation code, etc. - all the machine specific code that cannot be shared across various hardware architectures.

  • include: machine specific include files that describe the CPU and MMU layout, data formats used by the FPU, limits, etc.

  • stand: This directory contains sources for loading the kernel into the system - usually it contains code for bootblocks, secondary stage bootloaders, netboot miniroots and other facilities used to boot the system.

Further directories may exist in the arch specific directories that contain bus-specific/non-machine independent device drivers which don't fit into /sys/dev as they work on one port only. Ideally, a port only uses machine independent drivers, of course.

We have now described all the important directories that are available in the NetBSD source tree. To get used to the directory structure, it is recommented that you browse the directories and have a look at the various files to fully explore things.

Google
Web daemonnews.org

More Articles
  • Interview with Jan Schaumann
  • Interview with Theo de Raadt
  • Book Review: Virtualization with VMware ESX Server
  • Editorial: Not Quite Dead Yet
  • The Design of OpenBGPd
  • Interview with der Mouse
  • Letter to Steve Jobs
  • Interview with Manuel Bouyer on Xen
  • Apple and Open Source
  • BSDCan 2006
  • BSD Certification Survey Results
  • Lab in a Box
  • Ike Notes on BSDCan 2005
  • BSDCan 2005 Photos
  • FreeBSD Developer Summit Pictures

  • Advertisements




    Author maintains all copyrights on this article.
    Images and layout Copyright © 1998-2006 Dæmon News. All Rights Reserved.