The purpose of this document is to introduce the basics of programming
and developing KLDs under the FreeBSD operating system. Using the
"learn by example" method, I hope to share with you skeleton code so that
you, as the reader, may be able to learn what makes up KLD code at the
simplest level.
[ As a quick note, for those who are not familiar with what the dynamic
kernel linker facility is... KLDs replaced Loadable Kernel Modules (LKMs)
in FreeBSD 3.1, as the interface "allows the system administrator to
dynamically add and remove functionality from a running system. This
ability also helps software developers to develop new parts of the
kernel without constantly rebooting to test their changes" [ref. 1].
This is seen a great deal in those developing drivers, especially for
network cards. One thing to note is that KLDs are in and of itself
LKMs. The term KLD is being used to make it understood that changes
were made to the kernel module subsystem.
Moreover, here's a long quote from Peter Wemm on IRC discussing
the FreeBSD KLD vs. LKM issue a bit more in depth (excuse the
brokenness of the quote... it is IRC after all ;):
- The LKM system used a userland linker to push pre-relocated
binary data into the kernel.
- The KLD system does the relocation itself in the kernel.
LKMs had special data structures that the lkm driver knew
about and used those to wire it into the kernel; eg., the
VFS lkm had a structure that pointed to the vfs tables.
- LKMs were single purpose and were quite difficult to change
from LKM to actual kernel code.
- With KLDs, things were made to be more generic (generic files,
modules). A file could contain 0 or more modules.
- Each module is self-contained and self-initializing and registering.
- KLDs and kernel code are compiled the same.
- It's possible to take a piece of the kernel and easily make it a
KLD without much difficulty.
- The dependencies and versioning are now at the module level.
This tutorial is directed towards those who are interested in learning the
basics of writing KLD code. Recommended prerequisites are solely some
detailed knowledge of the FreeBSD kernel as well as the ability to program
K&R C. As an important note, the examples in this tutorial were
intended for use on FreeBSD 4.0 (Actually developed on -STABLE).
The following topics will be covered:
Characteristics common to all KLDs.
KLD syscall implementation skeleton.
KLD character device implementation skeleton.
The goal of this text is to help those who are familiar with KLDs gain
the ability to understand what goes into writing a simple example.
Therefore, the extended goal is for those who learn to program KLDs
to be able to go forth and utilize the KLD interface functionality
for higher purposes.
Characteristics common to all KLDs
There are two main functions/macros that must be included in all
KLDs; they are:
Load handler function.
DECLARE_MODULE macro.
Easy compile via Makefile
Basically, the load handler function is, as it states, a function that
handles the loading and unloading of a KLD. Hence, when a KLD is
kldloaded or kldunloaded, this handler is what, at a very simplistic
level, gets called. The following is a snippet of code that shows a
simple load handler:
static int
load_handler(module_t mod, int what, void *arg)
{
int err = 0;
switch (what) {
case MOD_LOAD:
uprintf("KLD loaded successfully!\n");
break;
case MOD_UNLOAD:
uprintf("KLD unloaded successfully!\n");
break;
default:
err = EINVAL;
break;
}
return(err);
}
This load handler fits the the function pointer defined in
/usr/include/sys/module.h:
typedef int (*modeventhand_t)(module_t mod, int what, void *arg);
The 'module_t mod' structure is just a pointer to the module structure.
This structure is part of a linked list of currently loaded modules.
It contains links to the other modules loaded, KLD ID number and other
such useful information.
The 'int what' is going to be from modeventtype_t (enum modeventtype)
which will be one of the following:
MOD_LOAD : Set when module is loaded (kldload).
MOD_UNLOAD : Set when module is unloaded (kldunload).
MOD_SHUTDOWN : Set on shutdown.
The DECLARE_MODULE macro is also something that is basic to all KLDs.
However, it is not always seen as DECLARE_MODULE. There are a couple of
macros which can be used instead to more easily declare the module as a
certain type. DECLARE_MODULE is itself a macro:
#define DECLARE_MODULE(name, data, sub, order) \
SYSINIT(name##module, sub, order, module_register_init, &data) \
struct __hack
which is defined in /usr/include/sys/module.h. Now let us go through and see
what each parameter is...
name : |
The generic module name, this will be used further
down in the SYSINIT call. |
|
data : |
A pointer to the moduledata structure is filled
then passed as the data field. This structure
contains two main items:
char *name:
The official module name, which
will be used in the module structure.
modeventhand_t evhand:
This is our load handler
function pointer, therefore, this field
gets filled with the name of our load
handler function.
|
| sub : |
This is an argument more directed at the SYSINIT
macro. The valued entries for this can be found
in /usr/include/sys/kernel.h in the system_sub_id
enumeration list. These are known types for system
startup interfaces. For example, the SI_SUB_DRIVERS
type is used when developing a KLD that is used
as a device, as well as for other purposes.
|
| order : |
This is another argument that is intended for the
later calling of SYSINIT. It represents the KLDs
order of initialization within the subsystem.
Valid values for this field can be found in
/usr/include/sys/kernel.h in the sysinit_elem_order
enumeration. |
There are, however, two other *_MODULE macros which are very useful.
They are SYSCALL_MODULE and DEV_MODULE, both of which wrap the
DECLARE_MODULE macro. They are designed to be better suited for 1) writing
a syscall or device module as well as 2) better for viewing module code.
It's just that much easier for one to look and note that the code is for a
syscall if I see the SYSCALL_MODULE being used.
The SYSCALL_MODULE macro, defined in /usr/include/sys/sysent.h gets passed
the following parameters:
| name : |
This is used in the same manner as in DECLARE_MODULE |
| offset : |
Meant to hold the syscall number value. Usually
when one is writing a syscall via KLD, there is no
reserved syscall number for it. In this case, the
correct value to set this parameter to is
NO_SYSCALL. This will then tell the subsystem to
find the next available syscall number value, and
assign it to our new syscall.
|
| new_sysent : |
The defined sysent structure for the new system
call. |
| evh : |
The modeventhand_t load handler (as seen in the
module_data structure above). |
| arg : |
Used in the syscall_module_data structure. This
parameter is usually set to NULL.
|
The DEV_MODULE macro, defined in /usr/include/sys/conf.h gets passed the
following parameters:
| name : |
Used in the same way as in the two previous *_MODULE
macros. |
| evh : |
The modeventhand_t load handler. |
| arg : |
Used in module_data structure... Usually set to NULL. |
Depending on how your module was developed you will have at least one
load handler and at least one *_MODULE macro. In this tutorial, I will not
discuss situations in which more than one load handler and DECLARE_MODULE
are needed, however in [ref. 2], they discuss this situation quite clearly.
They also provide some more in depth examples of modules which I find very
neat and helpful to those who are interested in seeing more examples.
A very neat piece of the Makefile functionality is the ".include"
command. Basically, we can have a generic Makefile, that has a
predefined set of variables. We can set these variables in our Makefile,
then just simply call the generic Makefile. This allows us to not
have to worry about writing a Makefile for compiling our KLDs. The
".include"-able Makefiles are located in /usr/share/mk.
The ".include" we are interested in, however, is [ref. 3]. I suggest
looking through the comments at the top of this Makefile before writing
yours. A few of the key variables that you may set are:
| SRCS : |
Listing of sources. |
| KMOD : |
Name of module to build. |
These are just a couple of a handful of useful variables. Examples of
using a Makefile with a ".include" can be seen in the skeleton pieces
of code on the next page My suggestion is to make use of them. There is no
real sense in reinventing the wheel, especially for something as trivial
as a Makefile.
KLD Syscall Implementation Skeleton
The following is a very generic example of how to create and add a
syscall via the dynamic kernel linker interface. There are a few important
pieces to creating a syscall that are, again, basically generic to all
modules (that add a syscall). There are four main parts, besides the
load handler and the DECLARE_MODULE macro that must be fulfilled:
Declaring the syscallname_args structure.
A function that is static and returns int that will be the syscall.
Filling the sysent structure according to our syscall.
Setting our 'offset' variable to NO_SYSCALL.
For all syscalls, the parameter list seen in the kernel code is as follows:
struct proc *
struct syscallname_args *
The parameters that one would pass the syscall from userland are defined
in the syscallname_args structure. The reason why we can call the
syscall by these parameters and not pass it a pointer to a proc structure
and a pointer to the arguments structure is because of libc's work.
Since we are dynamically adding a syscall, and are not adding the calling
functionality to libc, we must use syscall(2) to call our new syscall.
This will be explained a bit more as the example grows.
For this example, we will have the following syscall arguments:
struct sc_example_args {
char *str;
int val;
};
I am including an integer and a pointer to a character string so that
we may see how both are used (ie. user->kernel lands and vice versa).
The following is the example syscall:
static int
sc_example(struct proc *p, struct sc_example_args *uap)
{
char kstr[1024+1]; /* Holds kernel land copy of uap->str */
int err = 0; /* Generic return(err) */
int size = 0;
/*
* _IMPORTANT_:
*
* When one has a contiguous set of data and wish to copy this from
* userland to kernel land (or vice versa) the copy(9) functions
* are recommended for doing this.
*/
/*
* Copy the string located at the user land address uap->str to
* the kernel land address of &kstr.
*/
err = copyinstr(uap->str, &kstr, 1024, &size);
if (err == EFAULT)
return(err);
/*
* Print out the values we have gathered.
*
* uprintf() is a kernel land function that acts like printf().
* When using the printf() in kernel land, it uses the dmesg
* facility.. uprintf() on the other hand will output directly to
* the currently used tty.
*/
uprintf("The string passed was: %s\n", kstr);
uprintf("The value passed was: %d\n", uap->val);
return(0);
}
This function just takes the parameters passed to it (a character string
and a integer) and displays them to the currently being used tty (the
terminal that is running the program that called the syscall).
The next thing we do in our code is fill in a sysent structure for our
system call. The sysent structure, defined in /usr/include/sys/sysent.h,
is the following:
struct sysent {
int sy_narg;
sy_call_t *sy_call;
};
There is a sysent structure defined for each system call. 'int sy_narg'
is the variable that defines how many parameters are passed to the
system call being defined. In the case of our skeleton code, we have 2
parameters being passed: char *str and int val. Therefore, we will set
sy_narg to 2. 'sy_call_t *sy_call' is a function pointer to our static
int system call. sy_call_t, defined in /usr/include/sys/sysent.h, is actually
the following:
typedef int sy_call_t __P((struct proc *, void *));
So, in our code we will have the following:
static struct sysent sc_example_sysent = {
2, /* Number of parameters for our system call. */
sc_example /* A function pointer to our new system call. */
};
Now if you remember from above, there is a parameter passed to the
SYSCALL_MODULE macro called the offset. This parameter was meant to hold
the value that will be the calling value of the syscall being declared.
However, as mentioned above, when we are usually creating a system call and
implementing it via something dynamic like a KLD, it is usually not good
practice to actually assign a designated slot value. What one _should_ do
is set the offset value to NO_SYSCALL. This says: "find next available
system calling value." Now, we could just pass NO_SYSCALL to the macro
and that be the end of it. However, it is better practice to pass a
static int variable that is set to NO_SYSCALL, so that we may, when
we load our module, be able to find out what this variable is set to.
The macro (not _really_ in the macro) changes the value of the variable
holding NO_SYSCALL to the available system call calling value that we
have been assigned. Quick note, the list of already implemented system
calls and their calling values are in /usr/include/sys/syscall.h. So,
our next line of code should simply look like:
static int syscall_num = NO_SYSCALL;
One should note that NO_SYSCALL is defined in /usr/include/sys/sysent.h and
is actually the value -1.
We have already completed the necessary parts for implementing a system
call, therefore, all we have left to do is write our load handler and
call the SYSCALL_MODULE macro.
static int
load_handler(struct module *m, int what, void *arg)
{
int err = 0;
switch (what) {
case MOD_LOAD:
/* Print out syscall_num so we know the value to call */
printf("System call loaded at slot: %d\n", syscall_num);
break;
case MOD_UNLOAD:
printf("System call unloaded from slot: %d\n", syscall_num);
break;
default:
err = EINVAL;
break;
}
return(err);
}
SYSCALL_MODULE(sc_example,
&syscall_num,
&sc_example_sysent,
load_handler,
NULL);
Now our skeleton is complete (code without textual comments can be found
at [ref. 4]) and is ready for compile. Our Makefile should simply be:
SRCS=sc_example.c
KMOD=sc_example
.include <bsd.kmod.mk>
Just type `make` and after it compiles, as root, just do:
`kldload ./sc_example.ko`. This will output the following (as an
example):
System call loaded at slot: 210
What is not discussed here is how to call these syscalls after they
have been loaded. However, in the code provided by [ref. 4], there are
a couple of examples on different ways to accomplish the calling of the
function. It is also recommended that you read syscall(2).
Character Device KLD Implementation Skeleton
A useful type of device on most any UNIX system is a
character device. These are used, not really to represent
a physical device, but rather to act as an interface to
either read, write, set flags, etc on something specific
to the kernel. For example, we could set up a character
device so that we may read data from this device related
to network traffic. Like the system call implementation
discussion, this will discuss, step-by-step how to program
a generic KLD that creates a character device that has some
minor functionality. Hopefully, you will come to grasp
that it is not too hard to create a character device
and that they are quite useful.
The following things are usually common to all character
device implementations:
Prototype device functions.
Declaring the cdevsw structure.
Functions that follow the declared cdevsw structure.
Load handler and DEV_MODULE macro.
These are what make up the basic skeleton code that will be
shown below. Remember, to get the actual code without
textual comments, please go to [ref. 4]. Compiling, running,
and seeing that the code works helps out a great deal ;)
The functions we are prototyping here are to be the only
calls available to directly access our new device. They will
be included in the cdevsw structure below. For our example, we
will have three functions: example_open, example_close, and
example_write.
d_open_t example_open;
d_close_t example_close;
d_read_t example_read;
d_write_t example_write;
The cdevsw structure defines a great deal of things
regarding the character device we are implementing.
The structure, (aka the character device switch table)
as defined in /usr/include/sys/conf.h, is as follows:
struct cdevsw {
d_open_t *d_open; /* Func. pointer to dev open function */
d_close_t *d_close; /* Func. pointer to dev close function */
d_read_t *d_read; /* Func. pointer to dev read function */
d_write_t *d_write; /* Func. pointer to dev write function */
d_ioctl_t *d_ioctl; /* Func. pointer to dev ioctl function */
d_poll_t *d_poll; /* Func. pointer to dev poll function */
d_mmap_t *d_mmap; /* Func. pointer to dev mmap function */
d_strategy_t *d_strategy; /* Func. pointer to dev strategy func. */
const char *d_name; /* Device name in /dev */
int d_maj; /* Device major value */
d_dump_t *d_dump; /* Func. pointer to dev dump function */
d_psize_t *d_psize; /* Func. pointer to dev psize function */
u_int d_flags; /* D_TAPE, D_DISK, D_TTY, D_MEM */
int d_bmaj; /* Block Device major value (used by D_DISK) */
};
Not all the function pointers have to be defined. Why
is this? What if you wanted a write-only device? Not
only set the file permissions, but when declaring the
cdevsw structure, omit a d_read value by declaring it
as: noread. In our example, we will just allow for d_open,
d_close, and d_write functions so that we may just simplify
our discussion. Our cdevsw structure looks like:
static struct cdevsw example_cdevsw = {
example_open,
example_close,
example_read,
example_write,
noioctl,
nopoll,
nommap,
nostrategy,
"example",
33, /* /usr/src/sys/conf/majors */
nodump,
nopsize,
D_TTY,
-1
};
So, as you can tell from the "no*" declarations, we will only be having
functions for d_open_t, d_close_t, d_read_t and d_write_t. For another
example character device code, please refer to [ref. 5]. This example
will provide example functions for d_open_t, d_close_t, d_read_t, and
d_write_t. Also, please note the use for 33 for the major value. 33
is one of the majors reserved for example uses. Please check in
/usr/src/sys/conf/majors for other examples majors as well as those
reserved for real purposes.
The idea behind this example is to show some interaction with the
device driver. Therefore the flow of the code that this driver is
aimed at is as follows:
open(2) -> write(2) -> read(2) -> close(2).
We will first open the device in the /dev/ directory; then we will
write a small string via the write(2) call. This string we write
to the device will be stored in a static buffer, and later will be
accessible via the read(2) call. Finally, we will close(2) our
open()'d device so that we may no longer make read or write calls on
it.
/* Stores string recv'd by _write() */
static char buf[512+1];
static int len;
/*
* Used as the variable that is the reference to our device
* in devfs... we must keep this variable sane until we
* call kldunload.
*/
static dev_t sdev;
/*
* This open function solely checks for open(2) flags. We are only
* allowing for the flags to be O_RDWR for the purpose of showing
* how one could only allow a read-only device, for example.
*/
int
example_open(dev_t dev, int oflags, int devtype, struct proc *p)
{
int err = 0;
memset(&buf, '\0', 513);
len = 0;
uprintf("Opened device \"example\" successfully.\n");
return(err);
}
/*
* Simply "closes" our device that was opened with example_open.
*/
int
example_close(dev_t dev, int fflag, int devtype, struct proc *p)
{
memset(&buf, '\0', 513);
len = 0;
uprintf("Closing device \"example.\"\n");
return(0);
}
/*
* The read function just takes the buf that was saved
* via example_write() and returns it to userland for
* accessing.
*/
int
example_read(dev_t dev, struct uio *uio, int ioflag)
{
int err = 0;
if (len <= 0) {
err = -1;
} else { /* copy buf to userland */
err = copystr(&buf, uio->uio_iov->iov_base, 513, &len);
}
return(err);
}
/*
* example_write takes in a character string and saves it
* to buf for later accessing.
*/
int
example_write(dev_t dev, struct uio *uio, int ioflag)
{
int err = 0;
err = copyinstr(uio->uio_iov->iov_base, &buf, 512, &len);
if (err != 0) {
uprintf("Write to \"example\" failed.\n");
}
return(err);
}
So, now as you can see, implementing simple character device driver
code is fairly easy to do. It's a nifty way of passing information
in and out of kernel land when there is more to do than what a
sysctl can offer.
Below is our code for the function that handles the loading and unloading
of our actual KLD. For device drivers, we must do one thing specific
to load and unload. On MOD_LOAD, we must register our device with
devfs using make_dev. devfs is the Device File System which provides
access to the device namespace in the FreeBSD kernel. And on MOD_UNLOAD,
we must call destroy_dev, using the dev_t variable that was returned
from make_dev as the sole parameter.
/*
* chardev_example_load()
*
* This is used as the function that handles what is to occur
* when the KLD binary is loaded and unloaded via the kldload
* and kldunload programs.
*/
static int
chardev_example_load(struct module *m, int what, void *arg)
{
int err = 0;
switch (what) {
case MOD_LOAD: /* kldload */
sdev = make_dev(&example_cdevsw, /* explained below */
0,
UID_ROOT,
GID_WHEEL,
0600,
"example");
printf("Example device loaded.\n");
break;
case MOD_UNLOAD:
destroy_dev(sdev); /* explained below */
printf("Example device unloaded.\n");
break;
default:
err = EINVAL;
break;
}
return(err);
}
As with any KLD, we must have a *_MODULE macro that basically says which
function is our load handler and a name for our kld for reference purposes.
DEV_MODULE(chardev_example, chardev_example_load, NULL);
Now our very simple character device skeleton is complete. All that
must be done is a Makefile to be created and actually create the file
for our device in the /dev directory. This is very easy to do:
# cd /dev
# mknod example c 33 0
# ls -al | grep example
crw-r--r-- 1 root wheel 33, 0 Aug 14 04:40 example
#
Now, after kldload'ing, the open(), close(), read() and write() calls will
work on /dev/example. However, remember to close() the device you are
working with prior to kldunload'ing... or else ;)
Please see the example code in [ref. 4] for compilable code to play with.
Conclusion
This concludes the introduction to the FreeBSD KLDs coding system.
As stated in the introduction, it was meant to be fairly brief, yet
inform those who wish to write KLDs who currently can not. The paper
produced by THC [ref. 2] is an excellent place to look for more in
depth KLDs, with a black hat touch. Also, for more examples, please
look at the examples in [ref. 4], there are others there than the
ones explained in this tutorial.
Contact
Feel free to contact the author regarding _any_ piece of this
tutorial.
E-Mail: awr@blackops.org
References
- kld(4) man page.
[Much help]
- THC's FreeBSD Kernel Attack paper.
[Good place for taking your white hat and turning it black.]
- /usr/share/mk/*
[Key for any Makefile creation under FreeBSD]
- Example code from tutorial
[Location of the examples plus more code.]
- /usr/share/example/kld/cdev/
[Old example in FreeBSD tree]
Recognition
Peter Wemm - Discussion regarding LKM->KLD changes + long quote [intro]
Eivind Eklund - style(9) harassment.
Daniel O'Connor - Random comments.