DæmonNews: News and views for the BSD community

Daemon News Ezine BSD News BSD Mall BSD Support Forum BSD Advocacy BSD Updates

Dynamic Kernel Linker (KLD) Facility Programming Tutorial [Intro]

by Andrew Reiter <arr@watson.org>

Overview

The purpose of this document is to introduce the basics of programming and developing KLDs under the FreeBSD operating system. Using the "learn by example" method, I hope to share with you skeleton code so that you, as the reader, may be able to learn what makes up KLD code at the simplest level.

[ As a quick note, for those who are not familiar with what the dynamic kernel linker facility is... KLDs replaced Loadable Kernel Modules (LKMs) in FreeBSD 3.1, as the interface "allows the system administrator to dynamically add and remove functionality from a running system. This ability also helps software developers to develop new parts of the kernel without constantly rebooting to test their changes" [ref. 1]. This is seen a great deal in those developing drivers, especially for network cards. One thing to note is that KLDs are in and of itself LKMs. The term KLD is being used to make it understood that changes were made to the kernel module subsystem.

Moreover, here's a long quote from Peter Wemm on IRC discussing the FreeBSD KLD vs. LKM issue a bit more in depth (excuse the brokenness of the quote... it is IRC after all ;):

  • The LKM system used a userland linker to push pre-relocated binary data into the kernel.
  • The KLD system does the relocation itself in the kernel. LKMs had special data structures that the lkm driver knew about and used those to wire it into the kernel; eg., the VFS lkm had a structure that pointed to the vfs tables.
  • LKMs were single purpose and were quite difficult to change from LKM to actual kernel code.
  • With KLDs, things were made to be more generic (generic files, modules). A file could contain 0 or more modules.
  • Each module is self-contained and self-initializing and registering.
  • KLDs and kernel code are compiled the same.
  • It's possible to take a piece of the kernel and easily make it a KLD without much difficulty.
  • The dependencies and versioning are now at the module level.

This tutorial is directed towards those who are interested in learning the basics of writing KLD code. Recommended prerequisites are solely some detailed knowledge of the FreeBSD kernel as well as the ability to program K&R C. As an important note, the examples in this tutorial were intended for use on FreeBSD 4.0 (Actually developed on -STABLE).

The following topics will be covered:

  • Characteristics common to all KLDs.
  • KLD syscall implementation skeleton.
  • KLD character device implementation skeleton.
  • The goal of this text is to help those who are familiar with KLDs gain the ability to understand what goes into writing a simple example. Therefore, the extended goal is for those who learn to program KLDs to be able to go forth and utilize the KLD interface functionality for higher purposes.


    Characteristics common to all KLDs

    There are two main functions/macros that must be included in all KLDs; they are:

  • Load handler function.
  • DECLARE_MODULE macro.
  • Easy compile via Makefile
  • Basically, the load handler function is, as it states, a function that handles the loading and unloading of a KLD. Hence, when a KLD is kldloaded or kldunloaded, this handler is what, at a very simplistic level, gets called. The following is a snippet of code that shows a simple load handler:

    static int
    load_handler(module_t mod, int what, void *arg)
    {
    	int err = 0;
    
     	switch (what) {
     	case MOD_LOAD:
          		uprintf("KLD loaded successfully!\n");
          		break;
        	case MOD_UNLOAD:
          		uprintf("KLD unloaded successfully!\n");
          		break;
        	default:		
          		err = EINVAL;
          		break;
      	}
      	return(err);
    }
    

    This load handler fits the the function pointer defined in /usr/include/sys/module.h:

     typedef int (*modeventhand_t)(module_t mod, int what, void *arg);
    

    The 'module_t mod' structure is just a pointer to the module structure. This structure is part of a linked list of currently loaded modules. It contains links to the other modules loaded, KLD ID number and other such useful information.

    The 'int what' is going to be from modeventtype_t (enum modeventtype) which will be one of the following:

    MOD_LOAD     : Set when module is loaded (kldload).
    MOD_UNLOAD   : Set when module is unloaded (kldunload).
    MOD_SHUTDOWN : Set on shutdown.

    The DECLARE_MODULE macro is also something that is basic to all KLDs. However, it is not always seen as DECLARE_MODULE. There are a couple of macros which can be used instead to more easily declare the module as a certain type. DECLARE_MODULE is itself a macro:

    
      #define DECLARE_MODULE(name, data, sub, order) \
        SYSINIT(name##module, sub, order, module_register_init, &data) \
        struct __hack
    

    which is defined in /usr/include/sys/module.h. Now let us go through and see what each parameter is...

    name :

    The generic module name, this will be used further down in the SYSINIT call.

    data :

    A pointer to the moduledata structure is filled then passed as the data field. This structure contains two main items:

  • char *name:
    The official module name, which will be used in the module structure.
  • modeventhand_t evhand:
    This is our load handler function pointer, therefore, this field gets filled with the name of our load handler function.
  • sub :

    This is an argument more directed at the SYSINIT macro. The valued entries for this can be found in /usr/include/sys/kernel.h in the system_sub_id enumeration list. These are known types for system startup interfaces. For example, the SI_SUB_DRIVERS type is used when developing a KLD that is used as a device, as well as for other purposes.

    order :

    This is another argument that is intended for the later calling of SYSINIT. It represents the KLDs order of initialization within the subsystem. Valid values for this field can be found in /usr/include/sys/kernel.h in the sysinit_elem_order enumeration.

    There are, however, two other *_MODULE macros which are very useful. They are SYSCALL_MODULE and DEV_MODULE, both of which wrap the DECLARE_MODULE macro. They are designed to be better suited for 1) writing a syscall or device module as well as 2) better for viewing module code. It's just that much easier for one to look and note that the code is for a syscall if I see the SYSCALL_MODULE being used.

    The SYSCALL_MODULE macro, defined in /usr/include/sys/sysent.h gets passed the following parameters:
    name :

    This is used in the same manner as in DECLARE_MODULE

    offset :

    Meant to hold the syscall number value. Usually when one is writing a syscall via KLD, there is no reserved syscall number for it. In this case, the correct value to set this parameter to is NO_SYSCALL. This will then tell the subsystem to find the next available syscall number value, and assign it to our new syscall.

    new_sysent :

    The defined sysent structure for the new system call.

    evh :

    The modeventhand_t load handler (as seen in the module_data structure above).

    arg :

    Used in the syscall_module_data structure. This parameter is usually set to NULL.

    The DEV_MODULE macro, defined in /usr/include/sys/conf.h gets passed the following parameters:
    name :

    Used in the same way as in the two previous *_MODULE macros.

    evh :

    The modeventhand_t load handler.

    arg : Used in module_data structure... Usually set to NULL.

    Depending on how your module was developed you will have at least one load handler and at least one *_MODULE macro. In this tutorial, I will not discuss situations in which more than one load handler and DECLARE_MODULE are needed, however in [ref. 2], they discuss this situation quite clearly. They also provide some more in depth examples of modules which I find very neat and helpful to those who are interested in seeing more examples.

    A very neat piece of the Makefile functionality is the ".include" command. Basically, we can have a generic Makefile, that has a predefined set of variables. We can set these variables in our Makefile, then just simply call the generic Makefile. This allows us to not have to worry about writing a Makefile for compiling our KLDs. The ".include"-able Makefiles are located in /usr/share/mk.

    The ".include" we are interested in, however, is [ref. 3]. I suggest looking through the comments at the top of this Makefile before writing yours. A few of the key variables that you may set are:
    SRCS :

    Listing of sources.

    KMOD :

    Name of module to build.

    These are just a couple of a handful of useful variables. Examples of using a Makefile with a ".include" can be seen in the skeleton pieces of code on the next page My suggestion is to make use of them. There is no real sense in reinventing the wheel, especially for something as trivial as a Makefile.


    KLD Syscall Implementation Skeleton

    The following is a very generic example of how to create and add a syscall via the dynamic kernel linker interface. There are a few important pieces to creating a syscall that are, again, basically generic to all modules (that add a syscall). There are four main parts, besides the load handler and the DECLARE_MODULE macro that must be fulfilled:

  • Declaring the syscallname_args structure.
  • A function that is static and returns int that will be the syscall.
  • Filling the sysent structure according to our syscall.
  • Setting our 'offset' variable to NO_SYSCALL.

    For all syscalls, the parameter list seen in the kernel code is as follows:

  • struct proc *
  • struct syscallname_args *
  • The parameters that one would pass the syscall from userland are defined in the syscallname_args structure. The reason why we can call the syscall by these parameters and not pass it a pointer to a proc structure and a pointer to the arguments structure is because of libc's work. Since we are dynamically adding a syscall, and are not adding the calling functionality to libc, we must use syscall(2) to call our new syscall. This will be explained a bit more as the example grows.

    For this example, we will have the following syscall arguments:

    
    struct sc_example_args {
      	char *str;
    	int val;
    };
    

    I am including an integer and a pointer to a character string so that we may see how both are used (ie. user->kernel lands and vice versa).

    The following is the example syscall:

    
    static int
    sc_example(struct proc *p, struct sc_example_args *uap)
    {
    	char kstr[1024+1];  	/* Holds kernel land copy of uap->str */
      	int err = 0;		/* Generic return(err) */
      	int size = 0;
      
    	/*
    	 * _IMPORTANT_:
    	 *
    	 * When one has a contiguous set of data and wish to copy this from
    	 * userland to kernel land (or vice versa) the copy(9) functions 
    	 * are recommended for doing this.
     	 */
    
    	/*
    	 * Copy the string located at the user land address uap->str to
    	 * the kernel land address of &kstr.
    	 */
    
    	err = copyinstr(uap->str, &kstr, 1024, &size);
      	if (err == EFAULT) 
      		return(err);
    
      	/*
       	 * Print out the values we have gathered.
       	 *
       	 * uprintf() is a kernel land function that acts like printf().
       	 * When using the printf() in kernel land, it uses the dmesg
       	 * facility.. uprintf() on the other hand will output directly to 
       	 * the currently used tty.
       	 */
     
      	uprintf("The string passed was: %s\n", kstr); 
      	uprintf("The value passed was: %d\n", uap->val);
      	return(0);
    }
    

    This function just takes the parameters passed to it (a character string and a integer) and displays them to the currently being used tty (the terminal that is running the program that called the syscall).

    The next thing we do in our code is fill in a sysent structure for our system call. The sysent structure, defined in /usr/include/sys/sysent.h, is the following:

    
    	struct sysent {
                  int sy_narg;
      	      sy_call_t *sy_call;
    	};
    

    There is a sysent structure defined for each system call. 'int sy_narg' is the variable that defines how many parameters are passed to the system call being defined. In the case of our skeleton code, we have 2 parameters being passed: char *str and int val. Therefore, we will set sy_narg to 2. 'sy_call_t *sy_call' is a function pointer to our static int system call. sy_call_t, defined in /usr/include/sys/sysent.h, is actually the following:

    
      typedef int sy_call_t __P((struct proc *, void *));
    

    So, in our code we will have the following:

    static struct sysent sc_example_sysent = { 2, /* Number of parameters for our system call. */ sc_example /* A function pointer to our new system call. */ };

    Now if you remember from above, there is a parameter passed to the SYSCALL_MODULE macro called the offset. This parameter was meant to hold the value that will be the calling value of the syscall being declared. However, as mentioned above, when we are usually creating a system call and implementing it via something dynamic like a KLD, it is usually not good practice to actually assign a designated slot value. What one _should_ do is set the offset value to NO_SYSCALL. This says: "find next available system calling value." Now, we could just pass NO_SYSCALL to the macro and that be the end of it. However, it is better practice to pass a static int variable that is set to NO_SYSCALL, so that we may, when we load our module, be able to find out what this variable is set to. The macro (not _really_ in the macro) changes the value of the variable holding NO_SYSCALL to the available system call calling value that we have been assigned. Quick note, the list of already implemented system calls and their calling values are in /usr/include/sys/syscall.h. So, our next line of code should simply look like:

    
      static int syscall_num = NO_SYSCALL;
    

    One should note that NO_SYSCALL is defined in /usr/include/sys/sysent.h and is actually the value -1.

    We have already completed the necessary parts for implementing a system call, therefore, all we have left to do is write our load handler and call the SYSCALL_MODULE macro.

    
      static int
      load_handler(struct module *m, int what, void *arg)
      {
              int err = 0;
        
              switch (what) {
              case MOD_LOAD:
                         /* Print out syscall_num so we know the value to call */
                      printf("System call loaded at slot: %d\n", syscall_num);
                      break;
              case MOD_UNLOAD:
    	          printf("System call unloaded from slot: %d\n", syscall_num);
                      break;
              default:
    	          err = EINVAL;
                      break;
        	  }
              return(err);
      }
    
      SYSCALL_MODULE(sc_example, 
    		  &syscall_num, 
    		  &sc_example_sysent, 
                      load_handler,
    		  NULL);
    

    Now our skeleton is complete (code without textual comments can be found at [ref. 4]) and is ready for compile. Our Makefile should simply be:

    SRCS=sc_example.c
    KMOD=sc_example

    .include <bsd.kmod.mk>

    Just type `make` and after it compiles, as root, just do: `kldload ./sc_example.ko`. This will output the following (as an example):

     System call loaded at slot: 210
    

    What is not discussed here is how to call these syscalls after they have been loaded. However, in the code provided by [ref. 4], there are a couple of examples on different ways to accomplish the calling of the function. It is also recommended that you read syscall(2).


    Character Device KLD Implementation Skeleton

    A useful type of device on most any UNIX system is a character device. These are used, not really to represent a physical device, but rather to act as an interface to either read, write, set flags, etc on something specific to the kernel. For example, we could set up a character device so that we may read data from this device related to network traffic. Like the system call implementation discussion, this will discuss, step-by-step how to program a generic KLD that creates a character device that has some minor functionality. Hopefully, you will come to grasp that it is not too hard to create a character device and that they are quite useful.

    The following things are usually common to all character device implementations:

  • Prototype device functions.
  • Declaring the cdevsw structure.
  • Functions that follow the declared cdevsw structure.
  • Load handler and DEV_MODULE macro.
  • These are what make up the basic skeleton code that will be shown below. Remember, to get the actual code without textual comments, please go to [ref. 4]. Compiling, running, and seeing that the code works helps out a great deal ;)

    The functions we are prototyping here are to be the only calls available to directly access our new device. They will be included in the cdevsw structure below. For our example, we will have three functions: example_open, example_close, and example_write.

    
      d_open_t 	example_open;
      d_close_t	example_close;
      d_read_t	example_read;
      d_write_t	example_write;
    

    The cdevsw structure defines a great deal of things regarding the character device we are implementing. The structure, (aka the character device switch table) as defined in /usr/include/sys/conf.h, is as follows:

    
      struct cdevsw {
              d_open_t *d_open;	     /* Func. pointer to dev open function */
              d_close_t *d_close; 	     /* Func. pointer to dev close function */ 
              d_read_t *d_read;	     /* Func. pointer to dev read function */
              d_write_t *d_write; 	     /* Func. pointer to dev write function */ 
              d_ioctl_t *d_ioctl;	     /* Func. pointer to dev ioctl function */
              d_poll_t *d_poll;	     /* Func. pointer to dev poll function */
              d_mmap_t *d_mmap;	     /* Func. pointer to dev mmap function */
              d_strategy_t *d_strategy;  /* Func. pointer to dev strategy func. */
              const char *d_name;	     /* Device name in /dev */
              int d_maj;		     /* Device major value */
              d_dump_t *d_dump;	     /* Func. pointer to dev dump function */
              d_psize_t *d_psize;	     /* Func. pointer to dev psize function */
              u_int d_flags;	     /* D_TAPE, D_DISK, D_TTY, D_MEM */
              int d_bmaj;	      /* Block Device major value (used by D_DISK) */
      };
    

    Not all the function pointers have to be defined. Why is this? What if you wanted a write-only device? Not only set the file permissions, but when declaring the cdevsw structure, omit a d_read value by declaring it as: noread. In our example, we will just allow for d_open, d_close, and d_write functions so that we may just simplify our discussion. Our cdevsw structure looks like:

    
      static struct cdevsw example_cdevsw = {
              example_open,
              example_close,
              example_read,
              example_write,
              noioctl,
              nopoll,
              nommap,
              nostrategy,
              "example",
              33,			/* /usr/src/sys/conf/majors */		
              nodump,
              nopsize,
              D_TTY,
              -1
      };
    
    

    So, as you can tell from the "no*" declarations, we will only be having functions for d_open_t, d_close_t, d_read_t and d_write_t. For another example character device code, please refer to [ref. 5]. This example will provide example functions for d_open_t, d_close_t, d_read_t, and d_write_t. Also, please note the use for 33 for the major value. 33 is one of the majors reserved for example uses. Please check in /usr/src/sys/conf/majors for other examples majors as well as those reserved for real purposes.

    The idea behind this example is to show some interaction with the device driver. Therefore the flow of the code that this driver is aimed at is as follows:

    
    	open(2) -> write(2) -> read(2) -> close(2).
    

    We will first open the device in the /dev/ directory; then we will write a small string via the write(2) call. This string we write to the device will be stored in a static buffer, and later will be accessible via the read(2) call. Finally, we will close(2) our open()'d device so that we may no longer make read or write calls on it.

    
      /* Stores string recv'd by _write() */
      static char buf[512+1];
      static int len;
    
      /* 
       * Used as the variable that is the reference to our device
       * in devfs... we must keep this variable sane until we 
       * call kldunload.
       */
      
      static dev_t sdev;
    
      /*
       * This open function solely checks for open(2) flags.  We are only 
       * allowing for the flags to be O_RDWR for the purpose of showing
       * how one could only allow a read-only device, for example.
       */
    
      int 
      example_open(dev_t dev, int oflags, int devtype, struct proc *p)
      {
              int err = 0;
    
              memset(&buf, '\0', 513);
              len = 0;
              uprintf("Opened device \"example\" successfully.\n");
              return(err);
      }
    
      /*
       * Simply "closes" our device that was opened with example_open.
       */
    
      int 
      example_close(dev_t dev, int fflag, int devtype, struct proc *p)
      {
              memset(&buf, '\0', 513);
              len = 0;
              uprintf("Closing device \"example.\"\n"); 
              return(0);
      } 
    
      /*
       * The read function just takes the buf that was saved 
       * via example_write() and returns it to userland for
       * accessing.
       */
    
      int
      example_read(dev_t dev, struct uio *uio, int ioflag)
      {
              int err = 0;
        
              if (len <= 0) {
          	        err = -1; 
              } else {		/* copy buf to userland */
                    err = copystr(&buf, uio->uio_iov->iov_base, 513, &len);
              }
              return(err);
      }
    
      /*
       * example_write takes in a character string and saves it
       * to buf for later accessing.
       */
      
      int
      example_write(dev_t dev, struct uio *uio, int ioflag)
      {
              int err = 0;
      
              err = copyinstr(uio->uio_iov->iov_base, &buf, 512, &len);
              if (err != 0) {
                    uprintf("Write to \"example\" failed.\n");
              }
              return(err);
      }
    

    So, now as you can see, implementing simple character device driver code is fairly easy to do. It's a nifty way of passing information in and out of kernel land when there is more to do than what a sysctl can offer.

    Below is our code for the function that handles the loading and unloading of our actual KLD. For device drivers, we must do one thing specific to load and unload. On MOD_LOAD, we must register our device with devfs using make_dev. devfs is the Device File System which provides access to the device namespace in the FreeBSD kernel. And on MOD_UNLOAD, we must call destroy_dev, using the dev_t variable that was returned from make_dev as the sole parameter.

     
      /*
       * chardev_example_load()
       *
       * This is used as the function that handles what is to occur
       * when the KLD binary is loaded and unloaded via the kldload
       * and kldunload programs.
       */
     
      static int
      chardev_example_load(struct module *m, int what, void *arg)
      {
              int err = 0;
        
              switch (what) {
              case MOD_LOAD:		/* kldload */
                      sdev = make_dev(&example_cdevsw,	  /* explained below */
    		  			0,
    					UID_ROOT,
    					GID_WHEEL,
    					0600,
    					"example");
                      printf("Example device loaded.\n");
                      break;
              case MOD_UNLOAD:
                      destroy_dev(sdev);		/* explained below */
    	          printf("Example device unloaded.\n");
                      break;
              default:
                      err = EINVAL;
    	          break;
              }
              return(err);
      }
    

    As with any KLD, we must have a *_MODULE macro that basically says which function is our load handler and a name for our kld for reference purposes.

    
      DEV_MODULE(chardev_example, chardev_example_load, NULL);
    

    Now our very simple character device skeleton is complete. All that must be done is a Makefile to be created and actually create the file for our device in the /dev directory. This is very easy to do:

    
     # cd /dev
     # mknod example c 33 0 
     # ls -al  | grep example
     crw-r--r--  1 root  wheel   33,   0 Aug 14 04:40 example
     #
    

    Now, after kldload'ing, the open(), close(), read() and write() calls will work on /dev/example. However, remember to close() the device you are working with prior to kldunload'ing... or else ;)

    Please see the example code in [ref. 4] for compilable code to play with.


    Conclusion

    This concludes the introduction to the FreeBSD KLDs coding system. As stated in the introduction, it was meant to be fairly brief, yet inform those who wish to write KLDs who currently can not. The paper produced by THC [ref. 2] is an excellent place to look for more in depth KLDs, with a black hat touch. Also, for more examples, please look at the examples in [ref. 4], there are others there than the ones explained in this tutorial.

    Contact

    Feel free to contact the author regarding _any_ piece of this tutorial.

    E-Mail: awr@blackops.org

    
    

    References

    1. kld(4) man page.
      [Much help]
    2. THC's FreeBSD Kernel Attack paper.
      [Good place for taking your white hat and turning it black.]
    3. /usr/share/mk/*
      [Key for any Makefile creation under FreeBSD]
    4. Example code from tutorial
      [Location of the examples plus more code.]
    5. /usr/share/example/kld/cdev/ [Old example in FreeBSD tree]

    Recognition

    Peter Wemm 	- Discussion regarding LKM->KLD changes + long quote [intro]
    Eivind Eklund 	- style(9) harassment.
    Daniel O'Connor - Random comments.
    
    Google
    Web daemonnews.org

    More Articles
  • Interview with Jan Schaumann
  • Interview with Theo de Raadt
  • Book Review: Virtualization with VMware ESX Server
  • Editorial: Not Quite Dead Yet
  • The Design of OpenBGPd
  • Interview with der Mouse
  • Letter to Steve Jobs
  • Interview with Manuel Bouyer on Xen
  • Apple and Open Source
  • BSDCan 2006
  • BSD Certification Survey Results
  • Lab in a Box
  • Ike Notes on BSDCan 2005
  • BSDCan 2005 Photos
  • FreeBSD Developer Summit Pictures

  • Advertisements




    Author maintains all copyrights on this article.
    Images and layout Copyright © 1998-2006 Dæmon News. All Rights Reserved.