DæmonNews: News and views for the BSD community

Daemon News Ezine BSD News BSD Mall BSD Support Forum BSD Advocacy BSD Updates

C BSD Run

Matthew Alton, Matthew.Alton@anheuser-busch.com

The immortal Isaac Asimov on at least one occasion responded to an obvious question with a seemingly paradoxical answer. Dr. Asimov, who held a Ph.D in biochemistry, was asked why, when he had written literally hundreds of expository essays and books on such diverse topics as theoretical physics, computer science, and psychology, he had not seen fit to write on his chosen field. He answered, "It is too difficult. I know too much about it." Strange, indeed. One might well assume that biochemistry would be the simplest subject in the world for Asimov to exposit. Alas, such is not the case. As anyone who has attempted such a feat can attest, it is damnably difficult. The act of writing popular material on one's principal area of expertise is fraught with travails, not the least of which may be termed the "terminology trap." The terminology trap is entered into by our dauntless author at about the time that, fresh from choosing a starting point from an imposingly vast array of alternatives by means of a series of coin tosses, he begins to attempt to explain some facet of his subject in sufficiently simple terms only to discover that his work is replete with arcane words and phrases. Each of these is so thoroughly etched upon our author's mind by the force of years long habit that he simply cannot notice them flowing onto the page. The truly insipid nature of the trap becomes evident as he begins to attempt to backtrack and define each of the offensive gobbets, only to discover yet again that he has used even more of them in the effort. It is easy to become discouraged in such straits. My present effort to restart my monthly column here at Daemon News has, to my palate, much the same flavor as Asimov's conundrum.

Asimov eventually managed to write some fine popular pieces on biochemistry in spite of the difficulty. Perhaps, encouraged by his example, I may meet with some success here as well. After all, it's not as though my challenge is on the same order of magnitude. I am only a professional UNIX systems administrator specializing in systems programming -- hardly a Ph.D. Also, I do not intend to write strictly popular columns. A certain amount of computer science knowledge is assumed of the reader, along with a proficiency in the C programming language and a familiarity with the UNIX operating system at least insofar as it functions as a development environment. A respectable tolerance for florid prose reminiscent of the Victorian Era is, as you have already deduced, oddly expected as well. My brain constructs long, breathless sentences full of commas and subclauses of its own accord. I hope that they will suit.

To business. I intend, rather than merely to explain portions of existing code which I already fully understand and which require little if anything in the way of bazaar-style evolution, to assist in creating something entirely new. I say "assist" because I am hoping for the help of my readers. In the process I hope to learn constantly. The fruits of our labors will be a library of useful subroutines and a collection of utilities suitable for use in a production environment. This software will not be a mere reimplementation of existing capabilities, but will serve to fill existing gaps in open source functionality. We will "scratch new itches" as the phrase turns. We will also form a collection of preferred practices and methods gathered from all useful sources without regard to non-technical considerations. We should constantly evaluate these methods and hone them by means of testing and discussion. We will also maintain a body of resources useful to UNIX/C programming and to software engineering in general.

To begin with, I submit for your perusal an example of the sort of programming ubiquitous in the UNIX world, the quick-and-dirty utility designed to scratch an immediate and irritating itch.

First the itch. I quite often write programs which require to read configuration files of the ASCII text variety plentiful in UNIXland. I therefore quite often write code to open, read, parse and close these files -- precisely once too often, as a matter of fact. And so it is that I have taken it upon myself to write, once and for all, a library of subroutines designed to handle, with the enormous efficiency and robustness for which good library code is justly noted, not only my immediate needs but those constituting a reasonably general case. The software should be able to parse text formats more sophisticated, and commensurately more powerful, than the mundane one-line-per-entry terse, columnar variety such as, say, /etc/services. The reason for this is that, especially as more of my code morphs to the multithreaded variety, far more fields are needed than are accommodated in an 80-column text display. I have chosen a format based on the "paragraph" style used throughout IBM's AIX operating system. Basically, the format is similar to the monopolyware .ini-style without the useless brackets. An example of a well-formatted configuration file conforming to my specification is here. As far as the parser is concerned, there are three type of lines in a file: 1) white noise such as blank lines, lines consisting of spaces and tabs only, and comment lines; 2) stanza labels which necessarily begin in column one and end with a colon with trailing whitespace ignored; 3) entries, logically grouped with the closest previous valid stanza label, consisting of a name and a value separated by one or more spaces or tabs, exactly one equal sign, and another one or more spaces or tabs. White noise is summarily ignored without any effect on parser state. Stanza labels serve to delimit and to name stanzas. Entries form the stanza bodies and are stanza attributes. It is worth noting that we do not allow trailing comments, i.e. those occurring after stanza labels and entries and before the newline character. Allowing for trailing comments considerably complicates parsing without compensating benefit. Lines containing entries must have a space or tab character in the first column.

My method of software development is quite informal and, I think, quite typical. I first kludge up a funny looking prototype in the form of a program which seems to do the job. I take pains even at this early stage, however, to observe at least the major niceties such as consistent coding style and correct buffer handling. We should try to deny bad habits a chance of taking even shallow root. My quick and dirty stanza parser is here. I called the utility "stzck" out of a healthy respect for the UNIX tradition of terseness and in honor of the venerable "fsck" utility which checks filesystems for consistency. Stzck, unlike fsck, does not offer to repair discovered errors.

On to the code. Lines 5-7 are a common practice used often in BSD code. We want to make the RCS ident tags available to us while in a debugger and visible to such utilities as strings(1), but we would like to avoid complaints from static code evaluators like lint(1) about unused variables. Fortunately, lint defines the cpp macro LINT during its execution so that the #if wrappers work to eliminate the problem. Note that "#ifdef", though in heavy use, is non-portable. Lines 9-23 are a fairly standard and self-explanatory comment block. Note the convention of placing the function name left-justified on a new line so that a quick egrep(1) of "^foo()" from *.c in a source directory will quickly turn up the file containing the function body. Without this, we can only grep(1) foo() and come up with external declarations, invocations, and everything else -- quite a mess in some cases. There are some nasty things in this code including a so-called "magic number" at line 31. The "128" line buffer size is utterly arbitrary, being the largest power of two not smaller than the estimated reasonable case -- 80 (columns) here. This number should certainly be factored out to a header #define, or, better still a global enum so that we can print its value in a debugger. The real problem with magic numbers is that they manifest in evil ways such as the reappearance on line 187, this time as "127" owing to the fact that I've allowed room for the NULL string delimiter. All this will have to be cleaned up. Overall, though, not a bad effort for a straight-through write-up. I would be surprised, however, if there were not at least a few bugs lurking in there somewhere. I've only tested this code very lightly.

This code builds cleanly on FreeBSD 4.3 and Slackware Linux 8.0. You are heartily encouraged to compile and test this code. I am interested in any and all comments. Bug reports, indentation style comments, optimizations, complete redesigns, rewrites, all are welcome. I would also very much appreciate input on the program specification itself. I have erred exclusively on the side of permissiveness. Null stanza labels, entry names and values are permitted, for instance. Should the design be more restrictive? Why?

In future columns, I hope to add routines to handle such things as daemonizing, error reporting and network connections as part of a Daemon News software library constructed of, by and for the readers of Daemon News. All reader input of any appreciable import to this project will be properly accredited both in the column and in the source code as appropriate. Help make the Daemon News Library all that it can be. Next month, we'll explore methods of turning the stanza parser into a library with opaque API. Until then, happy hacking!

Google
Web daemonnews.org

More Articles
  • Interview with Jan Schaumann
  • Interview with Theo de Raadt
  • Book Review: Virtualization with VMware ESX Server
  • Editorial: Not Quite Dead Yet
  • The Design of OpenBGPd
  • Interview with der Mouse
  • Letter to Steve Jobs
  • Interview with Manuel Bouyer on Xen
  • Apple and Open Source
  • BSDCan 2006
  • BSD Certification Survey Results
  • Lab in a Box
  • Ike Notes on BSDCan 2005
  • BSDCan 2005 Photos
  • FreeBSD Developer Summit Pictures

  • Advertisements




    Author maintains all copyrights on this article.
    Images and layout Copyright © 1998-2006 Dæmon News. All Rights Reserved.