DæmonNews: News and views for the BSD community

Daemon News Ezine BSD News BSD Mall BSD Support Forum BSD Advocacy BSD Updates

Trawling the Ports Collection

Greg Lehey <grog@lemis.com>

Last month we looked at how the Ports Collection can give you a more comfortable environment in which to work. One of the things I mentioned there was the issue of printed documentation for the ports. As I said then,

You still need to print it out, of course. That's straightforward enough if you have a PostScript printer, but otherwise you'll need ghostscript, which we'll look at some other time.

Let's start that other time now. This month we'll look at how to print some things with the aid of the Ports Collection. Owing to the sheer number of possibilities, we won't get done this time.

Document formats

The documentation you get nowadays can come in a number of formats. In rough order of increasing desirability, you might find:

  • Plain text. For small programs, it's bearable, but so easy to do better.

  • HTML gives a lot better markup than plain text, but it's also sensitive to browser settings, and it sometimes looks really bad. You need a web browser to read it, of course, but since you're reading this, you have obviously already solved that problem.

  • ``man pages''. Strictly speaking, man pages are a form of document in groff source form, specifically those to be formatted with the mdoc macros. They're okay up to a point, but are frequently too long to read online. When you print them out, they tend to be plain text. We'll look at how to do better below.

  • GNU info files. These are designed to be processed in two different ways: for viewing online (processed by makeinfo) and for printing (processed by TeX with the texinfo) macros). Compared to man pages they have the advantage of hyperlinks, but the format hasn't really caught on outside the GNU project.

  • Documents in groff source form. At a certain level, they're identical to man pages, but you don't have a convenient way to view them online. We'll look at them below as well.

  • Documents in TeX or LaTeX. These bear the same relationship to info that groff does to man pages. TeX is another can of worms, so I'll defer its discussion to another article.

  • Documents in SGML, nowadays usually DocBook. They are becoming increasingly popular, but formatting them still seems like pulling teeth. I'll defer this one as well.

  • Documents in PostScript. These can be very nicely formatted, but they're strictly read-only documents. It's possible to change PostScript if you really want, but it falls in the category of ``don't try this at home''. Printing PostScript is simple if you have a PostScript printer. Do you? Probably not: they're much more expensive than normal laser printers, and there's a good way around it that we'll look at in the section on Ghostscript.

  • PDF is a development of PostScript. The big difference for Microsoft users is that they can download the Acrobat Reader and read the documents. We can do that too; we'll discuss it below.

Formatting groff documents

Unlike most text formatting packages, groff is part of the base system, since it's needed for man pages. You can format a man page for printing pretty easily:

$ man bash | lpr

That will give you 5144 lines of plain text without page breaks, and even the primitive markup that man offers will get lost. It looks a lot better if you print in PostScript, which you can do by specifying the -t option:

$ man -t bash | lpr

What you get out of the printer depends on the printer. If it understands PostScript, you'll get a nicely printed man page. If it doesn't, you'll get something which starts like this:

%!PS-Adobe-3.0
%%Creator: groff version 1.17.2
%%CreationDate: Sun Apr 28 14:10:19 2002
%%DocumentNeededResources: font Times-Roman

If that's the case, we'll look at a solution in the section on Ghostscript.

If your groff source isn't a man page, you first need to know which macros the source file wants. You should be able to tell that by the name: by convention, the last few letters in the name of a groff source file are the name of the macro package it expects. For example, you'd format the file /usr/src/usr.bin/gprof/PSD.doc/abstract.me with the me macros:

$ nroff -me abstract.me 
gprof: a Call Graph Execution Profiler1

by Susan L. Graham Peter B. Kessler Marshall K. McKusick

Computer Science Division Electrical  Engineering  and  Com-
puter  Science Department University of California, Berkeley
Berkeley, California 94720
(etc)

Use nroff to create plain text and groff to create PostScript output. groff also has a large number of other knobs to tweak, but there's not space in this article to go into that. The good news is that the documentation for troff, which used to be subject to an AT&T license, is now free and should become available in the BSD distributions soon.

Ghostscript

One of the problems with most formatting programs is that they produce output in PostScript. At the very least you want to look at them before you print them, so one way or another you're going to need ghostscript.

There are a number of things that you can do with ghostscript:

  • You can use it to display PostScript documents on the screen.

  • You can run it as a print filter to print PostScript on an ordinary printer.

  • You can use it to convert between PDF and PostScript.

Displaying PostScript

You can use ghostscript directly to display PostScript on your screen, but it's rather a complicated tool. Two wrappers are available to make life easier: ghostview and gv. ghostview is the older of the two, and it has a relatively primitive-looking interface. gv looks more professional. It's not until you use both tools that you discover that you can do more with ghostview.

Both tools work in pretty much the same way: you start them either from the command line or from a menu. The PgUp and PgDn keys move between pages. You can also enlarge sections of the image. With ghostview you just mouse click on the area you're interested in; the left button gives the least magnification, the right button the most. With gv you need to press the middle button and then drag through a menu to get the magnification you want. You can also magnify or reduce the entire document, and of course you can print individual pages.

Printing with ghostscript

ghostscript has a rather annoying syntax, and in general you don't want to run it from the command line. As a print filter, you'd put it in your lpd configuration. For example, I have an Epson Stylus 740 colour printer to which I want to print web pages with Netscape Communicator. Netscape produces PostScript output. The entry in /etc/printcap looks like this:

c|ep740ps|epson740ps|Epson Stylus Color 740 with PostScript:\
        :lp=/dev/lpt0:sd=/var/spool/output/lsp:lf=/var/log/lpd-errs:sh:mx#0:\
        :if=/usr/local/libexec/psfilter2:

Most of the magic here is in the filter /usr/local/libexec/psfilter2, which looks like this:

#!/bin/sh
read first_line
/usr/local/bin/gs @stc500ph.upp -q -sPaperSize=a4 -sOutputFile="|lpr -Pls" -dNOPAUSE - -c quit 

This tells ghostscript to use the definition file stc500ph.upp (part of the ghostscript distribution) to convert from PostScript to something the printer can understand.

Converting to PDF

Although ghostscript handles mainly PostScript, it also has some understanding of PDF, and it can convert between the two formats. The easiest way to do this is to use the scripts ps2pdf and pdf2ps which are supplied with ghostscript:

$ ps2pdf  doc.ps doc.pdf

pdf2ps works in the same manner. Note that you can also use acroread to convert PDF into PostScript; it has this facility to enable it to print to PostScript printers.

If you're creating text for a book or some such, you may find that ps2pdf is not specific enough for what you want to do. I use groff to format my book, ``The Complete FreeBSD''. Here's a slightly simplified version of the Makefile target for creating PDF versions of the chapters:

# Remake individual chapters in PDF.
${CHAPTERS:.mm=.pdf}: ${@:.pdf=.ps}
	$(GS) -q -dNOPAUSE -dBATCH -sDEVICE=pdfwrite -sPAPERSIZE=a4 \
		-sOutputFile=pdf/$@ -c save pop -f \
		ps/${@:.pdf=.ps}

In this Makefile, CHAPTERS is a list of the source file names (as you can guess, they use the mm macros), and GS is the name of the ghostscript executable. The PostScript versions are put in the subdirectory ps, and the PDF versions are put in the subdirectory pdf.

Handling PDF with Acrobat reader

Adobe charges real money for their Acrobat text processing software, and if that weren't enough, it only runs on Microsoft and Mac OS X. The reader is a different matter, though: you can download a Linux version for free. If you don't like the proprietary smell of Acrobat reader, you don't have to use it: ghostview handles it as well.

Where are we now?

As you can see, text processing is a can of worms. I've barely scratched the surface here; we'll get back to the subject in the months to come, but for next month I plan to look at some more ``fun'' ports. I'm open to suggestions.

Google
Web daemonnews.org

More Articles
  • Interview with Jan Schaumann
  • Interview with Theo de Raadt
  • Book Review: Virtualization with VMware ESX Server
  • Editorial: Not Quite Dead Yet
  • The Design of OpenBGPd
  • Interview with der Mouse
  • Letter to Steve Jobs
  • Interview with Manuel Bouyer on Xen
  • Apple and Open Source
  • BSDCan 2006
  • BSD Certification Survey Results
  • Lab in a Box
  • Ike Notes on BSDCan 2005
  • BSDCan 2005 Photos
  • FreeBSD Developer Summit Pictures

  • Advertisements




    Author maintains all copyrights on this article.
    Images and layout Copyright © 1998-2006 Dæmon News. All Rights Reserved.