DæmonNews: News and views for the BSD community

Daemon News Ezine BSD News BSD Mall BSD Support Forum BSD Advocacy BSD Updates

DOSSIER and the Meta Project (Part 2)

Rich Morin, <rdm@cfcl.com>


Last month, I discussed some problems with the current state of Free and Open Source documentation. I then sketched out how DOSSIER and the Meta Project hope to resolve some of these problems. This month, I will discuss the goals and design of an online Meta system.

Note: Much of this article is speculation; until such a system has been built, I won't really know how to build it! Nor can I promise any completion (or even starting) date for this work; we'll just have to see how much time I can free up from DOSSIER (:-).

System Overview

Like the Meta Demo (aka the "FreeBSD Browser"), Meta will have the basic function of accumulating and dispersing operating system metadata. The scope of the system will be dramatically larger, however, and many implementation details will be different:

  • Data Collection

    The demo uses data (e.g., file relationships) from static snapshots of released systems, supplemented by my own annotations. Meta will accept a continuous influx of data, including reports from systems (and humans!) in the field.

  • Data Format

    The demo uses an informal variant of XML which I call Ostensible Mark-up Language. Meta will use some OML internally, but well-defined XML will be used for all "published" interfaces.

  • Data Storage

    The demo uses a Perl "tied hash" (aka dbm(3) file), faking the existence of multiple tables. Meta will use an object/relational database such as PostgreSQL, possibly augmented by other (e.g., graph-structured) databases.

  • Breadth of coverage

    The demo only covers the base FreeBSD and Mac OS X distributions. Meta will cover several OS variants and thousands of packages.

  • Cross-OS relationships

    The demo treats each OS separately. Meta will use information from other OSes to supplement its information on the "target" OS: a FILES reference for a BSD system may well be applicable to a Linux system.

  • Depth of coverage

    The demo treats documents and files as atomic items; for example, it does not support browsing of man pages, let alone source code. Meta will support hyperlinked browsing, where possible, of all items it covers.

  • Modularity

    The demo's CGI script performs both data retrieval and user interface duties. Meta will divide up these tasks, using an XML-based interface (e.g., SOAP) for inter-process communication.

Distributed Operation

Although the increased modularity in Meta will be beneficial from a software engineering perspective, its real benefit lies in the fact that it will allow Meta to integrate multiple kinds of clients and servers.

For example, Meta will support "local browsers" (both command-line and GUI-based) which run on the user's system. These will examine the local system, then call upon the Meta back-end, producing integrated results:

  • Dynamic files

    Some subsystems create files as they work. Others only use a file if someone else has created it. Finally, some files may disappear because of operator decisions or system operations.

    A local browser can use the Meta back-end to help it identify files from their names and other attributes. It can then describe the file's format, purpose, etc. Alternatively, it can examine a directory, describing files that don't currently exist on the local system.

  • Best and/or common practice

    By collecting and analyzing information from cooperating sites, Meta can build up a repository of "common practice". Users can also submit annotations, indicating suggested practices, useful lore, warnings, etc.

    Armed with this information, a browser can examine a given system or network, highlighting changes from the distributed versions, permissions that seem unusual and/or potentially unsafe, etc.

  • The Meta Project spans a number of problem domains, including document archiving, indexing, and retrieval. Meta will take advantage of other systems, where appropriate. Similarly, its XML-based interface will allow it to serve as a resource for other systems.

Although the software described in this article has yet to be written, none of it requires any dramatic discoveries or inventions. Consequently, I am pretty certain that it can be built. That said, there remains the larger issue of getting useful knowledge from the assembled information.

Next month, I'll look at some totally speculative notions, including cluster analysis, data mining, and expert system technology. I'm not an expert in any of these areas, so I may get some things wrong. On the other hand, exploring new and interesting problems is one of the joys of volunteer software development!

In the meanwhile, please drop by the DOSSIER web site and look over the current offering of document collections. Each volume you buy helps to fund the Meta Project!

Google
Web daemonnews.org

More Articles
  • Interview with Jan Schaumann
  • Interview with Theo de Raadt
  • Book Review: Virtualization with VMware ESX Server
  • Editorial: Not Quite Dead Yet
  • The Design of OpenBGPd
  • Interview with der Mouse
  • Letter to Steve Jobs
  • Interview with Manuel Bouyer on Xen
  • Apple and Open Source
  • BSDCan 2006
  • BSD Certification Survey Results
  • Lab in a Box
  • Ike Notes on BSDCan 2005
  • BSDCan 2005 Photos
  • FreeBSD Developer Summit Pictures

  • Advertisements




    Author maintains all copyrights on this article.
    Images and layout Copyright © 1998-2006 Dæmon News. All Rights Reserved.