DOSSIER and the Meta Project (Part 2)
Last month,
I discussed some problems with the current state of
Free and
Open Source documentation.
I then sketched out how
DOSSIER and the
Meta Project
hope to resolve some of these problems.
This month, I will discuss the goals and design
of an online Meta system.
Note:
Much of this article is speculation;
until such a system has been built,
I won't really know how to build it!
Nor can I promise any completion (or even starting)
date for this work;
we'll just have to see how much time
I can free up from DOSSIER (:-).
System Overview
Like the Meta Demo (aka the "FreeBSD Browser"),
Meta will have the basic function
of accumulating and dispersing operating system metadata.
The scope of the system will be dramatically larger, however,
and many implementation details will be different:
- Data Collection
The demo uses data (e.g., file relationships)
from static snapshots of released systems,
supplemented by my own annotations.
Meta will accept a continuous influx of data,
including reports from systems (and humans!) in the field.
- Data Format
The demo uses an informal variant of XML which I call
Ostensible Mark-up Language.
Meta will use some OML internally,
but well-defined XML will be used
for all "published" interfaces.
- Data Storage
The demo uses a Perl "tied hash" (aka dbm(3) file),
faking the existence of multiple tables.
Meta will use an object/relational database such as PostgreSQL,
possibly augmented by other
(e.g., graph-structured) databases.
- Breadth of coverage
The demo only covers the base FreeBSD
and Mac OS X distributions.
Meta will cover several OS variants
and thousands of packages.
- Cross-OS relationships
The demo treats each OS separately.
Meta will use information
from other OSes to supplement its information
on the "target" OS:
a FILES reference for a BSD system
may well be applicable to a Linux system.
- Depth of coverage
The demo treats documents and files as atomic items;
for example, it does not support browsing
of man pages, let alone source code.
Meta will support hyperlinked browsing,
where possible, of all items it covers.
- Modularity
The demo's CGI script performs
both data retrieval and user interface duties.
Meta will divide up these tasks,
using an XML-based interface (e.g., SOAP)
for inter-process communication.
Distributed Operation
Although the increased modularity in Meta
will be beneficial from a software engineering perspective,
its real benefit lies in the fact
that it will allow Meta
to integrate multiple kinds of clients and servers.
For example, Meta will support "local browsers"
(both command-line and GUI-based)
which run on the user's system.
These will examine the local system,
then call upon the Meta back-end,
producing integrated results:
- Dynamic files
Some subsystems create files as they work.
Others only use a file if someone else has created it.
Finally, some files may disappear
because of operator decisions or system operations.
A local browser can use the Meta back-end
to help it identify files
from their names and other attributes.
It can then describe the file's format, purpose, etc.
Alternatively, it can examine a directory,
describing files that don't currently exist
on the local system.
- Best and/or common practice
By collecting and analyzing information
from cooperating sites,
Meta can build up a repository of "common practice".
Users can also submit annotations,
indicating suggested practices, useful lore, warnings, etc.
Armed with this information,
a browser can examine a given system or network,
highlighting changes from the distributed versions,
permissions that seem unusual and/or potentially unsafe, etc.
-
The Meta Project spans a number of problem domains,
including document archiving, indexing, and retrieval.
Meta will take advantage
of other systems, where appropriate.
Similarly, its XML-based interface
will allow it to serve as a resource for other systems.
Although the software described in this article
has yet to be written,
none of it requires any dramatic discoveries or inventions.
Consequently, I am pretty certain that it can be built.
That said, there remains the larger issue
of getting useful knowledge from the assembled information.
Next month, I'll look at some totally speculative notions,
including cluster analysis, data mining,
and expert system technology.
I'm not an expert in any of these areas,
so I may get some things wrong.
On the other hand, exploring new and interesting problems
is one of the joys of volunteer software development!
In the meanwhile, please drop by the
DOSSIER web site
and look over the current offering of
document collections.
Each volume you buy helps to fund the Meta Project!
|
|