DæmonNews: News and views for the BSD community

Daemon News Ezine BSD News BSD Mall BSD Support Forum BSD Advocacy BSD Updates

AFS: network filesystem beyond NFS weaknesses

From Emmanuel Dreyfus <manu@netbsd.org>

Network attached storage has been known to Unix users for a very long time with NFS. NFS is reliable, performs well on the performance front, but it is infamous for its security.

The biggest problem with NFS is that the client is responsible for controlling user file access. The NFS server just accepts file system operations on behalf of a given UID and enforces nearly no control. NFS require you trust your clients, something that may not be adequate.

Andrew File System (AFS) is an alternative network file system. In this interview, I ask Ty Sarna about his experience with AFS. Ty Sarna has been an AFS user since 1992 and is a NetBSD developper since 1998.

ED: Hello and thank you for accepting this interview. All I know about AFS is that it is a network filesystem, so my first question is: how is AFS different from NFS?

TS: A long answer, but there are a lot of differences!

The main thing that sets AFS apart from other common network filesystems like NFS and SMB is that it is a complete distributed file storage solution. AFS features sophisticated volume management with backups and replication, access control and encryption (based on Kerberos), and more. As part of this, AFS mediates all access to the files it contains. Unlike NFS/SMB/etc, which simply export a local filesystem that may be accessible by other means, files in AFS can only be accessed through an AFS client, even on the AFS fileserver machine. Managing AFS is a little more like managing a database server than a traditional fileserver.

The client only needs basic configuration about the AFS cell it belongs to and other cells it wishes to talk to and then location of volumes is handled automatically. There is no need to update every client's mounts if a filesystem is moved to a different disk or server, as with NFS. The client also featues caching of files from AFS on local disk for speed.

AFS also has a single consistent global namespace. All my organization's files are under /afs/my.org, and they are visible from any machine under that path... including from your afs site, if I choose! Likewise, I can transparently access files from /afs/yoursite.edu or other sites, if they've decided to make them available.

Finally AFS has good cross platform support, and can be accessed comfortably from Unix, NT, or Mac, whereas using NFS on Windows or SMB on Unix feels rather foreign. Though, some might say it's just that AFS feels consistently foreign on all platforms!

ED: That makes a lot of things to talk about. Let us start with security. How is AFS access control done? Is it completely based on Kerberos, or is it possible to enjoy it with other authentication schemes?

[side note: Kerberos is a Single-Sign On authentication system. In a network using Kerberos, the user authenticate to a first service and gets a kerberos ticket. This ticket allows access to all other services without renewing authentication. This require clients and servers application to be Kerberos-aware]

TS: It's completely based on Kerberos. For a new site this would generally be Kerberos 5 only, but there are various legacy setups possible.

AFS user identities are based on kerberos principals. When you first log into the system, you get kerberos credentials ("tickets") that allow you to access verious services. To access AFS, your tickets are used to get you a "token". All access to the servers is authenticated against the tokens. If you have no token or have an unrecognized one (say, you are jsmith@foo.com trying to access /afs/bar.com, which has no idea who any foo.com users are), you are treated as the anonymous user ("system:anyuser"). Even if you are root on a client system, without a valid token, you are only anonymous. Unlike NFS, the clients aren't trusted by the servers. Also unlike NFS, traffic between the clients and servers is encrypted, though the method used is by now quite dated and rather weak. I expect the AFS maintainers will address this in the near future. Still, it's better than nothing.

AFS's permissions model is based on ACLs (access control lists), which are set on a whole directory, and are combined with the unix permissions on a file (so you can still have individual read-only files, etc). Each entry in the ACL grants particular permissions to a user or group.

AFS has its own groups. There are system-defined groups (system:administrators, for example), and administrators can define other global groups. However, one of the nicest features is that users can define and manage their own groups. For example, I can create "tsarna:friends" and manage its membership without any help from an administrator.

ED: So as I understand, AFS is not something one could try out before having a Kerberos domain set up?

TS: Actually, you can! There are a number of resources publically available in AFS space, and because they're available anonymously, you don't need tickets/tokens. You can simply use an AFS client in "freelance mode".

Having your own AFS storage space does require a Kerberos domain, and has an unfortunately steep learning curve (especially if you are learning Kerberos at the same time), but there is a tutorial by Tracy Di Marco White and Thomas L. Kula that is very helpful, which I used to set up my own AFS cell on NetBSD.

ED: The replication feature sound exciting. Does that mean one can have multiple AFS servers for a single volume which is available with read/write access?

TS: The replication is limited to read-only volumes, but is still enourmously useful. AFS encourages the use of many, small volumes. Generally each user's home directory, each installed large software package, etc would have its own volume. With the fine-grained volumes it's easy to separate out infrequently-modified data into read-only volumes.

There is a built in "release management" for content of read-only volumes. Read-only volumes have an associated read-write volume, mounted in a different location, where changes are made. When it's time to update, the administrator issues a "vos release" command and the read-only replicas are updated to match the read-write volume.

ED: How does it work, from a practical point of view? Assuming I have the Kerberos domain properly set up, what tools will I need to install to have an AFS client or an AFS server?

TS: OpenAFS, from http://www.openafs.org/ is the server software. There is a NetBSD pkgsrc package for it. The setup is a little involved, and the paper mentioned above is probably the best source for instructions on this. It also has some information on setting up Kerberos.

Briefly, you need to install the software, create a service keyfile that the servers will use to talk between each other (even on a single physical AFS server, there are several server process responsible for different aspects of the system), bootstrap at least one user with admin privledges, create the volume that represents to root directory of your cell, and so on. There are a lot of steps, especially if you have to set up Kerberos too. AFS is optimized for low ongoing administrative costs at larger sites, at the cost of a steep learning curve and a complicated setup process.

The client is much easier. If you already have an AFS cell at your school or employer, it's very easy to join.

First, join the Kerberos realm. This generally involves getting a krb5.conf file from your local administrators, maybe getting a "host principal" for your system, and configuring PAM (NetBSD's default PAM activates kerberos automatically when a suitable krb5.conf is installed in /etc).

For NetBSD, Arla (http://www.stacken.kth.se/project/arla/), also available in pkgsrc, is the AFS client I use. It contains an LKM for the kernel part of the filesystem, a user-space daemon called arlad, and some utility programs. I added a PAM module to NetBSD that will create AFS tokens from your kerberos tickets on login, so everything if fairly automatic.

ED: Is there any reason why the kernel part of the filesystem was not integrated in the NetBSD kernel and kept outside in an LKM?

TS: At least one other OS has bundled Arla. There has been some discussion of importing Arla into NetBSD, either the complete package, or at least the kernel part. The idea seems generally popular and it would ease manintenance and prevent problems with kernel API skew, but it hasn't happened yet. I think it will eventually.

ED: You said a lot of processes are invloved on the server side. Having an outlook of the server processes and the files they manage is often a good way to get more familiar with a service. Can you tell us about the different server processes and files roles?

TS: The primary server process is of course the fs server, that actually serves the files and volumes on that server. It is really three servers rolled into one: the fileserver, the volume server, and the salvager (which is like fsck).

The vlserver (volume location server, not to be confused with the volume server) manages the database of volumes for the cell and their locations. Clients use this to find which fileservers to talk to for files.

The ptserver (protection server) maintains information on users and groups.

The buserver (backup server) manages the backup database, including backup schedules, backup volume sets, and so on.

At legacy sites, the kaserver (Kerberos authentication server) is used. A pure Kerberos 5 installation will simply use a regular Kerberos 5 KDC such as Heimdal or MIT Kerberos 5 instead.

Finally there is the bosserver (basic overseer server). This is the ringleader, and it manages starting up and shutting down other servers. The "bos" client program can be used to remotely control the other servers though the bosserver.

A very small site might only have one physical server, running everything. A somewhat larger site will have several for redundancy, since all the database-related servers (vl, pt, bu, ka) support replication of their databases.

A very large site won't bother to run database servers on every server machine. They'll have a few running them, and the rest will only run the fs and bosserver programs.

ED: Let us carry on with the administrative point of view: how do I configure an AFS volume for export?

TS: All AFS volumes are exported in a sense (unless you have firewalled your servers), but ACLs are used to restrict volumes from being publically available. To create a volume an administrator issues a "vos create" command, specifying the volume name, a quota (size of the volume), and on which server and partition to create the volume.

Next the volume is attached to a point in the AFS tree with "fs mkmount". Then the permissions are set with "fs setacl". The volume can be made public by granting permissions to "system:anyuser", restricted to any kerberos-authenticated member of the cell using "system:authuser", or to specific users or groups of users. Different levels of permissions can be granted to different users, of course.

The power of volumes is very central to AFS. Once created, the administrator can do advanced things, like move volumes between partitions or servers, dump and restore individual volumes, create read-only replicas, and so on.

ED: That is for the server. What about the client?

TY: On the client side, Arla needs files to tell it what cell it is a member of (ThisCell) and information on how to contact the cells you want to talk to (either a CellServDB that lists IPs or a DynRootDB which lists names, where the IPs will be found in DNS via "AFSDB" records)

For freelance mode, if you just want read-only access to public AFS resources, you can simply skip all the Kerberos and PAM setup. The client does still need a configuration, including ThisCell, but you can use the default "stacken.kth.se" just fine. A CellServDB file listing major AFS cells is included.

Once the Arla "nnpfs" lkm is loaded, and nnpfs is mountded on /afs, arlad is started, and that's pretty much it. The client knows how to contact the volume location server(s) for a given cell from the configuraton provided, and the server tells it everything else it needs to know, about which fileservers volumes are located on, etc.

The entire global AFS filespace, encompasing thousands of servers at hundreds of locations, appears as a single mountpoint ("/afs") on the client. All the component cells, servers, and volumes are transparent to the user, unless you go looking for specifics using the AFS client tools like the "fs" command (not to be confused with the fs server)

ED: I have also heard about Coda, which is based on AFS if I remeber correctly. Can you tell us about it?

TS: I don't have any experience with Coda. I believe it's derived from a very much older AFS codebase, so it is likely missing some features. On the other hand, it carries the idea of local caching of files much further, the idea being to support disconnected operation -- where the user can access files in Coda even when disconnected from the network, if they have been cached beforehand, and have their local changes integrated back into the server later when they reconnect. Coda is not nearly as widely deployed as AFS.

ED: Well, thank you for answering my questions. AFS seems an interesting thing to try, but it seems I will have to focus on Kerberos first. Do you have anything to add on AFS?

TS: AFS has been popular in large organizations for long time, but there are many medium and smaller sized organizations that can benefit from its advanced features, especially as it has become open source and available on more platforms (including a good quality Windows client).

AFS can be intimidating to get into for the beginner, but I hope this interview will encourage those who can benefit from its powerful features to give it a closer look and try it out!

Emmanuel Dreyfus is a system and network administrator in Paris, France, and is currently a developer for the NetBSD project.

He's been a Unix user since 1996, and the first Unix system he installed and managed was a Mac68k running NetBSD in 1998. Emmanuel Dreyfus became a NetBSD developer in January 2001, and his job in the project has been to integrate various binary compatibility layers: Linux/powerpc, Linux/mips, IRIX, Darwin... More recently, he worked on VPN and the ipsec-tools package.

Emmanuel Dreyfus is the author of the french book The Administrator's Notebook: BSD; The complete list of his publication can be found here http://hcpnet.free.fr/pubz

Google
Web daemonnews.org

More Articles
  • AFS: network filesystem beyond NFS weaknesses
  • Mastering FreeBSD and OpenBSD Security
  • Installing BSD on IBM Netvista S40 - Part 2: FreeBSD Installation
  • Interview with Diego Petteno, Gentoo/*BSD developer
  • Installing BSD on IBM Netvista S40 - Part 1: FreeBSD Installation
  • Interview with Jan Schaumann
  • Interview with Theo de Raadt
  • Book Review: Virtualization with VMware ESX Server
  • Editorial: Not Quite Dead Yet
  • The Design of OpenBGPd
  • Interview with der Mouse
  • Letter to Steve Jobs
  • Interview with Manuel Bouyer on Xen
  • Apple and Open Source
  • BSDCan 2006

  • Advertisements

    BSD News
  • FreeBSD 6.1 review
  • Run Your Own Unix Web Server
  • FreeBSD vows to compete with desktop Linux
  • Submit A News Item
  • BSDSupport goes live
  • Stability in FreeBSD 6.1
  • FreeBSD 6.1 Released!
  • The PC-BSD Interview
  • Interviews with Scott Ullrich and Kris Moore



  • Author maintains all copyrights on this article.
    Images and layout Copyright © 1998-2006 Dæmon News. All Rights Reserved.