DæmonNews: News and views for the BSD community

Daemon News Ezine BSD News BSD Mall BSD Support Forum BSD Advocacy BSD Updates

Interview: der Mouse

By Emmanuel Dreyfus <manu@netbsd.org>

Hard disks tend to be one of the weakest points in today's machines. This is a shame because this is the piece of hardware that holds your data.

If you can afford it, RAID setups will address the problem. But they require duplicate hard disks. If you have a lot of machines, this gets expensive. And if you cannot afford it, you probably cannot afford a NAS or a SAN either. The poor man's solution is to regularly back up the information from one disk to another. Of course if your disk dies between two backups, you lose.

der Mouse is a Canadian open source developer who produced a bunch of valuable software ranging from anti-spam tools to a PPPoE implementation. At the BSDCan 2005 conference, he presented an innovative solution to hard disk replication. In this interview, der Mouse explains his idea and how he implemented it to us.

ED: Good morning, and thank you for answering my questions. So what is this new disk backup idea about?

DM: What it was about for me was protecting my data from disks dying. I use almost exclusively old hardware, old enough that I get it for free or nearly; this means I also use old disks, and their dying is a real danger. I was thinking about backups one day and it occurred to me that it probably wouldn't be that hard to replicate writes over the net in real time. A little more thought and I had a possible way to deal with the hard part blocked out; I tried implementing it, and it worked. "Works", actually; it's in production use right now on my house machines.

ED: As I understand, you modified the NetBSD kernel so that any write to a filesystem is sent over the network to your real-time backup server. At what level is the thing implemented? Is it a stackable filesystem, à la cgd (the cryptographic filesystem available in NetBSD), or is it a hook in each filesystem, or in each device driver? What would have been the pros and cons of each of these approaches?

DM: It's a hook in each device driver. I've added such hooks to sd, wd, ccd, and ed - the last not being stock NetBSD; it's akin to cgd, but my own work. (The reason I haven't done others is that I haven't needed them myself.) The hooks are fairly simple to add.

As for making it a stackable filesystem, I didn't think of that. It's certainly a sensible approach. The major problem from my point of view is that stackable filesystems don't work very well in the version I use; the second-biggest problem is that it means that the result works only for filesystems, and backs up the filesystem view of the result. (For example, if you put ffs on a cgd, with the stackable-filesystem approach you have to back up the plaintext version of it.) Hooks in the filesystems have similar properties, with the additional disadvantage that you need to add code to each filesystem type you want to back up. Also, filesystem-level mechanisms work at the file level; while there's nothing wrong with that, I wanted to work at the disk-block level. It would be entirely reasonable to build something similar that worked at the file level, and I can think of various possible advantages. (I can also think of possible disadvantages.)

Doing it with hooks in the disk drivers means I have a choice of backing up the plaintext or encrypted version of an ed (or cgd, if the version I worked with had cgd), it means that it can back up a database raw partition, it means that the backup can incrementally track a partition while I unmount, newfs, and remount it....

ED: What about the network protocol? I assume you use TCP or UDP, with the data being sent with a header? What kind of information do you have to send in the header?

DM: It's TCP. Besides an encryption key and verifier exchange, which occurs only at connection startup, the only data attached to a disk block's contents is the block number. Demultiplexing different data streams between the same client and server machines is done based on the TCP port number the connection is on.

ED: As I understand, you encrypt all the data in your protocol. Why not use IPsec? You would have got the security for free.

DM: Everything on the wire is encrypted to avoid exposing the disk data - I had in mind potentially using this over non-private long-haul links, though as far as I know it hasn't actually been used that way. IPsec probably could have got it for free, yes. I didn't use it because I've never played with IPsec enough to know how to do that. (I could have treated this as a chance to learn that, but at the time I was more concerned with getting it running - I had no recent backup then and I was getting uncomfortable with that state of affairs.)

ED: What level of security does the built-in encryption give us? Are there authentication methods that prevent traffic injection or replays? Is there a rekeying method that prevents the private key from being guessed by passive monitoring?

DM: As for level of security, I'm not sure how to quantify it. It all leverages a shared secret, so you have all the key-distribution issues that go with that. Bulk encryption is done with arcfour, with a 256-byte key (which admittedly has only 256 bits of entropy, 128 provided by each end). There is an authentication step designed to, among other things, prevent replay attacks. Data is not signed; it is not robust against an active attacker attempting to disrupt it - but nothing that depends on an unsecured TCP connection is. There is currently no rekeying, which I recognize is something to fix next time I pick it up to work on it more.

ED: How do you handle the situation where the machine performs a fsck while network has not yet been configured? How the changes are sent to the server in that situation?

DM: That is normally before the backup software starts, though it doesn't have to be. At startup, or at any other time when it doesn't have a connection to the server, it pays no attention to changes; when the server connection comes back up, it does a full rescan of the disk, updating the server's copy to match the client's. (It sends block checksums when doing this, not full block contents. I've been thinking about ways to cut the network traffic down even further.)

ED: How does the thing look on the server? I can imagine a daemon dumping the data to a text file. Is that text file a plain copy of the partition? Can I mount it using vndconfig? (vndconfig is NetBSD's utility for manipulating a file as if it was a disk partition)

DM: The backup is a copy of the partition, as if made with dd. You can use vnd to look at it on the server, yes, just as you would for a dd image - though the backup can have holes if the client disk has blocks full of 0x00s; some versions of vnd don't like files with holes. (You need FFS_EI for ffs if the client and server are of different endiannesses, of course - same as for a dd image.) The backup image won't normally be plain text, of course, since disks normally contain non-plain-text data such as filesystem superblocks.

You may not *want* to access it with vnd, of course, if the client is live. But, for example, if the client is dormant, vnd on the server is a reasonable way look at important data from the backup image.

ED: What about performance? Do you get a noticeable slow down? Did you try to measure it?

DM: In normal operation - normal for me, that is - I notice no performance hit. When it's doing a rescan (on startup, or after a connection loss and recovery) there is a significant CPU load, especially on slow machines (68k, 486, that sort of thing). When my backup server machine reboots, there is also a fairly large performance hit (affecting primarily the network) as all my machines reestablish their connections and start doing rescans.

The only other performance hit I've noticed is when a machine is receiving data over the network and writing it to disk: the data coming in competes for network bandwidth with the data going back out to the backup machine, effectively cutting my network bandwidth in half. Fortunately, most of the times I do this I'm pulling data from a machine far enough away that the data flow is less than half the local bandwidth anyway, so even when doubled there's still excess capacity.

I have not done quantitative measurements of the performance impact, only "measurements" of the "gee this feels slow" or "I don't notice the hit" qualitative variety.

ED: And what about the network load? You said it was designed with the use over the Internet in mind, but does that require a very high speed connection?

DM: It is designed to be usable over the long-distance Internet. Of course, for it to really live up to its potential, you need to have enough network bandwidth from client to server to handle the average rate at which stuff gets written to disk; otherwise, writing will get ahead of backing up and it will go into catch-up mode and stay there forever. You'll still have a backup, but it won't be very real-time and it will lose the nice write ordering properties it has in normal mode, increasing the chance that a crash will leave the backup holding a corrupted filesystem.

ED: Are there some feature additions planned for this project?

DM: Yes. I'd like to add rekeying. I don't know how crackable the keystream is in practice, but rekeying is unlikely to be a bad idea. I have some ideas for cutting down the amount of data sent over the wire when doing a rescan; I'd like to implement them and see how well they work in practice. And, while I have no plans to try to port it to other OSes myself, I support such attempts and would be happy to work with anyone doing such a thing to help where I can. I know of at least one person who wants it under FreeBSD.

ED: I hope someone will answer your call. Thank you for the interview, and good luck with your future developments.

DM: Thank you for the opportunity to talk about it. If anyone would like to write to me about it, ed-dm-2006-01@rodents.montreal.qc.ca is a suitable address, or my NetBSD address, mouse@netbsd.org.

Coming soon in this column: Jan Schaumann interviewed about NetBSD as a desktop, Alistair G. Crooks on SAN and iSCSI, Jared D. Mac Neill about kernel framebuffer support, and Ty Sarna on the Andrew File System (AFS). Stay tuned!

Google
Web daemonnews.org

More Articles
  • Interview with Jan Schaumann
  • Interview with Theo de Raadt
  • Book Review: Virtualization with VMware ESX Server
  • Editorial: Not Quite Dead Yet
  • The Design of OpenBGPd
  • Interview with der Mouse
  • Letter to Steve Jobs
  • Interview with Manuel Bouyer on Xen
  • Apple and Open Source
  • BSDCan 2006
  • BSD Certification Survey Results
  • Lab in a Box
  • Ike Notes on BSDCan 2005
  • BSDCan 2005 Photos
  • FreeBSD Developer Summit Pictures

  • Advertisements




    Author maintains all copyrights on this article.
    Images and layout Copyright © 1998-2006 Dæmon News. All Rights Reserved.