Distributed Data Storage on a LAN? 446
AgentSmith2 asks: "I have 8 computers at my house on a LAN. I make backups of important files, but not very often. If I could create a virtual RAID by storing data on multiple disks on my network I could protect myself from the most common form on data failure - a disk crash. I am looking for a solution that will let me mount the distributed storage as a shared drive on my Windows and Linux computers. Then when data is written, it is redundantly stored on all the machines that I have designated as my virtual RAID. And if I loose one of the disks that comprise the raid, the image would automatically reconstruct itself when I add a replacement system to the virtual RAID. Basically, I'm looking to emulate the features of hi-end RAIDS, but with multiple PCs instead of multiple disks within a single RAID subsystem. Is there any existing technologies that will let me do this?"
NBD Does this (Score:5, Insightful)
"Network Block Device (TCP version)
What is it: With this thing compiled into your kernel, Linux can use a remote server as one of its block devices. Every time the client computer wants to read
Limitations:It is impossible to use NBD as root file system, as an user-land program is required to start (but you could get away with initrd; I never tried that). (Patches to change this are welcome.) It also allows you to run read-only block-device in user-land (making server and client physically the same computer, communicating using loopback). Please notice that read-write nbd with client and server on the same machine is bad idea: expect deadlock within seconds (this may vary between kernel versions, maybe on one sunny day it will be even safe?). More generally, it is bad idea to create loop in 'rw mounts graph'. I.e., if machineA is using device from machineB readwrite, it is bad idea to use device on machineB from machineA.
Read-write nbd with client and server on some machine has rather fundamental problem: when system is short of memory, it tries to write back dirty page. So nbd client asks nbd server to write back data, but as nbd-server is userland process, it may require memory to fullfill the request. That way lies the deadlock.
Current state: It currently works. Network block device seems to be pretty stable. I originaly thought that it is impossible to swap over TCP. It turned out not to be true - swapping over TCP now works and seems to be deadlock-free.
If you want swapping to work, first make nbd working. (You'll have to mkswap on server; mkswap tries to fsync which will fail.) Now, you have version which mostly works. Ask me for kreclaimd if you see deadlocks.
Network block device has been included into standard (Linus') kernel tree in 2.1.101.
I've successfully ran raid5 and md over nbd. (Pretty recent version is required to do so, however.) "
Comment removed (Score:5, Informative)
Re:NBD Does this (Score:2)
Re:NBD Does this (Score:2)
That being said, do some benchmarks. RAID1+0 might be more sane. (That is, a RAID1 array overtop a RAID0 array.)
Re:NBD Does this (Score:5, Informative)
Just to clarify what this guy is saying:
1) Make all your machines NBD servers. NBD for Linux [sourceforge.net], NBD for Windows [vanheusden.com]. NBD stands for "network block device" and allows a client to use a server's block device.
2) Set up a master client/server (using Linux or something else with a decent software RAID stack). This machine will be the only NBD *client*, and it will use all the NBD block devices exported by the rest of your network.
3) On the master set up in 2), create a Linux MD RAID array overtop all the NBD devices that are available.
4) Create a filesystem on the brand-spanking-new multi-machine RAID array.
5) Export it back to the other machines via Samba or NFS or AFS or what have you.
Why does only one machine (the "master server") access the NBD devices, you ask? Because for a given block device, there can only be one client accessing it safely. Thus, if you want to make the RAID array available to anything other than the machine which is *running* the array off the NBD devices, you need to use something which allows concurrent access; something like NFS, Samba, or AFS.
Hope that clears it up a bit.
Yes. (Score:3, Informative)
NOTE!
You shouldn't leave any NBD-exported volumes on the new master. Make it into a physical, local volume, but reference it in the "same place" in your RAID configuration.
DRBD does it as well... (Score:3, Informative)
Re:NBD Does this (Score:3, Informative)
Note: Network Block Device is now experimental, which approximately
means, that it works on my computer, and it worked on one of school
computers.
That doesn't sound very promising to me. Usually stuff that's been in the kernel since 2.1 days is rock solid.
Isn't AFS/Coda more like the guy wants (excluding Windows-ability, although I seem to remember there being something for Andrews for Windows)?
Re:NBD Does this (Score:3, Insightful)
Basically, in order to execute the network device request you often have to get more memory. In order to get more memory you have to execute a network request. So on so forth.
Also, AFAIK RAID does not work properly over NBD.
Re:NBD Does this (Score:3, Interesting)
Re:What if one of the nodes goes down? (Score:5, Insightful)
Assuming a Raid5 with three nodes, and two go down not at the same moment, will all your data be lost?
I would think very carefully about these issues before putting all your valuable data on it. RAID isn't really designed for frequently unreliable connections like this. It's meant to prevent data loss if a hard drive crashes, which should be a fairly uncommon thing within a single system.
yes (Score:2)
Re:yes (Score:2)
we use afs (pre-openafs, tho i'm sure openafs will work just find) on top of nbd (link escapes me right now). works pretty well.
Re:rsync Re:yes (Score:2)
Win2k (Score:5, Informative)
Re:Win2k (Score:2)
Re:Win2k (Score:3)
If you look further [microsoft.com] into DFS, I believe you'll find that you can have multiple servers syncronizing the same share name.
It's pretty snazzy; it'll even try to figure out the 'closest' server to you at any given time, skip over servers that are down, and so on.
Re:Win2k (Score:2)
The distributed feature would be quite worthless if there wasn't some synchronization taking place to make sure the data was synched across all servers in the DFS namespace.
rdist would work... (Score:5, Informative)
But if you don't want to get into nbd, you can tolerate delayed writes to your virtualized disks, and all you want is the network equivalent of RAID level 1, then you could always just set up an rdist script that synchronizes your local data disk with a remote repository (or eight) every so often...
--ZS
Speed (Score:5, Interesting)
Re:Speed (Score:2)
--ZS
Standard Linux kernel maybe? (Score:3)
Anyone tried this?
Re:Standard Linux kernel maybe? (Score:4, Informative)
If you're curious about using the enhanced NBD w/ failover and HA, you can read about it at:
http://www.it.uc3m.es/~ptb/nbd/#How_to_make_ENB
Re:Standard Linux kernel maybe? (Score:2)
Ok. But does it work under Windows? That was one of the requirements.
AFS (Score:4, Informative)
http://www.psc.edu/general/filesys/afs/afs.html
There's another alternative with a different name, but I forget what it's called.
Re:AFS (Score:5, Informative)
It's still running and running well at CMU (AFAIK - as of late 90's). Every student gets an "Andrew" ID. Actually the very first networked computer I ever logged into (other than dialing a bbs) was a 'node' on Andrew, in 1988. Very very cool at the time, and still is.
Re:AFS (Score:5, Interesting)
It requires it's own partition for each mount of it; you can't just share disks you've already got.
Setup also takes hours, and it probably won't work the first time. Online documentation is incredibly outdated, which doesn't help matters at all. It also takes a hefty chunk of computer to run it, because it requires a lot of watchdog type programs to fix the frequent corruption that happens to it as you use it.
The servers time has to be matched exactly, so it's also best if you've got an NTP server running and clients on all the machines.
It's also about ten times slower than Samba (which you might use instead to share with Windows machines), and it chokes when you try to move/copy/delete large files.
I tried it for a month before it completely corrupted it's own partition and I switched back to NFS and Samba.
I can't wait for the day when these problems are but a memory and such a system works flawlessly.
Re:AFS (Score:3, Insightful)
I've been using it for years. I've found nothing that works better. I've got ``clients'' that are IRIX, NetBSD, Solaris, SunOS 4, NetBSD, MacOS X and FreeBSD and I use it to serve my web root, home directories, various applications (my mail server etc...) I can't imagine using something else.
It requires it's own partition for each mount of it; you can't just share disks you've already got.
This is very misleading. A f
Re:AFS (Score:3, Informative)
The Windows client did have some notable slowness issues, performance with Linux is excellent, and scales much better than NFS. Clients are available for a large number of OSs. Doesn't matter if it's the right time, just A time. So setup NTP on one machine as a primary, and the others can use ntpdate to set time once a day.
AFS started around 1986 as a com
Why? (Score:2, Funny)
I mean, let's be honest here. We are all dorks, but this guy is king dorkus dweedius maximus. Don't fool yourself about the "important data" - it is just pr0n and pirated MP3s.
If it was real work, there would be a real IT guy with real RAID and real backup tapes working on the problem,. But we know it isn't real work, because if this guy had a real IT job, h couldn't stand coming home and dealing with 8 frigg
Most common form of data loss? (Score:5, Insightful)
In my 14 years as a Network Administrator I think I've restored backups due to failed hard disks about twice (RAID catches the rest).
But I restore data accidentally deleted or changed by a user at least weekly! A distributed storage system won't help you there.
However, I will grant that the average
Re:Most common form of data loss? (Score:2)
Re:Most common form of data loss? (Score:2)
CVS isn't designed for that, unless you only store documents or have some pretty stringent filters setup on CVS. CVS is for versioning, and you don't really want to maintain a backlog of every version of every file in your home directory.
Re:Most common form of data loss? (Score:3, Interesting)
AFAIK, there's at least on project out there to turn CVS into a filesystem, and a few others to add MVCC functionality into a filesystem (somewhat like the Clearcase filesystem does).
It's a good feature, somethi
Re:Most common form of data loss? (Score:2)
It's been 15 seconds since you hit 'reply'!
Goddamn it. It only took me 15 seconds to type it.
Re:Most common form of data loss? (Score:5, Insightful)
File versioning useful, VMS variant not so sure (Score:3, Interesting)
Try #1:
DELETE FOO.TXT
This is really the wrong answer. If you have FOO.TXT;1 and FOO.TXT;2, then this command deletes FOO.TXT;2 and any attempt to access FOO.TXT will get you FOO.TXT;1.
Try #2:
DELETE FOO.TXT;*
This is the common recommendation, but you've now lost the ability to see any of the old versions.
The GNU file utiliti
Re:Most common form of data loss? (Score:3, Interesting)
That feature doesn't need to be in the kernel, since it can easily and transparently be provided in user space.
If you like, you can enable this right now using a simple hack on top of PlasticFS [sourceforge.net] or your own, custom LD_PRELOAD hack.
Providing file versioning in the kernel or enabling it globally in some other form has not caught on because it is a huge hassle and causes lots of problems, even in systems that know about it.
For example, when you retag one MP
Re:Most common form of data loss? (Score:2)
Business environments are generally more robust - especially when it comes to things like power. Not only the mains power, but power supplies. A lousy power supply can kill a hard disk as easily as a line surge. In the last ten years I've personally lost a 4.3 GB Atlas Wide SCSI and a couple of Maxtor 60GB IDE drives. In both cases my backups a month out-of-date.
Also ha
Re:Most common form of data loss? (Score:2)
Re:Most common form of data loss? (Score:4, Informative)
1) You could make a partition that is 10% of your disk, make another identical one on another disk, and mirror those. Then put your 10% critical data in there.
2) Do what I do: set up a RAID server, and keep all critical data on that. This is good if you have a home network with multiple computers. It also makes data sharing easy among the computers.
steveha
Re:Most common form of data loss? (Score:2)
Re:Most common form of data loss? (Score:2)
Yes, but all us users of more than one home pc (ie, enthusiasts) use RAID 0, which has the opposite effect. So for us, a suplemental distributed RAID is a GREAT idea for our documents, e-mail backups, and other stuff we want to keep permanently and access from any of our home stations.
Re:Most common form of data loss? (Score:5, Informative)
Use rdiff-backup!
http://rdiff-backup.stanford.edu/
Configurable, secure, distributed, versioning incremental backups.
It's not a replacement for RAID, but is good for nightly inter-machine backups.
There's also a related project where the far-end repository is encrypted, so you can have it on any public server without fear of having your data read by the wrong people.
Very cool. It's saved my ass a few times.
Intermezzo (Score:5, Informative)
It isn't particularly high-performance, from what I know, and may be more complexity than you need.
Or try Groove workspace for Windows (Score:2, Informative)
Worth considering because:
- Files are encrypted and sent in an encrypted format.
- Files placed in the shared space are mirrored on all systems that are members of the worspace.
- The software is free for non-commercial use.
- Lot's of other interesting features to play with.
- You can even mirror with a machine accross the Internet.
Limited by:
- The speed of your connection.
- W
Re:Intermezzo (Score:5, Informative)
We have looked at various distributed filesystems for use in a clustered setup of webservers. We wanted to remove the single point of failure from a central NFS server - Intermezzo was one of the filesystems we had a look at.
The idea behind Intermezzo is fairly simple and the documentation is good. The Intermezzo system looked like an ideal solution for our setup (Coda and OpenAFS are far to complex for use in a distributed filesystem on a closed internal net).
We tested the system but sadly it's not really production stable and I can't advise that you use it.
If you are looking for a SAFE solution then Intermezzo is not for you - you will just end up with garbled data, deadlocks and tons of wasted time ...
My 2 cents.
Re:Intermezzo (Score:2)
Re:Intermezzo (Score:2, Informative)
NFS is a proven filesystem and it has been tested for years. It's compatible with all major UNIX flavors and BSD/Linux systems.
Bandwidth (Score:4, Insightful)
The obvious solution (Score:4, Funny)
Then the only remaining issue is number of pigeons.
Re:Bandwidth (Score:3, Insightful)
RAID on Files (Score:3, Insightful)
This would be really useful for SOHO type places to allow me to have a hot offsite backup at multiple friends (and vise versa).
Re:RAID on Files (Score:2)
--ZS
Backing up all within your house (Score:5, Insightful)
8 copies of the same document all nicely toasted!
Re:Backing up all within your house (Score:3, Funny)
Come on, this'll never happen. I live in San Diego!
Re:Backing up all within your house (Score:3, Interesting)
Loose Hard Drive? (Score:2, Funny)
Speed would be an issue... (Score:5, Informative)
I get what you mean though... it's a nice idea, but it would be costly to implement vs. what I suggested above.
When I went to see a presentation on HP's SAN solutions last year, I was very impressed with the ideas they had. One big hardware box with multiple disks that are controlled by the hardware. They are then presented to any systems over a fiber link as any number of drives you wish for any OS. Finally, their "snapshot" ability was pretty impressive. (Also called Business Copy) All they would do is quiesce the data bus, then create a bunch of pointers to the original data. As data is altered on the "copy" (just the pointers, not a real copy), the real data is then copied to the "copy" with changes put in place. I imagein something similar could be accomplished with CVS...
Re:Speed would be an issue... (Score:2)
Re:Speed would be an issue... (Score:2)
Re:Speed would be an issue... (Score:3, Informative)
I took a spare machine, added a 3ware 6800 ATA RAID controller ($130 on eBay), and installed eight 120GB Maxtor hard drives ($1200 when I bought them last year) and put them in eight Genica hot-swap trays ($60). For about $1500, I now have an 800GB formatted RAID5 array. (Had to throw in a dedicated 400W Antec power supply for HDs.) In a year, two of the drives have flunked,
Coda (Score:3, Redundant)
Re:Coda (Score:3, Interesting)
Seriously, I looked into Coda a couple months ago and the design looks really cool, but it just doesn't seem to work very well unless you're only storing tiny text files. It also doesn't scale very well on large servers (i.e. it h
Distributed Network Block Device (Score:2, Informative)
data loss (Score:2)
In my experience, the most common form of data loss is not hardware failure, but user error. RAID is great for protecting against hardware failure, but be sure to still make backups to prevent against accidental deletion.
Try Rsync or DRBD (Score:4, Informative)
see http://drbd.cubit.at/ [cubit.at] DRBD is described as RAID1 over a network.
Rsync with a cron script would work too. I think there is a recipe in the linux hacks books to do something like what you are looking for: #292 [oreilly.com].
Venti needs a mention (Score:4, Informative)
http://plan9.bell-labs.com/sys/doc/venti/venti.
Abstract
This paper describes a network storage system, called Venti, intended for archival data. In this system, a unique hash of a block's contents acts as the block identifier for read and write operations. This approach enforces a write-once policy, preventing accidental or malicious destruction of data. In addition, duplicate copies of a block can be coalesced, reducing the consumption of storage and simplifying the implementation of clients. Venti is a building block for constructing a variety of storage applications such as logical backup, physical backup, and snapshot file systems.
Expensive but reliable solution (Score:3, Interesting)
According to pricewatch the 4 160's could be had for around $400 total with about another $400 for the backup. Add a 3ware RAID controller for another $245 bucks and your looking at about $1045 to convert a system into supporting 450 GB of usuable network storage and backup.
From all indications IDE harddrives are now the cheapest form of backup there is. I've looked at CD, DVD, Tape, but it keeps coming back to IDE hard drives. This is far cheaper than a similiar storage and backup would be on tape.
hyper scsi (Score:2, Informative)
HyperSCSI is a networking protocol designed for the transmission of SCSI commands and data across a network. To put this in "ordinary" terms, it can allow one to connect to and use SCSI and SCSI-based devices (like IDE, USB, Fibre Channel) over a network as if it was directly attached locally.
http://nst.dsi.a-star.edu.sg/mcsa/hyperscsi/ [a-star.edu.sg]
iSCSI? (Score:2)
Rsync and Ssh (Score:5, Informative)
First, setup ssh to use pubkey authentication instead of interactive password. You can read the man pages for details but it basically boils down to running keygen on the trusted source:
ssh-keygen -b 2048 -t dsa -f ~/.ssh/identity
Then copy|append the newly created ~/.ssh/identity.pub to the remote hosts into their
Now you can run rsync with ssh as the transport (instead of rsh) by exporting:
export RSYNC_RSH=ssh or also passing --rsh=ssh on the command line.
So to sync directories you could use a find command to update regularly:
while true; do
find . -follow -cnewer
if (( $? == 0 )) ; then
rsync -rz --delete . destination:/some/path/
touch
fi
sleep 60
done
Obviously this is pretty hackish and could be improved. But the point is that with ssh and rsync you could do automatic mirroring of specific filesystems or directories to remote locations securely.
Re:Rsync and Ssh (Score:4, Informative)
Re:Rsync and Ssh (Score:3, Informative)
Rsync and ssh can work with Windows using Cygwin. See this document [unimelb.edu.au] for example.
The holy grail (Score:2)
So far, I've not seen anything that exists that does what you are asking for. Several technologies come somewhat close.
What I've been hopeful of is the recent donations by Oracle for database clustering, but I haven't seen any decent fallout from that... yet.
For now, on my home-based work network, I have two network drives (both IDE 120 GB) and do nightly rsynch from one to the other.
(sigh)
You aren't gonna get a real RAID. (Score:5, Insightful)
Instead of trying to implement a shoestring SAN, go the simple route: throw up a Linux box running Samba for your "backup server;" it doesn't need much horsepower, just fairly fast drives and a network connection. Then schedule copies of your documents and home directories (using a cron-type tool on Linux and XCOPY called by the Task Scheduler on Windows, you should be able to hack something together that copies only changed files) every night at midnight, or some other time when you aren't using your computers. Although you might lose a bit of work if the system goes down, you won't ever lose more than 24 hours' worth.
If you have more money to blow, then I would suggest that you invest in an honest-to-dog hardware RAID card and some good drives and put them into a server, then do everything across the network (put the /home tree and My Documents folders on the server). You can of course mount the /home directory in Linux via NFS or smbmount, and Group Policy in Windows 2K/XP will allow you to change the location of the My Documents folder to whatever you choose. You might be able to do the same via the System Policy Editor on 9x; it's been a while and I can't find the information after a brief Google.
To sum up:
Re:You aren't gonna get a real RAID. (Score:4, Informative)
Re:You aren't gonna get a real RAID. (Score:3, Informative)
Set up a server with multiple hard disks in a Linux software RAID, and run Samba and NFS on that. The Linux software RAID HOWTO explains all you need to know.
steveha
Re:You aren't gonna get a real RAID. (Score:3, Informative)
I'm currently running some benchmarks on an XFS filesystem built upon a Linux MD RAID1 array, which is in turn built upon a local disk and a remote disk (which is at the end of a switched 100mbit network, the NBD server itself having
You probably don't want to do this. (Score:4, Insightful)
Really. If you're on a 100-megabit LAN, that gives you a max of about 10 megaBYTES per second. So, if you have to transmit information to two other computers for every disk write, you're effectively limitting yourself to a maximum of about 5 megabytes/second disk transfer. And that's under GOOD situations. If you're doing random I/O, where the latency will be the determining factor, then take the latency of the hard drives, add in the latency of the networking, and the latency of the software layers, and you're looking at some pretty abysmal performance.
Using rsync in a cron job will solve your backup problems. In fact, your script can use rsync to do the synchronization, and tar/gzip to archive the backup - giving you "point in time" snapshots for when someone says "I deleted this file 4 days ago, can you get it back?"
steve
I can't believe... (Score:2, Interesting)
If all you're worried about is disk failures, mirror each disk locally. Disks are cheap, and real operating systems don't have any trouble with software mirroring.
Why would you want to make all of your machines suddenly non-functional, just because one of them lost a network card? Or the switch failed? Or
P2P solutions: Freenet, Oceanstore? (Score:2)
What you might be able to do is put together a microcosm of Freenet [sourceforge.net] or something like it, running on just your home computers. There may be other Peer-to-Peer solutions available that are faster/more stable. Do some searching
Been meaning to do someting like this with unison (Score:2)
Though not real time like a true RAID, I think what you're really after is something like rsync, as many other posters have mentioned. When this came up in an earlier story I found a like to Unison, which seems to be better for my needs at least.
http://www.cis.upenn.edu/~bcpierce/unison/ [upenn.edu]
Might be interesting to combine this with FSRaid [fluidstudios.com] (Parity Archive or PAR files) to get some extra redundancy.
B
I do this.... (Score:4, Funny)
I do this everynight to thousands of machines...
The software I use is Kazaa-lite.
Oh, you mean files other than my MP3s/jpegs/mpegs? Sorry, I can't help you there.
Re:I do this.... (Score:3, Funny)
I blame Jack Valenti for this whole mess.
Off the mark (Score:2)
Many responses, even highly-rated ones, seem to be talking about simple replication via NBD (worst-written code I've ever seen) or DRBD. That's not the same as what the original poster was asking about. Neither are fully-distributed but non-transparent file stores such as HiveCache [mojonation.net]. AFS/DFS/Coda/Intermezzo are probably the closest in the sense of being both transparent and resistant to failures. There have also been a couple of very closely related projects at Microsoft (Farsite and Pastiche) but I'm no
Parallel Virtual File System (Score:4, Informative)
"The goal of the Parallel Virtual File System (PVFS) Project is to explore the design, implementation, and uses of parallel I/O. PVFS serves as both a platform for parallel I/O research as well as a production file system for the cluster computing community. PVFS is currently targeted at clusters of workstations, or Beowulfs."
"In order to provide high-performance access to data stored on the file system by many clients, PVFS spreads data out across multiple cluster nodes, which we call I/O nodes. By spreading data across multiple I/O nodes, applications have multiple paths to data through the network and multiple disks on which data is stored. This eliminates single bottlenecks in the I/O path and thus increases the total potential bandwidth for multiple clients, or aggregate bandwidth."
Or there are many others to chose from, google for clustered filesystems:
http://www.yolinux.com/TUTORIALS/LinuxClustersA
Slow? (Score:2, Informative)
However, while small files would be fine, I would think the speed of the network would make for some fairly slow storage on a 100mbit network.
Add more users saving files across the network to the equation and things would get out of hand fast.
I guess I would just buy a serial ata raid motherboard (the intel D865GBFLK is one I have bee
Raid != Backup (Score:3, Informative)
Personally I have a server with a RAID 5 array that is shared via SAMBA to windows and linux clients, which works fine, though I may adjust this if good suggestions are made here. The only real issue would be disk space, and all my computers now have 120G+ hard drives or RAID array....
here ya go (Score:2)
copy c:\porncollection\*.* \\backup1\bak
copy c:\porncollection\*.* \\backup2\bak
.
.
.
copy c:\porncollection\*.* \\backup8\bak
New kind of network file system needed (Score:2, Interesting)
Amanda (Score:2)
Don't play around with something "cool" like a distributed RAID disk. Just spend the money on a decent tape drive and tapes, design a tape backup rotation strategy, get a safety deposit box at a local (or not-so-local) bank for off-site storage, and set up Amanda [amanda.org] to do the backups.
I don't worry... (Score:2)
It's my wife and her need to open any email she gets using outlook on her windows box. She's just enough of a geek to be dangerous and "enjoys" the preview feature.
And she wonders why her 'puter can't log into the LAN without being Virus checked first.
-Goran
Lustre and PVFS (Score:3, Insightful)
Why? (Score:5, Funny)
Why would you want to "loose" one of the disks? Don't you know they're supposed to stay tightly enclosed in their little boxes?
And why do you think that "loosing" the disk would help the image "automatically reconstruct itself?"
Actually, if you did that the disk would carom around the room like a very fast, very lethal Frisbee and you would be too busy trying to survive to worry about where your data went!
Just a thought
Otherwise, your plan sounds peachy.
Check out HiveCache (Score:3, Informative)
While a pure linux solution seems to score the most points here, this particular one lets you combine your windows, OS X, and linux systems into a single distributed storage mesh. There is safety in numbers, and the more systems you can add to these sort of distributed storage systems the more reliable they become.
HiveCache is more of a backup solution, but I do know that it is possible to use this with a webDAV front-end for archival storage and other intersting storage possibilities.
NBD for Windows (Score:2, Redundant)
(I haven't used this, but it exists)
Re:Comment (Score:2)
http://www.techsoftpl.com/backup/
Re:So... (Score:5, Funny)
If you were a medieval ass-kicker, would you want your moniker to be the butt of thousands of canned-jokes that weren't even funny to begin with?
Hmm...that's like a Beowulf cluster of usb thumb drives...
Yeah. Maybe the cheap super-computer idea Beowulf would find cool, but not the jokes and the impossible-to-Beowulf devices.
So those jokes aren't funny and probably won't get you (not you in particular, Pingular) modded up. If you want to talk about networked clusters of non-networkable devices, say:
"That's like a Duke Nukem Forever/Bit Boys graphics card/Mac OS X on a 386 cluster"
No wait, on second thought, that's not funny either.
Re:I used to do this, years ago.. (Score:3, Insightful)