Reply to this post | Go Back
View Post [edit]

Poster: KirbyMeist Date: Aug 2, 2004 2:08pm

Forum: petabox Subject: Filesystem

I'm wondering, what filesystem are we using for storing this? you think we could get a good performance boost by, say, writing our own distributed filesystem or something? Just a suggestion.

Reply to this post
Reply [edit]

Poster: brewster Date: Aug 2, 2004 3:11pm

Forum: petabox Subject: Re: Filesystem

we are looking for archival qualities so we trade off ultimate performance for preservation and simplicity. we use reiserfs on each node, then we store an "item" in a directory. these items are located by using a udp broadcast to find machines that have it.

simple effective secure.

-brewster

Reply to this post
Reply [edit]

Poster: Plasma 000 Date: Aug 2, 2018 9:58pm

Forum: petabox Subject: Re: Filesystem

Have you considered replacing the broadcast based system with some kind of DHT a la bittorrent for finding the correct node and locating files on the nodes.

Reply to this post
Reply [edit]

Poster: grignak Date: Oct 4, 2004 12:33am

Forum: petabox Subject: Re: Filesystem

> we use reiserfs on each node, then we store an "item" in a directory. these items are located by
> using a udp broadcast to find machines that have it.

Hmm, could we have a peek at that code, maybe? Please?

Reply to this post
Reply [edit]

Poster: foundation Date: Nov 10, 2004 11:50pm

Forum: petabox Subject: Re: Filesystem

So, do you have a special "uploader" that distributes new files to drives with free space?
I've been thinking about writing something like this for my company's image repository, and have been pondering how to represent the physical topology of the network so as to place duplicate copies not on the same drive, controller, server, rack or location if possible.

Reply to this post
Reply [edit]

Poster: brewster Date: Nov 11, 2004 12:39am

Forum: petabox Subject: Re: Filesystem

we pair machines that are sitting next to each other. these rsync between themselves (saving changed or deleted versions).

we also sync between datacenters on different continents.

this gets around alot of types of errors for us while keeping it simple.

-brewster

Reply to this post
Reply [edit]

Poster: foundation Date: Nov 11, 2004 1:18am

Forum: petabox Subject: Re: Filesystem

So do you worry about virii/worms/stupidusers that delete or corrupt a file, which then gets rsynced to be deleted or corrupt on the other box? or do you use something like rsync backup which keeps versions?

Also, if a box goes down do you have a spare box, that then gets rsync with the mirror to become the host? or does that have to happen manually?

Reply to this post
Reply [edit]

Poster: brewster Date: Nov 11, 2004 1:40am

Forum: petabox Subject: Re: Filesystem

we use rsync with a backup directory to allow us to watch for changes and be able to get things back from the trash bin if needed.

Reply to this post
Reply [edit]

Poster: dunno Date: Jun 16, 2005 1:55pm

Forum: petabox Subject: Re: Filesystem

I assume the UDP broadcast system to find which nodes have what I assume is easier to implement, then say, a couple of small dedicated boxes with a database of all of the file locations... it seems that unless you have a small number of large files that the UDP system... well, I'll bust say that I'd say that it seems like a timebomb.

just a spur of the moment thought, but you could have a 2 tier data system, where the first tier is JBOD, and is generally the front end, and a second tier that has the same dataset as the first, except that it used RAID 5 at some level... maybe it could also be stagered time wise, the backup could be 2 days behind the first tier, with a rather reliable pool keeping the changelog between the first tier, and the 2 day old backup tier... that way you'd have all your information in two place, and you'd have some measure of protection against virii type corrpution that bypasses safeguards like redundancy... an well.

Reply to this post
Reply [edit]

Poster: foundation Date: Jul 15, 2005 5:08am

Forum: petabox Subject: Re: Filesystem

At my company (not the archive) we're implementing a large storage system for image storage (almost entirely a write once, read many for some, and read almost never for the rest)
and we are looking at mogilefs. Mogilefs uses mysql to track file locations, and automatically replicates the number of copies required. So you can say I want at all times there to be 2 copies of this data and three copies of this other data. And when you lose a server, it detects a copy is inaccessible and starts replicating a new copy. It does the transfers over http or nfs. Because we have written the front end, we don't need a posix compliant file system, we can use the client libraries. Something to consider for people implementing large systems, and a way to avoid RAID (it's raid-ish over the network really).

Internet Archive Audio

Featured

Top

Images

Featured

Top

Software

Featured

Top

Books

Featured

Top

Video

Featured

Top

Mobile Apps

Browser Extensions

Archive-It Subscription

Save Page Now

Reply to this post | Go Back
View Post [edit]

Poster: KirbyMeist Date: Aug 2, 2004 2:08pm

Forum: petabox Subject: Filesystem

Reply to this post
Reply [edit]

Poster: brewster Date: Aug 2, 2004 3:11pm

Forum: petabox Subject: Re: Filesystem

Reply to this post
Reply [edit]

Poster: Plasma 000 Date: Aug 2, 2018 9:58pm

Forum: petabox Subject: Re: Filesystem

Reply to this post
Reply [edit]

Poster: grignak Date: Oct 4, 2004 12:33am

Forum: petabox Subject: Re: Filesystem

Reply to this post
Reply [edit]

Poster: foundation Date: Nov 10, 2004 11:50pm

Forum: petabox Subject: Re: Filesystem

Reply to this post
Reply [edit]

Poster: brewster Date: Nov 11, 2004 12:39am

Forum: petabox Subject: Re: Filesystem

Reply to this post
Reply [edit]

Poster: foundation Date: Nov 11, 2004 1:18am

Forum: petabox Subject: Re: Filesystem

Reply to this post
Reply [edit]

Poster: brewster Date: Nov 11, 2004 1:40am

Forum: petabox Subject: Re: Filesystem

Reply to this post
Reply [edit]

Poster: dunno Date: Jun 16, 2005 1:55pm

Forum: petabox Subject: Re: Filesystem

Reply to this post
Reply [edit]

Poster: foundation Date: Jul 15, 2005 5:08am

Forum: petabox Subject: Re: Filesystem

Poster:	KirbyMeist	Date:	Aug 2, 2004 2:08pm
Forum:	petabox	Subject:	Filesystem

Poster:	brewster	Date:	Aug 2, 2004 3:11pm
Forum:	petabox	Subject:	Re: Filesystem

Poster:	Plasma 000	Date:	Aug 2, 2018 9:58pm
Forum:	petabox	Subject:	Re: Filesystem

Poster:	grignak	Date:	Oct 4, 2004 12:33am
Forum:	petabox	Subject:	Re: Filesystem

Poster:	foundation	Date:	Nov 10, 2004 11:50pm
Forum:	petabox	Subject:	Re: Filesystem

Poster:	dunno	Date:	Jun 16, 2005 1:55pm
Forum:	petabox	Subject:	Re: Filesystem

Internet Archive Audio

Featured

Top

Images

Featured

Top

Software

Featured

Top

Books

Featured

Top

Video

Featured

Top

Mobile Apps

Browser Extensions

Archive-It Subscription

Save Page Now

Reply to this post | Go Back View Post [edit]

Poster: KirbyMeist Date: Aug 2, 2004 2:08pm Forum: petabox Subject: Filesystem

Reply to this post Reply [edit]

Poster: brewster Date: Aug 2, 2004 3:11pm Forum: petabox Subject: Re: Filesystem

Reply to this post Reply [edit]

Poster: Plasma 000 Date: Aug 2, 2018 9:58pm Forum: petabox Subject: Re: Filesystem

Reply to this post Reply [edit]

Poster: grignak Date: Oct 4, 2004 12:33am Forum: petabox Subject: Re: Filesystem

Reply to this post Reply [edit]

Poster: foundation Date: Nov 10, 2004 11:50pm Forum: petabox Subject: Re: Filesystem

Reply to this post Reply [edit]

Poster: brewster Date: Nov 11, 2004 12:39am Forum: petabox Subject: Re: Filesystem

Reply to this post Reply [edit]

Poster: foundation Date: Nov 11, 2004 1:18am Forum: petabox Subject: Re: Filesystem

Reply to this post Reply [edit]

Poster: brewster Date: Nov 11, 2004 1:40am Forum: petabox Subject: Re: Filesystem

Reply to this post Reply [edit]

Poster: dunno Date: Jun 16, 2005 1:55pm Forum: petabox Subject: Re: Filesystem

Reply to this post Reply [edit]

Poster: foundation Date: Jul 15, 2005 5:08am Forum: petabox Subject: Re: Filesystem

Reply to this post | Go Back
View Post [edit]

Poster: KirbyMeist Date: Aug 2, 2004 2:08pm

Forum: petabox Subject: Filesystem

Reply to this post
Reply [edit]

Poster: brewster Date: Aug 2, 2004 3:11pm

Forum: petabox Subject: Re: Filesystem

Reply to this post
Reply [edit]

Poster: Plasma 000 Date: Aug 2, 2018 9:58pm

Forum: petabox Subject: Re: Filesystem

Reply to this post
Reply [edit]

Poster: grignak Date: Oct 4, 2004 12:33am

Forum: petabox Subject: Re: Filesystem

Reply to this post
Reply [edit]

Poster: foundation Date: Nov 10, 2004 11:50pm

Forum: petabox Subject: Re: Filesystem

Reply to this post
Reply [edit]

Poster: brewster Date: Nov 11, 2004 12:39am

Forum: petabox Subject: Re: Filesystem

Reply to this post
Reply [edit]

Poster: foundation Date: Nov 11, 2004 1:18am

Forum: petabox Subject: Re: Filesystem

Reply to this post
Reply [edit]

Poster: brewster Date: Nov 11, 2004 1:40am

Forum: petabox Subject: Re: Filesystem

Reply to this post
Reply [edit]

Poster: dunno Date: Jun 16, 2005 1:55pm

Forum: petabox Subject: Re: Filesystem

Reply to this post
Reply [edit]

Poster: foundation Date: Jul 15, 2005 5:08am

Forum: petabox Subject: Re: Filesystem