Skip to main content

View Post [edit]

Poster: illtud_llgc Date: May 11, 2004 10:03pm
Forum: petabox Subject: Re: Selective powering of large petasites

Richard,

I would suggest that your needs would best be served by a tape-library, or probably a HSM solution where a portion of the tape-library's content is cached on disk. Presumably a few minutes' wait would not be a problem whilst accessing full-quality bitstreams. Tape libraries can also give you automated tape duplication for offsite storage (disaster recovery) and media refreshing (digital preservation). They also give a lot less problems with regard to power and climate issues (lots of disks equal lots of heat).

Here at the National Library of Wales we've only a smallish (tens of terabytes) tape library, but if cost is not an option, Hitachi or Sony will gladly sell you larger solutions. Your main headache will be the development of the management and cataloging side.

Reply [edit]

Poster: brewster Date: May 11, 2004 11:12pm
Forum: petabox Subject: Re: Selective powering of large petasites

Richard and "illtud"--

Thank you for the notes. We have had some experience with both tapes and hard drives at the Internet Archive and Television Archive, all of which points to the solution of keeping multiple copies and as active as possible.

At the Television Archive, which has holdings closing in on a petabyte, it started on tape and is now recording on hard drives that are kept offline. We dont have much experience on this collection on reading it back except reading back Sept-11-2001 to Sept 18, and it all worked fine.

A bigger tape experiment was trying to read 1000 DLT tapes recorded by the Internet Archive from 1996-1999 and had faults that made some tapes difficult to read and some limited data was lost. It was also very slow to read (took months of an administrators time). Since then, all data is recorded onto hard drives that are kept online.

Disks spinning seem to have a failure rate of 6% per year, but we are working on better measurements. When a disk "fails" it does not always lose data, or sometimes only one block, so recovery can be effective. But this means we should not keep one copy.

Our data protection system is to have at least 2 copies and preferably in distant locations (we have found that human error accounts for real loss as well, so having different administrative bodies helps). We keep copies in San Francisco and at the Library of Alexandria in Egypt.

We are developing the petabox for exactly this reason. It is bottom up designed for reliability, low power, and low cost. The low cost means that we can have 2 or more copies of even large datasets.

I am in Europe for the next 2 months setting up a European Internet Archive that will host those machines in Amsterdam. I would be very interested in talking with anyone about what we are doing if this is of interest.

I can be reached directly at brewster (at) archive.org

-brewster
Digital Librarian

Reply [edit]

Poster: JTW Date: May 12, 2004 4:42pm
Forum: petabox Subject: Re: Selective powering of large petasites

Like you’ve been saying once you going beyond a couple Terabytes of data most is rarely accessed again or in some cases never again. This is one of the problems I see with very large databases. We have massive servers run 24/7 that only has about 1% of the data stored in it used and 97% of that data was added in the last 3-6 months. But because of the database software we’re running (and a management decision) all the data is stored and always powered on in one large database. But to the part that might be of interest to you, we also store reports long-term.

I’ve been looking into is having a system which has “on demand wake up” functionality for these reports. The “computers” and more importantly Hard Drives spend most of their time turned off in a sleep mode i.e. actually off and using a Network signal to the BIOS to bring them back to life when required. For large archives this could save thousands in power consumption, heat problems and should cut down hard drive failure rates. From a management side of things, if it’s possible to figure out in advance what is going to be accesses least, place them in the this long-term storage computer system, while keeping the more highly demanded data in always on subsystem.

From a topology point of view everything seems to be online 24/7 but in actuality it’s the requests for data that drive what systems are currently powered up. I’m on the prowl to see if anybody else is doing this before I invest time into creating our own solution for feasibility testing with off the shelf components and Linux. Initial with 4 boxes single 100GB drives (400GB total of data) and a master control. This controller will mount all the subsystem with NFS or SAMBA depend on the OS that worked best for hibernation / suspend modes. The idea being it’s only when someone access data in those subdirectories on the master controller that the other computer will power-up. Of course there needs to be some controlling program that knows the location of all the data you have, meaning you can’t just let the user start browsing the network looking for files as the systems will end up starting and stop ever couple of minutes.

Reply [edit]

Poster: brewster Date: May 12, 2004 5:32pm
Forum: petabox Subject: Re: Selective powering of large petasites

The Library of Alexandria has a copy of much of the web collection of the Internet Archive. They run their systems with a sleep after 3 minutes of inactivity setting. They report it works fine. In a separate test by Bruce Baumgart, he found it takes 9-10 seconds to spin a disk back up.

We have not done a large-scale test of this approach, but it sounds promising for many applications.

The petabox with spun-down disks would save 1/2 the power.

-brewster

Reply [edit]

Poster: Jp7733 Date: Feb 5, 2022 11:50am
Forum: petabox Subject: Re: Selective powering of large petasites

Question. You said 9 to 10 seconds to split a disk back up. I think of a basic windows 98 tower maybe a cd drive, I'm leaning more towards a floppy disc though. I don't know too much about those other than the endurance and reliability is unquestionable. As to the main question though, laser read or no?

Reply [edit]

Poster: Jp7733 Date: Feb 5, 2022 11:54am
Forum: petabox Subject: Re: Selective powering of large petasites

I created a formula that not only corrects itself and regenerates memory, it adds 330 kb of memory for ever 8mg new data .. please reply. I need pointers on 2 missing pieces.. it is almost flawless...problem is it does not stop, and rather than regenerating or making new "cells" it just generates and becomes useless z, or xsv files empty and wasted though