Skip to main content

View Post [edit]

Poster: Michael Birk Date: Mar 7, 2005 4:08pm
Forum: etree Subject: Re: Why doesn't the archive allow files to resume?

The problem is that the archive.org Live Audio Collection doesn't support resumption of .zip files. It doesn't matter which browser, HTTP client, or download manager you use. If it is a .zip file, you *can't* resume it. If it is any other file type (.mp3, .ogg, .shn), then you *can* resume it. [Note: My guess is that they are creating uncompressed .zip files "on the fly." This would reduce the required disk space by 50%; however, it makes implementing the HTTP resume feature more difficult (but not impossible).] There is some good news, however -- those ".m3u" files for streaming simply contain the list of files that you want to download! So here are step-by-step instructions for downloading all of the MP3 files for a show on Microsoft Windows XP (something similar should work on older Windows as well): 1. Download and install the program "wget.exe" from http://www.interlog.com/~tcharron/wgetwin-1_5_3_1-binary.zip -- this is a "command-line download manager." Put the "wget.exe" file in the "c:\" directory: "c:\wget.exe". (Note: Other versions, such as Cygwin, should work too.) 2. Click on "My Computer", and choose "Tools/Folder Options..." from the menu. 3. Click on the "File Types" tab and scroll down in the list until you see the "M3U file" file type. 4. Click on the "M3U file" type and click the "Advanced" button on the buttom-right. This pops up the "Edit File Type" dialog box. 5. Click on the "New..." button. This brings up the "New Action" dialog box. 6. Type in "Download" in the blank "Action" field. 7. In the blank "Application used to perform action" field, enter in exactly the following: c:\wget.exe -c -P c:\incoming -i "%1" 8. Click on the "OK" button to close the dialog box. 9. Click on the "Set Default" button, so that the "Download" command is highlighted. 10. Click on the "OK" button. Whew! Now, to download all of the MP3's for a show, simply right-click on the "Stream" link (either "64 Kbps M3U" or "VBR M3U") and choose "Save Target As..." from the context menu. Save the .m3u file anywhere -- it doesn't really matter. The .m3u file should download immediately. When it is done, choose "Open" from the "Download complete" dialog box. This will start downloading all of the .mp3 files. An icky black window will open up, and you can monitor your progress. If your download is interrupted, simply find the .m3u file and double-click it (or right-click and choose "Download"). It will pick up right where it left off, skipping any .mp3 files that have already completed. All of the downloaded .mp3 files will be located in the "c:\incoming" folder. You can change this by adjusting the command above. Hope this helps, mcb
This post was modified by Michael Birk on 2005-03-08 00:08:04

Reply [edit]

Poster: Brad Leblanc Date: Mar 8, 2005 2:28am
Forum: etree Subject: Re: Why doesn't the archive allow files to resume?

The problem is that the archive.org Live Audio Collection doesn't support resumption of .zip files. It doesn't matter which browser, HTTP client, or download manager you use. If it is a .zip file, you *can't* resume it. If it is any other file type (.mp3, .ogg, .shn), then you *can* resume it. This is true for recordings that were uploaded prior to Jan 2004. Anything uploaded after that month has a Zip file created and hosted inside of the show folder, so you can log in via FTP, see that file, and download it (and resume when you get disconnected). We are now creating and hosting zips of the lossless files, the 64kb MP3's, and the VBR MP3's (where applicable - some items don't have lossy files) for all new items. Eventually we will implement this for all the older items too and the "zip on the fly" issues will go away. Great instructions for circumventing the issue for now though Michael! Thanks! -Brad
This post was modified by Brad Leblanc on 2005-03-08 10:28:46

Reply [edit]

Poster: Michael Birk Date: Mar 8, 2005 2:27am
Forum: etree Subject: Re: Why doesn't the archive allow files to resume?

Thanks for the correction. However, is there any way to know beforehand whether the .zip file supports resumption? I don't see any upload dates on the details pages.

Reply [edit]

Poster: Brad Leblanc Date: Mar 8, 2005 2:52am
Forum: etree Subject: Re: Why doesn't the archive allow files to resume?

Sure, if you look at the URL used for the ZIP on this item: http://www.archive.org/compress/sci2000-01-27.dsbd.shnf And for this item: http://www.archive.org/download/sci1998-04-16.flac16/sci1998-04-16.flac16_flac.zip The "Compress" vs. "Download" in the URL is your tip-off if it's zip on the fly or already created. HTH -Brad
This post was modified by Brad Leblanc on 2005-03-08 10:52:00

Reply [edit]

Poster: Michael Birk Date: Mar 8, 2005 2:45am
Forum: etree Subject: Re: Why doesn't the archive allow files to resume?

Using HTTP, I was unable to resume an aborted download of either one of those files. More specifically, both responses ignored the HTTP "Range" header and returned a "200 OK" rather than the required "206 Partial Content".

I did notice some differences between the two, however. The first one (with the "compress" in the URL) returned a non-standard "X-Content-Minimum-Length" header rather than the (important) "Content-Length" header.

Does "resume" for these files only work with FTP? Let me know if there is anything I can do to help get HTTP-based resume working. If disk space is an issue (obviously you have tons ;-), it may be better to fix the zip-on-the-fly script rather than statically creating all of those .zip files.

Reply [edit]

Poster: Brad Leblanc Date: Mar 8, 2005 4:06am
Forum: etree Subject: Re: Why doesn't the archive allow files to resume?

it may be better to fix the zip-on-the-fly script rather than statically creating all of those .zip files. Last time it was discussed (off-forum) we wanted to do away with it because it is a very CPU intensive solution and is not very scalable. -- Imagine 150 people downloading 150 different recordings from 1 server, with a single 1.8 GHz CPU - all while using "Zip on the Fly" - that's 150 simultaneous threads not only transferring info but trying to compress it while doing so. :) Does "resume" for these files only work with FTP? I don't know the answer to this, I think other fans have had success resuming zip files (that are complete and *not* on-the-fly) with HTTP here, but I never use it. IMO - FTP is a much better solution for transferring large files. It always resumes, and it easily allows you to queue up a bunch of items and walk away. If disk space is an issue We're moving to one of these later in 2005: http://www.archive.org/web/petabox.php Early estimates are between 500 and 1000 terabytes (1024TB = 1 petabyte). Possibly bigger. The current LMA collection is somewhere between 20 and 35 terabytes. Not all of that room is being assigned to LMA expansion, but you get the idea... :) -Brad
This post was modified by Brad Leblanc on 2005-03-08 12:06:13

Reply [edit]

Poster: Michael Birk Date: Mar 8, 2005 6:22am
Forum: etree Subject: Re: Why doesn't the archive allow files to resume?

Well, my apologies if I am re-hashing an old conversation. However, a few points:

It should be possible to do scalable, on-the-fly zipping that supports resumption. There is no need to use ZIP file compression, since these audio files are already compressed (with MP3, OGG, Shorten, or Flac). Without the ZIP compression, it should not be CPU-intensive.

HTTP resumption pretty much works for all clients, assuming the server supports it. As we are discussing, it is a bit tricky to implement for dynamic content, but certainly not impossible.

There are some advantages to HTTP over FTP for content distribution (even large files). In particular, caching is much more straightforward, since the HTTP protocol specifically supports it.

I sent an email last night to info@archive.org offering to help with the on-the-fly-zip. Any chance you will take me up on the offer? If I just implement it as, say, a PHP script, could you use it?

thanks,
mcb

p.s. The petabox looks pretty cool! :-) However, if you store the .zip files, is it really a 500-gigabox?

Reply [edit]

Poster: Brad Leblanc Date: Mar 8, 2005 10:50am
Forum: etree Subject: Re: Why doesn't the archive allow files to resume?

It should be possible to do scalable, on-the-fly zipping that supports resumption. There is no need to use ZIP file compression, since these audio files are already compressed (with MP3, OGG, Shorten, or Flac). Without the ZIP compression, it should not be CPU-intensive Well, I guess I'm still at the point where I don't see the benefit. If space isn't an issue, what does the on-the-fly stuff gain us? I sent an email last night to info@archive.org offering to help with the on-the-fly-zip. Any chance you will take me up on the offer? If I just implement it as, say, a PHP script, could you use it? I responded to that around 2 or 3 this afternoon Michael. Not sure why you haven't seen it yet. Let me know if I need to resend. If I just implement it as, say, a PHP script, could you use it? I'm not the person that will be implementing it (I'm just a librarian and middleman for the real engineers), but I guess if you can convince me of why we should use on-the-fly then I will send it to them. If we're retiring it to free up resources (CPU), what does keeping it around help with? We appreciate your offer to help. The petabox looks pretty cool! :-) However, if you store the .zip files, is it really a 500-gigabox? No, it's a 500,000 gigabox, or a 500 terabox. :) And when that fills up in 10-15 years we talk about rolling in another bigger one. We'll see... -Brad
This post was modified by Brad Leblanc on 2005-03-08 18:50:31

Reply [edit]

Poster: Diana Hamilton Date: Mar 8, 2005 11:07pm
Forum: etree Subject: Re: Why doesn't the archive allow files to resume?

No, it's a 500,000 gigabox, or a 500 terabox. :) And when that fills up in 10-15 years we talk about rolling in another bigger one. We'll see...

Gosh, remember when we had etree01 and etree02 and imagining in this forum when we'd be up to etree38 and etree39... "yeah, that will be really cool". Same feeling here and now. :)

Reply [edit]

Poster: Michael Birk Date: Mar 7, 2005 4:18pm
Forum: etree Subject: Re: Why doesn't the archive allow files to resume?

Oh ... for you command-line junkies, you can do all of this in one line (assuming you use the bash shell):

download () { wget -O - "$@" | wget -c -i -; }

Just put that in your .bashrc (or .bash_profile). To download, right-click on the "Stream" link and choose "Copy Shortcut" to get the URL in your clipboard. Then just type

download "the-url"

(Of course, paste the real url inside the quotes.)