___ ______ _____ _ _ _____ _ _ _____ _____ _____ ___ ___ ___ / _ \ | ___ \/ __ \| | | |_ _| | | | ___|_ _| ___|/ _ \ | \/ | / /_\ \| |_/ /| / \/| |_| | | | | | | | |__ | | | |__ / /_\ \| . . | | _ || / | | | _ | | | | | | | __| | | | __|| _ || |\/| | | | | || |\ \ | \__/\| | | |_| |_\ \_/ / |___ | | | |___| | | || | | | \_| |_/\_| \_| \____/\_| |_/\___/ \___/\____/ \_/ \____/\_| |_/\_| |_/ we are going to rescue your shit P R E S E N T S THE ARCHIVE TEAM ANNIVERSARY GEOCITIES TORRENT VERSION 1.0 or "Your webpage isn't classy without a MIDI soundtrack background" or "Seriously, what the shit, Yahoo!?" ========================================================================= HERE IS THE IMPORTANT MESSAGE WHICH YOU SHOULD READ BEFORE DOING TOO MUCH ========================================================================= This is a collection of Geocities data downloaded by a bunch of people who call themselves ARCHIVE TEAM, who began scraping the Yahoo! Geocities site during a six month period in 2009, before Yahoo! shut down geocities.com on October 26th, 2009. This collection is compressed in a UNIX filesystem with both 7zip archives and tape archives (gtar). If you're a bit of a data tourist and just want to waft in the scent of a web era gone by, please go to one of the Geocities mirrors that were put up in the wake of the end of Geocities. As of this writing, these mirrors include: http://www.reocities.com http://www.geocities.ws http://www.geociti.es http://www.oocities.org/ You'll get your fix and you won't go into internet rage when you find you downloaded hundreds of gigabytes of THING YOU DO NOT WANT. ========================================================================= This collection was put together by nearly 100 folks assembling at the news of the death of Geocities, a website that allowed free hosting of web pages from roughly 1994 (in beta) to 2009. In 1999, it was purchased by Yahoo! for three billion dollars. We're not kidding here: billion with a b. At the time of the purchase, Geocities was the THIRD most popular website on the Internet. Even by the time of its shutdown, it was in the top 250. We don't have complete rock-solid knowledge of why it was shut down, but all signs point to Yahoo! trying to get back to basics (like, uh, having a huge audience?) and Geocities magically didn't fall into this new "focus", and lacked any internal cheerleader to make it last through meetings. Yahoo! succeeded in destroying the most amount of history in the shortest amount of time, certainly on purpose, in known memory. Millions of files, user accounts, all gone. We are unsure how much of Geocities was rescued in this package you have, but we do know we got enough for it to represent a good amount. Attempts to contact Yahoo! to get any hard numbers were consistently rebuffed; we suspect even Yahoo! didn't know exactly how many accounts and files they had. As mentioned in the IMPORTANT MESSAGE, others were concurrently downloading Geocities and used alternate methods of discovery, so our datasets do not overlap 100%. The hope is that more will contribute datasets over time and a good amount of Geocities will be available for study. =========================================================================== SO WHO IN THE GOOD GODDAMN WOULD WANT ALL OF THESE FILES =========================================================================== While we don't feel the need to act like a 1950s commercial inventing new ways to use hula hoops and baking powder, the most likely candidates for this Geocities Anniversary Collection are researchers, scientists, historians and developers who wish to work with a large collection of information hand-made by millions of free labor. We forsee application tests, sociology studies, academic articles and history tests putting this to good use. Our job is not to find a use for it. Our job was to save it. Now we're giving it to whoever wants it. ============================================================================ DISCLAIMER ============================================================================ If you go "but what about...." when you think about the repercussions of having this data set, please save us all a lot of trouble and just delete it off your hard drive and go watch some tv and don't talk of it again. ============================================================================ THE VERY BORING BUT PROBABLY RATHER IMPORTANT TECHNICAL NOTES FOR YOU ============================================================================ Inside this torrent collection are the following directories: ARCHIVES GEOCITIES LOWERCASE MEDIA NUMBERS SUBSITES UPPERCASE WORKSHOP YAHOO MEDIA is just a quick set of press releases from Yahoo! and an mp3 interview about Archive Team and the importance of saving this digital history. The rest are collections of .7z files. 7z is an archive format called 7ZIP. To unpack these archives, use 7zip to create... well, a bunch of large files. These large files are GNU Tar archives, which will then recreate a collection of directories related to Geocities. And then it gets weird. As a scraper (wget) was used to get these many files, and the resulting set of data was very huge, these collections of archives were then sorted down by some rough headings. So UPPERCASE are Yahoo! IDs on geocities (something like http://www.geocities.com/DigitalHolocaust) that started with an uppercase letter. LOWERCASE are lowercase, like http://www.geocities.com/deletegeocities. NUMBERS began with numbers, like http://www.geocities.com/69convent. WORKSHOP is our own junkbins of lists, scripts, and other tools used for getting Geocities and the URL sets we combined together with lots of google and other searches to find some seeds to grab items. Almost nobody wants this, trust us, we're just providing you what we generated along the way. As you run scrapers, they sometimes span hosts and come out with a bunch of other sites. This is what's in SUBSITES. Finally, GEOCITIES is the www.geocities.com site, with TONS of links over to a /geocities/YAHOOIDS directory structure that UPPERCASE, LOWERCASE, and NUMBERS created. Make sense? Well, you'll figure it out. =============================================================================== http://www.archiveteam.org WE ARE GOING TO RESCUE YOUR SHIT =============================================================================== Dropped on the world on October 29, 2010