Skip to main content

View Post [edit]

Poster: gojomo Date: Apr 7, 2011 2:36pm
Forum: web Subject: Forum reminders: adds, removes

A reminder: we don't accept 'add site' requests here. They will be ignored and deleted. (Any links in posts here, including as message signatures, may also be deleted.) See this FAQ entry about having your site archived: http://www.archive.org/about/faqs.php#1 More generally: if your site is well-linked from elsewhere, and open to crawling, the archival crawls by the Internet Archive and our partners will eventually find it. Also: you shouldn't request removals here, either. See this FAQ entry for information about our exclusion policies and how to effect exclusion via either robots.txt or direct request. http://www.archive.org/about/faqs.php#2 - Gordon @ IA
This post was modified by gojomo on 2011-04-07 21:36:52

Reply [edit]

Poster: purplemath Date: Jan 23, 2011 5:33am
Forum: web Subject: Re: Forum reminders: adds, removes

When sites are added, crawled, archived, whatever, where will they be available (since archives for the last three years or so are not available here)?

Thank you.

Reply [edit]

Poster: gojomo Date: Jan 24, 2011 3:30pm
Forum: web Subject: Re: Forum reminders: adds, removes

Please see the new Wyaback Machine now in public beta testing at...

http://waybackmachine.org

...for the most recent material.

- Gordon @ IA

Reply [edit]

Poster: schwarma Date: Feb 21, 2011 9:45am
Forum: web Subject: Re: Forum reminders: adds, removes

The FAQ says to change your robots.txt file to exclude ia_archiver, then "submit your site on the form on http://www.alexa.com/help/webmasters#crawl_site" where there is no form anymore. How do I ask the wayback machine to re-spider my site so it'll notice my robots.txt and exclude the historical versions of my site now?

Reply [edit]

Poster: gojomo Date: Feb 22, 2011 3:23pm
Forum: web Subject: Re: Forum reminders: adds, removes

The only FAQ reference to any forms at Alexa (which as you note have gone away) should be in relation to *adding* your site.

Any changes to your robots.txt should be noticed within 24 hours, perhaps much sooner. (Active crawling usually requests robots.txt just before visiting a site, and then every 6-12 hours thereafter. The Wayback Machine generally checks robots.txt as needed, but can remember a previous version for up to 24 hours.)

If it has been more than 24 hours and your changes do not seem to have been noticed, and using an online robots.txt validator suggests all is well with your format, please let us know and we'll investigate further.

- Gordon @ IA