Skip to main content

View Post [edit]

Poster: CoJaBo Date: Aug 16, 2011 11:21pm
Forum: web Subject: Why is Archived content purged retroactively? [Was: Re: Ezboard content suddenly not available in the new system - why?]

The domain ezboard.com now redirects to a domain "parking" page. This usually happens when a popular domain name expires and is taken over by a domain reseller. Standard practice for these pages is to exclude them (using robots.txt) from search engines, as they are web portals with no useful content (and indeed, are often search engines themselves). The problem is that the Archive interprets robots.txt *retroactively*- that is, if a future owner of a domain name decides that its content is unfit for web robots, then the Archive robot flags the content of *all* prior owners of that domain as unfit and excludes them from access.

I too am *very* interested in the official response to this, as it essentially means that when a domain expires, its prior content is indefinitely purged from the Archive. As the primary intent of the Archive is to preserve defunct sites, and defunct sites will invariably have their domain names expire, I am confused as to why such a careless measure would be implemented.

The obvious solution would be to only purge retroactively when the IA robot User-Agent is declared *specifically* in robots.txt, which would allow content owners to erase their previous content if they so desire. (If a declaration is made for any User-Agent, the Archive robot would still skip crawling the page as normal, but would not apply the restriction retroactively.) This still has the issue of a future owner electing to purge and wiping out a previous owner's content, but it would avoid 99% of the *unintended* deletions that now occur. Why has this not been done?

Reply [edit]

Poster: Jeremy Walker Date: Sep 12, 2018 8:40am
Forum: web Subject: Re: Why is Archived content purged retroactively? [Was: Re: Ezboard content suddenly not available in the new system - why?]

EZBoard no longer redirects to a site with a robots.txt that blocks everything. It now redirects to https://www.tapatalk.com, and https://www.tapatalk.com/robots.txt only restricts access to the "topic" directory:

User-agent: *
Disallow: /topic/

Does this mean that the archive can restore EZ Board content now?

Reply [edit]

Poster: eloyesp Date: Aug 18, 2011 9:22pm
Forum: web Subject: Re: Why is Archived content purged retroactively? [Was: Re: Ezboard content suddenly not available in the new system - why?]

This is terrible, I cannot access queridonotepad.tk (has expired). Has it been erased? For ever?

This was one of the first sites from Santa Fe, Argentina with original content, I need to show it to others to show it still possible.

Reply [edit]

Poster: MissTheOldOne Date: Sep 10, 2011 6:40am
Forum: web Subject: Re: Why is Archived content purged retroactively? [Was: Re: Ezboard content suddenly not available in the new system - why?]

This is a horrible problem, and given the tendency of domains to expire over long periods of time, the current implementation guarantees that the archive will devolved into irrelevancy over time.

A technical solution is needed.

Thoughts?

Reply [edit]

Poster: jory2 Date: Sep 10, 2011 10:12am
Forum: web Subject: Re: Why is Archived content purged retroactively? [Was: Re: Ezboard content suddenly not available in the new system - why?]

Website owners (content owners) are not legally required to
put a robot.txt file of their copyright protected (websites) "Works".
The Copyright Act affords protection to “original works of authorship fixed in any tangible medium of expression." 75.
Works published on the Internet are fully protected and subject to the same qualifications and limitations as non-digital works. 76
Digital works are “fixed” if they can be perceived, reproduced, or otherwise communicated for more than a transitory duration. 77
The copyright owner of a website or web content has the same exclusive rights under the Copyright Act as copyright holders of non-digital works. 78
Under a strict interpretation of copyright law, archiving meets the threshold of copyright infringement and therefore makes archivists liable without a defense.
Archiving violates three exclusive rights of copyright owners:
the right to reproduction, 79
the right to distribution, 80
and the right to display.81

[ 75. 17 U.S.C. § 102 (2006). 76. They are still required to meet the subject matter requirements of § 102, must be fixed in a tangible medium, and be original works of authorship. 77. Whether fixation on a computer hard disk or random access memory (RAM) is enough is now controversial. Compare Triad Sys. v. SoutheasterExpress Co., 64 F.3d 1330, 1333 (9th Cir. 1995) (granting preliminary injunction when defendant copied soft-ware into RAM of computer) and MAI Sys. v. Peak Computers, 991 F.2d 511, 518 (9th Cir. 1993) (finding that the “representation created in the RAM is ‘sufficiently permanent or stable to permit it to be perceived, reproduced, or otherwise communicated for a period of more than transitory duration’”), with The Cartoon Network L.P. v. CSC Holdings, Inc., 536 F.3d 121, 129-30 (2d Cir. 2008) (finding that the cable television company’s embodiments of copyrighted television programs and movies in data buffers under 1.2 seconds did not last for a period of more than a transitory duration and therefore were not “fixed”) and CoStar Group, Inc. v. LoopNet, Inc., 373 F.3d 544, 551 (4th Cir. 2004) (holding information and data downloaded onto a user’s RAM are not “fixed” because they are for no more than a transitory duration). 78. Those rights include: the right of reproduction (i.e. copying), the right to display, the right to prepare derivative works, and the right to distribute. 17 U.S.C. § 106 (2006). 79. § 106(1). 80. § 106(3). 81. § 106(5). ]

First, archives necessarily make copies of each new webpage as their software crawls the Internet. This step is essential to the project of preservation. Second, archives distribute the copied web content when they make the material available via their website.
Lastly, archives violate the exclusive right to display copyrighted material when they make the pages available on a website that is open to the public. 82
As a result, digital archives like the Internet Archive will be found liable for copyright infringement without an exception or defense.
There are statutory exceptions to the exclusive rights of copyright holders like the library exception, 83, or the DMCA safe harbor for ISPs, 84.
Unfortunately, there are currently no similar exceptions for digital preservation by archives and their strongest defense, fair use, 85, is unpredictable, fact intensive, and uncertain.
The “library exception” of section 108 of the Copyright Act, 86, is a narrow limitation on the exclusive rights of copyright holders.
The primary purpose of the library provisions is to promote access to copyrighted works and reinforce preservation, while safeguarding against the commercial sale of works being supplanted by copying. 87.
While the library exception allows for preservation, flexibility, and access to knowledge by the public, it also reflects concerns about the unauthorized commercial exploitation of copyrighted works and disruption of markets.
Under section 108, a library may make a maximum of only three copies of a published work to replace a damaged, deteriorating, lost, or stolen copy, or if the existing format of the work becomes obsolete. Any copies must be only for the library’s own use. 88.
Libraries are only allowed isolated and unrelated reproduction or distribution of a single copy of copyrighted materials for patrons requesting library materials. 89
[ 82. See H.R. REP. 94-1476 at 64 (1976) (“'[D]isplay’ would include . . . the showing of an image on a cathode ray tube, or similar viewing apparatus connected with any sort of information storage and retrieval system.”).
83. 17 U.S.C. § 108 (2006). 84. 17 U.S.C. § 512 (2006). 85. 17 U.S.C. § 107 (2006). 86. § 108.
87. See Menell, supra note 9, at 1034-35 (explaining that the provisions “augment the general fair use privilege and afford libraries greater leeway in copying and distributing copyrighted works”). 88. § 108(c). 89. § 108(g). ]

Public libraries and archives are exempted from liability for the reproduction or distribution of a single copy of work, as long as the reproduction or distribution is not for commercial advantage, the collections are publicly available, and a notice of copyright is included in the reproduction or distribution of the work. 90.
Furthermore, nonprofit libraries, archives, and educational institutions are exempted from liability for circumventing technological protection measures to the extent necessary to determine whether to add copyrighted works to their collections. 91
Digital archives do not fall under the section 108 library exception. First, the exception does not apply to material the library or archive does not own. Second, under section 108 a library cannot distribute digital copies or make them available to patrons outside the library premises. 92
Furthermore, although the DMCA allows for the digital preservation of copyrighted works, it states that pure digital libraries and archives that exist only on the Internet are not part of the library exception.
The legislative history clearly shows congressional intent not to extend the library exception to libraries and archives existing wholly on the Internet.
The Senate Judiciary Committee stated:
Although online interactive digital networks have since given birth to online digital ‘libraries’ and ‘archives’ that exist only in the visual (rather than physical) sense on websites, bulletin board and homepages across the Internet, it is not the Committee’s intent that [17 U.S.C. § 108] as revised apply to such collections of information. The ease with which such sites are established on-line literally allows anyone to create his or her own digital ‘library’ or ‘archives.’ The extension of the application of section 108 to all such sites would be tantamount to creating an exception to the exclusive rights of copyright holders that would permit any person who has an online website, bulletin board or a homepage to freely reproduce and distribute copyrighted works. Such an exemption would swallow the general rule and severely impair the copyright owners’ right and ability to commercially exploit their copyrighted works. 93

[ 90. § 108(a). 91. Digital Millennium Copyright Act, § 103, 112 Stat. 2866 (codified at 17 U.S.C. § 1201(d) (2006)). 92. See § 108(b)-(c) (2006). However, the exception allows libraries and archives to reproduce, distribute, display, or perform in digital form a copy of a copyrighted work during the last twenty years of any term of copyright for purposes of preservation, scholarship, or research as long the work is not still being commercially exploited, the work can not be obtained at a reasonable price, or the copyright holder does not provide notice that one of the above conditions applies. § 108 (h)(1). 93. S. REP. NO. 105-190 at 62 (1998). ]

Reply [edit]

Poster: Detective John Carter of Mars Date: Sep 11, 2011 8:02am
Forum: web Subject: Re: Why is Archived content purged retroactively? [Was: Re: Ezboard content suddenly not available in the new system - why?]

@The legislative history clearly shows congressional intent not to extend the library exception to libraries and archives existing wholly on the Internet. The Senate Judiciary Committee stated googled that random bit and maybe from copyright 2009 journal article according to https://litigation-essentials.lexisnexis.com/webcd/app?action=DocumentDisplay&crawlid=1&srctype=smi&srcid=3B15&doctype=cite&docid=24+Berkeley+Tech.+L.J.+437&key=3a1575b667a332f7df14ab57de5007f2 @of section 108 of the Copyright Act, 86, is a narrow limitation on the exclusive rights of copyright holders. same link found you forgot to cite? more ref links for article http://scholar.google.com/scholar?cluster=9674958588746884061&hl=en&as_sdt=0,21&sciodt=0,21 http://heinonline.org/HOL/LandingPage?collection=journals&handle=hein.journals/berktech24&div=20&id=&page= (excerpt intro with maybe full article for purchase) article is CC BY NC ND 3.0 http://creativecommons.org/licenses/by-nc-nd/3.0/ so its use needs the BY part: 2009 Alyssa N. Knutson
This post was modified by Detective John Carter of Mars on 2011-09-11 15:02:40

Reply [edit]

Poster: jory2 Date: Sep 11, 2011 8:42am
Forum: web Subject: Re: Why is Archived content purged retroactively? [Was: Re: Ezboard content suddenly not available in the new system - why?]

I thought it a very informative paper as well, thank you for citing the author, I thought I had?
RE: Creative Commons
A webmaster / site owner would not be within his legal rights to transfer to any third party rights to content he is not the rightful owner of.
When you consider websites contain material's not owned by the webmaster but are leased with a non-exclusive time sensitive non-transferable limited license, creative commons would be very difficult.
Unless your reference to Creative Commons had nothing to do with websites?



Reply [edit]

Poster: Detective John Carter of Mars Date: Sep 11, 2011 9:36am
Forum: web Subject: Re: Why is Archived content purged retroactively? [Was: Re: Ezboard content suddenly not available in the new system - why?]

@your reference to Creative Commons had nothing to do with websites? right, Knutson the author of the law journal article you used licensed her work, noted in first part of the footer on the first page of 24 Berkeley Tech. L.J. 437 (2009) Proceed with Caution: How Digital Archives Have Been Left in the Dark; Knutson, Alyssa N.
This post was modified by Detective John Carter of Mars on 2011-09-11 16:36:24

Reply [edit]

Poster: jory2 Date: Sep 11, 2011 9:33am
Forum: web Subject: Re: Why is Archived content purged retroactively? [Was: Re: Ezboard content suddenly not available in the new system - why?]

The entire paper is available for download if anyone wants to read it.
I believe you posted the link to PDF that's been made available.
Would you happen to know how long the moderator Jeff take to respond to questions? If at all?

Reply [edit]

Poster: hochstenbach Date: Oct 14, 2011 11:33am
Forum: web Subject: Re: Why is Archived content purged retroactively? [Was: Re: Ezboard content suddenly not available in the new system - why?]

Same here. Our university changed domain names from rug.ac.be to ugent.be in 2000. All the old pages are now not accessible. This is very bad news.

Pat