Skip to main content

View Post [edit]

Poster: Tom Gally Date: Dec 11, 2009 6:20pm
Forum: texts Subject: Missing images from Google book scans

Nearly every day, I check the following two URLs to see what books have been added to the Americana text collection, and nearly every day I come across unusual, interesting, and wonderful books.

http://www.archive.org/search.php?query=%28collection%3Aamericana%20AND%20format%3Apdf%20AND%20mediatype%3Atexts%29%20AND%20-mediatype%3Acollection&sort=-publicdate

http://www.archive.org/search.php?query=-description%3A%28Google%29%20AND%20%28collection%3Aamericana%20AND%20format%3Apdf%20AND%20mediatype%3Atexts%29%20AND%20-mediatype%3Acollection&sort=-publicdate

The first URL lists all of the most recent additions to the Americana collection, while the second URL excludes those taken from Google (nearly all added, it seems, by user tpb, whom I presume is a 'bot). Even though it returns many fewer books, I prefer the latter URL because, too often, the scans from Google are bad in multiple ways--folded-over pages, visible fingers, resolution too coarse for the text to be read, and, worst of all, omitted images.

Examples of missing images, all from books taken from the former URL a few minutes ago, can be seen here:

http://www.archive.org/stream/economicmininga01lockgoog#page/n410/mode/2up
http://www.archive.org/stream/edinburghphilos10edingoog#page/n381/mode/2up
http://www.archive.org/stream/earthadescripti01reclgoog#page/n34/mode/2up

Presumably Google removed these images in order to improve the accuracy of OCR conversion, but it's a shame that these files are being added in such large numbers to the Internet Archive when, one would hope, better scans must be available somewhere.

Does anyone know why these defective versions, rather than versions with the illustrations intact, are being added? Do the libraries at which the books were scanned (Harvard University, University of California, etc.) know that defective versions of their books are being added to the Internet Archive? Can anything be done to replace those scans with better ones?

Reply [edit]

Poster: stringybark Date: Dec 11, 2009 7:15pm
Forum: texts Subject: Re: Missing images from Google book scans

In my experience, the Google digitisation is done with quantity rather than quality in mind. Eventually I concluded that they have an 'acceptable level of defect' policy where, providing only around 1 percent of pages are spoiled, the quality standards are met. So all "user tpb" books have a few dud pages. The contrast in quality is stark compared to the older, far better quality, MSN sponsored digitisation.

Reply [edit]

Poster: stbalbach Date: Dec 11, 2009 10:09pm
Forum: texts Subject: Re: Missing images from Google book scans

Concur with stringybark. Also there's been a problem with public-domain books disappearing from Google's full-view access. Presumably Google wants to sell those books in the future once it resolves the legalities, although there may be other reasons. User tpb is doing a great service by getting books off Google ASAP while it's still possible, before they disappear behind limited-preview mode (ie. pay wall), even if they are poor quality, it's better than nothing!

Stephen

Reply [edit]

Poster: Time Traveller Date: Dec 11, 2009 11:10pm
Forum: texts Subject: Re: Missing images from Google book scans

The arrangement that Google has, where it owns out of copyright books or something until authors opt out is causing lots of discussion and legal action.

It may be that with it all in the news media, authors are opting out, becuase they did not know before, how Google was using their book (s)

Peter

Reply [edit]

Poster: Time Traveller Date: Dec 11, 2009 10:13pm
Forum: texts Subject: Re: Missing images from Google book scans

seeing they might be losing the rights to lots of books soon, it might be that the more they have on line today, the more rights they have to keep these books.

But they are sort of profit making, while volunteers do the scanning for the IA, a labour of love as opposed to a salary from Google.

Peter