The Proliferation of Inaccurate Book Information on the Internet

Why is it that book publication information is so incorrect, by so much, in so many cases?

If you look around the Internet, or just Amazon long enough, and you buy books, you will notice that there are pieces of information that are obviously incorrect -- sometimes wildly so.

There are books that are duplicated, where one will have an ISBN and one will only have an ASIN (Amazon's proprietary ID). There are book pages with the incorrect format, and there are book pages with incorrect publishing dates.

And these are only the most common errors that you will find looking for books on Amazon and the Internet.

The most important pieces of information are the title, the author, the publisher, the format, and the publication date. There are also number of pages, size of the book, image of the book, how many volumes, and ISBNs when applicable.

There are many sources for this book information. Bowker's Books in Print is a subscription service and it is the gold standard for correct book information. They release weekly, and sometimes daily, updates of all new books being published to keep their subscribers up to date.

The Library of Congress has had a book database building for many years, and it has as many or more records than Bowker's. But the data format is somewhat archaic, and the records are weighted heavily towards historical texts rather than consumer books. In 2016 the Library of Congress begin to offer the vast majority of their database for free.

OCLC also has a massive database of titles. The OCLC is used for library interchange loans, and is the database that powers the feature you'll see on many book sites to "find this book in a local library". The OCLC also offers a subscription service to use their book data in other applications.

There are efforts like HathiTrust and Google Books, which are based on the process of scanning physical books into digital storage. Google Books makes their data available by API, meaning you can look up one book's information at a time. HathiTrust offers a monolithic download of their titles, but the demographics of the books are very similar to that of Library of Congress. More historical texts and important books than consumer or popular books.

With all of these sources, how is it so many Amazon books have the incorrect data?

Amazon itself is probably the leading source of book information on the internet, in terms of use. Their API is orders of magnitude more popular than any of the previously listed sources. So the errors in Amazon propagate throughout the internet.

Even more baffling, is that Amazon purchased AbeBooks some years ago and AbeBooks has seemingly very accurate book information. Abe is intended much more as a collector's bookstore than Amazon is, albeit much smaller than Amazon. But it is strange that Amazon wouldn't put Abe's data to use to correct their own database.

A simple example on Amazon is "How to Live on Almost Nothing and Have Plenty". Published in 1979, there is an original hardcover, a paperback following, and then in 2011 a reprinted paperback version. But Amazon has eight listings for this book, three of which are for the identical printing of the hardback.

Being relatively expensive books, the inconsistencies make purchasing a copy more difficult. One of the two listings has the correct ISBN in the information, but another which claims to be a hardcover has lower prices. Unfortunately, the print date and everything else is incorrect and there is no book image.

From a collector's standpoint, these errors can actually protect the buyer though. Because the seller is concerned about feedback, and Amazon's incorrect information gives you an easy reason to request a refund or discount after the fact.

But it is still annoying and a waste of time, because many times book listings do not have an ISBN. Less scrupulous sellers will put far less valuable versions of a book in, knowing you that the listing information is incorrect, and hoping to get a more money from an unsuspecting purchaser who hasn't surveyed the entire set of available books.

I have used the mechanism on the price page for book data corrections for Amazon to improve their product information. But sellers are still free to create an entirely new product page with their own book information entered, so incorrect versions proliferate.

As with anything, what you can't change, try to make use of. Amazon's incorrect book data can be used to make your book collecting a little less risky, curiously.

If you see a publishing date of 1773 on a book from 1979, you can rest assured that the book you receive is not going to completely match Amazon's description. And if the book is in marginal condition, or you just change your mind, it's an easy return with little risk.

As for what to do when selling your books, be sure to look through all of the versions of your book for sale on Amazon. In general list by ISBN first and get it listed in the correct spot. This is usually the more expensive listing and is far less likely to be returned.

On the off-chance that you find a listing with an ASIN instead of an ISBN, but it correctly describes your book and has higher listing prices, look further. If it seems to come up readily in search than the completely correct listing, consider listing where the higher price is.

But add good images of the book you are selling, so that the buyer is aware of what they're getting. You will greatly reduce the chances of return or negative feedback from the start.

Will Amazon correct its often wildly inaccurate book data? I highly doubt it. Amazon is extremely successful, and the portion of their business that this affects is likely miniscule now, as opposed to 25 years ago when they only sold books.

So it is a landscape to get used to, rather than to try to change.


Comments on this article? Write them here.

We moderate all comments.