Friday, January 8, 2010

How Google Scan Books?

We all know that Google Books have millions of books in its database. And I always wonder how many people they hire to scan new books everyday. This is interesting. With luck, I only found out how they work this evening.

For book scanning technology, Google has come up a system using 2 cameras and infrared light to automatically correct for the curvature of pages in a book. It's pretty cool. By constructing a 3D model of each page and then de-wrap it afterward, Google Books presents flat-looking pages online without having to slice books up or mash them onto a flatbed scanner.

The patent should works like this:

This diagram shows patented Google technology for correcting for curved pages while scanning books.

1) The book is placed on a flat surface. Above it, an infrared projector displays a special mazelike pattern onto the pages.

2) 2 Infrared cameras photograph the infrared pattern from different perspectives.

3) Photo of the page taken with conventional cameras can be de-wraped, permitting easier OCR and a better image when showing the real book in conjunction with search results based on the text.

The sophistication of this technology makes competitors who want to feature their own digitized libraries won't have a trivial time catching up to Google, which already has scanned more than 7 million books.

Any unskilled laborer can plop a book on an ordinary scanner and run some optical character recognition (OCR) operations that convert the imagery into textual data, but it is another matter for high quality image by doing so rapidly.

Have fun, folks.


OnitaSPedrosa said...


Vincent Lee said...

Don't really understand your comments, but thanks anyway.

Copyright 2009 Ekimkee. Powered by Blogger Blogger Templates create by Deluxe Templates. WP by Masterplan