Google has filed a patent application in July 2007, which has just recently become public claiming methods where robots can read and understand text in images and video. The basic idea here is Google to be able to index videos and images and made them available and searchable by text or keywords located inside the image or the video. Aside Google Inc. the application was filed by Luc Vincent from Palo Alto, Calif. and Adrian Ulges from Delaware, US. The inventors are Luc Vincent and Adrian Ulges.
Digital images can include a wide variety of content. For example, digital images can illustrate landscapes, people, urban scenes, and other objects. Digital images often include text. Digital images can be captured, for example, using cameras or digital video recorders. Image text (i.e., text in an image) typically includes text of varying size, orientation, and typeface. Text in a digital image derived, for example, from an urban scene (e.g., a city street scene) often provides information about the displayed scene or location. A typical street scene includes, for example, text as part of street signs, building names, address numbers, and window signs.”
If Google manages to implement that technology the consumer search will be taken to the next level and Google would have an access to much wider array of information far beyond the text only search it already plays a leading role in.
This, of course, raises some additional privacy issues as being properly noted by InformationWeek. Gogole had already privacy issues with Google Maps Street View and if that technology starts to index and recognize textual information from millions of videos and billions of pictures around Web things might go more complicated.
Nonetheless if that technology bears the fruits it promises it will represent a gigantic leap forward in the progression of the general search technology.
There are open sources solutions to the problem. Perhaps not scalable and effective as it would be if Google develops it, yet they do exist.
Andrey Kucherenko from Ukraine is known to have made a very interesting project in that aspect. His classes can recognize text in monochrome graphical images after a training phase. The training phase is necessary to let the class build recognition data structures from images that have known characters. The training data structures are used during the recognition process to attempt to identify text in real images using the corner algorithm. His project is called PHPOCR and more information can be found over here.
PHPOCR have won the PHPClasses innovation awards of March 2006, and it shows the power of what could be implemented with PHP5. Certain types of applications require reading text from documents that are stored as graphical images. That is the case of scanned documents.
An OCR (Optical Character Recognition) tool can be used to recover the original text that is written in scanned documents. These are sophisticated tools that are trained to recognize text in graphical images.
This class provides a base implementation for an OCR tool. It can be trained to learn how to recognize each letter drawn in an image. Then it can be used to recognize longer texts in real documents.
Another very interesting start-up believed to be heavily deploying text recognition inside videos is CastTV. The company is based in San Francisco and over its just $3M in funding is trying to build one of the Web’s best video search engines. CastTV lets users find all their favorite online videos, from TV shows to movies to the latest celebrity, sports, news, and viral Internet videos. The company’s proprietary technology addresses two main video search challenges: finding and cataloging videos from the web and delivering relevant video results to users.
CastTV was one of the presenters at Techcrunch40 and was there noticed by Marissa Mayer from Google. She asked CastTV the following question: “Would like to know more about your matching algo for the video search engines?”. CastTV then replied: “We have been scaling as the video market grows – relevancy is a very tough problem – we are matching 3rd party sites and supplementing the meta data.”
Today we see Marissa’s question in the light of the patent application above and the context seems quite different and the answer from CastTV did not address Google’s concerns. Does CastTV work on something similar to what the patent is trying to cover for Google? We do not know but the time will tell. CastTV’s investors are Draper Fisher Jurvetson and Ron Conway. Hope they make a nice exit from CastTV.
Adobe has also some advances in that particular area. You can use Acrobat to recognize text in previously scanned documents that have already been converted to PDF. OCR (Optical Character Recognition) runs with header/footer/Bates number on image PDF files.
It is also interesting that Microsoft had, in fact, applied for a very similar patent (called “Visual and Multi-Dimensional Search“). Even more interesting here is the fact that MS had beaten Google to the punch by filing three days earlier – Microsoft filed on June 26, 2007, while Google filed on June 29.
Full abstract, description and claims can be read below: