Science

The Next Trick for Literate Robots? Reading Closed Books

The new tech can only get through 20 pages but it's getting faster.

Dec. 10, 2016

Researchers at Georgia Tech and the Massachusetts Institute of Technology are developing spectral imaging and machine learning algorithms to read closed books, according to a recently published research paper.

The technology uses terahertz radiation (electromagnetic radiation between infrared light and microwaves) to scan pages in a book and identify the different letters. The camera shoots radiation at the book, producing distinct signals indicating where there’s blank page or printed ink. The algorithm then processes the data and to distinguish individual letters. So far, however, the algorithm can detect 20 pages but only read through nine, researcher Barmak Heshmat tells Inverse, because the single from deeper pages is weaker.

So how does this technology recognize letters? An algorithm, one that’s actually smart enough to beat those text-based captcha systems you used to see on the internet — before the introduction of the no captcha re-captcha in 2014— uses a “dictionary” of possible letters and determines words.

“Our ultimate mission in camera culture at MIT is to make the invisible visible,” Heshmat says. The implications are still thrilling for anybody who cares about making digital what now only exists on yellowed pages.

Alireza Aghasi, a research scientist at MIT, explains the implications of this technology, especially regarding the captcha.

Can you explain how the algorithm that reads through books?

When you try to read through a closed book using electromagnetic waves, separating the contents of one page from the other, can often become challenging and we might see overlapping characters coming from the superposition of different pages, or partially occluded characters because of the noise. The algorithm tries to address such problems and enable us to read through the book more accurately.

How is this technology related to captchas?

This technology requires a complicated combination of characters, because of the overlapping and occlusion we get through our electromagnetic imaging system. In captchas we also deal with reading complicated characters and the methods we develop here may also address some instances of the captcha problem.

What are the overall implications for this book-scanning technology, and the implications for captchas?

This technology has general applications in extracting content information from layered structures. An immediate application is reading through old and historical books or document, where even touching them might cause damage. This technology allows us to safely reveal the content up to a certain depth.

What kind of advantages does it bring to the table?

Most OCR techniques rely on machine learning algorithms, where we train the algorithm with thousands and millions of examples. Also, many of these algorithms have issues with the overlapping or the occlusion of characters, that is, when we have characters that are partly overlapped by others, or parts of the characters missing. In our algorithm, the learning phase is replaced with feeding the algorithm a “dictionary” of possible letters (teaching the algorithm how each letter looks). Then the algorithm itself deals with reading the words by trying to come up with the best combination of characters that visually looks like the given word, just like a child that reacts the words character by character. The good thing is now when the letters overlap or are partially missed, still the reading process remains stable.

How could the algorithm get through a captcha barrier?

The algorithm works in the same way somebody solves a puzzle. In solving a puzzle, by combining pieces of smaller objects, you try to construct the main object. This algorithm does the same thing: It tries to combine pieces of the puzzle, which now are the characters to construct words. That’s basically the main idea.

How do the captchas actually work?

The main idea behind captchas is to tell humans and machines apart. Basically, a word written in a complicated way is shown and the person tries to read the word. That is something a computer should not be able to do. However, the learning tools in machine learning are getting stronger and stronger and this science is almost catching up with the level of human intelligence. We cannot keep making the captchas harder, because after a certain level, they also become unreadable to humans.

What are the implications for the future?

What we have now is more of a proof of concept, clearly future devices are more accurate and we expect to see more accurate reading algorithms, as well. This framework can be considered as a good initial step toward that goal.

Related Tags