Vondrick teaches computers to see — by learning to see like computers

December 20, 2013

Object-recognition systems — software that tries to identify objects in digital images — typically rely on machine learning. They comb through databases of previously labeled images and look for combinations of visual features that seem to correlate with particular objects. Then, when presented with a new image, they try to determine whether it contains one of the previously identified combinations of features.

Even the best object-recognition systems, however, succeed only around 30 or 40 percent of the time — and their failures can be totally mystifying. Researchers are divided in their explanations: Are the learning algorithms themselves to blame? Or are they being applied to the wrong types of features? Or — the “big-data” explanation — do the systems just need more training data?

Today, the feature set most widely used in object-detection research is called the histogram of oriented gradients, or HOG (hence the name of the MIT researchers’ system: HOGgles).

“This feature space, HOG, is very complex,” says Carl Vondrick, an MIT graduate student in electrical engineering and computer science and first author on the new paper. “A bunch of researchers sat down and tried to engineer, ‘What’s the best feature space we can have?’ It’s very high-dimensional. It’s almost impossible for a human to comprehend intuitively what’s going on. So what we’ve done is built a way to visualize this space.”

Read more on MIT News.

If you enjoyed this post, please consider leaving a comment or subscribing to the RSS feed to have future articles delivered to your feed reader.

Leave a Reply

Your email address will not be published. Required fields are marked *