Why image recognition is the next big thing

As social networks, apps, and websites strive to make the most of the vast amounts of data users share with them, and deliver smarter, better services to the people who use them, there’s one approach that many of them have in common. It draws on methods of artificial intelligence and machine learning, and in just a few years we could see its sophisticated methods improving our search results, or making our social networks smarter. The next big thing in Silicon Valley? Image recognition.

Let’s start, as many stories in the tech world do, with the recent acquisition of a startup. TechCrunchreported in August that Google acquired the team behind Jetpac, an app that uses public Instagram data to create “Jetpac City Guides” to determine things like the happiest places in a town, or compile guides to scenic hikes or popular food trucks. Jetpac’s system looks for visual cues to determine contextual information about the area where a photo was taken, and reviews that visual information to determine what’s actually happening in a given location.

Jetpac’s chief technology officer, Pete Warden, is an expert in computer vision, an artificial intelligence subfield that encompasses the discipline of teaching computers to see, and interpret images. Jetpac’s “city guides” for more than 6,000 destinations relied on neural network technology developed by Warden to process images. Neural networks mimic the way that the human brain processes information, and can be trained with large datasets to recognize objects and identify their presence or absence.

Jetpac’s technology uses automated processes to provide customized geographic information, which could be useful in Google’s efforts to build a superior personal assistant, drawing on Google Now, Google+, and Google Maps, and CNET reports that the Jetpac team is joining Google’s Knowledge team, which is working to build more sophisticated knowledge into Google Search. But that’s far from the limit of Google’s efforts to use image recognition and related methods to improve its services for users. Google announced in 2013 that it used computer vision and machine learning, another subfield of artificial intelligence, to enable users to search their Google+ images within Google Search, and Jetpac has achieved real-time object recognition, which could be useful for enhancements to Google Glass.

In a post on Google’s Research Blog, software engineer Christian Szegedy recently detailed Google’s latest research in image recognition, which placed first in the classification and detection tasks at the ImageNet Large-Scale Visual Recognition Challenge, the largest academic challenge in computer vision. The classification tasks measure an algorithm’s ability to assign correct labels to an image. The classification with localization tasks assess how well an algorithm models the labels of an image and the location of underlying objects. The task is similar, but with more stringent criteria, and images with “tiny objects” that are difficult to recognize.

Google’s research involved an algorithm, called GoogLeNet, for a “radically redesigned” convolutional network with increased depth and width, which enabled the system “to perform inference with low memory footprint.” (GoogLeNet was named in honor of Yann LeCun, who popularized convolutional networks, and recently joined Facebook, where he’s leading the company’s new artificial intelligence lab.)

Convolutional networks are a type of deep learning architecture, widely used for image recognition. As theMIT Technology Review explains, referencing the SuperVision algorithm that won the ImageNet challenge in 2012, convolutional neural networks consist of layers of “neuron” collections, which each evaluate a small section of an image. The results yielded by all of the collections in a layer overlap to create a representation of the whole image, and the layer below repeats the process on the new image representation. The Technology Review considers the unveiling of SuperVision a turning point for the field of computer vision, and for its submission to the challenge, Google built on SuperVision and other implementations of convolutional networks to  achieve an error rate of only 6.7 percent.

Google will be able to use the technology developed for the challenge to build better image understanding, and Szegedy notes that “the progress is directly transferable to Google products such as photo search, image search, YouTube, self-driving cars, and any place where it is useful to understand what is in an image as well as where things are.” Those implementations could also be helped by the talent acquired with DeepMind and DNNResearch, two of Google’s other recent purchases.

Google’s focus on deep learning, and explorations of its potential for image recognition, runs parallel to what other companies in Silicon Valley are researching. Pinterest recently acquired a startup called VisualGraph, which focused on identifying elements of images and making connections between images, so that users can find interesting images.

Twitter acquired Madbits, a deep learning startup focused on visual intelligence technology that can understand and organize information from raw media, whether that information consists of the content of an image or the tags associated with it. Yahoo-owned Tumblr is partnering with Ditto Labs to analyze photos for brand-related data, so that brands can understand the nature of Tumblr’s collective conversations about them. Facebook has implemented a facial recognition system, called DeepFace, that uses neural networks to detect and identify faces in photos. Even Amazon, with the launch of its Fire Phone, uses a version of image recognition software to identify books, DVDs, bar codes, phone numbers, addresses, and more.

A wide variety of web companies — including some of Silicon Valley’s biggest — can envision ways that deep learning and image recognition capabilities could improve their platforms, and the services that they offer to users. Olga Russakovsky, a Stanford PhD candidate who reviewed the annual results of the ImageNet Large Scale Visual Recognition Challenge in a recent paper, wrote that while current computer vision algorithms still struggle to identify objects that are small or thin in photographs, or images distorted with filters, it won’t be long before the technology is more efficient at analyzing images than we are. “It is clear that humans will soon outperform state-of-the-art image classification models only by use of significant effort, expertise, and time.”

Tech-savvy Internet users should expect to hear more about image recognition methods and tools in the near future — and to see their favorite websites and apps growing smarter and more sophisticated in the way they handle images and all of the information contained within them.

Full article available at Tech Cheat Sheet

Credit: illustration by Kyle McDonald