Chaitanya Kukde
Member • Oct 22, 2014
Flickr Completes An xkcd Challenge With Park or Bird Game
Suppose you are presented with the following xkcd webcomic.
What will you do? Chuckle and move on? Maybe. Think over it and move on? Maybe. Whatever the initial reaction, the final step in this two-part process will always be to move on; unless you work at Flickr. The Flickr guys took this really seriously and created parkorbird.flickr.com . As mentioned, the task of identifying whether the photo (with GPS data embedded into it) is taken from a US National Park is straightforward. But whether the picture contains a bird or not was the tough part. The problem wasn't easy, but it definitely did not take 5 years to materialize and it wasn't certainly 'virtually impossible'.
Last year The Flickr team had employed a computer vision technique called deep convolutional neural network which enabled the computer to recognize more than a thousand things in an image; one of them being 'birds'.
The Deep Convolutional Neural Network Model
The network transforms an input image into a representation in which different objects and scenes are easily distinguishable by a simple binary classification algorithm, like Support Vector Machine or Bayesian Network. It does this by passing the image through a series of layers, where each layer computes a function of the output of the layer below it. These layers are then trained using millions of images where these layers recognize image features in an ascending order of complexity. For example, the first layer may start off with a simple edge or line recognition and then proceed to recognize various shapes in the subsequent layers. Further layers might recognize higher-level concepts, like eyes and beaks, and even further ones might recognize heads and wings.
The layers are then 'activated' on the basis of the amount of features they have detected as input image and a short floating-point vector summarizing all of the various activations at each layer is output to a binary classifier. The classifier, as mentioned, is trained using a million images and it provides a yes/no answer to identify a specific object/scene class, one of the class being birds.
TL/DR: xkcd: Tasks (kind of indirectly), Introducing: Flickr PARK or BIRD | code.flickr.com, "#-Link-Snipped-#" is born
Source: Introducing: Flickr PARK or BIRD | code.flickr.com
What will you do? Chuckle and move on? Maybe. Think over it and move on? Maybe. Whatever the initial reaction, the final step in this two-part process will always be to move on; unless you work at Flickr. The Flickr guys took this really seriously and created parkorbird.flickr.com . As mentioned, the task of identifying whether the photo (with GPS data embedded into it) is taken from a US National Park is straightforward. But whether the picture contains a bird or not was the tough part. The problem wasn't easy, but it definitely did not take 5 years to materialize and it wasn't certainly 'virtually impossible'.
Last year The Flickr team had employed a computer vision technique called deep convolutional neural network which enabled the computer to recognize more than a thousand things in an image; one of them being 'birds'.
The Deep Convolutional Neural Network Model
The layers are then 'activated' on the basis of the amount of features they have detected as input image and a short floating-point vector summarizing all of the various activations at each layer is output to a binary classifier. The classifier, as mentioned, is trained using a million images and it provides a yes/no answer to identify a specific object/scene class, one of the class being birds.
TL/DR: xkcd: Tasks (kind of indirectly), Introducing: Flickr PARK or BIRD | code.flickr.com, "#-Link-Snipped-#" is born
Source: Introducing: Flickr PARK or BIRD | code.flickr.com