Google Street View has become an essential part of the online mapping
experience. It allows users to drop down to street level to see the
local area in photographic detail.
But it’s also a useful resource for Google as well. The company uses the
images to read house numbers and match them to their geolocation. This
physically locates the position of each building in its database.
That’s particularly useful in places where street numbers are otherwise
unavailable or places such as Japan and South Korea where streets are
rarely numbered in chronological order but in other ways such as the
order in which they were constructed, a system that makes many buildings
impossibly hard to find, even for locals.
But the task of spotting and identifying these numbers is hugely
time-consuming. Google’s street view cameras have recorded hundreds of
millions of panoramic images that together contain tens of millions of
house numbers. The task of searching these images manually to spot and
identify the numbers is not one anybody could approach with relish.
So, naturally, Google has solved the problem by automating it. And
today, Ian Goodfellow and pals at the company reveal how they’ve done
it. Their method turns out to rely on a neural network that contains 11
levels of neurons that they have trained to spot numbers in images.
To start off with, Goodfellow and co place some limits on the task at
hand to keep it as simple as possible. For example, they assume that the
building number has already been spotted and the image cropped so that
the number is at least one-third the width of the resulting frame.
They
also assume that the number is no more than five digits long, a
reasonable assumption in most parts of the world.
But the team does not divide the number into single digits, as many
other groups have done. Their approach is to localize the entire number
within the cropped image and to identify it in one go—all with a single
neural network.
They train this net using images drawn from a publicly available data
set of number images known as the Street View House Numbers data set.
This contains some 200,000 numbers taken by Google’s Street View cameras
and made publicly available. The training takes about six days to
complete, they say.
Goodfellow and co say there is no point in using an automated system
that cannot match or beat the performance of human operators who can
generally spot numbers accurately 98 percent of the time. So this is the
team’s goal.
However, that doesn’t mean spotting 98 percent of the numbers in 100
percent of the images. Instead, Goodfellow and co say it is acceptable
to spot 98 percent of the numbers in a certain subset of images, which
in this case turn out to cover around 95 percent of the total.
But even this is significantly better than any other team has been able
to achieve. “Worldwide, we automatically detected and transcribed close
to 100 million physical street numbers at [human] operator level
accuracy,” they say, describing this as an “unprecedented success.”
And they can do it at considerable speed. “We can transcribe all the
views we have of street numbers in France in less than an hour using our
Google infrastructure,” they say. Yep, that’s just one hour.
One interesting question is whether the same technique might help
extract other numbers such as telephone numbers on business signs or
even number plates.
However, Goodfellow and co are not optimistic. They say the success of
their technique rests heavily on the assumption that street numbers are
never more than five digits long. “For large [numbers of digits] our
method is unlikely to scale well,” they say.
And of course, the system is not yet perfect. That 2 percent of misidentified numbers is still a thorn in the team’s side.
But in the meantime, Google can rest assured that it has made a
significant step forward in character extraction and recognition: the
localization and identification of numbers by a single neural network.
The big question of course is what’s next. And Goodfellow and co oblige
by opening the kimono just a fraction: “This approach of using a single
neural network as an entire end-to-end system could be applicable to
other problems such as general text transcription or speech
recognition.”
Source :Technologireview
Subscribe to:
Post Comments (Atom)
0 komentar:
Post a Comment