Using Google’s Image Classification to Rank in Local SEO Image Queries

Google’s “Image Classification of Landmarks” can be used to drive more foot traffic to local businesses. This is through utilising what Google calls “trigger words,” and is derived from their patented image recognition and categorisation strategies.


In 2008, Flickr became the leading image repository on the internet. It was thought that Google, and Yahoo, the major Search Engines of the time, were scraping there images and tags to learn how to identify images.

A study at Yahoo mentioned that: 

Using automatically generated location data, and software that can cluster together similar images to learn about images, goes beyond just looking at the words associated with pictures to learn what they are about.

This is in reference to metadata found in image collections, whereas Google was using utilising a patent “INTERPRETING USER QUERIES BASED ON NEARBY LOCATIONS.” The difference being that Google was using live location data, while Yahoo was scraping image metadata. This is vital to our understanding of image classification later.

Problems Faced with Image Classification

The go-to example of image classification, is when a Search Engine has to determine if a picture is of a landmark or not.

A quote from the Google AI blog states:

“Image classification technology has shown remarkable improvement over the past few years, exemplified in part by the Imagenet classification challenge, where error rates continue to drop substantially every year. In order to continue advancing the state of the art in computer vision, many researchers are now putting more focus on fine-grained and instance-level recognition problems – instead of recognizing general entities such as buildings, mountains and (of course) cats, many are designing machine learning algorithms capable of identifying the Eiffel Tower, Mount Fuji or Persian cats. However, a significant obstacle for research in this area has been the lack of large annotated data sets.”

What we had stated in the introduction were the two methods that Google and Yahoo had been using. Google were able to identify, with significant accuracy, the entity of each picture. However for finer detail and as “next steps,” Google needed to train their machine learning algorithm with immensely large data sets.

Yahoo had taken the other course, by refining their Search Engine to identify images based on metadata rather than the image itself. This lead to an extensive library of annotated datasets

Instead of contacting Yahoo, Google spent a full year developing their own datasets to improve the predictive accuracy by 7x. This was announced as Google-Landmarks-v2: An Improved Dataset for Landmark Recognition & Retrieval. It required more than 500 researchers, and 5,000,000 images.

However, it wasn’t the researchers that were able to solve the problem of identifying local and regional landmarks. It was actually the hobbyist photographers:

“A particular problem in preparing Google-Landmarks-v2 was the generation of instance labels for the landmarks represented since it is virtually impossible for annotators to recognize all of the hundreds of thousands of landmarks that could potentially be present in a given photo. Our solution to this problem was to crowdsource the landmark labeling through the efforts of a world-spanning community of hobby photographers, each familiar with the landmarks in their region.”

As stated by Bingyi Cao, a Software Engineer at Google.

Image Classification of Landmarks

The above diagram (patent no: WO2016028696) explains how identification of Landmarks is accomplished through large digital collections. Google owns Google Photos, allowing them access into extensive photo libraries to train and exercise their AI brains.

Images undergo three simultaneous steps upon introduction onto the World Wide Web. First they are geo-clustered, the data surrounding their publishing and discovery are analysed. Second, and simultaneously, they are scanned based on relevant tags in the image metadata, as well as any tags that the individual provided onto the image. Third, and also simultaneously, they are indexed into the image repository. Whether or not this image has yet been classified is irrelevant, as it will be moved later on according to the analyses. These steps are illustrated by 102, 110, and 122.

Secondly, the images are 103 “visually-clustered,” where an analysis conducted of the content of the image. This is Google’s AI magic, where the image is recognised, and then categorised. Recognition is not important for this article, as we are focusing on the ability to beat the categorisation aspect. A subsection of the visual clustering is a module of “popularity,” which is relevant to the geolocation stated in the first step. This is a section that aims at Local SEO, as the more popular a “trigger word,” is, the more Google recognises that there is an existence of something relative to the trigger word. This is also what we will be using to improve the rankings of images, and our site.

Finally, the images are entered into a database within the index. In this regard images are being placed into the Landmark database, but there are in theory an infinite different number of databases relevant to different genres of images.

The patent however, does not save them from a particular problem:

“However, there is no known system that can automatically extract information such as the most popular tourist destinations from these large collections. As numerous new photographs are added to these digital image collections, it may not be feasible for users to manually label the photographs in a complete and consistent manner that will increase the usefulness of those digital image collections. What is needed, therefore, are systems and methods that can automatically identify and label popular landmarks in large digital image collections.”

How Does This Apply for SEOs?

The Google patent “Automatic discovery of popular landmarks” describes a few key steps explaining how the above work flow is implemented into search results:

  1. Enhancing user queries to retrieve images of landmarks, including the stages of receiving a user query
  2. Identifying one or more trigger words in the user query
  3. Selecting one or more corresponding tags from a landmark database
  4. Supplementing the user query with the one or more corresponding tags, generating a supplemented user query

Just before the filing of the above patent (May 2019), Google filed for another patent called Augmented Search Queries. Augmented Search Queries, much like the above description of the patent, involves modulating, a user’s query without their knowledge. Augmented Search Queries apply to general queries, with Hybrid Search Results seeing greater presence in vague or poorly written queries. This opens up the door for a more “diverse” Search Results page.

In regard to images, “trigger words” is of key importance in the patent description. In the introduction we discussed the use of Google’s locality to understand the search intent of a user. Google paired the location, along with the search, to provide locally relevant results. When images are submitted in a tight geographic region, and there is high search volume of “trigger words” such as “landmarks near me,” or “famous buildings,” Google begins to understand that the images have relevance to “landmarks” or “famous buildings.”

Image SEO is often overlooked, which could drive significant presence to businesses that exist around landmarks. By targeting the correct “trigger words” in titles, posts, and images, SEOs could very successfully be ranked alongside the landmark based results.

For example: Just down the road from the restaurant “Just Eat,” is a famous building dating back to the 1200’s. Just Eat has seen a reduction in foot traffic, while the total traffic nearby the monument has stayed the same. People are often hungry after a day walking around the building, so he knows the market opportunity exists. Just Eat begins including the monument’s name, “House Grandeur” in its website titles, Google My Business name, and images. This is an effort to develop his local SEO. He begins to see greater amounts of converting customers, but his website seems to be maintaining the same rankings, and page views as before.

The resulting foot traffic has occured from proper image SEO. String based queries will not work as well involving landmarks and monuments because the search results will be mixed in a far greater variety of ways. However, users searching “monuments near me,” will have an Augmented Search Query of “house grandeur near me.” If you include “House Grandeur” and are a locally available business, your string queries will see an increase, but your image rankings will see a tremendous rise. If your images have been properly optimised, your brand logo, a map of the area, and maybe even products you sell could be ranking on the first page.

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.