Google’s Augmented Search Queries and Hybrid Search Results

Google recently patented a method of “augmenting search queries,” which is based on entities and attributes of nouns. These are used behind-the-scenes during a query to create additional Search Results based on the augmented search queries. These augmented results are mixed into the original search engine results page (SERP), resulting in the formation of a hybrid search results page.

Augmented Search Queries

Google LLC filed for a B2 publication (US20140280089) on 15 of March 2013. The publication was only available publicly on the 18th of September 2014, and was finally granted on the 21st of August 2018. The publication’s title is “Providing search results using augmented search queries” and was invented by Emily Moxley and Sean Liu.

A search query is a specific enquiry made through a search box. For example “how old is the universe?” would be a search query. Hitting enter, would cause Google’s database and ranking systems to kick into play, and generate a SERP (Search Engine Results Page).

Imagine you Google “how long is Harry Potter.” The meaning can be extremely ambiguous, especially for a computer. A human may ask clarifying questions like, “the movie?” “the book?” “the actor?” But Google doesn’t have time to do that. Instead it identifies key grammatical points within the query, for example the entity “Harry Potter,” the preposition “is,” and the adjective “long.”

Augmentation revolves around entities and attributes. Each entity has attributes. For example,

George Washington (entity)

  • is a: person (attribute)
  • is a: U.S president (attribute)
  • has assumed office date: 1789 (attribute)
  • has Gender: male (attribute)
  • has date of birth: Feb 22 1732 (attribute)

Noticeably, each attribute is preceded by a preposition, or a prepositional phrase. This helps Google identify characteristics when scraping data and monitoring internet traffic. Specific prepositions used in original search queries therefore will create different forms of augmented search queries – this is particularly important for SEOs.

To put some clarity into that, searching “was George Washington a president of the U.S” will cause Google to draw particular interest to: “was,” “U.S president,” “George Washington.” This forms the Original Search Query (OSQ). The Augmented Search Query (ASQ) will be derived from relevant entities and attributes that are found in this query.

Let’s say there is a famous singer named George Washington too! His entity may look a little something like this:

George Washington (entity)

  • is a: person (attribute)
  • is a: songwriter (attribute)
  • has property: Backstreet Boys (attribute)
  • has Gender: male (attribute)
  • has date of birth: Dec 11 1982 (attribute)

Now, Google will receive the OSQ and has to determine whether or not we are referring to George Washington – singer, or George Washington – president. Based on keywords alone, it is evident we are looking for George Washington – president. But that’s not always the case when the search query becomes vague.

Let’s imagine we only search “George Washington,” suddenly the context of keywords is completely void. However, we have two entities in our database that could align with what “George Washington” is prompting us. This is where Google’s black-box comes in. Through natural language processing (NLP) Google has taught RankBrain (their AI processor) that “George Washington” is actually referring to a specific entity rather than an attribute of another entity.

How they’ve done this is their secret and allows them to remain in business, so we won’t press into the “identification” of a Search Result any further.

Diving into the brain.

ASQ’s are based on query logs and structured data, both “table schema” and “schema data” (two types of structured data markup). Let’s unpack that:

Query logs refers to what you have recently been logging as queries. Some SEOs call these strains. So for example, if you’re searching for medical advice concerning a broken toe, like “symptoms of a broken toe,” and then you proceed to look up “medical facilities,” Google understands based on query logs, that you are particularly interested in medical facilities that would help with a broken toe. Rankings are adjusted accordingly.

Secondly, structured data refers to the formal information kept in either table schema, or schema data, found across the web. Google aggregates this information to create an accurate representation of a particular entity based on attributes as we discussed earlier. This categorisation of information allows them to process information faster, and improve the relevance of the rankings.

In some implementations, a system receives a search query containing an entity reference, such as a person’s name, that corresponds to one or more distinct entities. The system provides a set of results, where each result is associated with at least one of the distinct entities. The system uses the set of results to identify attributes of the entity and uses the identified attributes to generate additional, augmented search queries associated with the entity. The system updates the set of results based on one or more of these augmented search queries.

Abstract | Publication: US20140280089

The above is taken form the abstract of the publication which, in jargon, explains that additional results are generated based on identified attributes of the entity, and results based on the different, distinct entities that may exist. Once the results are generated, they are included in the search results, forming a hybrid search results page.

To the left, we have Kate Middleton, as an entity. The diagram represents, according to the patent:

“An exemplary diagram of augmented search queries in accordance with some implementations of the present disclosure”

Basically, the diagram is a current example of how Augmented Search Queries are generated. The size of each bubble represents search volume, evidently Kate Middleton and the attribute “sister,” are commonly searched. The overlap between the diagrams represents relevant attributes that help generate Augmented Search Queries.

So if we were to take this and search “Kate Middleton” in Google, it is most likely that we will receive a hybrid results page including some resources for the augmented query “Kate Middleton St. Andrews University.”

The greater the overlap, the size of the bubble, and some other intangible factors, the number of augmented results increases. So in the above example, the statistical probability of having some additional results for the augmented query “Kate Middleton St. Andrews University,” is very high when compared to the statistical probability of having additional results from “Kate Middleton sister.” This all happens behind-the-scenes, while our OSQ was a very simple, “Kate Middleton.”

Based on the abstract, and the diagram above, we can actually formulate our own hybrid search results.

A user searches “Kate Middleton.” This is recognised as an entity with many possible attributes, for example she’s a female, she’s married, and she was born on 9 Jan 1982. There will also be dozens of other entities with different attributes. One Augmented Search Query could very likely be “Kate Middleton” as a different entity, for example a local brick layer, or a specific entity that has suddenly had lots of media coverage.

Let’s say that Google decides that Kate Middleton the Duchess is the entity we’re after, based on a high probability from query logs, recent search patterns, and general internet traffic.

Now that we’ve solidified the entity, Google will generate additional Augmented Search Results that involve variations in the attributes. Remember, attributes look something like this:

Kate Middleton (entity)

  • is a: person (attribute)
  • marital status: married (attribute)
  • has title: Duchess (attribute)
  • has Gender: female (attribute)
  • has date of birth: Jan 9 1982 (attribute)

So based on the attributes it will generate variations in the Search Query. This is very important, as the change is not reflected in the SERP, but at the root of the request. Behind the scenes, some Augmented queries such as “Kate Middleton Married,” “Kate Middleton Title,” “Kate Middleton Male,” “Kate Middleton Birth Date,” will be generated. The brain behind the ASQ’s is referred to in the patent as a “high level block system.”

It’s unknown how many Augmented Search Queries are generated, and how many of them are included in the search results. What is certain, is that some degree of additional information is provided. It’s important to note that the original query is not actually changed. The Augmented Search Queries provide peripheral information in the event that your original query does not satisfy your search intent, or there is information of relevance that can be provided.

Now we’re left with the generation of the results page. The augmented search queries are ranked with URLs, files, and most often structured data, to provide the simple provision of additional information

This, again, is classified. So no one knows how the augmented search results are ranked. Insight into this would allow an SEO to rank for multiple augmented queries, while only targeting one OSQ.

How to utilise this in an SEO capacity.

LSI Keywords are no longer applicable.

Ok so we know that augmented queries are generated, and based on those, augmented results are mixed into the search engine results page to generate hybrid results pages. I believe this is what has long been loosely referred to as “LSI keywords” (Latent semantic indexing). Recently LSI has sprung up due to it being a property of machine learning, and is believed to relate your article or post to grammatically similar queries.

For example searching “the best bakery,” will also include things like “the greatest bakery,” “the most fantastic bakery,” “the nicest bakery,” and so on. What’s actually happened was not a relation between words, but rather in attributes linked to entities. For example your local baker would have a geographic attribute, which is certainly relevant in your search query, however “best” isn’t a structured data. What it may be reflective of is common anchor text, ratings, reviews, or other unknown factors.

This generates a quasi-“best” attribute, rather than a word. In terms of databases, an attribute like that is far easier to store than a natural language based value such as “best.”

So in that sense, I believe that Google is becoming far more holistic, drawing on multiple angles to develop values for a single attribute, rather than relying on keywords. Especially not LSI keywords.

Structured Data Markup is more important than ever.

With all the evidence in the article, having the right markup for noticeable content is critical. Augmented search queries will mean URI’s have far better “spread,” being able to rank for dozens of augmented queries simply under a search for something like “best baker.”

Ensuring that as much of the schema.org schema is used will ensure that you can have the greatest amount of overlap in terms of attributes, allowing your content to rise to the top. The inclusion in rich results is likely the end-goal for anyone using structured data markup in this sense.

Google moves to entity based ranking.

Keywords no longer hold very much footing. Simply including “Harry Potter,” half a dozen times throughout the text of your article will no longer drive up the ranking, nor the organic keywords.

Instead, Google cares far more about the attributes that you assign the entity in your text. If you discuss Harry Potter, provide some insight into his character, the writer, and maybe the setting, especially in structured data, Google will be far more likely to rank you better.

This is actually a clever move by Google to move on to easier to crawl, easier to display, and higher value information for it’s users. The results provided in modern Search Results will be driven by the nature of the noun, rather than simply accepting the string as a combination of letters that happen to appear the most on a specific page of a website.

Non-string based SEO is gaining traction.

” Augmented Search queries may be textual, image-based, audio-based, video-based, based in any other suitable format, or any combination thereof.”

Description | Publication: US20140280089

Google recognises in it’s description that augmented search queries may be generated in a large variety of ways. This suggests that they are aware of the increasing trend for a variety of information streams, whether it be image, audio, or video based.

For SEOs, it will be important to ensure that all images, audio files, and videos are well optimised to improve the likelihood of being used in augmented queries. Beyond just optimisation, the provision of all types of data will be a requirement in the near future based on the rate of information diversification Google is pursuing.

We’ll have to say goodbye to static <p> based blogs! Videos, images, and audio will all have to be included

Information is value.

It’s a highly contested fact that more information will be preferred over less information in a SERP. Some argue it’s the specificity of an article that enables it to reach the top of a SERP, while others believe it’s the peripheral information provided along with the necessary information that gives it it’s strength.

Based solely on the above patent, I would argue that the more information provided to a Googlebot or crawler, the more likely it is to be included in augmented queries. So if you’re on the fence about concatenating two articles to bring your count over 1,000, I’d recommend it.

Leave a Reply