Google has deployed Machine Learning like never before to actually read and understand images without human annotation. This is currently being applied to identify relationships between entities and attributes, which are saved into Knowledge Bases and displayed as rich data in SERPs.
Knowledge bases are repositories of structured and unstructured data. Knowledge bases are usually relate to a specific entity which usually includes information on:
- Facts about entities
- Relationships between entities
This information is used to enrich the search experience for users, in the form of rich data extract or increasing the value of the SERP. The data in knowledge bases are enriched or expanded by harvesting information from a variety of sources. The most common, traditional, methods of enriching knowledge bases are:
- Crawling text on Internet web pages
- Collecting search behaviour on a specific term
- Testing augmented search query success rates
Google, in their patent “Computerized systems and methods for enriching a knowledge base for search queries” have begun including images as one of the knowledge base enrichment methods.
This has always been extremely complex, as unstructured data (such as audio and images) has only recently become understandable through machine learning algorithms. While Google has employed machine learning for quite some time in their core search algorithm (Hummingbird), this is the first real application of ML on images, to enhance knowledge bases.
ML has been used on images for recognition purposes. Mostly for identifying whether or not this image has been used before, whether it has relation to the other images it is currently pooled with in a knowledge base; but has never been used for enrichment of a knowledge base.
Prior to this system, images were always added to knowledge bases simply through labelling or tagging of webmasters on the web, which enabled any image to, potentially, rank for any knowledge base. This new patent is the first instance of interpreting valuable data out of raw unstructured data.
Entities and Attributes
As discussed in an earlier article, objects, entities, and attributes aren’t a new system for Google.
Learn more about entities and attributes: Augmented Search Queries
Augmented Search Queries revolve around specific object entities which are given attributes. These attributes are descriptive features, or “object attributes,” that describe the entity. This system is highly efficient for developing understanding within knowledge bases of specific topics. Such as different types of Sharks. When a user searches “types of sharks,” google is very easily able to identify object attributes such as “Hammerhead,” “Wobbegong,” etc.
The entity annotation has allowed for far better rich results when surfing through images. This knowledge base system has also paved the way for suggested searches, correlating:
- When searches are mistyped that the more common spelling of the entity is recommended.
- Providing a list of alternative, common search phrases that fall under the same knowledge base.
One key issue that the entity system has always caused is that entities can belong to multiple knowledge bases, with many different attributes in in each repository.
For example, let’s take a bank. A bank can be the bank of a river, the terrain alongside a flowing river. A bank could also be a financial institution, or you may even be referring to a specific bank, like the Bank of America.
“bank” as an entity, belongs to different knowledge bases. One knowledge base may be a “financial knowledge base.” The other may be a “geography terminology knowledge base”, and the other may be a “list of American firms knowledge base.”
The problem arises when someone searches something unclear in relation to bank, or mixes attributes of a financial bank, with a sentence pertaining to a geographical bank. A search like “Banks near banks” really begin to cause interesting results in search results, as Google desperately tries to understand not only the query, but the ranking application, SERP presentation, and relevance factors.
The result is a SERP generated through augmented search queries, which brings us full circle.
Analysing Images for Knowledge Base Enrichment
Systems and methods are disclosed for enriching a knowledge base for search queries. According to certain embodiments, images are assigned annotations that identify entities contained in the images. An object entity is selected among the entities based on the annotations and at least one attribute entity is determined using annotated images containing the object entity. A relationship between the object entity and the at least one attribute entity is inferred and stored in the knowledge base. In some embodiments, confidence may be calculated for the entities. The confidence scores may be aggregated across a plurality of images to identify an object entity.Abstract | US Patent 10,534,810
According to the abstract, the “system” and “methods” are used to identify entity objects in images through annotations. Annotations, are used to identify object attributes, and then a relationship between the attribute and object is inferred, and stored in the knowledge base.
The abstract then states confidence scores may be calculated. We’ll get to this concept later.
Embodiments of the present disclosure provide improved systems and methods for enriching a knowledge base for search queries. The information used to enrich a knowledge base may be learned or inferred from analyzing images and other data sources.§5 | US Patent 10,534,810
This simply states that the system Google employs is definitively machine learning based, likely a part of the Hummingbird algorithm, which learns through analysing images and other data sources.
In accordance with some embodiments, object recognition technology is used to annotate images stored in databases or harvested from Internet web pages. The annotations may identify who and/or what is contained in the images.US Patent 10,534,810
Object recognition technology applied to who/what is contained in images published on the web. Slightly privacy invasive, but currently within Google’s right. Of course, with a repository of data so large, the algorithm will likely become extremely accurate, extremely quickly.
The disclosed embodiments can learn which annotations are good indicators for facts by aggregating annotations over object entities and facts that are already known to be true. Grouping annotated images by object entity helps identify the top annotations for the object entity.US Patent 10,534,810
In order to determine which annotations are best, images are grouped by object entities which are far easier to identify to the algorithm. The images are then annotated and attributes drawn up. The system then mathematically determines what are the highest probability attributes for specific objects.
According to the disclosed embodiments, images may be assigned annotations that identify entities contained in the images. An object entity among the entities may be selected based on the annotations. An attribute entity for the object entity may be determined using annotated images containing the object entity. A relationship between the object entity and the attribute entity may be inferred and stored in a knowledge base for processing search queries.US Patent 10,534,810
Once attributes are selected, a relationship is inferred by the ML system between the entity and the attributes. The relationships are usually inferred, something which machine learning is extremely weak with (interpolation v extrapolation), and then stored in the knowledge base which are used for Search queries.
As used herein, the term “inferring” refers to operations where an entity relationship is inferred from or determined using indirect factors such as image context, known entity relationships, and data stored in a knowledge base to draw an entity relationship conclusion instead of learning the entity-relationship from an explicit statement of the relationship such as in text on an Internet web page.US Patent 10,534,810
This is crucial for SEOs. The underlying reasoning and optimisation opportunities. Evidently, Google is pivoting away from factual statement in sites and moving more towards images to compare the data with. Both sources combined provide powerful overlapping checks and balances.
Here are entity relationship optimisation methods as they stand:
- Indirect factors
- Image context (site / URI)
- Image information conveyed
- Known entity relationships
- Knowledge Base data storage
- Direct factors
- Explicit statement (<p>)
- Rich data extract
- Aggregated structured data
The inferred relationships may be stored in a knowledge base and subsequently used to assist with or respond to user search queries processed by a search engine.US Patent 10,534,810
Again, simply reiterating this process will be applied to value-add in the SERP for any Search Engine that deploys it.
The System in Action
This is an example of the traditional system.
The traditional, and initial stages of the current system. The images are assigned annotations by moderators. The algorithm scans the objects to find object entities (the key point of the image). Then the algorithm identifies the key attributes, it infers a relationship between the entity and attribute, and it stores the relationship in the knowledge base.
This is an of the annotation process.
This is the example used throughout the patent report: a grizzly bear eating a fish. What’s key here are the different images displayed in a current Search Engine, which shows a slightly more diverse range of results than is intended. For example the polar bear, and two right bears not eating a fish.
The polar bear image has had the entity misidentified, whereas the two right grizzly bears are missing the attribute specified in the search.
This is an example of attribute determination.
The Grizzly Bear is correctly identified as the entity, with fish, water, and grass being three attributes. The size of the images displays the weighting, or “strength” of the relationship to the Grizzly bear. These attributes cannot have relationships with each other in this stage of the process, but will in the later stages.
This is an example of relationship inference.
Through the same system discussed in augmented search queries, the entity and attribute are linked through a verb, noun, or other word to denote their relationship.
In this case, based on the relationship and annotations above, the ML algorithm believes that Grizzly Bears have an eating relationship with fish.
The size of the fish icon suggests that it is reasonably confident that this is the case; or at least as confident as it is that the image depicts a grizzly bear.
This is an example of the advanced system.
This is a look under the hood of the advanced system. What the objective of the system now is to scan the web for images, annotate them automatically, and do the entire above process with a high degree of accuracy.
This is what the patent enables, and allows Google to enhance their knowledge bases dramatically.
This is an example of applied search.
As displayed, upon search, the new system is far better at identifying images that contain the entity, the action (eat) and the attribute, fish.
The previous system had a success rate of 50% at it’s minimum, with the new system having a success rate of roughly 90% at it’s minimum, even in it’s infant stage.
This is an example of complex relationships.
As the algorithm learns it begins to place the entity into multiple knowledge bases which are more accurate. In this case, Google understands that there is an eating relationship loosely associated with grass (suggesting spring), and under different search circumstances the grizzly bear hunts near water.
In this model the attributes also begin to have relationships between other entities attributes and entities, which begins to form highly complex relationships. This is also the point of leverage Google uses to develop understanding between knowledge bases.
This is an example of the tech stack deployed.
Google relies on the standard information systems procedure until the duality of the “Inference Engine” and the “Confidence Score Engine.”
These are the two systems that enable, simultaneously, to establish whether or not there is statistically significant confidence in the inferences made, so the systems are required to work in duality.
With high enough confidence of a valid inference, the annotations and relationships are stored, and the image is stored separately.
Confidence Scores & SEO Value
The disclosed embodiments also provide improved systems and methods for calculating confidence scores for annotations assigned to images. Confidence scores may reflect likelihood that an entity identified by an annotation is actually contained in an image. Confidence scores may be calculated on a per-image basis and aggregated over groups of annotated images in order to improve image recognition and annotation techniques.US Patent 10,534,810
The nature of image processing and machine learning prevents any algorithm from being 100% accurate, as machine learning is poor when applied to circumstances of extrapolation. Thereby, the greater the data set it is exposed to, the more likely cases are to have occured within it’s tested data pool.
The above extract suggests that relationships are usually not defined unless confidence scores reach a critical value. This means two things for SEOs:
- Successful image rankings are likely to increase dramatically as accuracy of understanding increases, allowing images to be far more valuable to users in the SERP than in the past.
- Image indexation will reduce, as images will not be displayed unless there is a high confidence between the entity and the attribute. If your image does not satisfy the confidence threshold, it will likely not be displayed, or be displayed at a very low result.
- Cross-views will increase dramatically as users are more likely to jump between rich data suggestions determined by relationships in the knowledge base. It is likely you will see rankings for terms you don’t naturally rank for, which increases opportunity to compete for high-value keywords.
Furthermore, images will now be grouped based on object entity. This means that your image is far more likely to be correctly displayed, increasing soft factors like pogo sticking and traffic dips as a result of incorrect keyword attribution. Alternatively, soft views may become one of the main avenues that SEOs obtain exposure on the behalf of their clients.
The world looks bright in terms of image SEO and video optimisation as Google begins to move further into optimised unstructured data!
Learn more about optimising images: Image SEO & Classification of Images in Local Search Results