As Google rolls out AI Overviews, AI Mode in Search, and the Gemini ecosystem, we face a growing challenge: what happens when users get answers — and soon complete purchases — without leaving Google’s interfaces?
UCP is designed to help brands to sell to consumers without leaving the Gemini or LLM experience. Consumers can check out within the LLM, add rewards points, and fully execute the transaction. Here’s an example flow:
How Google’s Universal Commerce Protocol works
At its core, UCP standardizes how consumer AI interfaces communicate with merchant checkout systems. When a user tells Gemini, “Find me a highly rated, waterproof hiking boot in size 10 under $200 and buy it,” UCP is the invisible bridge that allows the AI to securely fetch inventory, process the payment, and confirm the order.
While Google’s developer documentation leans into technical jargon like “Model Context Protocol (MCP)” and “Agent2Agent (A2A) interoperability,” the implications are remarkably straightforward:
It uses your existing feeds: UCP plugs directly into your existing Google Merchant Center (GMC) shopping feeds. The inventory data you’re already managing for your campaigns is the same data that will power these AI transactions.
You keep the data: Unlike selling on some third-party marketplaces, where you lose the customer relationship, UCP ensures you remain the merchant of record. You process the transaction, you own the first-party customer data, and you control the post-purchase experience.
Frictionless checkout: By enabling checkouts directly within Google’s AI ecosystem, UCP can reduce cart abandonment and increase conversion rates among high-intent shoppers.
Your customers search everywhere. Make sure your brand shows up.
The SEO toolkit you know, plus the AI visibility data you need.
Start Free Trial
Get started with
Best practices for Google’s UCP
Like many LLM optimization recommendations, these steps come down to the fundamentals of managing your shopping feed and Merchant Center account.
Google outlined a few best practices. If you follow these four steps, you’ll be well-positioned for success.
1. Master your feed data hygiene
In an agentic commerce environment, your product feed is your primary sales tool. To ensure the AI accurately matches your products to highly specific user queries, you need to enrich your feed with granular details.
Write product titles that are 30 or more characters long.
Expand product descriptions to 500 or more characters.
Include Global Trade Item Numbers (GTINs), where relevant, to ensure accurate product matching.
Include three or more additional images alongside your primary product photo to engage visual shoppers.
Use lifestyle images, not just standard product shots on white backgrounds.
Ensure your image quality meets the standard of 1,500×1,500 pixels.
Categorize your inventory by product type and share key product highlights.
Prepare specific feed attributes required for UCP, such as returns, support information, and policy information.
Support Google’s Native Checkout when possible (checkout logic integrated directly into the AI interface). Google also offers another option called Embedded Checkout (an iframe-based solution for highly bespoke branding). This will work, but is suboptimal at this time.
To set your brand apart when AI is helping consumers make immediate, confident purchasing decisions, you must pass trust and convenience signals directly through your feed. The data shows that these elements directly impact the bottom line:
Indicate clearly if your brand offers free shipping.
Share your shipping speed (next day, two-day, etc.).
Display your return policy.
Submit sale prices when available. Regardless, ensure the feed represents the most accurate pricing details.
The shift to UCP requires foundational updates to how your backend systems interact with Google. You must work hand in hand with their development and SEO teams to prepare for these AI search experiences.
Migrate from the Content API to the Merchant API to enable real-time inventory updates and programmatic access to data and insights.
Upgrade your tag in Data Manager and implement Conversion with Cart Data to effectively use first-party data in your campaigns.
Prioritize content-rich pages for indexing and crawling, and ensure structured data is always supported by visible content.
Create your Business Profile and claim your Brand Profile to highlight your business information and brand voice on Google platforms.
Have your development team explore and prototype with UCP open source on GitHub to map APIs for checkout, session creation, and order management.
4. Additional features and tools beyond UCP to consider
Google is actively rolling out pilot programs designed specifically for the agentic era. Be proactive in adopting these new solutions rather than waiting for wide release:
Prepare for the “Business Agent,” a virtual sales associate that acts like a brand representative to answer product questions right on Google.
Consider the “Direct Offers Pilot,” a new way for advertisers to present exclusive discounts directly in AI Mode.
Inquire about the “Conversational Attributes Pilot,” which introduces dozens of new Merchant Center attributes designed to enhance discovery in the conversational commerce era.
The launch of Google’s Universal Commerce Protocol signals a significant shift. The SERP is becoming a transactional engine that increasingly operates within large language models.
UCP presents a meaningful opportunity. By removing friction between discovery and purchase, conversion rates could increase.
However, taking advantage of this requires stepping outside the Google Ads interface and working directly in your feed data and technical integrations, much like with Google Shopping. While this isn’t new, it’s becoming more important.
Ultimately, this comes down to the quality of your product data.
For years, SEO followed a fairly predictable playbook: create valuable content, optimize it for search engines, and compete for rankings on Google. But the way people discover information online is changing quickly. Tools like ChatGPT, Perplexity, and Gemini are introducing a new layer between users and search engines, where answers are generated and synthesized rather than simply retrieved.
In a recent episode of the Get Discovered podcast, Joe Walsh, CEO of Prerender.io, sat down with Yoast’s Principal Architect Alain Schlesser to discuss what this shift means for SEO and online discoverability. Their conversation explores how AI answer engines are reshaping the search landscape and why many traditional SEO assumptions no longer fully apply.
Alain shares insights on:
How AI systems retrieve and surface information
Why brands must rethink their online positioning, and
What businesses should start preparing for as AI-driven discovery evolves over the next 12–18 months?
The new discovery layer: AI is becoming the gatekeeper
“There’s now a layer in front of search that acts as a gatekeeper before you even hit those search engines.”
AI adds a new layer to the information discovery process for the searchers
That’s how Alain describes one of the biggest structural shifts happening in online discovery today. For years, the flow of search was straightforward: a user typed a search term into a search engine, the engine returned a list of results, and the user decided which link to click.
But AI-powered systems have added a new layer to that process.
From search queries to conversational discovery
Today, many users begin their search journey by asking questions in tools like ChatGPT, Perplexity, or Gemini instead of typing traditional keyword queries. The AI system then determines whether it needs external information and may generate multiple search queries behind the scenes to retrieve relevant sources.
The discovery flow now looks something like this:
The traditional vs the new agentic search
Previously:
User → Search engine → Website
Now:
User → AI model → Search engine → Website → AI synthesis → User
Instead of presenting a list of links, the AI model interprets and combines information before generating an answer. Alain explains this process in more detail in the podcast, highlighting how AI systems now act as a filtering layer between users and the web.
Search is fragmenting beyond Google
“We were in a rather comfortable position where we were only dealing with a monopoly search.”
For much of the past two decades, SEO largely meant optimizing for one ecosystem: Google. Even though other search engines existed, Google dominated how people discovered information online.
But that environment is changing.
As Alain explains, AI systems are introducing a new layer of fragmentation in discovery. Different AI platforms rely on different combinations of search engines, indexes, and training data, which means results can vary widely between them.
In practice, that means a brand might appear prominently in one AI system while barely showing up in another. For SEO teams, this marks a shift toward thinking about visibility across multiple AI-driven environments rather than just one search engine.
Despite technological changes, Alain emphasizes that the core principles of good SEO remain intact.
“You shouldn’t try to game the search engine. You need to create valuable content that humans actually want to read, and structure it so search engines can understand it.”
At its core, search still aims to deliver the best possible answers to users. Whether the request comes from a person typing a query or an AI model generating one behind the scenes, the goal remains the same: surface useful, reliable information.
That means SEO teams should continue focusing on fundamentals such as:
AI systems may change how information is surfaced, but they still rely on the same underlying signals of quality and relevance.
The “top results or nothing” reality
As the discovery landscape evolves, another important shift emerges in how AI systems interact with search results.
“They don’t see the full search result page. What the LLM typically sees is just the five topmost elements per search query.”
Unlike human users, AI systems typically work with a very small set of retrieved sources before generating an answer. That means if your content doesn’t appear among those top results, it may never reach the AI system at all.
In a world where AI answers rely on the summarization of modern content, only the sources that make it into that small retrieval window influence the final response.
This makes strong search visibility more important than ever. Ranking well isn’t just about earning clicks anymore. It determines whether your content is even considered when AI systems construct an answer.
Why “safe” content strategies are no longer enough
Even if your content reaches those top results, there’s another layer of filtering happening inside the AI model itself.
Large language models compress enormous amounts of information during training. As Alain explains:
What the model keeps are the dominant signal and the outliers. Everything in between is often compressed away as statistical noise.
In the podcast, Alain uses this idea to explain why brands that try to be broadly acceptable or “safe” may struggle to stand out in AI-driven discovery.
The takeaway is clear: in a world where AI systems summarize and compress information, having a clear and distinctive perspective becomes increasingly important.
Why Yoast launched AI visibility tracking
As AI systems reshape how information is discovered and summarized, a new challenge emerges for businesses: understanding how their brand appears in AI-generated answers. That’s the problem Yoast set out to address with Yoast SEO AI +, a feature designed to help businesses monitor how their brand shows up across major AI platforms.
Earlier in this article, we explored how AI systems now sit between users and search engines, retrieve only a small set of results, and synthesize answers through the summarization of modern content. Together, these changes create a new discovery layer that is far less transparent than traditional search.
As Alain explains in the podcast:
“We need more visibility and observability into that AI-based layer to figure out what is going on there. Right now, it’s mostly a black box.”
Unlike traditional search engines, AI systems don’t provide clear rankings, impressions, or click data that explain why a source was selected. Instead, answers are generated from a mix of retrieved content, training data, and model reasoning. For businesses, that makes it much harder to understand whether their brand is visible in AI-driven discovery.
This is where AI visibility tracking becomes valuable. Rather than focusing only on search rankings, teams also need insight into how their brand is represented inside AI responses.
Yoast SEO AI + helps surface that layer by allowing teams to observe how their brand appears across AI systems, such as ChatGPT, Perplexity, and Gemini.
The goal is not simply to track another metric. It’s to help businesses understand how AI systems interpret and represent their brand.
As Alain notes, visibility in AI systems can vary significantly depending on the platform, because each one relies on different combinations of:
search engines
indexes
training datasets
This means a brand might appear frequently in one AI system while barely showing up in another. Without visibility into those differences, it becomes difficult for teams to understand how their content performs in the new discovery landscape.
In that sense, tools like Yoast SEO AI + are less about selling a new SEO feature and more about helping businesses observe a rapidly changing ecosystem where discoverability no longer happens only in search results.
The next evolution: AI agents making decisions
“What we will increasingly see is automated transactions where AI agents navigate websites and initiate actions on behalf of users.”
So far, much of the discussion around AI and search has focused on how answers are generated. But according to Alain, the next phase of this evolution may go further.
Over the next 12–18 months, AI systems may begin moving beyond answering questions and start performing tasks on behalf of users. Instead of guiding someone toward a website to make a decision, AI agents could increasingly compare options, interact with websites, and complete actions automatically.
If that shift happens, the traditional customer journey could change significantly. Alain shares a fascinating perspective on what this might mean for businesses in the coming years in the full podcast conversation.
SEO matters more than ever
AI isn’t replacing SEO. If anything, it’s reinforcing why good SEO matters in the first place. What’s changing is the path between users and content. Instead of navigating search results themselves, users increasingly receive answers that AI systems retrieve, interpret, and synthesize.
That makes strong fundamentals more important than ever. Businesses still need to focus on:
valuable content
clear structure
discoverable and indexable pages
a distinctive brand identity
But the central question for SEO is evolving. It’s no longer just:
http://dubadosolutions.com/wp-content/uploads/2017/05/dubado-logo-1.png00http://dubadosolutions.com/wp-content/uploads/2017/05/dubado-logo-1.png2026-03-13 12:32:092026-03-13 12:32:09Rethinking SEO in the age of AI
http://dubadosolutions.com/wp-content/uploads/2017/05/dubado-logo-1.png00http://dubadosolutions.com/wp-content/uploads/2017/05/dubado-logo-1.png2026-03-13 08:00:472026-03-13 08:00:47How To Fix “Discovered ‐ Currently Not Indexed” in Google Search Console
A good XML sitemap serves as a roadmap for your website, guiding Google to all your important pages. XML sitemaps can be beneficial for SEO, helping Google find your essential pages quickly, even if your internal linking isn’t perfect. This post explains what they are and how they help you rank better and get surfaced by AI agents.
An XML sitemap is crucial for SEO, as it guides search engines to your important pages, improving crawl efficiency
XML sitemaps list essential URLs and provide metadata, helping search engines understand content and prioritize crawling
With Yoast SEO, you can automatically generate and manage XML sitemaps, keeping them up to date
XML sitemaps support faster indexing of new content and help discover orphan pages that aren’t linked elsewhere
Add your XML sitemap to Google Search Console to help Google find it quickly and monitor indexing status
What are XML sitemaps?
An XML sitemap is a file that lists a website’s essential pages, ensuring Google can find and crawl them. It also helps search engines understand your website structure and prioritize important content.
Fun fact:
XML is not the only type of sitemap; there are several sitemap formats, each serving a slightly different purpose:
RSS, mRSS, and Atom 1.0 feeds: These are typically used for content that changes frequently, such as blogs or news sites. They automatically highlight recently updated content
Text sitemaps: The simplest format. These contain a plain list of URLs, one per line, without additional metadata
These are HTML sitemaps that are created for visitors, not search engines. They list and link to important pages in a clear, hierarchical structure to improve user navigation. An XML sitemap, however, is specifically designed for search engines.
XML sitemaps include additional metadata about each URL, helping search engines better understand your content. For example, it can indicate:
When a page was last meaningfully updated
How important is a URL relative to other URLs
Whether the page includes images or videos, using sitemap extensions
Search engines use this information to crawl your site more intelligently and efficiently, especially if your website is large, new, or has complex navigation.
Looking to expand your knowledge of technical SEO? We have a course in the Yoast SEO Academy focusing on crawlability and indexability. One of the topics we tackle is how to use XML sitemaps properly.
What does an XML sitemap look like?
An XML sitemap follows a standardized format. It is a text file written in Extensible Markup Language (XML) that search engines can easily read and process. As it follows a structured format, search engines like Google can quickly understand which URLs exist on your website and when they were last updated.
Here is a very simple example of an XML sitemap that contains a single URL:
Each URL in a sitemap is wrapped in specific XML tags that provide information about that page. Some of these tags are required, while others are optional but helpful for search engines.
Below is a breakdown of the most common XML sitemap tags:
Tag
Requirement
Description
<?xml>
Mandatory
Declares the XML version and character encoding used in the file.
<urlset>
Mandatory
The container for the entire sitemap. It defines the sitemap protocol and holds all listed URLs.
<url>
Mandatory
Represents a single URL entry in the sitemap. Each page must be enclosed within its own <url> tag.
<loc>
Mandatory
Specifies the full canonical URL of the page you want search engines to crawl and index.
<lastmod>
Optional
Indicates the date when the page was last meaningfully updated, helping search engines know when to re-crawl the page.
<changefreq>
Optional
Suggests how frequently the content on the page is expected to change, such as daily, weekly, or monthly.
<priority>
Optional
Suggests the relative importance of a page compared to other pages on the same site, using a scale from 0.0 to 1.0.
Note: While sitemaps.org supports optional tags like <changefreq> and <priority>, Google and Bing generally ignore them. Google has officially discarded them. Instead, it prefers <lastmod> to signal (last modified) when content actually updates.
What is an XML sitemap index?
A sitemap index is a file that lists multiple XML sitemap files. Instead of containing individual page URLs, it acts as a directory that points search engines to several separate sitemaps.
This becomes useful when a website has a large number of URLs or when the site owner wants to organize sitemaps by content type. For example, a site may have separate sitemaps for pages, blog posts, products, or categories.
Here’s a breakdown of how XML sitemap and XML sitemap index differ:
Feature
XML Sitemap
XML Sitemap Index
Purpose
Lists individual URLs on a website
Lists multiple sitemap files
Content
Contains page URLs and optional metadata
Contains links to sitemap files
Use case
Suitable for small or medium-sized sites
Useful when a site has multiple sitemaps
Structure
Uses <urlset> and <url> tags
Uses <sitemapindex> and <sitemap> tags.
Search engines support sitemap limits. A single sitemap can contain up to 50,000 URLs or be up to 50 MB in size. If your website exceeds these limits, you can create multiple sitemaps and group them together using a sitemap index.
Submitting a sitemap index to search engines allows them to discover and process all your sitemaps from a single file.
In short, an XML sitemap helps search engines discover pages, while a sitemap index helps search engines discover multiple sitemaps.
Below is a simple example of what a sitemap index file looks like:
In this example, the sitemap index references two separate sitemaps. Each one can contain thousands of URLs. This structure helps search engines efficiently discover and crawl large websites.
Why do you need an XML sitemap?
Technically, you don’t need an XML sitemap. Search engines can often discover your pages through internal links and backlinks from other websites. However, having an XML sitemap is highly recommended because it helps search engines crawl and understand your site more efficiently.
Here are some key benefits of using an XML sitemap:
Improved crawl efficiency
Sitemaps help search engines like Google and Bing crawl large or complex websites more efficiently. By listing your important URLs in one place, you make it easier for crawlers to find and prioritize valuable pages.
Faster indexing of new content
When you update or add new pages to your site, including them in your sitemap helps search engines discover them sooner. This can lead to faster indexing, especially for websites that publish content frequently, such as blogs, news sites, or e-commerce stores with changing product listings.
Discovery of orphan pages
Orphan pages are pages that are not linked from other parts of your website. Because crawlers typically follow links to discover content, these pages can sometimes be missed. An XML sitemap can help ensure these pages are still discovered.
Additional metadata signals
XML sitemaps can include additional metadata about each URL, such as the <lastmod> tag. This information helps search engines understand when a page was last updated and whether it may need to be crawled again.
Support for specialized content
Sitemaps can also be extended to include specific types of content, such as images or videos. These specialized sitemaps help search engines better understand and surface media content in results like Google Images or video search.
Better understanding of site structure
A well-organized sitemap gives search engines a clearer overview of your website’s structure and the relationship between different sections or content types.
Indexing insights through Search Console
When you submit your sitemap to tools like Google Search Console, you can monitor how many URLs are discovered and indexed. This also helps you identify crawl issues or indexing errors.
Support for multilingual websites
For websites targeting multiple languages or regions, XML sitemaps can include alternate language versions of pages using hreflang annotations. This helps search engines serve the correct language version to users in different locations.
Do XML sitemaps matter for AI search?
Yes, but indirectly. AI-powered search experiences like AI Overviews or Bing Copilot still rely on the traditional search index to discover and retrieve content. That means your pages usually need to be crawled and indexed first before they can appear in AI-generated answers.
This is where XML sitemaps still help. By listing your important URLs in one place, a sitemap makes it easier for search engines to discover and index your content. Keeping the <lastmod> value accurate can also help search engines prioritize recently updated pages, which is especially useful for AI systems that aim to surface fresh information.
In short, a sitemap won’t make your content appear in AI answers by itself. But it helps ensure your pages are discoverable, indexed, and up to date, which increases their chances of being used in AI-powered search results.
Adding XML sitemaps to your site with Yoast
Because XML sitemaps play an important role in helping search engines discover and crawl your content, Yoast SEO automatically generates XML sitemaps for your website. This feature is available in both the free and premium versions (Yoast SEO Premium, Yoast WooCommerce SEO, and Yoast SEO AI+) of the plugin.
A smarter analysis in Yoast SEO Premium
Yoast SEO Premium has a smart content analysis that helps you take your content to the next level!
Instead of requiring you to manually create or maintain sitemap files, Yoast SEO handles everything automatically. As you publish, update, or remove content, the plugin updates your sitemap index and the individual sitemaps in real time. This ensures search engines always have an up-to-date overview of the pages you want them to crawl and index.
Yoast SEO also organizes your sitemaps intelligently. Rather than placing every URL in a single file, the plugin creates a sitemap index that groups separate sitemaps for different content types, such as posts, pages, and other public content types, with just one click.
Another important advantage is that Yoast SEO only includes content that should actually appear in search results. Pages set to noindex are automatically excluded from the XML sitemap. This helps keep your sitemap clean and focused on the URLs that matter for SEO.
Controlling what appears in your sitemap
While the plugin automatically manages sitemaps, you still have full control over which content is included.
For example, if you don’t want a specific post or page to appear in search results, you can change the setting “Allow search engines to show this content in search results?” in the Yoast SEO sidebar under the Advanced tab. When this option is set to No, the content will be marked as noindex and automatically excluded from the XML sitemap. When set to Yes, the content remains eligible to appear in search results and is included in the sitemap.
This makes it easy to keep your sitemap focused on the pages you actually want search engines to crawl and index. In some cases, developers can further customize sitemap behavior. For example, filters can be used to limit the number of URLs per sitemap or to programmatically exclude certain content types.
Because all of this happens automatically, most website owners never need to manage sitemap files manually. Yoast SEO keeps your XML sitemap clean, up to date, and optimized for search engines as your site grows.
If you want Google to find your XML sitemap quicker, you’ll need to add it to your Google Search Console account. You can find your sitemaps in the ‘Sitemaps’ section. If not, you can add your sitemap at the top of the page.
Adding your sitemap helps check whether Google has indexed all pages in it. We recommend investigating this further if there is a significant difference between the ‘submitted’ and ‘indexed’ counts for a particular sitemap. Maybe there’s an error that prevents some pages from indexing? Another option is to add more links pointing to content that has not yet been indexed.
Google correctly processed all URLs in a post sitemap
What websites need an XML sitemap?
Google’s documentation says sitemaps are beneficial for “really large websites,” “websites with large archives,” “new websites with just a few external links to them,” and “websites which use rich media content.” According to Google, proper internal linking should allow it to find all your content easily. Unfortunately, many sites do not properly link their content logically.
While we agree that these websites will benefit the most from having one, at Yoast, we think XML sitemaps benefit every website. As the web grows, it’s getting harder and harder to index sites properly. That’s why you should provide search engines with every available option to have it found. In addition, XML sitemaps make search engine crawling more efficient.
Every website needs Google to find essential pages easily and know when they were last updated. That’s why this feature is included in the Yoast SEO plugin.
Which pages should be in your XML sitemap?
How do you decide which pages to include in your XML sitemap? Always start by thinking of the relevance of a URL: when a visitor lands on a particular URL, is it a good result? Do you want visitors to land on that URL? If not, it probably shouldn’t be in it. However, if you don’t want that URL to appear in the search results, you must add a ‘noindex’ tag. Leaving it out of your sitemap doesn’t mean Google won’t index the URL. If Google can find it by following links, Google can index the URL.
Example: A new blog
For example, you are starting a new blog. Of course, you want to ensure your target audience can find your blog posts in the search results. So, it’s a good idea to immediately include your posts in your XML sitemap. It’s safe to assume that most of your pages will also be relevant results for your visitors. However, a thank you page that people will see after they’ve subscribed to your newsletter is not something you want to appear in the search results. In this case, you don’t want to exclude all pages from your sitemap, only this one.
Let’s stay with the example of the new blog. In addition to your blog posts, you create some categories and tags. These categories and tags will have archive pages that list all posts in that specific category or tag. However, initially, there might not be enough content to fill these archive pages, making them ‘thin content’.
For example, tag archives that show just one post are not that valuable to visitors yet. You can exclude them from the sitemap when starting your blog and include them once you have enough posts. You can even exclude all your tag pages or category pages simultaneously using Yoast SEO.
However, this kind of page could also be excellent ranking material. So, if you think: well, yes, this tag page is a bit ‘thin’ right now, but it could be a great landing page, then enrich it with additional information and images. And don’t exclude it from your sitemap in this case.
Frequently asked questions about XML sitemaps
There are a lot of questions regarding XML sitemaps, so we’ve answered a couple in the FAQ below:
What happens when Google Search Console says an XML sitemap has errors?
An invalid or improperly read XML sitemap usually indicates a specific error that needs investigation. Check the reported issue to understand what is causing the problem. Make sure the sitemap has been submitted through the search engine’s webmaster tools. When the sitemap is marked as invalid, review the listed errors and apply the appropriate fixes for each one.
How can I check whether a website has an XML sitemap?
In most cases, you can find out if sites have an XML sitemap by adding sitemap.xml to the root domain. So, that would be example.com/sitemap.xml. If a site has Yoast SEO installed, you’ll notice that it’s redirected to example.com/sitemap_index.xml. sitemap_index.xml is the base sitemap that collects all the sitemaps on your site into a single page.
How can I update an XML sitemap?
There are ways to create and update your sitemaps by hand, but you shouldn’t. Also, there are static generators that let you generate a sitemap whenever you want. But, again, this process would need to repeat itself every time you add or update content. The best way to do this is by simply using Yoast SEO. Turn on the XML sitemap in Yoast SEO, and all your updates will be applied automatically.
Can I use <priority> in my XML sitemap?
In the past, people believed that adding the <priority> attribute to sitemaps would signal to Google that specific URLs should be prioritized. Unfortunately, it doesn’t do anything, as Google has often said it doesn’t use this attribute to read or prioritize content in sitemaps.
Check your own XML sitemap!
Now you know how important it is to have an XML sitemap: it can help your site’s SEO. If you add the correct URLs, Google can easily access your most important pages and posts. Google will also find updated content easily, so it knows when a URL needs to be crawled again. Lastly, adding your XML sitemap to Google Search Console helps Google find it quickly and lets you check for sitemap errors.
So check your XML sitemap and find out if you’re doing it right!
http://dubadosolutions.com/wp-content/uploads/2017/05/dubado-logo-1.png00http://dubadosolutions.com/wp-content/uploads/2017/05/dubado-logo-1.png2026-03-11 13:52:012026-03-11 13:52:01What is an XML sitemap and why should you have one?
https://i0.wp.com/dubadosolutions.com/wp-content/uploads/2021/12/web-design-creative-services.jpg?fit=1500%2C600&ssl=16001500http://dubadosolutions.com/wp-content/uploads/2017/05/dubado-logo-1.png2026-03-11 06:00:002026-03-11 06:00:00Search Central Live is coming to Canada
Starting July 1st, Meta will add “location fees” to ad buys targeting users in six countries — effectively offloading the cost of European digital services taxes onto the advertisers themselves.
The numbers. Fees will match each country’s digital services tax rate:
France, Italy, Spain: 3%
Austria, Turkey: 5%
UK: 2%
How it works in practice. Per Meta’s email to advertisers — “$100 in ads delivered to Italy will cost $103, plus any applicable VAT on top of that.”
The fine print. The fees apply to where the ad is delivered, not where the advertiser is based — meaning a US brand running campaigns targeting French users will pay the French rate regardless.
Why we care. This is a direct, unavoidable cost increase hitting European campaigns on July 1 — with no opt-out. If you’re running ads targeting users in France, Italy, Spain, Austria, Turkey, or the UK, your effective CPM and CPA benchmarks are about to get more expensive, which means existing budgets will stretch less far and current ROAS targets may no longer be achievable without adjustment.
And since the fee is based on where the ad is delivered rather than where you’re based, even non-European brands aren’t off the hook.
The big picture for advertisers. This isn’t unique to Meta — Google and Amazon already charge similar pass-through fees. But it’s a meaningful shift in how European ad budgets need to be calculated, and campaign managers should revisit their cost models before July 1 to account for the added overhead across affected markets.
The backdrop. Digital services taxes have been a flashpoint between Europe and Washington. The Trump administration has threatened retaliation against European firms over the levies — adding geopolitical uncertainty to what is already a complex compliance landscape for global advertisers.
https://i0.wp.com/dubadosolutions.com/wp-content/uploads/2026/03/Inside-Metas-AI-driven-advertising-system-How-Andromeda-and-GEM-work-together-irR0vC.jpg?fit=1920%2C1080&ssl=110801920http://dubadosolutions.com/wp-content/uploads/2017/05/dubado-logo-1.png2026-03-10 16:17:372026-03-10 16:17:37Meta is passing Europe’s digital taxes directly to advertisers
Positive coverage creates exposure, authority, trust, and often valuable backlinks.
But for many people, the path to getting it is a mystery. Others believe myths about how it works.
Some believe you have to be at the very top of your industry before the media will care about your story.
That’s simply false.
Others believe you can simply buy your way into media coverage.
There’s a small degree of truth to that.
You can find contributors willing to feature you (or your client) for a fee, but this blatantly violates every outlet’s contributor guidelines. You may land the feature, but editors will eventually find out.
What happens then?
First, the article gets deleted or any mention of you and your links gets removed. Then, the contributor gets removed from the platform and blacklisted in the media industry. Finally, you get blacklisted too.
Good luck getting featured again. It won’t happen.
The reality is that you can get featured in the media.
You just need to understand the process and execute it consistently.
Develop your story
You probably have a great story — you just may not realize it yet.
The media has to produce a constant stream of content. If you have a strong story, you’re already one-third of the way to getting featured.
Let’s start with what doesn’t make a great story.
You’re the first.
You think you’re the best (everyone thinks that, and no one cares except your mother).
You’re the biggest.
You want to change the world.
So what does make a great story?
Like the answer to most SEO questions: it depends.
A great story starts with an actual story.
You have to explain, in an engaging way, why anyone should care about what you have to say.
For example, I often tell the story of how I used PR to rebuild my success after being on my deathbed.
I explain that my agency’s specific PR approach comes from the exact process I used to rebuild my own business — and that I want to give others the same advantage.
And my story is easily verifiable.
But you don’t need a life-or-death struggle to have a compelling story.
You just need a story that shows a deeper purpose. A mission. Something people can get excited about and care about.
Craft your pitch
Even with the best story in the world, you still need an effective pitch.
Your pitch has to cut through the noise and grab attention. Journalists, producers, and others in the media are inundated with pitches — many receive hundreds every day. Your pitch has to tell your story clearly and quickly, and motivate them to respond.
Easier said than done.
Most pitches are sent by email, so most people start with the subject line. That’s the exact opposite of what you should do.
Start with the body of the email. There’s a reason for this, which we’ll get to shortly.
Find a way to connect your story to current events. If a topic is already popular in the media, other outlets are more likely to cover it.
But remember: while the story involves you, it isn’t about you.
You have to pitch from the perspective of what the audience wants. The journalist’s, editor’s, or producer’s needs come second, and yours come in a distant last place.
Sorry, that’s just the way it is.
You need to distill your story and why the audience should care into a few sentences. You can add a little more detail after that, but keep it short. If they see a wall of text, they’ll likely delete your email.
Once your pitch is solid, write your subject line. It should be short, punchy, and aligned with your pitch.
Short and punchy matters because the subject line determines whether they open your email.
If the pitch doesn’t align with the subject line, they’ll likely delete the email without reading it. Getting attention means nothing if they don’t read the message.
I once saw a publicist use a subject line that certainly grabbed attention, but it had zero positive impact and damaged his reputation.
What was it?
“Fuck You!”
Bottom line: your pitch must quickly and clearly show the value the audience will get, and your subject line must grab attention in a positive way while aligning with the pitch.
Build your media list
PR isn’t a numbers game.
Yet people treat it like one. They buy or compile lists of media contacts and blast their pitch to anyone they can find.
That’s no different from spam emails selling generic Viagra.
Success comes from sending the right pitch to the right people at the right time.
Finding the right people means identifying journalists, producers, and other media contacts who cover the types of stories you’re telling.
Several expensive tools can help you find these contacts and their information. But you can often find the same information with a search engine and social media. In fact, that’s how I built most of my media relationships.
As for the right time, that’s largely a matter of chance.
Send your pitch
There’s no magic formula.
The time of day you send your pitch doesn’t matter much unless it’s extremely time-sensitive, which most business topics aren’t. Producers often check email at certain times, but they won’t touch it while preparing for or running their show.
Now here’s something you need to avoid:
Don’t bombard them with follow-up emails!
For truly time-sensitive stories, it may be acceptable to follow up within the same week. In most cases, though, wait about a week. Frequent follow-ups will annoy journalists, producers, and other media contacts.
Stop after two or three follow-ups. If you haven’t received a response by then, they likely aren’t interested in the story.
Try not to take it personally. They probably won’t tell you it’s not a fit. Given the sheer volume of pitches they receive, responding to every one would be a full-time job.
Nurture your relationships
Most of your pitches won’t result in media coverage.
The problem is that most people stop after a rejection or no response.
That’s crazy to me.
I can’t tell you how many times I’ve heard “no” or received no reply before finally landing a feature.
It happened because I didn’t pitch once and move on. These contacts all started as strangers, but I invested time and energy in building real relationships.
As a result, when I reach out, they open and read my emails because I’m not a stranger. Those relationships make it far easier to turn a pitch into media coverage.
Most initial outreach won’t lead to coverage. But if you nurture the right relationships, you’ll eventually build a network of responsive press contacts.
https://i0.wp.com/dubadosolutions.com/wp-content/uploads/2026/03/media-coverage-relationships-Jcra37.png?fit=1920%2C1080&ssl=110801920http://dubadosolutions.com/wp-content/uploads/2017/05/dubado-logo-1.png2026-03-10 16:00:002026-03-10 16:00:00How to get media coverage: A practical guide to pitching journalists
Perplexity AI must stop using its Comet browser agent to make purchases on Amazon. A federal judge sided with Amazon in an early ruling over AI shopping bots.
Why we care. The case targets a core promise of AI agents: completing tasks like shopping on a user’s behalf. If courts restrict how agents access sites, AI agents could face strict limits when interacting with logged-in accounts on major websites.
What happened. U.S. District Judge Maxine Chesney granted Amazon a preliminary injunction Monday in San Francisco federal court.
The order blocks Perplexity from using its Comet browser agent to access password-protected parts of Amazon, including Prime subscriber accounts.
Chesney wrote that Amazon presented “strong evidence” that Comet accessed accounts “with the Amazon user’s permission but without authorization by Amazon.”
The ruling also requires Perplexity to destroy any Amazon data it previously collected.
Catch-up quick. Amazon sued Perplexity in November, accusing the startup of computer fraud and unauthorized access. The company said Comet made purchases from Amazon on behalf of users without properly identifying itself as a bot.
What’s next. The order is paused for one week to allow Perplexity to appeal.
What they’re saying. Amazon spokesperson Lara Hendrickson told Bloomberg (subscription required) the injunction “will prevent Perplexity’s unauthorized access to the Amazon store and is an important step in maintaining a trusted shopping experience for Amazon customers.”
Google Ads is rolling out auto end screens — a new feature that appends an interactive, auto-generated card to the end of eligible video ads to nudge viewers toward a conversion.
How it works. An interactive screen appears for a few seconds immediately after the video finishes playing.
Content is auto-populated from campaign data — app name, icon, price, and a direct install link for app campaigns
End screens appear by default on eligible ads, requiring no setup from advertisers
Why we care. Advertisers no longer need to manually build post-roll calls-to-action. This feature is on by default and changes the end of your video ads — and if you’ve already built custom YouTube end screens, they’ll be overridden without any warning. With end screens being the last thing a viewer sees before deciding to act, losing control of that moment matters.
And with broader expansion planned, now is the time to understand how it works before it reaches more of your campaigns.
The catch. Enabling auto end screens in Google Ads overrides any manually added YouTube end screens — meaning advertisers who’ve already customized their YouTube end cards will lose them.
Current limitations. The feature is only available for in-stream ads running in mobile app install campaigns, with broader expansion planned but not yet dated.
What stays the same. Auto end screens don’t affect billing or view counts — they’re purely an added engagement layer tacked on after a full video view.
Next steps. Advertisers running mobile app install campaigns should audit their video ads now — check whether auto end screens are serving as expected and verify that any manually added YouTube end screens aren’t being silently overridden. As Google expands the feature beyond app installs, it’s worth establishing a review process early so campaigns are ready when eligibility broadens.
https://i0.wp.com/dubadosolutions.com/wp-content/uploads/2026/03/Google-end-screen-JHkgTn.png?fit=740%2C416&ssl=1416740http://dubadosolutions.com/wp-content/uploads/2017/05/dubado-logo-1.png2026-03-10 15:12:382026-03-10 15:12:38Google adds automatic end screens to video ads
The DSCRI-ARGDW pipeline maps 10 gates between your content and an AI recommendation across two phases: infrastructure and competitive. Because confidence multiplies across the pipeline, the weakest gate is always your biggest opportunity. Here, we focus on the first five gates.
The infrastructure phase (discovery through indexing) is a sequence of absolute tests: the system either has your content, or it doesn’t. Then, as you pass through the gates, there’s degradation.
For example, a page that can’t be rendered doesn’t get “partially indexed,” but it may get indexed with degraded information, and every competitive gate downstream operates on whatever survived the infrastructure phase.
If the raw material is degraded, the competition in the ARGDW phase starts with a handicap that no amount of content quality can overcome.
The industry compressed these five distinct DSCRI gates into two words: “crawl and index.” That compression hides five separate failure modes behind a single checkbox. This piece breaks the simplistic “crawl and index” into five clear gates that will help you optimize significantly more effectively for the bots.
If you’re a technical SEO, you might feel you can skip this. Don’t.
You’re probably doing 80% of what follows and missing the other 20%. The gates below provide measurable proof that your content reached the index with maximum confidence, giving it the best possible chance in the competitive ARGDW phase that follows.
Sequential dependency: Fix the earliest failure first
The infrastructure gates are sequential dependencies: each gate’s output is the next gate’s input, and failure at any gate blocks everything downstream.
If your content isn’t being discovered, fixing your rendering is wasted effort, and if your content is crawled but renders poorly, every annotation downstream inherits that degradation. Better to be a straight C student than three As and an F, because the F is the gate that kills your pipeline.
The audit starts with discovery and moves forward. The temptation to jump to the gate you understand best (and for many technical SEOs, that’s crawling) is the temptation that wastes the most money.
Your customers search everywhere. Make sure your brand shows up.
The SEO toolkit you know, plus the AI visibility data you need.
Start Free Trial
Get started with
Discovery, selection, crawling: The three gates the industry already knows
Discovery and crawling are well-understood, while selection is often overlooked.
Discovery is an active signal. Three mechanisms feed it:
XML sitemaps (the census).
IndexNow (the telegraph).
Internal linking (the road network).
The entity home website is the primary discovery anchor for pull discovery, and confidence is key. The system asks not just “does this URL exist?” but “does this URL belong to an entity I already trust?” Content without entity association arrives as an orphan, and orphans wait at the back of the queue.
The push layer (IndexNow, MCP, structured feeds) changes the economics of this gate entirely, and I’ll explain what changes when you stop waiting to be found and start pushing.
Selection is the system’s opinion of you, expressed as crawl budget. As Microsoft Bing’s Fabrice Canel says, “Less is more for SEO. Never forget that. Less URLs to crawl, better for SEO.”
The industry spent two decades believing more pages equals more traffic. In the pipeline model, the opposite is true: fewer, higher-confidence pages get crawled faster, rendered more reliably, and indexed more completely. Every low-value URL you ask the system to crawl is a vote of no confidence in your own content, and the system notices.
Not every page that’s discovered in the pull model is selected. Canel states that the bot assesses the expected value of the destination page and will not crawl the URL if the value falls below a threshold.
Crawling is the most mature gate and the least differentiating. Server response time, robots.txt, redirect chains: solved problems with excellent tooling, and not where the wins are because you and most of your competition have been doing this for years.
What most practitioners miss, and what’s worth thinking about: Canel confirmed that context from the referring page carries forward during crawling.
Your internal linking architecture isn’t just a crawl pathway (getting the bot to the page) but a context pipeline (telling the bot what to expect when it arrives), and that context influences selection and then interpretation at rendering before the rendering engine even starts.
Rendering fidelity: The gate that determines what the bot sees
Rendering fidelity is where the infrastructure story diverges from what the industry has been measuring.
After crawling, the bot attempts to build the full page. It sometimes executes JavaScript (don’t count on this because the bot doesn’t always invest the resources to do so), constructs the document object model (DOM), and produces the rendered DOM.
I coined the term rendering fidelity to name this variable: how much of your published content the bot actually sees after building the page. Content behind client-side rendering that the bot never executes isn’t degraded, it’s gone, and information the bot never sees can’t be recovered at any downstream gate.
Every annotation, every grounding decision, every display outcome depends on what survived rendering. If rendering is your weakest gate, it’s your F on the report card, and remember: everything downstream inherits that grade.
The friction hierarchy: Why the bot renders some sites more carefully than others
The bot’s willingness to invest in rendering your page isn’t uniform. Canel confirmed that the more common a pattern is, the less friction the bot encounters.
I’ve reconstructed the following hierarchy from his observations. The ranking is my model. The underlying principle (pattern familiarity reduces selection, crawl, rendering, and indexing friction and processing cost) is confirmed:
Approach
Friction level
Why
WordPress + Gutenberg + clean theme
Lowest
30%+ of the web. Most common pattern. Bot has highest confidence in its own parsing.
Established platforms (Wix, Duda, Squarespace)
Low
Known patterns, predictable structure. Bot has learned these templates.
WordPress + page builders (Elementor, Divi)
Medium
Adds markup noise. Downstream processing has to work harder to find core content.
Bespoke code, perfect HTML5
Medium-High
Bot does not know your code is perfect. It has to infer structure without a pattern library to validate against.
Bespoke code, imperfect HTML5
High
Guessing with degraded signals.
The critical implication, also from Canel, is that if the site isn’t important enough (low publisher entity authority), the bot may never reach rendering because the cost of parsing unfamiliar code exceeds the estimated benefit of obtaining the content. Publisher entity confidence has a huge influence on whether you get crawled and also how carefully you get rendered (and everything else downstream).
JavaScript is the most common rendering obstacle, but it isn’t the only one: missing CSS, proprietary elements, and complex third-party dependencies can all produce the same result — a bot that sees a degraded version of what a human sees, or can’t render the page at all.
JavaScript was a favor, not a standard
Google and Bing render JavaScript. Most AI agent bots don’t. They fetch the initial HTML and work with that. The industry built on Google and Bing’s favor and assumed it was a standard.
Perplexity’s grounding fetches work primarily with server-rendered content. Smaller AI agent bots have no rendering infrastructure.
The practical consequence: a page that loads a product comparison table via JavaScript displays perfectly in a browser but renders as an empty container for a bot that doesn’t execute JS. The human sees a detailed comparison. The bot sees a div with a loading spinner.
The annotation system classifies the page based on an empty space where the content should be. I’ve seen this pattern repeatedly in our database: different systems see different versions of the same page because rendering fidelity varies by bot.
Three rendering pathways that bypass the JavaScript problem
The traditional rendering model assumes one pathway: HTML to DOM construction. You now have two alternatives.
WebMCP, built by Google and Microsoft, gives agents direct DOM access, bypassing the traditional rendering pipeline entirely. Instead of fetching your HTML and building the page, the agent accesses a structured representation of your DOM through a protocol connection.
With WebMCP, you give yourself a huge advantage because the bot doesn’t need to execute JavaScript or guess at your layout, because the structured DOM is served directly.
Markdown for Agents uses HTTP content negotiation to serve pre-simplified content. When the bot identifies itself, the server delivers a clean markdown version instead of the full HTML page.
The semantic content arrives pre-stripped of everything the bot would have to remove anyway (navigation, sidebars, JavaScript widgets), which means the rendering gate is effectively skipped with zero information loss. If you’re using Cloudflare, you have an easy implementation that they launched in early 2026.
Both alternatives change the economics of rendering fidelity in the same way that structured feeds change discovery: they replace a lossy process with a clean one.
For non-Google bots, try this: disable JavaScript in your browser and look at your page, because what you see is what most AI agent bots see. You can fix the JavaScript issue with server-side rendering (SSR) or static site generation (SSG), so the initial HTML contains the complete semantic content regardless of whether the bot executes JavaScript.
But the real opportunity lies in new pathways: one architectural investment in WebMCP or Markdown for Agents, and every bot benefits regardless of its rendering capabilities.
Rendering produces a DOM. Indexing transforms that DOM into the system’s proprietary internal format and stores it. Two things happen here that the industry has collapsed into one word.
Rendering fidelity (Gate 3) measures whether the bot saw your content. Conversion fidelity (Gate 4) measures whether the system preserved it accurately when filing it away. Both losses are irreversible, but they fail differently and require different fixes.
The strip, chunk, convert, and store sequence
What follows is a mechanical model I’ve reconstructed from confirmed statements by Canel and Gary Illyes.
Strip: The system removes repeating elements: navigation, header, footer, and sidebar. Canel confirmed directly that these aren’t stored per page.
The system’s primary goal is to find the core content. This is why semantic HTML5 matters at a mechanical level. <nav>, <header>, <footer>, <aside>, <main>, and <article> tags tell the system where to cut. Without semantic markup, it has to guess.
Illyes confirmed at BrightonSEO in 2017 that finding core content at scale was one of the hardest problems they faced.
Chunk: The core content is broken into segments: text blocks, images with associated text, video, and audio. Illyes described the result as something like a folder with subfolders, each containing a typed chunk (he probably used the term “passage” — potato, potarto, tomato, tomarto). The page becomes a hierarchical structure of typed content blocks.
Convert: Each chunk is transformed into the system’s proprietary internal format. This is where semantic relationships between elements are most vulnerable to loss.
The internal format preserves what the conversion process recognizes, and everything else is discarded.
Store: The converted chunks are stored in a hierarchical structure.
The individual steps are confirmed. The specific sequence and the wrapper hierarchy model are my reconstruction of how those confirmed pieces fit together.
In this model, the repeating elements stripped in the first step are not discarded but stored at the appropriate wrapper level: navigation at site level, category elements at category level. The system avoids redundancy by storing shared elements once at the highest applicable level.
Like my “Darwinism in search” piece from 2019, this is a well-informed, educated guess. And I’m confident it will prove to be substantively correct.
The wrapper hierarchy changes three things you already do:
URL structure and categorization: Because each page inherits context from its parent category wrapper, URL structure determines what topical context every child page receives during annotation (the first gate in the phase I’ll cover in the next article: ARGDW).
A page at /seo/technical/rendering/ inherits three layers of topical context before the annotation system reads a single word. A page at /blog/post-47/ inherits one generic layer. Flat URL structures and miscategorized pages create annotation problems that might appear to be content problems.
Breadcrumbs validate that the page’s position in the wrapper hierarchy matches the physical URL structure (i.e., match = confidence, mismatch = friction). Breadcrumbs matter even when users ignore them because they’re a structural integrity signal for the wrapper hierarchy.
Meta descriptions: Google’s Martin Splitt suggested in a webinar with me that the meta description is compared to the system’s own LLM-generated summary of the page. If they match, a slight confidence boost. If they diverge, no penalty, but a missed validation opportunity.
Where conversion fidelity fails
Conversion fidelity fails when the system can’t figure out which parts of your page are core content, when your structure doesn’t chunk cleanly, or when semantic relationships fail to survive format conversion.
The critical downstream consequence that I believe almost everyone is missing: indexing and annotation are separate processes.
A page can be indexed but poorly annotated (stored but semantically misclassified). I’ve watched it happen in our database: a page is indexed, it’s recruited by the algorithmic trinity, and yet the entity still gets misrepresented in AI responses because the annotation was wrong.
The page was there. The system read it. But it read a degraded version (rendering fidelity loss at Gate 3, conversion fidelity loss at Gate 4) and filed it in the wrong drawer (annotation failure at Gate 5).
Processing investment: Crawl budget was only the beginning
The industry built an entire sub-discipline around crawl budget. That’s important, but once you break the pipeline into its five DSCRI gates, you see that it’s just one piece of a larger set of parameters: every gate consumes computational resources, and the system allocates those resources based on expected return. This is my generalization of a principle Canel confirmed at the crawl level.
Gate
Budget type
What the system asks
1 (Selected)
Crawl budget
“Is this URL a candidate for fetching?”
2 (Crawled)
Fetch budget
“Is this URL worth fetching?”
3 (Rendered)
Render budget
“Is this page a candidate for rendering?”
4 (Indexed)
Chunking/conversion budget
“Is this content worth carefully decomposing?”
5 (Annotated)
Annotation budget
“Is this content worth classifying across all dimensions?”
Each budget is governed by multiple factors:
Publisher entity authority (overall trust).
Topical authority (trust in the specific topic the content addresses).
Technical complexity.
The system’s own ROI calculation against everything else competing for the same resource.
The system isn’t just deciding whether to process but how much to invest. The bot may crawl you but render cheaply, render fully but chunk lazily, or chunk carefully but annotate shallowly (fewer dimensions). Degradation can occur at any gate, and the crawl budget is just one example of a general principle.
Structured data: The native language of the infrastructure gates
The SEO industry’s misconceptions about structured data run the full spectrum:
The magic bullet camp that treats schema as the only thing they need.
The sticky plaster camp that applies markup to broken pages, hoping it compensates for what the content fails to communicate.
The ignore-it-entirely camp that finds it too complicated or simply doesn’t believe it moves the needle.
None of those positions is quite right.
Structured data isn’t necessary. The system can — and does — classify content without it. But it’s helpful in the same way the meta description is: it confirms what the system already suspects, reduces ambiguity, and builds confidence.
The catch, also like the meta description, is that it only works if it’s consistent with the page. Schema that contradicts the content doesn’t just fail to help: it introduces a conflict the system has to resolve, and the resolution rarely favors the markup.
When the bot crawls your page, structured data requires no rendering, interpretation, or language model to extract meaning. It arrives in the format the system already speaks: explicit entity declarations, typed relationships, and canonical identifiers.
In my model, this makes structured data the lowest-friction input the system processes, and I believe it’s processed before unstructured content because it’s machine-readable by design. Semantic HTML tells the system which parts carry the primary semantic load, and semantic structure is what survives the strip-and-chunk process best because it maps directly to the internal representation.
Schema at indexing works the same way: instead of requiring the annotation system to infer entity associations and content types from unstructured text, schema declares them explicitly, like a meta description confirming what the page summary already suggested.
The system compares, finds consistency, and confidence rises. The entire pipeline is a confidence preservation exercise: pass each gate and carry as much confidence forward as possible. Schema is one of the cleaner tools for protecting that confidence through the infrastructure phase.
That said, Canel noted that Microsoft has reduced its reliance on schema. The reasons are worth understanding:
Schema is often poorly written.
It has attracted spam at a scale reminiscent of keyword stuffing 25 years ago.
Small language models are increasingly reliable at inferring what schema used to need to declare explicitly.
Schema’s value isn’t disappearing, but it’s shifting: the signal matters most where the system’s own inference is weakest, and least where the content is already clean, well-structured, and unambiguous.
Schema and HTML5 have been part of my work since 2015, and I’ve written extensively about them over the years. But I’ve always seen structured data as one tool among many for educating the algorithms, not the answer in itself. That distinction matters enormously.
Brand is the key, and for me, always has been.
Without brand, all the structured data in the world won’t save you. The system needs to know who you are before it can make sense of what you’re telling it about yourself.
Schema describes the entity and brand establishes that the entity is worth describing. Get that order wrong, and you’re decorating a house the system hasn’t decided to visit yet.
The practical reframe: structured data implementation belongs in the infrastructure audit, and it’s the format that makes feeds and agent data possible in the first place. But it’s a confirmation layer, not a foundation, and the system will trust its own reading over yours if the two diverge.
Why improve infrastructure when you can skip them entirely?
The multiplicative nature of the pipeline means the same logic that makes your weakest gate your biggest problem also makes gate-skipping your biggest opportunity.
If every gate attenuates confidence, removing a gate entirely doesn’t just save you from one failure mode: it removes that gate’s attenuation from the equation permanently.
To make that concrete, here’s what the math looks like across seven approaches. The base case assumes 70% confidence at every gate, producing a 16.8% surviving signal across all five in DSCRI. Where an approach improves a gate, I’ve used 75% as the illustrative uplift.
These are invented numbers, not measurements. The point is the relative improvement, not the figures themselves.
Approach
What changes
Entering ARGDW with
Pull (crawl)
Nothing
16.8%
Schema markup
I → 75%
18.0%
WebMCP
R skipped
24.0%
IndexNow
D skipped, S → 75%
25.7%
IndexNow + WebMCP
D skipped, S → 75%, R skipped
36.8%
Feed (Merchant Center, Product Feed)
D, S, C, R skipped
70.0%
MCP (direct agent data)
D, S, C, R, I skipped
100%
The infrastructure phase is pre-competitive. The annotation, recruited, grounded, displayed, and won (ARGDW) gates are where your content competes against every alternative the system has indexed. Competition is multiplicative too, so what you carry into annotation is what gets multiplied.
A brand that navigated all five DSCRI gates with 70% enters the competitive phase with 16.8% confidence intact. A brand on a feed enters with 70%. A brand on MCP enters with 100%. The competitive phase hasn’t started yet, and the gap is already that wide.
There’s an asymmetry worth naming here. Getting through a DSCRI gate with a strong score is largely within your control: the thresholds are technical, the failure modes are known, and the fixes have playbooks.
Getting through an ARGDW gate with a strong score depends on how you compare to all the alternatives in the system. The playbooks are less well developed, some don’t exist at all (annotation, for example), and you can’t control the comparison directly — you can only influence it.
Which means the confidence you carry into annotation is the only part of the competitive phase you can fully engineer in advance.
Optimizing your crawl path with schema, WebMCP, IndexNow, or combinations of all three will move the needle, and the table above shows by how much. But a feed or MCP connection changes what game you’re playing.
Every content type benefits from skipping gates, but the benefit scales with the business stakes at the end of the pipeline, and nothing has more at stake than content where the end goal is a commercial transaction.
The MCP figure represents the best case for the DSCRI phase: direct data availability bypasses all five infrastructure gates. In practice, the number of gates skipped depends on what the MCP connection provides and how the specific platform processes it. The principle holds: every gate skipped is an exclusion risk avoided and potential attenuation removed before competition starts.
A product feed is only the first rung. Andrea Volpini walked me through the full capability ladder for agent readiness:
A feed gives the system inventory presence (it knows what exists).
A search tool gives the agent catalog operability (it can search and filter without visiting the website).
An action endpoint tips the model from assistive to agentic — the agent doesn’t just recommend the transaction, it closes it.
That distinction is what I built AI assistive agent optimization (AAO) around: engineering the conditions for an agent to act on your behalf, not just mention you.
Volpini’s ladder makes the mechanic concrete: each rung skips more gates, removes more exclusion risk, and eliminates more potential attenuation before competition starts. A brand with all three is playing a different game from a brand that’s still waiting for a bot to crawl its product pages.
Note: Always keep this in mind when optimizing your site and content — make your content friction-free for bots and tasty for algorithms.
See the complete picture of your search visibility.
Track, optimize, and win in Google and AI search from one platform.
Start Free Trial
Get started with
DSCRI are absolute tests, ARGDW are competitive tests. The pivot is annotation.
Five gates. Five absolute tests. Pass or fail (and a degrading signal even on pass).
The solutions are well documented:
Discovery failures fixed with sitemaps and IndexNow.
Selection failures with pruning and entity signal clarity.
Crawling failures with server configuration.
Rendering failures with server-side rendering or the new pathways that bypass the problem entirely.
Indexing failures with semantic HTML, canonical management, and structured data.
The infrastructure phase is the only phase with a playbook, and opportunity cost is the cheapest failure pattern to fix.
But DSCRI is only half the pipeline, and it’s the easiest to deal with.
After indexing, the scoreboard turns on. The five competitive gates (ARGDW) are competitive tests: your content doesn’t just need to pass, it needs to beat the competition. What your content carries into the kickoff stage of those competitive gates is what survived DSCRI. And the entry gate to ARGDW is annotation.
The next piece opens annotation: the gate the industry has barely begun to address. It’s where the system attaches sticky notes to your indexed content across 24+ dimensions, and every algorithm in the ARGDW phase uses those notes to decide what your content means, who it’s for, and whether it deserves to be recruited, grounded, displayed, and recommended.
Those sticky notes are the be-all and end-all of your competitive position, and almost nobody knows they exist.
In “How the Bing Q&A / Featured Snippet Algorithm Works,” in a section I titled “Annotations are key,” I explained what Ali Alvi told me on my podcast, “Fabrice and his team do some really amazing work that we actually absolutely rely on.”
He went further: without Canel’s annotations, Bing couldn’t build the algos to generate Q&A at all. A senior Microsoft engineer, on the record, in plain language.
The evidence trail has been there for six years. That, for me, makes annotation the biggest untapped opportunity in search, assistive, and agential optimization right now.
This is the third piece in my AI authority series.
https://i0.wp.com/dubadosolutions.com/wp-content/uploads/2026/03/Information-loss-through-infrastructure-gates-tj3RKq.png?fit=1600%2C1500&ssl=115001600http://dubadosolutions.com/wp-content/uploads/2017/05/dubado-logo-1.png2026-03-10 15:00:002026-03-10 15:00:00The five infrastructure gates behind crawl, render, and index