SerpApi is asking a federal court to dismiss Google’s lawsuit, arguing the company is misusing copyright law to restrict access to public search results.
The motion was filed Feb. 20, according to a blog post by SerpApi CEO and founder Julien Khaleghy.
Google sued SerpApi in December, alleging it bypassed technical protections to scrape and resell content from Google Search.
The details: SerpApi argues Google is improperly invoking the Digital Millennium Copyright Act (DMCA). According to Khaleghy:
The DMCA protects copyrighted works, not websites or ad businesses.
Google doesn’t own the underlying content displayed in search results.
Accessing publicly visible pages isn’t “circumvention” under the statute.
Google’s complaint alleged SerpApi:
Circumvented bot-detection and crawling controls.
Used rotating bot identities and large bot networks.
Scraped licensed content from Search features, including images and real-time data.
SerpApi said it doesn’t decrypt systems, disable authentication, or access private data. Khaleghy said SerpApi retrieves the same information available to any user in a browser, without requiring a login.
Khaleghy also argued Google admitted its anti-bot systems protect its advertising business — not specific copyrighted works — which he said undermines the DMCA claim.
SerpApi cites the Ninth Circuit’s hiQ v. LinkedIn decision warning against “information monopolies” over public data. It also cites the Sixth Circuit’s Impression Products v. Lexmark ruling to argue that public-facing content can’t be shielded by technical measures alone.
Catch up quick: The lawsuit follows months of escalating legal fights over scraping and AI data use.
Oct. 22:Reddit sued SerpApi, Perplexity, Oxylabs, and AWMProxy in federal court, alleging they scraped Reddit content indirectly from Google Search and reused or resold it. Reddit claimed the companies hid their identities and scraped at “industrial scale.” Reddit said it set a “trap” post visible only to Google’s crawler that later appeared in Perplexity results. Reddit is seeking damages and a ban on further use of previously scraped data.
Dec. 19:Google sued SerpApi, alleging it bypassed security protections, ignored crawling directives, and scraped licensed Search content for resale. SerpApi responded that it operates lawfully and that accessing public search data is protected by the First Amendment.
By the numbers: SerpApi claims that, under Google’s interpretation of the DMCA, statutory damages could theoretically total $7.06 trillion — a figure it said exceeds U.S. GDP. The number reflects SerpApi’s calculation of potential per-violation penalties, not an actual damages demand.
What’s next. The case now moves to the court’s decision on whether Google’s claims can proceed.
Why we care: The outcome could reshape how SEO platforms, AI tools, and competitive intelligence software access SERP data. A win for Google could make third-party search data harder or riskier to obtain. A win for SerpApi could strengthen arguments that publicly accessible search results can be scraped and collected.
https://i0.wp.com/dubadosolutions.com/wp-content/uploads/2021/12/web-design-creative-services.jpg?fit=1500%2C600&ssl=16001500http://dubadosolutions.com/wp-content/uploads/2017/05/dubado-logo-1.png2026-02-23 14:41:132026-02-23 14:41:13SerpApi moves to dismiss Google scraping lawsuit
Wikipedia is a fascinating experiment. It’s a community-built encyclopedia that’s always in motion. It runs on volunteer energy and openly shared infrastructure, and it’s closer to an open-source project in how it’s built than a traditional encyclopedia book. Anyone can write, edit, and debate what belongs on a page.
And that’s the twist. The “truth” on Wikipedia isn’t handed down by a single editor or community member. It’s negotiated in public, guided by community standards, citations, and a whole lot of conversation. Contributors don’t so much control a subject’s story as they continually test it. They’re constantly asking questions: What can we verify? What deserves weight? What’s missing?
When you read a Wikipedia article, you’re seeing a current snapshot of a living, evolving community decision.
This whole experiment has scale, too. As of February 6, 2026, the English Wikipedia had 7.13 million articles, and the project spanned more than 340 languages.
If you’re thinking about creating a Wikipedia page for your company, it helps to know what you’re signing up for. Wikipedia isn’t a marketing channel, and it isn’t designed for companies to shape their narrative.
It’s designed to summarize what independent, reliable sources have already said about a company, so not every organization qualifies for a stand-alone article. Wikipedia cautions that only a small percentage of organizations meet the requirements for an article in the first place.
The easiest way to orient yourself with the platform is to keep Wikipedia’s “five pillars” top of mind. Wikipedia is, first and foremost, an encyclopedia. It aims for a neutral point of view, the content is free for anyone to use and edit, editors are expected to be civil, and there are no hard-and-fast rules. It’s just policies and guidelines applied with unbiased judgment.
If your company is genuinely notable by Wikipedia’s standards and you’re willing to play by its guidelines, there’s a real visibility upside in a solid, well-sourced page that holds up over time.
Key Takeaways
Wikipedia isn’t for marketing. If a Wikipedia page reads like company positioning, a feature brochure, or a pricing page, it’ll get rejected, reverted, or flagged. Even if other company pages “get away with it,” you need to focus on creating a deeply researched, informative draft to give strong notability in Wikipedia’s eyes.
Notability = independent coverage. You need multiple strong secondary sources (real reporting with editorial standards). Press releases, paid placements, niche trade mentions, and contributor “interviews” don’t hold up.
Sources drive the outline (and the page). Build your outline from what your credible secondary sources already cover. Possible sections could include a lead, history, high-level operations, leadership, or controversies, if documented. Each company’s outline may look different depending on what information can be strongly sourced. If you can’t source a section cleanly, it doesn’t belong.
Use Wikipedia’s Articles for Creation (AfC) process to avoid conflict of interest (COI) roadblocks. If you’re connected to a company or paid to write a Wikipedia page for them, you must disclose it and lean on the AfC process instead of directly pushing a company page live.
Getting published isn’t the finish line. Volunteers continuously review pages. Expect ongoing edits, scrutiny, and occasional challenges, so monitor a live page and keep it updated with strong, independent citations.
What Are the Benefits of Creating a Wikipedia Page?
The most significant benefit of Wikipedia is its sheer size and reach. It is one of the most visited websites in the world, averaging more than 1.1 billion unique visitors per month.
In addition to the size of its audience, the platform offers other benefits to marketers and company owners:
Credibility via independent validation (earned, not claimed): A live Wikipedia page signals that reliable, third-party sources have covered your organization in a meaningful way. For journalists, partners, investors, and enterprise buyers, this can reduce skepticism during research.
Search and AI visibility (off-page, long-term): Wikipedia tends to surface prominently in search results and is commonly referenced by knowledge systems. A well-sourced page can support progress in how your company appears in search features, AI overviews (AIOs), and large language model (LLM) output, based on what independent sources say, not what a company wants to say.
A neutral orientation page for readers: Wikipedia’s format helps readers quickly understand a company’s basics, including history, products or services, leadership, milestones, and context. The tradeoff is accessible neutrality. Anything included needs support from reliable secondary sources, and promotional language rarely lasts.
Clarity and disambiguation: If your name overlaps with other companies, or your story includes mergers, rebrands, or multiple founders, Wikipedia can help people land on the right entity and timeline.
A durable reference hub: A good Wikipedia page often becomes a stable directory of the strongest independent sources about you, such as press, books, and other reputable coverage, so readers can verify details without relying on your website alone.
Consistency across the web (a quiet multiplier): Wikipedia and related knowledge sources are reused in many downstream places. When the facts are clean, cited, and consistent, it can improve how your company is represented across third-party profiles and information panels over time.
A Wikipedia page is rarely a conversion engine, and it isn’t a place to “own” your story. The value is credibility and discoverability that can compound, but benefits can vary based on the strength of independent coverage and ongoing community scrutiny.
Below, we’ll cover the 10 steps on how to create a Wikipedia page, as well as considerations to keep in mind.
1. Check to See If Your Company is a Good Fit for a Wikipedia Page
Before you think about how to create a Wikipedia page for your company, you need to answer one question:
Would Wikipedia editors consider your company “notable”?
On Wikipedia, “notability” has nothing to do with how compelling your company story is. It means there’s enough independent, reliable coverage about your company that an article can be written from what third parties have already published, without filling in gaps with interpretation, insider knowledge, or marketing claims.
This is also where a lot of brand teams get tripped up. Again, Wikipedia isn’t a marketing channel. It’s not a place to shape messaging or control a narrative. If the only story you can tell is the one you want to tell, the page will be declined during initial submission review or deleted later.
What Notability Actually Looks Like
A company is usually considered notable when it receives significant coverage in multiple reliable sources independent of the company. “Significant coverage” is the key phrase here. Editors are looking for articles that discuss your company in real depth, not quick mentions or short blurbs.
A helpful way to think about it is this: if you can’t outline a neutral article using independent secondary sources alone, you probably don’t have enough notability yet.
Editors typically want coverage that checks these boxes:
Independent: Truly third-party reporting. Not press releases, paid placements, sponsored posts, advertorials, partner blogs, or content your PR team arranged. If a piece exists because the company made it happen, editors tend to discount it.
Significant: More than a passing mention. A funding announcement, product launch blurb, or event listing can be real coverage and still not be enough. The strongest sources are the ones that explain context, impact, history, or controversy in detail.
Secondary: Sources that analyze, summarize, or report on the company from the outside. Primary sources like your website, blog, press page, or social channels can support basic facts in limited cases, but they do not establish notability.
Reliable: Publications with editorial oversight and a reputation for accuracy. Big-name outlets can help, but they are not the only option. Trade and industry publications can be excellent sources when they have real editorial standards and provide in-depth coverage, but you can rarely use them to establish notability.
Multiple and sustained: A single great source is rarely enough on its own. Editors want to see more than one strong source, ideally across time, so the page can hold up after more people review it.
Neutral tone: Even when a source is independent, it can still be weak if it reads like promotion. Glowing profiles, “thought leadership” posts, or contributor content that feels like marketing often carry less weight than staff-reported coverage.
One nuance that matters a lot in practice is that “lots of links” does not equal notability. Companies can appear all over the internet through routine announcements and PR-driven writeups and still fail Wikipedia’s notability test.
What matters is whether independent sources have treated the company as worthy of real, substantive coverage. This also means that magazines and trade publications can’t work as reliable coverage to establish notability. Many industry leaders also run trade organizations, creating a conflict of interest (COI, in Wikipedia’s terms) if their trade publication were to cover their own company or the companies of friends or contributors.
If your company does not meet this bar yet, that’s not a judgment on it. It just means a Wikipedia article is likely premature, and the better move is to wait until there is enough independent coverage to support a neutral, well-sourced page.
A Note on Conflict of Interest (COI)
If you’re writing about your own company (or you’re paid to write for a company), Wikipedia considers that a conflict of interest (COI). That doesn’t automatically ban you from participating, but it does change how you should approach it.
When creating a new page, submit it to Articles for Creation (AfC) to ensure community editors review it properly.
When editing an existing page, you want to create your edits in a Sandbox draft (the Sandbox is a personal workspace where you can safely draft and refine changes to an article before submitting them for public review). Then, you submit that Sandbox draft onto the live Wikipedia page’s Talk page, along with a comment that asks community members to review and collaborate on the edits you suggested. Once a community consensus is reached, you can push those edits or additions live.
It’s also a good idea to disclose your COI connection. Your disclosure should be one of the following:
A statement on your User page.
A statement on the Talk page accompanying any paid contributions.
A statement in the edit summary accompanying any paid contributions.
Avoid directly creating or heavily editing an article and stick to Wikipedia’s COI process to request edits for independent editors to review.
Again, this is about expectations. If your team is hoping to just write a draft and hit “publish,” like you do with a blog, you’re going to have a bad time. But if you do have strong, independent coverage from credible outlets, you’ve got a real shot and can move to the next step.
2. Create a Wikipedia Account
Creating an account is a practical next step if you plan to contribute to Wikipedia. While you don’t need an account to read Wikipedia (or even to edit some pages), registering gives you features that make collaboration and transparency easier.
With an account, you can:
Create a User page (a simple profile and a place to draft in a Sandbox).
Use your Talk page to communicate with other editors.
Build an edit history tied to your username (helpful for credibility and continuity).
Work through article creation more smoothly, including drafting and submitting via AfC.
If you add images to your User page, make sure they’re properly licensed. Wikipedia generally accepts only freely licensed uploads.
After that, you’re set up to start editing, drafting, and participating in the community.
3. Contribute to Existing Pages
Quick reminder from earlier: If you’re connected to the company, you’re dealing with a COI. That’s why Wikipedia prefers that company pages undergo independent review before publication.
As a newbie, a good way to get comfortable on Wikipedia is to start by editing existing articles that have nothing to do with your organization. When you spend time improving clarity, tightening wording, and backing up facts with solid sources, you learn how Wikipedia works, and you build a history of helpful contributions.
As you do that, your account may become autoconfirmed. That usually happens automatically after your account has been around for more than four days and you’ve made at least 10 edits to Wikipedia pages that need them. Autoconfirmed status primarily grants a few basic permissions, such as creating pages and editing some semi-protected articles.
Here’s the key point, though: “Autoconfirmed” does not change your COI situation. Even if you can technically publish a page directly, a company-related article should still be written as a draft and submitted through AfC. This is the step that gets you the independent review Wikipedia expects, and it’s the safest, most appropriate route for a company page.
4. Conduct Research and Gather Sources
Before you write a single line of your Wikipedia draft, do the homework. Wikipedia doesn’t prioritize non-source-backed storytelling. The platform only cares about verifiability, meaning every meaningful claim must be backed by a reliable secondary source that an editor can check. Your company story could play well on Wikipedia, as long as there’s enough reliable evidence to back it up.
This is where most company pages fall apart. Not because the company isn’t real, but because the sources are thin, biased, or too “inside baseball.”
Why sources matter so much on Wikipedia
Wikipedia runs on two big rules:
No original research: You can’t “introduce” new facts, even if they’re true, without proper citation. Which leads to the next point…
Cite everything that matters: If it’s notable, controversial, or specific (revenue, awards, history, key dates, acquisitions), you need a secondary source to back it up.
Primary vs. secondary vs. tertiary sources (and how Wikipedia treats them)
Wikipedia breaks sources down into three categories: primary, secondary, and tertiary. Here is a look at each and how they play into the strength of your Wiki page:
Primary sources (you): Your website, press releases, investor decks, published reports, filings (e.g., Securities Exchange Commission (SEC), etc.).
Upside: Can work for basic, factual details (launch dates, historical milestones, etc.).
Downside: Biased by default. Editors won’t accept these for “notability” or big claims like “industry leader.”
Upside: Useful for quick confirmation and context.
Downside: Often too shallow to prove notability on their own.
Overall, secondary sources are the most important to your success. By their nature, these sources are pivotal in helping you summarize what experts think about a company or topic in Wikipedia’s voice. Relying heavily on these gives you a really strong case for notability in Wikipedia’s eyes.
What Makes a Good Wikipedia Source?
Good Wikipedia sources cover topics while maintaining editorial standards. Think major publications, local newspapers of record, respected business outlets, and independent industry analysis. If you’re short on that kind of coverage, that’s usually a PR problem, not a Wikipedia problem. Strengthening your digital PR (DPR) efforts can help you earn credible mentions that hold up under editor scrutiny.
But DPR for a Wikipedia use case must be handled carefully. What tends to work is focusing on independent coverage first. This looks like pitching credible story angles to journalists and outlets that genuinely cover your industry, and accepting that they may say no, or cover the story in a way you can’t control.
When an outlet does publish real, editorial reporting, that’s the kind of secondary source Wikipedia editors are more likely to accept.
Reliable Sources at a Glance
After seeing what Wiki editors consider reliable sources, you might be wondering where you even find sources that hit all their criteria. It helps to look at real-world use cases of which sources are best for your company. Here are some of the types of sites you can choose from.
For company pages, the sources that matter most are the ones that provide significant, independent coverage; the kind that demonstrates notability and gives editors something substantial to cite.
Major national/international newsrooms (strongest for notability + facts): Reuters, AP, BBC, Financial Times, The Wall Street Journal, Bloomberg, The New York Times, The Washington Post, NPR (news reporting over opinion).
Reputable business and investigative reporting: Deep dives and investigations from established outlets (e.g., ProPublica) can be highly valuable, especially for controversies, legal issues, and accountability reporting.
High-quality trade press with editorial oversight (context-dependent): Useful for industry coverage when it’s independent and more than a product announcement or reposted PR. You cannot use trade press as a primary indicator of notability, though.
Books from reputable publishers: Especially helpful for founders, company history, and industry impact when written by independent authors and published by established presses.
Government and major non-governmental organization (NGO) reports (within remit): Strong for regulatory actions, enforcement, public contracts, or formal assessments (but not a substitute for independent secondary coverage).
Medical/health claims (only when relevant): For biomedical statements, prioritize high-quality secondary sources like systematic reviews and authoritative guidelines (MEDRS standard), not individual studies or marketing claims.
Check out Wikipedia’s Perennial Sources list to see which sources have a good community track record because they all meet a high level of fact-checking and editorial standards. But remember, the sources featured in this list are still contextual; it’s not a whitelist.
Non-reliable Sources
To paint a clearer picture, here are some of the sources you should avoid:
Self-published/user-generated content (UGC): Personal blogs, Substack/Medium posts, self-hosted sites, most social media.
Press releases/advertorial: Company press rooms, PR wires; these are fine to state that an announcement occurred, not to establish third-party facts or notability.
Sensational/tabloid sources: Outlets known for gossip/sensationalism; poor for verifying facts.
Anonymous forums and crowdsourced threads: Message boards, comment sections, most Reddit/4chan/Discord posts.
Wikipedia views these types of sources as weaker because they aren’t research-backed, trustworthy, or credible. The common thread is that they undergo minimal editorial oversight (if any) or, in Reddit’s case, most of the content is UGC and self-published.
5. Research Your Competition
Like many things when it comes to Wikipedia, researching your competitors is fine if you do it the right way. As you start your research, view your competitors’ pages through the lens of what Wikipedia editors ultimately want.
The challenge here is that Wikipedia isn’t perfectly consistent. Some company pages are old, lightly monitored, or haven’t been updated to match today’s standards.
When someone says, “But other pages include feature lists and product tier breakdowns,” that doesn’t really matter. Editors don’t treat “other pages do it” as a justification. They judge your page on whether it reads like an encyclopedia entry and whether it’s backed by independent, reliable sources.
General Competitor Research Rules
Use competing Wiki pages to answer questions like:
What’s the typical structure for a company page in your category? Take note of the typical section titles. (We’ll dive into this next.)
What kind of claims survive without getting reverted? (Neutral, sourced, non-promotional.)
What sources are doing the heavy lifting on pages that stay live?
A “Wiki-safe” Research Method
Pick 3–5 competitors with live pages, then audit them like an editor would:
Scan the citations first. Are they mostly independent, secondary news coverage, press releases/company sites, or paid placements?
Check the tone. If it reads like a promotional brochure (feature-by-feature, pricing tiers, “best-in-class”), that’s a red flag, even if it hasn’t been removed yet.
Look at the page history and Talk page. Lots of reverts, banners, or sourcing disputes usually mean the page is shaky.
Note what’s missing. If competitors avoid detailed feature lists, that’s usually a sign that those details don’t belong on Wikipedia.
6. Create an Outline
Once you’ve got your sources, your outline has a starting point. The hard part is deciding what belongs.
On Wikipedia, an outline is not “everything you want to say.” It’s you making careful decisions about what independent, reliable sources have actually covered, what they have not covered, and what deserves space without turning the page into a brochure. That takes judgment, and it often takes multiple passes.
The mindset you want is simple: Wikipedia pages are built around what reliable secondary sources already said about the subject. Your outline is how you organize those sourced facts into a structure that editors recognize and are willing to review.
Infobox (quick facts): Founded, founders, headquarters, industry, key people, website, and similar basics. Only include items you can verify.
Lead (opening summary): 2–4 neutral sentences explaining what the company is, where it’s based, what it does at a high level, and why it’s notable. This is not a tagline.
History: Founding and major milestones, expansions, acquisitions, funding or IPO, only if independent sources cover them, and major pivots. Focus on events that third parties actually reported.
Operations/Business (optional, and only if sourced): What the company does at a high level and what markets it serves. Avoid feature-by-feature descriptions and pricing tiers.
Leadership/Ownership (optional): Only if reliable sources discuss executives, ownership changes, or governance in a meaningful way.
Reception/Controversies (only if they exist in sources): Reviews, notable criticism, legal issues, regulatory actions, all written neutrally and backed by sources.
See also / References / External links: References do the heavy lifting; external links are usually minimal (often just the official site).
Using Your Sources to Build the Outline
Start with your strongest independent secondary sources and work outward. As you read through them, you’re identifying what the coverage actually emphasizes.
As you review sources, pull out:
Events they cover (those become history sections)
Claims they support (those become lead and operations sections)
Any recurring themes across sources (those become section headings)
Each major section in your outline should be supported by multiple secondary sources, not a single mention. Also, keep an eye on the length as you draft. Wikipedia discourages overly long articles unless the amount of independent coverage truly warrants it. If a section or topic isn’t discussed in depth by reliable secondary sources, it usually doesn’t belong at length in the article.
If you focus on covering the topic from an encyclopedic angle and you leave out anything that feels like marketing, you will give your draft a much better chance of surviving review.
7. Write a Draft of Your Wikipedia Page
Take your time as you write a draft of your Wikipedia page from your outline. You want your content to be source-backed, thorough, thoughtful, and genuinely useful, giving readers the information they came for.
At this stage, it’s best to write your draft in a Wikipedia Sandbox. As mentioned earlier, this is a personal workspace where you can draft safely, revise freely, and share the link with others for informal feedback without accidentally publishing anything live.
While a Wikipedia page can support your broader visibility, the platform’s purpose is encyclopedic and impartial. Anything that reads as emotional, salesy, or promotional is likely to be flagged and can lead to rejection later in the process.
Aim for short, direct sentences that stick to verifiable facts. And those facts need strong secondary sources. For example, if you write, “Spot ran to the big oak tree yesterday,” that claim would need a source. Not just any source, but a credible, independent secondary source that Wikipedia considers reliable.
It’s also critical to remember you’re writing on behalf of Wikipedia. Aka, you’re writing in Wikipedia’s unbiased, impartial, and neutral voice.
Here are some examples to show what this looks like in practice:
Example 1: Product Description
Promotional: “XYZ Software is a revolutionary, industry-leading platform that empowers businesses to achieve unprecedented productivity gains. With its cutting-edge AI technology and intuitive interface, XYZ transforms the way teams collaborate, delivering exceptional results that exceed expectations.“
Neutral: “XYZ Software is a project management platform that combines task tracking, team messaging, and file sharing. The software is used by businesses to coordinate work across departments.[1][2]“
Example 2: Company History
Promotional: “Founded by visionary entrepreneur Jane Smith, the company quickly rose to prominence as a game-changer in the industry. Through relentless innovation and unwavering commitment to excellence, it has become the trusted choice for Fortune 500 companies worldwide.“
Neutral: “The company was founded in 2015 by Jane Smith in Seattle.[3] It launched its enterprise tier in 2019 and rebranded from “TaskFlow” to its current name in 2021.[4][5]“
Wikipedia also defines “promotional” language differently. It’s more than simply using words like “revolutionary” or “legendary.” Factually correct statements can still be considered “promotional” in a Wikipedia editor’s eyes if they meet certain structure and emphasis criteria:
Long, comprehensive feature inventories.
Plan/tier breakdowns that resemble packaging (“Free vs. Premium vs. Enterprise”).
Performance claims that read like sales positioning.
Details that feel like purchase guidance (pricing, quotas, storage limits, admin entitlements).
Let’s talk about specs and features for a second. If your company is well-known for a particular product or service, it can be tempting to include a specification or feature list on your Wikipedia page. Unfortunately, that can cause problems with Wikipedia for several reasons.
Here’s why:
Wikipedia isn’t a manual or catalog: Wikipedia tries to avoid becoming vendor documentation. Specs and feature matrices belong on the company site, in the documentation center, in release notes, or on third-party comparison sites, not in an encyclopedia.
Specs change constantly:Feature sets, tiers, storage limits, and admin/security capabilities change frequently. Wikipedia content must remain stable and verifiable over time. Highly granular spec content becomes outdated quickly and attracts disputes.
It’s hard to verify neutrally:If the only source for a feature or tier is the vendor’s own site or press release, Wikipedia considers that primary sourcing; useful for limited factual verification, but not ideal for describing capabilities in detail or making value claims.
“Undue weight” and imbalance:Even accurate feature lists can give a product more prominence than independent sources do. Wikipedia tries to reflect external coverage: if reliable third parties don’t treat a feature as notable, Wikipedia typically won’t either.
What a Company’s Wikipedia Draft Should Look Like
Much like sourcing, it’s hard to imagine what an acceptable draft should look like, given all of Wikipedia’s guidelines. Here’s a brief rundown of what a solid draft should look like when you’re done:
A clear, high-level description of what a company is (one paragraph, not a feature catalog).
A history/timeline of major milestones (launches, renames, major releases) backed by independent sources.
Widely covered integrations/partnerships only when reported by reliable third parties.
A short, selective “features” summary only for capabilities that independent sources treat as notable and cover in-depth.
8. Upload Your Page into the Article Wizard
Once your Sandbox draft is in good shape, move over to the Wikipedia Article Wizard. The Wizard is the guided tool that helps you move what you wrote from your Sandbox into Wikipedia’s Draft space, which is where new articles are typically prepared before they go live.
For company-related pages, the key takeaway is that the Wizard is the structured path to getting your draft into the right place so it can be submitted for independent review.
9. Submit Your Article for Review
Now that your draft is in Draft space, you’re ready for the step that triggers formal evaluation by the community. Submit your draft through Articles for Creation by clicking “Submit for review.” This is when your draft enters the AfC queue, and a volunteer reviewer takes a look.
The timeline can range from a few weeks to a few months, depending on backlog and whether the reviewer requests changes. It’s also common for drafts to be declined at first, with feedback you’ll need to address before approval.
At NPD, we’ve found that sticking with AfC is the best practice for companies lookingto go live. Even though autoconfirmed accounts may have the technical ability to publish directly, that path often creates more friction for company-related topics. AfC sets expectations for independent review from the start and helps reduce avoidable issues related to COI and other Wikipedia guidelines.
10. Continue Making Improvements
Once your page is accepted, the work is not really over.
Wikipedia is editable by anyone, so changes can happen at any time. Some edits will be helpful, some will be mistaken, and some may reflect a negative point of view. The best approach is to keep an eye on the page so you can understand what is changing and respond appropriately, usually by suggesting improvements on the Talk page or updating the article with strong, independent sourcing.
As the page gets more visibility and gains traction on Google and LLMs, focus on accuracy and neutrality rather than “updating marketing messaging.” Wikipedia is not the place for routine product updates, but it is the right place to reflect significant, well-covered developments when reliable third-party sources have written about them.
You should also plan for the possibility that your draft will be declined. That is common, especially for company-related topics. If it happens, do not get discouraged. Read the reviewer’s comments carefully, make the requested changes, and resubmit when you have addressed the specific issues that kept the draft from being accepted.
FAQs
Should I build a Wikipedia page for my company?
A Wikipedia page can be a meaningful credibility asset, but it isn’t a fit for every company. The deciding factor is whether there’s enough independent, reliable secondary coverage to support a neutral article. If you can’t outline the page using third-party sources alone, it’s usually too early.
If your company does qualify, the value tends to be indirect: stronger brand legitimacy, clearer “who you are” context in search results, and more consistent entity information across the web. It’s less about immediate conversions and more about long-term visibility and trust signals that can compound.
Yes. Creating, publishing, and maintaining a company page is challenging because Wikipedia is community-reviewed and built around strict expectations: neutral tone, verifiable claims, and high-quality sourcing. You also have to plan for ongoing edits and scrutiny after the page goes live.
The opportunity is achievable if you have strong independent coverage and treat the process as encyclopedic documentation rather than company messaging.
How do I know if my Wikipedia page will be published?
There’s no guaranteed way to know. Even well-prepared drafts can be declined, revised, and resubmitted, especially for company topics.
Your best indicators are practical: you have multiple independent sources with significant coverage, your draft reads neutrally (not like marketing), and you submit through the Articles for Creation (AfC) process so reviewers can evaluate it in draft space.
How long will my Wikipedia article be under review before publication?
Review time varies widely. Some drafts are reviewed quickly, but it’s also common for company-related submissions to take weeks (or longer) depending on backlog and how many revisions are needed. A decline doesn’t mean “never”; it usually means “not yet” or “needs stronger sourcing and a more neutral rewrite.”
Conclusion
If you’re looking to increase traffic, improve your search everywhere visibility, or build credibility, Wikipedia can be part of the equation. But it’s not a marketing channel, and it isn’t built for companies to shape their narratives. It’s a community-edited encyclopedia that summarizes what independent, reliable sources have already said about you.
Where Wikipedia can help is in discovery and trust signals. A stable, well-sourced page often shows up prominently for company and topic queries, and it can reinforce consistent “entity facts” that search engines and other knowledge systems use to understand companies.
That’s also why Wikipedia often pairs well with entity SEO. When key details about your organization are documented consistently across reputable sources, your company is easier to interpret and surface accurately across platforms, including some LLM-style experiences. Results may vary based on implementation, the strength of independent coverage, and ongoing community review.
As you evaluate whether your company is a good fit for a Wikipedia page, keep in mind that the process is complicated, and it won’t be fully in your control. What matters most is having enough independent, reliable secondary coverage to justify a stand-alone article and being willing to follow Wikipedia’s COI expectations.
http://dubadosolutions.com/wp-content/uploads/2017/05/dubado-logo-1.png00http://dubadosolutions.com/wp-content/uploads/2017/05/dubado-logo-1.png2026-02-20 20:00:002026-02-20 20:00:00How to Create a Wikipedia Page for Your Company
Search has changed, and so should your audience personas.
Your audience searches across Google, ChatGPT, Reddit, YouTube, and many other channels.
Knowing who they are isn’t enough anymore. You need to know how they search.
Search-focused audience personas fill gaps that traditional personas miss.
Think insights like:
Where this person actually goes for answers
What triggers them to look for solutions right now
Which proof points win their trust
And you don’t need months of research or expensive tools to build them.
An audience persona is a profile of who you’re creating for — what they need, how they search, and what makes them trust (or tune out). Done well, it aligns your team around a shared understanding of who you’re serving.
In this guide, I’ll walk you through nine strategic questions that dig deep into your persona’s search behavior. I’ve also included AI prompts to speed up your analysis.
They’ll help you spot patterns and synthesize findings without the manual work.
By the end, you’ll have a complete audience persona to guide your content strategy.
Free template: Download our audience persona template to document your insights. It includes a persona example for a fictional SaaS brand to guide you through the process.
1. Where Is Your Audience Asking Questions?
Answer this question to find out:
Where you need to build authority and presence
Which platforms to target for every persona
Which formats work well for each persona
Knowing where your persona hangs out tells you which channels influence their decisions.
So, you can show up in places they already trust.
It also reveals how they think and what will resonate with them.
For example, someone posting on Reddit wants honest advice based on lived experiences. But someone searching on TikTok wants visual content like tutorials or unboxing videos.
How to Answer This Question
Start with an audience intelligence tool that lets you identify your persona’s preferred platforms and communities.
I’ll be using SparkToro.
Note: Throughout this guide, I’ll walk you through this persona-building process using the example of Podlinko, a fictional podcasting software. You’ll see every step of the research in action, so you can replicate it for your own business.
For this example, we’re building out one of Podlinko’s core personas: Marcus, a marketing professional on a one-person or small team team, so he’s scrappy and in-the-weeds.
Pro tip: Start with one primary persona and build it completely before adding others. Focus on your most valuable customer segment (the one driving the highest revenue for your business).
In SparkToro, enter a relevant keyword that describes your persona’s professional identity or core interests.
This could be their job title, industry, or a topic they care deeply about.
I went with “how to start a podcast.” Marcus would likely search for this early in his journey.
The report gives a pretty solid overview of Marcus’s online behavior.
For example, Google, ChatGPT, YouTube, and Facebook are his primary research channels.
But it could be worth testing a few other platforms too.
Compared to the average user, he’s 24.66% more likely to use X and 12.92% more likely to use TikTok.
The report also tells me the specific YouTube channels where he spends time.
He’s watching automation, editing, and business tutorials.
He’s also active in multiple industry-related Reddit communities.
Maybe he’s posting, commenting, or even just lurking to read advice.
Since Marcus uses ChatGPT, I also did a quick search on this platform to see which sources the platform frequently cites.
I searched for some prompts he might ask, like “Which podcast hosting platforms should I use for marketing?”
If you see large language models (LLMs) repeatedly mention the same sources, they likely carry authority for the topic.
And by extension, they influence your persona’s research as well.
Compare these sources to the ones you identified earlier. If they match, you have validation.
If they’re different, assess which ones to add to your persona document.
Here’s how I filled out the persona template with Marcus’s search behavior:
2. What Exact Questions Are They Asking?
Answer this question to find out:
What language to mirror in your content
How to structure content for AI visibility
What content gaps exist in your market
Your buyer persona’s language rarely matches marketing jargon.
Companies might talk about “podcast production tools” and “integrated workflows.”
But personas use more personal and specific language:
What’s the cheapest way to record remote podcasts?
How long does it take to edit a 30-minute podcast?
Knowing your audience’s actual questions reveals the gap between how you describe your solution and how they experience the problem.
And shows you exactly how to bridge it.
How to Answer This Question
Start by going to the platforms and communities you identified in Question 1.
Search 3-5 topics related to your persona.
Review the context around headlines, posts, and comments:
How they phrase questions (exact words matter)
What emotions do they express
What outcomes they’re trying to achieve
Pro tip: As you research, save persona comments, discussions, and reviews in full — not just snippets. You’ll analyze the same sources in Questions 3-5. But through different lenses (challenges, triggers, language patterns). Having everything saved means you won’t need to revisit platforms multiple times.
For example, I searched “how to start a podcast for a business” on Google.
Then, I checked People Also Ask for related questions Marcus might have:
On YouTube, I searched “how to edit a podcast” and reviewed video comments.
Users asked follow-up questions about mic issues and screen sharing.
This gave me insight into language and questions beyond the video’s main topic.
In Facebook Groups, I found users asking questions related to their goals, constraints, and challenges.
It also provided the unfiltered language Marcus uses when he’s stuck.
Now, use a keyword research tool to visualize how your persona’s questions connect throughout their journey.
I used AlsoAsked for this task. But AnswerThePublic and Semrush’s Topic Research tool would also work.
For Marcus, I searched “Best AI podcasting editing software,” which revealed this path:
Which AI tool is best for audio editing? → Can I use AI to edit audio? → Which software do professionals use for audio editing? → How much does AI audio editor cost?
It’s helpful to visualize how Marcus’s questions change as he progresses through his search.
Next, learn the questions your persona asks in AI search.
It tells you the exact prompts people use when searching topics related to your brand.
(And if your brand appears in the answers.)
If you don’t have a subscription, sign up for a free trial of Semrush One, which includes the AI Visibility Toolkit and Semrush Pro.
Since Podlinko is fictional, I used a real podcasting platform (Zencastr.com) for this example.
This brand appears often in AI answers for user questions like:
What equipment do I need to create a professional podcast setup?
Can you recommend popular tools for managing and promoting online radio or podcasts?
You’ll also see citation gaps — questions where your brand isn’t mentioned. These reveal content opportunities.
For this brand, one gap includes:
“Which AI tools are best for recording, editing, and distributing an AI-focused podcast?”
After reviewing all the questions I gathered, I narrowed them down to the top 5 for the template:
3. What Challenges Influence Their Search Behavior?
Answer this question to find out:
What constraints influence their decision-making process
How to anticipate objections before they arise
What kind of solutions does your persona need
Challenges are the ongoing issues driving your persona’s search behavior. These overarching problems shape their decisions to find a solution.
Understanding these challenges can help you:
Position your solution in the context of these pain points
Anticipate and address objections before they come up
Structure your campaigns to speak directly to their limitations
How to Answer This Question
Review the questions you collected in Question 2 to identify underlying pain points.
For example, this Facebook Group post contains some telling language for Marcus’s persona:
Specific phrases highlight ongoing challenges:
“Tech support is no help”
Can’t find an editing software that consistently works”
Now, visit industry-specific review platforms.
Check G2, Capterra, Trustpilot, Amazon, Yelp, or another site, depending on your niche.
Look for reviews where people describe recurring frustrations.
Positive reviews may mention what drove a user to seek a new solution. For example, this one references poor audio and video quality:
Negative reviews reveal what users constantly struggle with.
Unresolved pain points often push people to find workarounds or alternatives.
This user noted issues with a podcasting tool, including loss of backups, unreliable tech, and more.
Pay close attention to the language people use. Word choice can signal underlying feelings and constraints.
When someone asks for the “easiest” and “most cost-effective” solution, they’re signaling:
Limited resources
Low confidence
Risk aversion
After reviewing conversations and communities, you’ll likely have dozens of data points.
Copy the reviews, questions, and phrases into an AI tool to identify your persona’s top challenges.
Use this prompt:
Based on these reviews and discussions, identify the five biggest challenges for this persona.
For each challenge, show:
(1) exact phrases they use to describe it
(2) what constraints make it harder (budget, time, skills)
(3) how it influences where and when they search.
Format as a table.
This analysis helped me identify Marcus’s recurring challenges:
4. What Triggers Them to Search Right Now?
Answer this question to find out:
What emotional and situational context should you address in your content
How to structure content for different urgency levels
Which pain points to lead with
Search triggers explain why your audience is ready to take action.
But they’re not the same as challenges.
Challenges are ongoing constraints your persona faces. This could be a limited budget, small team, or skill gap.
Triggers are the specific events or goals that push them to act right now. Like a looming deadline or a competitor launching a podcast.
Understanding triggers helps you reach your persona when they’re most receptive.
How to Answer This Question
If you have access to internal data, start there.
Your sales and customer support teams can spot patterns that push prospects from browsing to buying.
For example, your sales conversations might reveal that one of Marcus’s triggers is urgency. His manager might ask him to improve the sound quality by the next episode, prompting his search.
These spaces are where people describe the exact moments they decide to take action. Aka plateaus, milestones, and failed attempts.
When I searched “podcast marketing” on Reddit, I found a post from someone experiencing clear triggers:
This user has been unable to get a consistent flow of organic listeners despite high-quality content.
Trigger: A growth plateau that pushed him to ask for help.
He’s also trying to hit his first 1,000 listeners.
Trigger: A goal that pushed him to look for solutions.
If you collected a lot of content, upload it to an AI tool to quickly identify triggers.
Use this prompt:
Analyze these community posts and discussions. Identify the specific trigger moments that pushed people to actively search for solutions.
For each trigger, show:
The exact moment or event described (quote the language they use)
The type of trigger (situational, temporal, emotional, or goal-driven)
What action did they take as a result
Format as a table.
After analyzing the content I gathered, I identified the key triggers pushing Marcus to search:
5. What Language Resonates (and What Turns Them Off)?
Answer this question to find out:
Which messaging angles resonate
What tones build trust with your audience
Which phrases trigger objections or skepticism
The words you use can affect whether your persona trusts you or tunes out.
The right language makes people feel understood. The wrong language creates friction and drives them away.
When you know what resonates, you can create messaging that builds trust and motivates your personas to act.
How to Answer This Question
Refer back to your research from Questions 3 and 4.
This time, focus specifically on language patterns in reviews and community discussions.
Look at:
Exact phrases people use to describe success, relief, or satisfaction
Words highlighting frustration, disappointment, and concerns
For example, on Capterra, users praised podcasting platforms that “do a lot” and let them “distribute with ease.”
This language signals Marcus’s preference for all-in-one platforms.
He would likely connect with messaging that emphasizes functionality without complexity.
Next, review the content you previously gathered from community spaces.
In r/podcasting, users like Marcus write with direct, benefit-focused language:
Notice what he values: simplicity and concrete outcomes (“automatic transcripts”).
He’s not mentioning jargon like “AI-powered transcription engine” or “enterprise-grade recording infrastructure.”
Plain language that emphasizes quick results over technical capabilities works best with this persona.
Once you have enough data, use this LLM prompt to identify language patterns:
Analyze these customer reviews and community discussions I’ve shared. Identify:
Most common words and phrases people use to describe positive experiences
Most common words and phrases that signal frustration or concerns
Emotional undertones in how they describe problems and solutions
Create a table organizing these insights.
This analysis revealed the specific language that Marcus reacts to positively (and negatively).
6. What Content Types Do They Engage With Most?
Answer this question to find out:
Content types to prioritize in your content strategy
How to structure content for maximum engagement
What length and style work best for each format
Knowing the content types your audience prefers has multiple benefits.
It lets you create content that captures your persona’s attention and keeps them engaged.
Think about it: You could write the most comprehensive guide on podcast equipment.
But if your ideal customer prefers video reviews, they’ll scroll right past it.
How to Answer This Question
You identified your persona’s most-used platforms in Question 1. Now analyze which content formats perform best on each.
Conduct a few Google Searches to identify popular content types.
You’ll learn what users (and search engines) prefer for specific queries. Look at videos, written guides, infographics, carousels, podcasts, and more.
For example, when I search “how to set up podcast equipment,” the top results are a mix: long-form articles, video tutorials, and community discussions.
But you’ll ideally be able to validate them against real behavioral data.
If possible, survey recent customers to find concrete patterns about their search behavior.
Send a short survey to customers who converted in the last 90 days:
Where did you first hear about us?
Where do you go for advice about [primary pain points]?
What platforms do you use when researching [your product category]?
How do you prefer to learn about new solutions in your workflow?
Once responses come in, look for patterns in how each segment discovers, researches, and evaluates solutions.
Here’s a prompt you can use in an AI tool for faster analysis:
I surveyed recent customers about their search and discovery behavior.
Analyze this data and identify:
The top 3-5 platforms where customers discovered us or researched solutions
Common pain points or information needs they mentioned
Preferred content formats for learning about solutions
Any patterns in how different customer segments discover and evaluate us
Highlight the platforms and channels that appear most frequently, and flag any gaps between where customers search and where we currently have a presence.
Next, cross-reference your research against existing data in Google Analytics.
Open Google Analytics and navigate to Reports > Lifecycle > Acquisition > Traffic acquisition.
Sort by engagement rate or average session duration to see which channels drive genuinely engaged visitors.
Look for high time on site (2+ minutes) and multiple pages per session (3+).
Then, map each platform to the content format that performs best there.
Combine insights from Question 1 (preferred platforms) and Question 6 (preferred formats) to build your distribution strategy.
Here’s what this looks like for Marcus:
9. What Keeps This Persona Coming Back?
Answer this question to find out:
What product features or experiences to double down on
How to position your solution beyond initial use cases
What content to create for existing customers
Winning your audience’s attention once is easy. Earning it repeatedly is the real challenge.
Understanding what keeps your persona engaged is the key to getting them to return.
How to Answer This Question
Review all the audience persona insights you’ve gathered so far to identify recurring needs.
Look at triggers, pain points, content preferences, and community discussions.
Pinpoints problems that can’t be solved with a single article or resource.
This could include:
Tasks they do every week (editing, distribution, promotion)
Decisions they face with each piece of content (format, platform, messaging)
Skills they’re continuously learning (new tools, changing algorithms)
Friction points that slow them down every time
Then, outline the content types that repeatedly solve these problems.
Think tools, templates, checklists, and guides they’ll use repeatedly.
If you don’t want to do this manually, drop this prompt into an AI tool to synthesize your findings:
Based on my audience persona research, here’s what I’ve learned:
Questions they ask: [Paste top questions from Q2]
Challenges they face: [Paste challenges from Q3]
Triggers that push them to act: [Paste triggers from Q4]
Their preferred content types: [Paste formats from Q6]
Identify recurring problems they face repeatedly (not one-time issues).
Use it to guide your content creation, search strategy, and distribution efforts.
Your next move: Expand your visibility further with our guide to ranking in AI search. Our Seen & Trusted Framework will help you increase mentions, citations, and recommendations for your brand.
http://dubadosolutions.com/wp-content/uploads/2017/05/dubado-logo-1.png00http://dubadosolutions.com/wp-content/uploads/2017/05/dubado-logo-1.png2026-02-20 17:58:512026-02-20 17:58:51How to Build Audience Personas for Modern Search + Template
At just under 200 employees, Descript is not the biggest name in video editing software.
It’s not the most robust or the most popular, either.
But it’s punching way above its weight, competing with much bigger companies (like Adobe, and CapCut) in LLM search.
Using Semrush’s AI Visibility score, you can see that Descript is competing closely with giant brands like Adobe.
Descript found the way in.
And so can you.
In this SaaS LLM visibility case study, we’ll break down exactly how Descript is getting seen.
And more importantly, what you can copy to improve visibility for your own product.
Choosing Clear Niche Messaging
For years, Descript has been known as a podcast editing tool.
That matters.
Because when people talk about podcast editing, Descript comes up naturally.
In blog posts.
In forums.
And now, in AI answers.
This isn’t accidental. Descript is clear about who it’s for, and their content reflects that focus.
Their product pages and blog posts consistently speak to one core audience: people who want to edit podcasts easily.
Here’s why this matters:
When I asked Google’s AI Mode for the best software to edit podcasts — specifically as someone with no video editing skills — Descript was one of the first tools mentioned.
And what shows up second in the list of sources?
One of Descript’s own blog posts about podcast editing.
Across Descript’s own website and other third-party sources, this tool is regularly mentioned as ideal for podcasters.
This matters because of a key difference between AI search and traditional SEO.
LLMs don’t just surface pages. They based their answers on query fan-outs.
Here’s what that means: AI creates multiple searches after the original query, and tries to find an answer that is most directly matched to what was asked.
That’s why even articles and websites that aren’t ranking well in Google can still get cited by AI when they provide the most relevant, specific answer to what users are asking.
Because Descript’s content is tightly focused on one audience, one use case, one problem, it maps cleanly to those AI queries.
That doesn’t necessarily correlate to higher ranking in traditional search. In fact, Descript’s traffic from traditional SEO has been steadily decreasing since its peak in 2024:
But at the same time, branded traffic has increased.
So even while the brand isn’t succeeding in traditional search, more people are becoming aware of Descript and searching for the brand name specifically.
Why? In part, because the brand is known for exactly what it does: podcast editing.
AI knows that too. And I would bet that a higher amount of mentions in AI search is helping with brand recognition and influencing that increase in branded search traffic.
Here’s the point: Descript isn’t just checking off boxes of what to talk about.
The way they write — and the way they present their product — shows exactly who they’re speaking to. They match the way their audience talks.
Take the blog article on podcast editing that we mentioned above as an example.
The copy flows naturally, includes quotes from an internal expert in the way she describes the problem and solution, and speaks in an easy way that matches the tone of the audience.
As a byproduct of this natural way of writing and clear product position, their copy and content semantically matches what their audience is searching for.
And their AI mentions keep increasing.
Action Item: Identify and Focus on Your Niche Market
Effort vs. Impact: Medium effort. High impact.
If you’re trying to be all things to everyone, AI is less likely to recommend you for anything specific.
Instead, narrow your focus like Descript does:
Of course, you also want to find balance.
For example, “Podcast editing software for true crime hosts who only record on Thursdays,” may be a bit too niche.
To get the narrowest viable version of your core audience, look at your most successful customers.
Ask:
Who gets the most ROI from our product?
Who uses it weekly — or daily?
Which customers have become vocal advocates?
What do those users have in common? (Role, company size, industry, workflow)
That overlap is your niche.
Once that’s clear, your messaging gets easier.
You stop being an “All-in-one AI-powered platform for creators and teams.”
And start anchoring your product to a specific job: “Edit podcasts and spoken audio, without technical complexity.”
Then, your product becomes easier for AI systems to understand — and recommend — for specific use cases.
Once you’ve defined your niche, focus your content on what actually helps them.
Descript doesn’t target video editing professionals. So, they don’t show up in those searches.
They focus on content creators and podcasters. And their content reflects that.
To do the same:
Talk to people in your niche industry
Ask about their workflows, goals, and sticking points
Learn what slows them down
Pro tip: If you can’t speak directly to people in your audience or customer base, talk to your customer-facing teams. Customer success and sales teams have daily contact with your core audience. So, they’re in a better position to give you insights into what this audience cares about.
Online research also helps.
Find relevant subreddits to see what people are talking about. Check the comments section of relevant YouTube videos.
Look for recurring questions and complaints.
For example, the Descript team might peruse the r/podcasting subreddit to learn about their audience’s questions and opinions.
The goal: understanding.
When you deeply understand your audience’s day-to-day reality, creating helpful content becomes much easier.
And your content can become the source for AI answers.
Of course, getting citations back to your website isn’t the same as getting direct brand mentions. However, it’s still an opportunity to build awareness and authority.
Plus, building content around relevant core topics helps reinforce your niche messaging.
With image-processing models like contrastive language–image pre-training (CLIP,) AI systems can understand what’s happening inside screenshots and videos — not just the words around them.
And those visuals now show up directly in AI answers. Especially for SaaS product queries in tools like ChatGPT.
For example, when I search for “best CRM software for a small business,” the top AI result includes images of the actual product interface.
That’s a shift.
Highly polished mockups matter less. Real, in-product visuals matter more.
Which is why Descript shows up like this in ChatGPT:
Descript consistently shows real product images and videos across product pages, Help Center articles, and blog content.
These aren’t decorative.
They show:
What the product looks like
How features work
What users should expect when they log in
As a result, those same images and videos get pulled into AI answers — often with a link back to Descript’s site.
In this case, the link goes back to a very in-depth Help Center guide to getting started with podcast editing.
And most Interestingly, that’s a near-perfect semantic match to the original query.
Action Item: Include In-Product Images in Your Marketing Content
Effort vs. Impact: Low effort. Medium impact.
Start with the basics.
For every feature you highlight, ask one question: Can someone see this working?
Then act on it. Add real screenshots of your core product screens to key product pages. Replace abstract diagrams with in-product visuals where possible.
Next, expand beyond product pages.
Mention a feature in a blog post? Include a screenshot of it in use.
Explaining a workflow in a Help Center article? Show each step visually.
Teaching a process? Record a short screen capture instead of relying on text alone.
The goal is clarity.
Clear visuals help users understand your product faster. And they give AI systems concrete material to reuse in answers.
Which makes your product easier to recommend — and easier to recognize — inside AI search.
Creating Detailed MoFu/BoFu Content
Content mapped to different awareness levels performs especially well in AI search.
Descript understands this.
They don’t just publish top-of-funnel guides. They create content for product-aware and solution-aware searches, too.
When you search in ChatGPT for video creation or editing tools, Descript often appears in the results.
But more importantly, their own content is cited as a source.
In this example, the cited source is a Descript-owned “best of” article comparing video tools.
Instead of generic recommendations, the page:
Breaks tools down by specific use cases
Includes clear pros and cons
Explains who each option is best for
Descript follows this same pattern with multiple “best of” lists and comparison pages against their main competitors.
The payoff?
When I asked AI to compare podcast video editing tools, Descript appeared with clear labels explaining:
Who it’s best for
Key features
When it makes sense to choose it
That context helps AI recommend Descript to the right people (not everyone).
Action Item: Create Citable MoFu and BoFu Content
Effort vs. Impact: High effort. High impact.
Different awareness levels need different content.
To increase product-level AI visibility, focus on Product Aware and Solution Aware queries.
For Product Aware audiences, create:
Comparison pages
“Best alternative” posts
Owned “best of” lists
Want more ideas?
Talk to your sales team.
Ask them: What features are convincing people to buy? Which competitors are commonly brought up in sales conversations?
Those answers map directly to comparison content AI likes to cite.
For Solution Aware audiences, focus on how-to content that naturally features your product.
For example, when I asked Google’s AI Mode how to reduce background noise from a microphone, it referenced a Descript how-to article.
This same pattern repeats itself across many of Descript’s blog posts: Find a clear problem, give a clear solution, add product mentions naturally.
It’s all about finding the right questions to answer.
To find these opportunities faster, use Semrush’s AI Visibility Toolkit. This data is powered by Semrush’s AI prompt database and clickstream data, organized into meaningful topics.
Head to “Competitor Research” and review:
Shared topics where competitors appear
Prompts where they earn more AI visibility than you
Then, dig into the specific questions behind those prompts.
The goal isn’t simply “more content”.
It’s answering the right questions — at the right stage — with content AI can confidently cite.
Building Positive Sentiment With Digital PR and Affiliate Marketing
AI visibility isn’t earned on your website alone.
LLMs look for signals across the web.
This is what we call consensus. And it means that positive sentiment has to exist outside your owned channels.
Descript is doing this in two ways:
Digital PR on sites AI already trusts
A creator-friendly affiliate program that drives third-party mentions
Here’s how it works: Google’s AI Mode tends to favor certain websites to source when answering queries about software.
Semrush’s visibility research for AI in SaaS from December 2025 shows these sites dominate citations:
Zapier
PCMag
Gartner
LinkedIn
G2
Here’s what’s interesting.
Descript is mentioned in articles across nearly all of these top sources.
For example, in software listicles like this one on Zapier:
Or in real-world experience articles like this one on Medium:
Or in their clear listings on reviews sites like Gartner and G2:
When AI systems cite those favored sources, Descript comes along for the ride.
Not because it’s the biggest brand.
But because it’s present where AI is already looking.
The second lever is Descript’s affiliate program.
It’s simple:
$25 per new subscriber
30-day attribution window
Monthly payouts
No minimums
Those are solid incentives.
And they lead to more creator-driven content across the web.
For example, a YouTube walkthrough from VP Land explains how to use Descript and includes an affiliate link in the description.
When I later asked Google’s AI Mode how to use Descript, that exact video was cited as a source.
That’s the pattern.
Affiliate content creates citable, trusted references that AI systems reuse.
Action Item: Build a Strategy to Get More Mentions Online
Effort vs. Impact: High effort. High impact.
Getting third party mentions is all about building relationships.
First, build relationships with publishers, starting with the ones AI already trusts.
Even if you’re not an enterprise SaaS company with a full-sized PR team, this is still possible.
Granted, it’s not the easy route — but when you find the right websites and perform regular outreach to those teams, you can get your brand on these sites.
Before you start outreach, get your bearings.
Start by going back to Semrush’s AI Visibility Toolkit. Head to the “Competitor Research” tab and select “Sources.”
This shows you:
Which sites LLMs cite for your category
Where competitors are already getting mentioned
Gaps where your brand doesn’t show up (yet)
Those sites become your shortlist.
Outreach works better when you’re aiming at sources AI already relies on.
Second, build relationships with creators.
Affiliate programs work when creators want to talk about you.
So, build an affiliate program people actually want to be part of.
This means the program has to be easy to join, with clear terms that make it worth their time.
At a minimum, make sure you have:
A simple signup
Transparent tracking
Reliable payouts
Pro tip: Use a tool like PartnerStack to handle all of the details automatically. Better signups, better tracking, and automated payouts build trust with your affiliates.
If you need inspiration, research top affiliate programs to learn more about the conditions creators expect.
But most importantly: Treat affiliates as distribution partners, not just a side channel.
This means enabling them with clear positioning on your product, example use cases, demo workflows, screenshots they can reuse, and other resources.
The better you equip them, the stronger their recommendations will be.
Once you have this set up, track the results.
Use AI visibility data to see:
Which publisher relationships are turning into citations in AI search
Which creators show up in AI answers
Which formats perform best
Then, double down.
Now that we’ve discussed what Descript is doing well, let’s look at where there’s room for improvement.
Where Descript Could Improve: Reddit Marketing
Descript is doing a great job in many areas that are important for AI search visibility.
That said, there’s one area they’re missing out on: Reddit.
And yes, Reddit matters. A lot.
It’s still one of the most-cited sources in Google’s AI Mode.
And in almost all of the searches I tested above, Reddit was cited as a source (especially conversations in the r/podcasting subreddit).
Here’s the problem: right now, Reddit is not doing Descript any favors.
Here are a few thread titles I found just by searching for Descript in a podcasting subreddit:
When LLMs scan Reddit for sentiment, that unbalance matters.
AI wants to see consensus. So when Reddit skews negative, recommendations may weaken, and alternatives get surfaced instead.
Even when the product is strong.
That’s why, while Descript’s AI visibility is good, it’s still not as good as it could be. And that vulnerability could hurt them in the long run, even if they’re still doing everything else right.
Here are some ways that Descript (and you) could turn the tides on Reddit:
Avoid promoting and start participating: Reddit punishes marketing language. Helpful, honest comments perform better than posts.
Respond to criticism directly (when appropriate): Not defensively, but with clear explanations and fixes
Be present before there’s a problem: Accounts that only show up during damage control don’t build trust
Focus on comments, not posts: High-value comments in active threads outperform standalone branded posts
Monitor brand mention weekly: Focus especially on high-intent subreddits. In Descript’s case, that could be r/podcasting.
To be fair, it seems like Descript is taking steps in the right direction.
As of December 2025, the Descript team has taken control of a dedicated brand subreddit, with PMM Gabe at the helm.
And the team’s responses feel very Reddit-friendly, not using marketing jargon or being pushy.
But popular threads here still have very little interaction with the Descript team. And there seems to be very few (if any) comments from the Descript team outside of this branded subreddit.
It’s a step in the right direction, but there’s still a lot to work on.
Done right, Reddit becomes a sentiment stabilizer and a stronger input source for AI answers.
Ignore it, and Reddit can become a liability.
Remember: for AI visibility, silence isn’t neutral.
Further reading: If Reddit feels like a whole other world, we’ve got you covered. Read our full guide to Reddit Marketing.
What You Can Take Away from This SaaS LLM Visibility Case Study
Descript isn’t winning AI visibility because it’s the biggest brand.
It’s winning because it’s clear, focused, and consistently helpful.
None of that is accidental.
And none of it requires massive scale.
You can get started on this today by choosing one key action to work on.
Use the effort vs. impact lens from this article to choose where to start.
Add in-product screenshots and videos: Low effort, medium impact
Tighten your niche messaging: Medium effort, high impact
Build citable MoFu/BoFu content: High effort, medium impact
Invest in digital PR, affiliates, and community participation: High effort, high impact
Create seriously helpful content: High effort, high impact
Pick one, start there. AI search visibility tools for SaaS companies — like Semrush’s AI Visibility Toolkit — can help you see exactly where you stand today, and where you can improve.
Remember: LLM visibility isn’t about chasing algorithms.
It’s about making your product easier to understand, easier to trust, and easier to recommend.
Do that consistently — and AI search will follow.
Want to learn how it all works on a deeper level? Read our LLM visibility guide to discover even more ways to increase your brand mentions and citations in AI search.
http://dubadosolutions.com/wp-content/uploads/2017/05/dubado-logo-1.png00http://dubadosolutions.com/wp-content/uploads/2017/05/dubado-logo-1.png2026-02-20 17:50:472026-02-20 17:50:47How a 200-Person Company Competes with a $160B Giant in AI Search
Over the past year, Google has significantly accelerated its investment in artificial intelligence and machine learning across its products and platforms. While most marketers are familiar with ChatGPT, Google has been advancing its own AI capabilities in parallel, including the relaunch of Bard as Gemini and the steady rollout of AI-assisted features across Google Play.
For app marketers and ASO specialists, these developments are not abstract. They represent a fundamental shift in how apps are understood, categorized, and surfaced to users. Google Play is no longer relying primarily on keyword matching. Instead, it is moving toward a deeper, semantic understanding of apps, their functionality, and the problems they solve.
This evolution raises an important question. If Google increasingly generates, interprets, and evaluates app metadata itself, how do ASO teams maintain control, differentiation, and long-term competitive advantage?
One underutilized answer lies in a tool that has existed for years but is rarely discussed in an ASO context: the Google Natural Language.
Key Takeaways
Google Play is moving away from keyword density and toward semantic understanding driven by machine learning and natural language processing.
The Google Natural Language provides valuable insight into how Google interprets app metadata, including entities, sentiment, and category relevance.
Optimizing for category confidence and entity relevance can improve keyword coverage and resilience during algorithm updates.
ASO teams that align metadata with user intent and natural language patterns are better positioned for long-term discovery performance.
Using tools like the Google Natural Language helps future-proof ASO strategies as automation and AI-driven ranking signals continue to expand.
Why Traditional ASO Signals Are Losing Impact
Before exploring how the Google Natural Language can support ASO, it is important to understand the broader shifts in Google Play’s ranking algorithms.
Over the past two years, Google Play has shifted away from frequent, visible algorithm swings towards a more continuous learning model. While ASO teams still see volatility, it is now driven less by discrete updates and more by ongoing recalibration as models ingest new behavioural, linguistic, and performance data. Reindexing events still occur, but they are increasingly tied to semantic reassessment rather than simple metadata changes.
At the same time, the effectiveness of traditional optimization levers such as keyword density, exact-match repetition, and rigid keyword placement has continued to erode. These tactics no longer align with how Google Play evaluates relevance.
Like Google Search, Google Play is now firmly optimized for meaning, not mechanics. Its systems are designed to understand intent, function, and audience context rather than rely on surface-level keyword signals. The algorithm is increasingly capable of identifying what an app does, who it serves, and the problems it solves, even when those ideas are expressed using varied, natural language.
This is where natural language processing becomes central to modern ASO tools and practices.
What is the Goal of the Google Natural Language
Google Natural Language is designed to help machines understand human language in a way that more closely mirrors human interpretation. It powers a wide range of Google products and capabilities, including sentiment analysis, entity recognition, content classification, and contextual understanding.
In practical terms, it analyzes a body of text and identifies:
The overall sentiment and tone.
Key entities and their relative importance.
The categories and subcategories that the content most strongly aligns with.
For ASO teams, this offers a rare opportunity. Instead of guessing how Google might interpret app metadata, it provides a proxy for understanding how Google’s machine learning systems read and categorise text.
Used correctly, it can help ASO specialists align metadata more closely with Google’s evolving ranking logic.
How Google Natural Language Applies to ASO
When applied to app metadata, Google Natural Language can reveal how Google is likely to associate an app with certain concepts, categories, and keyword themes. This insight is particularly valuable as keyword density becomes less influential and semantic relevance takes priority.
Below are the key components that matter most for ASO.
Sentiment Analysis
Sentiment analysis evaluates the emotional tone of a piece of text and categorises it as positive, negative, or neutral. While sentiment is not a primary ranking factor for app discovery, it does provide useful contextual information.
For example, overly promotional, aggressive, or unclear language can introduce noise into metadata. Reviewing sentiment outputs can help teams ensure that descriptions maintain a clear, neutral, and informative tone that supports both user trust and algorithmic interpretation.
Entity Recognition and Salience
Entity recognition identifies specific entities within a text and classifies them into predefined types such as company, product, feature, or concept. Each entity is assigned a salience score, which reflects how central that entity is to the overall content.
In an ASO context, entities might include:
Core app features
Functional use cases
Industry-specific terms
Recognisable product or service concepts
Salience scores range from 0 to 1.0. Higher scores indicate that an entity plays a more important role in defining the content.
From an optimization perspective, this is critical. If key features or use cases are not appearing as highly salient, it suggests Google may not be strongly associating the app with those concepts.
Strategically incorporating relevant entities into metadata in a natural, user-focused way can improve clarity and strengthen topical relevance. Placement also matters. Important entities that appear early in descriptions or are reinforced toward the end of the text tend to carry more weight.
Categories and Confidence Scores
Category classification is arguably the most impactful element of Google Natural Language for ASO.
When text is analyzed, it assigns it to one or more categories and subcategories, each with an associated confidence score. These scores indicate how strongly the content aligns with a given category.
For Google Play, this has major implications. Higher category confidence increases the likelihood that an app will be associated with a broader range of relevant search queries within that category. Rather than ranking for a narrow set of exact keywords, apps can gain visibility across an expanded semantic keyword space.
In practice, we have seen that improving category confidence can significantly enhance keyword coverage and ranking stability, particularly during periods of algorithm change.
To increase category confidence:
Use clear, natural language that reflects real user intent
Focus on describing functionality and value, not just features
Avoid keyword stuffing or forced phrasing
Reinforce category-relevant concepts consistently throughout metadata
Applying GNL Insights to Metadata Strategy
The real value of Google Natural Language lies not in isolated analysis, but in iterative optimization. By repeatedly testing metadata drafts through the Google Natural Language, ASO teams can refine language until category confidence, entity salience, and overall clarity improve.
This approach aligns well with broader 2026 ASO best practices, which emphasize:
User intent over keyword lists
Semantic relevance over repetition
Long-term stability over short-term gains
Case Study Insights
We have applied GNL-driven optimisation techniques across multiple app categories. While results vary by vertical, the overall pattern has been consistent.
During periods of significant Google Play algorithm updates, apps optimized around category confidence and entity relevance showed greater resilience. In several cases, visibility improved despite widespread volatility elsewhere in the store.
In one example, keyword coverage expanded substantially following metadata updates that increased confidence across both a core category and secondary related categories. This translated into a more than fivefold increase in organic Explore installs over time.
These results reinforce an important principle. When ASO strategies align with how Google understands language, they are better positioned to benefit from algorithm evolution rather than being disrupted by it.
Connecting GNL to 2026 ASO Strategy
Looking ahead, the role of natural language processing in app discovery will only grow. As Google continues to automate metadata creation and interpretation, manual optimization will shift from mechanical execution to strategic guidance.
ASO teams that understand and leverage tools like Google Natural Language will be better equipped to:
Guide AI-generated content rather than react to it
Maintain differentiation in an increasingly automated ecosystem
Build metadata that supports both paid and organic discovery
This approach also complements broader trends such as AI-powered search, cross-platform discovery, and privacy-first measurement frameworks.
Conclusion
The rise of natural language processing does not signal the end of ASO. Instead, it marks a shift in how optimization should be approached.
By moving beyond keyword density and embracing semantic relevance, ASO teams can align more closely with Google’s evolving algorithms. Google Natural Language offers a practical way to understand how app metadata is interpreted and how it can be improved to support discovery, conversion, and long-term stability.
As automation continues to expand across Google Play, the teams that succeed will be those who understand the systems behind it and adapt their strategies accordingly. Natural language optimization is no longer optional. It is becoming a core pillar of modern ASO.
http://dubadosolutions.com/wp-content/uploads/2017/05/dubado-logo-1.png00http://dubadosolutions.com/wp-content/uploads/2017/05/dubado-logo-1.png2026-02-19 20:00:002026-02-19 20:00:00How to Leverage Google Natural Language to Boost Your ASO Efforts
A core targeting lever in Google Demand Gen campaigns is changing. Starting March 2026, Lookalike audiences will act as optimization signals — not hard constraints — potentially widening reach and leaning more heavily on automation to drive conversions.
What is happening. Per an update to Google’s Help documentation, Lookalike segments in Demand Gen are moving from strict similarity-based targeting to an AI-driven suggestion model.
Before: Advertisers selected a similarity tier (narrow, balanced, broad), and campaigns targeted users strictly within that Lookalike pool.
After: The same tiers act as signals. Google’s system can expand beyond the Lookalike list to reach users it predicts are likely to convert.
Between the lines. This effectively reframes Lookalikes from a fence to a compass. Instead of limiting delivery to a defined cohort, advertisers are feeding intent signals into Google’s automation and allowing it to search for performance outside preset boundaries.
How this interacts with Optimized Targeting. The new Lookalike-as-signal approach resembles Optimized Targeting — but it doesn’t replace it.
When advertisers layer Optimized Targeting on top, Google says the system may expand reach even further.
In practice, this stacks multiple automation signals, increasing the algorithm’s freedom to pursue lower CPA or higher conversion volume.
Opt-out option. Advertisers who want to preserve legacy behavior can request continued access to strict Lookalike targeting through a dedicated opt-out form. Without that request, campaigns will default to the new signal-based model.
Why we care. This update changes how much control advertisers will have over who their ads reach in Google Demand Gen campaigns. Lookalike audiences will no longer strictly limit targeting — they’ll guide AI expansion — which can significantly affect scale, CPA, and overall performance.
It also signals a broader shift toward automation, similar to trends driven by Meta Platforms. Advertisers will need to test carefully, rethink audience strategies, and decide whether to embrace the added reach or opt out to preserve tighter targeting.
Zoom out. The shift mirrors a broader industry trend toward AI-first audience expansion, similar to moves by Meta Platforms over the past few years. Platforms are steadily trading granular manual controls for machine-led optimization.
Why Google is doing this. Digital markerter Dario Zannoni, has two reasons as to why Google is doing this:
Strict Lookalike targeting can cap scale and constrain performance in conversion-focused campaigns.
Maintaining high-quality similarity models is increasingly complex, making broader automation more attractive.
The bottom line. For performance marketers, this is another step toward automation-centric buying. While reduced control may be uncomfortable, comparable platform changes have often produced performance gains in mainstream use cases. Expect a new testing cycle as advertisers measure how expanded Lookalike signals affect CPA, reach, and incremental conversions.
First seen. This update was spotted by Zannoni who shared his thoughts on LinkedIn.
https://i0.wp.com/dubadosolutions.com/wp-content/uploads/2026/02/Lookalike-audience-WANafN.jpg?fit=800%2C307&ssl=1307800http://dubadosolutions.com/wp-content/uploads/2017/05/dubado-logo-1.png2026-02-17 18:02:502026-02-17 18:02:50Google shifts Lookalike to AI signals in Demand Gen
Jeff Dean says Google’s AI Search still works like classic Search: narrow the web to relevant pages, rank them, then let a model generate the answer.
In an interview on Latent Space: The AI Engineer Podcast, Google’s chief AI scientist explained how Google’s AI systems work and how much they rely on traditional search infrastructure.
The architecture: filter first, reason last. Visibility still depends on clearing ranking thresholds. Content must enter the broad candidate pool, then survive deeper reranking before it can be used in an AI-generated response. Put simply, AI doesn’t replace ranking. It sits on top of it.
Dean said an LLM-powered system doesn’t read the entire web at once. It starts with Google’s full index, then uses lightweight methods to identify a large candidate pool — tens of thousands of documents. Dean said:
“You identify a subset of them that are relevant with very lightweight kinds of methods. You’re down to like 30,000 documents or something. And then you gradually refine that to apply more and more sophisticated algorithms and more and more sophisticated sort of signals of various kinds in order to get down to ultimately what you show, which is the final 10 results or 10 results plus other kinds of information.”
Stronger ranking systems narrow that set further. Only after multiple filtering rounds does the most capable model analyze a much smaller group of documents and generate an answer. Dean said:
“And I think an LLM-based system is not going to be that dissimilar, right? You’re going to attend to trillions of tokens, but you’re going to want to identify what are the 30,000-ish documents that are with the maybe 30 million interesting tokens. And then how do you go from that into what are the 117 documents I really should be paying attention to in order to carry out the tasks that the user has asked me to do?”
Dean called this the “illusion” of attending to trillions of tokens. In practice, it’s a staged pipeline: retrieve, rerank, synthesize. Dean said:
“Google search gives you … not the illusion, but you are searching the internet, but you’re finding a very small subset of things that are relevant.”
Matching: from keywords to meaning. Nothing new here, but we heard another reminder that covering a topic clearly and comprehensively matters more than repeating exact-match phrases.
Dean explained how LLM-based representations changed how Google matches queries to content.
Older systems relied more on exact word overlap. With LLM representations, Google can move beyond the idea that particular words must appear on the page and instead evaluate whether a page — or even a paragraph — is topically relevant to a query. Dean said:
“Going to an LLM-based representation of text and words and so on enables you to get out of the explicit hard notion of particular words having to be on the page. But really getting at the notion of this topic of this page or this page paragraph is highly relevant to this query.”
That shift lets Search connect queries to answers even when wording differs. Relevance increasingly centers on intent and subject matter, not just keyword presence.
Query expansion didn’t start with AI. Dean pointed to 2001, when Google moved its index into memory across enough machines to make query expansion cheap and fast. Dean said:
“One of the things that really happened in 2001 was we were sort of working to scale the system in multiple dimensions. So one is we wanted to make our index bigger, so we could retrieve from a larger index, which always helps your quality in general. Because if you don’t have the page in your index, you’re going to not do well.
“And then we also needed to scale our capacity because we were, our traffic was growing quite extensively. So we had a sharded system where you have more and more shards as the index grows, you have like 30 shards. Then if you want to double the index size, you make 60 shards so that you can bound the latency by which you respond for any particular user query. And then as traffic grows, you add more and more replicas of each of those.
And so we eventually did the math that realized that in a data center where we had say 60 shards and 20 copies of each shard, we now had 1,200 machines with disks. And we did the math and we’re like, Hey, one copy of that index would actually fit in memory across 1,200 machines. So in 2001, we … put our entire index in memory and what that enabled from a quality perspective was amazing.
Before that, adding terms was expensive because it required disk access. Once the index lived in memory, Google could expand a short query into dozens of related terms — adding synonyms and variations to better capture meaning. Dean said:
“Before, you had to be really careful about how many different terms you looked at for a query, because every one of them would involve a disk seek.
“Once you have the whole index in memory, it’s totally fine to have 50 terms you throw into the query from the user’s original three- or four-word query. Because now you can add synonyms like restaurant and restaurants and cafe and bistro and all these things.
“And you can suddenly start … getting at the meaning of the word as opposed to the exact semantic form the user typed in. And that was … 2001, very much pre-LLM, but really it was about softening the strict definition of what the user typed in order to get at the meaning.”
That change pushed Search toward intent and semantic matching years before LLMs. AI Mode (and its other AI experiences) continues Google’s ongoing shift toward meaning-based retrieval, enabled by better systems and more compute.
Freshness as a core advantage. Dean said one of Search’s biggest transformations was update speed. Early systems refreshed pages as rarely as once a month. Over time, Google built infrastructure that can update pages in under a minute. Dean said:
“In the early days of Google, we were growing the index quite extensively. We were growing the update rate of the index. So the update rate actually is the parameter that changed the most.”
That improved results for news queries and affected the main search experience. Users expect current information, and the system is designed to deliver it. Dean said:
“If you’ve got last month’s news index, it’s not actually that useful.”
Google uses systems to decide how often to crawl a page, balancing how likely it is to change with how valuable the latest version is. Even pages that change infrequently may be crawled often if they’re important enough. Dean said:
“There’s a whole … system behind the scenes that’s trying to decide update rates and importance of the pages. So, even if the update rate seems low, you might still want to recrawl important pages quite often because the likelihood they change might be low, but the value of having updated is high.”
Why we care. AI answers don’t bypass ranking, crawl prioritization, or relevance signals. They depend on them. Eligibility, quality, and freshness still determine which pages are retrieved and narrowed. LLMs change how content is synthesized and presented — but the competition to enter the underlying candidate set remains a search problem.
https://i0.wp.com/dubadosolutions.com/wp-content/uploads/2026/02/f_1odpwxpfq-MZyg0t.jpg?fit=1280%2C720&ssl=17201280http://dubadosolutions.com/wp-content/uploads/2017/05/dubado-logo-1.png2026-02-17 17:52:532026-02-17 17:52:53Google’s Jeff Dean: AI Search relies on classic ranking and retrieval
If you look at job postings on Indeed and LinkedIn, you’ll see a wave of acronyms added to the alphabet soup as companies try to hire people to boost visibility on large language models (LLMs).
Some people are calling it generative engine optimization (GEO). Others call it answer engine optimization (AEO). Still others call it artificial intelligence optimization (AIO). I prefer large model answer optimization (LMAO).
I find these new acronyms a bit ridiculous because while many like to think AI optimization is new, it isn’t. It’s just long-tailSEO — done the way it was always meant to be done.
Why LLMs still rely on search
Most LLMs (e.g., GPT-4o, Claude 4.5, Gemini 1.5, Grok-2) are transformers trained to do one thing: predict the next token given all previous tokens.
AI companies train them on massive datasets from public web crawls, such as:
Common Crawl.
Digitized books.
Wikipedia dumps.
Academic papers.
Code repositories.
News archives.
Forums.
The data is heavily filtered to remove spam, toxic content, and low-quality pages. Full pretraining is extremely expensive, so companies run major foundation training cycles only every few years and rely on lighter fine-tuning for more frequent updates.
So what happens when an LLM encounters a question it can’t answer with confidence, despite the massive amount of training data?
AI companies use real-time web search and retrieval-augmented generation (RAG) to keep responses fresh and accurate, bridging the limits of static training data. In other words, the LLM runs a web search.
To see this in real time, many LLMs let you click an icon or “Show details” to view the process. For example, when I use Grok to find highly rated domestically made space heaters, it converts my question into a standard search query.
Many of us long-time SEO practitioners have praised the value of long-tail SEO for years. But one main reason it never took off for many brands: Google.
As long as Google’s interface was a single text box, users were conditioned to search with one- and two-word queries. Most SEO revenue came from these head terms, so priorities focused on competing for the No. 1 spot for each industry’s top phrase.
Many brands treated long-tail SEO as a distraction. Some cut content production and community management because they couldn’t see the ROI. Most saw more value in protecting a handful of head terms than in creating content to capture the long tail of search.
Fast forward to 2026. People typing LLM prompts do so conversationally, adding far more detail and nuance than they would in a traditional search engine. LLMs take these prompts and turn them into search queries. They won’t stop at a few words. They’ll construct a query that reflects whatever detail their human was looking for in the prompt.
Suddenly, the fat head of the search curve is being replaced with a fat tail. While humans continue to go to search engines for head terms, LLMs are sending these long-tail search queries to search engines for answers.
While AI companies are coy about disclosing exactly who they partner with, most public information points to the following search engines as the ones their LLMs use most often:
ChatGPT – Bing Search.
Claude – Brave Search.
Gemini – Google Search.
Grok – X Search and its own internal web search tool.
Perplexity – Uses its own hybrid index.
Right now, humans conduct billions of searches each month on traditional search engines. As more people turn to LLMs for answers, we’ll see exponential growth in LLMs sending search queries on their behalf.
SEO is being reborn.
Your customers search everywhere. Make sure your brand shows up.
The SEO toolkit you know, plus the AI visibility data you need.
The principles of long-tail SEO haven’t changed much. It’s best summed up by Baseball Hall of Famer Wee Willie Keeler: “Keep your eye on the ball and hit ’em where they ain’t.”
Success has always depended on understanding your audience’s deepest needs, knowing what truly differentiates your brand, and creating content at the intersection of the two.
As straightforward as this strategy has been, few have executed it well, for understandable reasons.
Reading your customers’ minds is hard. Keyword research is tedious. Content creation is hard. It’s easy to get lost in the weeds.
Happily, there’s someone to help: your favorite LLM.
Here are a few best practices I’ve used to create strong long-tail content over the years, with a twist. What once took days, weeks, or even months, you can now do in minutes with AI.
1. Ask your LLM what people search when looking for your product or service
The first rule of long-tail SEO has always been to get into your audience’s heads and understand their needs. This once required commissioning surveys and hiring research firms to figure out.
But for most brands and industries, an LLM can handle at least the basics. Here’s a sample prompt you can use.
Act as an SEO strategist and customer research analyst. You're helping with long-tail keyword discovery by modeling real customer questions.
I want to discover long-tail search questions real people might ask about my business, products, and industry. I’m not looking for mere keyword lists. Generate realistic search questions that reflect how people research, compare options, solve problems, and make decisions.
Company name: [COMPANY NAME]
Industry: [INDUSTRY]
Primary product/service: [PRIMARY PRODUCT OR SERVICE]
Target customer: [TARGET AUDIENCE]
Geography (if relevant): [LOCATION OR MARKET]
Generate a list of 75 – 100 realistic, natural-language search queries grouped into the following categories:
AWARENESS
• Beginner questions about the category
• Problem-based questions (pain points, frustrations, confusion)
CONSIDERATION
• Comparison questions (alternatives, competitors, approaches)
• “Best for” and use-case questions
• Cost and pricing questions
DECISION
• Implementation or getting-started questions
• Trust, credibility, and risk questions
POST-PURCHASE
• Troubleshooting questions
• Optimization and advanced/expert questions
EDGE CASES
• Niche scenarios
• Uncommon but realistic situations
• Advanced or expert questions
Guidelines:
• Write queries the way real people search in Google or ask AI assistants.
• Prioritize specificity over generic keywords.
• Include question formats, “how to” queries, and scenario-based searches.
• Avoid marketing language.
• Include emotional, situational, and practical context where relevant.
• Don't repeat the same query structure with minor variations.
• Each query should suggest a clear content angle.
Output as a clean bullet list grouped by category.
You can tweak this prompt for your brand and industry. The key is to force the LLM (and yourself) to think like a customer and avoid the trap of generating keyword lists that are just head-term variations dressed up as long-tail queries.
With a prompt like this, you move away from churning out “keyword ideas” and toward understanding real customer needs you can build useful content around.
Most large brands and sites don’t realize they’ve been sitting on a treasure trove of user intelligence: on-site search data.
When customers type a query into your site’s search box, they’re looking for something they expect your brand to provide.
If you see the same searches repeatedly, it usually means one of two things:
You have the information, but users can’t find it.
You don’t have it at all.
In both cases, it’s a strong signal you need to improve your site’s UX, add meaningful content, or both.
There’s another advantage to mining on-site search data: it reveals the exact words your audience uses, not the terms your team assumes they use.
Historically, the challenge has been the time required to analyze it. I remember projects where I locked myself in a room for days, reviewing hundreds of thousands of queries line by line to find patterns — sorting, filtering, and clustering them by intent.
If you’ve done the same, you know the pattern. The first few dozen keywords represent unique concepts, but eventually you start seeing synonyms and variations.
All of this is buried treasure waiting to be explored. Your LLM can help. Here’s a sample prompt you can use:
You're an SEO strategist analyzing internal site search data.
My goal is to identify content opportunities from what users are searching for on my website – including both major themes and specific long-tail needs within those themes.
I have attached a list of site search queries exported from GA4. Please:
STEP 1 – Cluster by intent
Group the queries into logical intent-based themes.
STEP 2 – Identify long-tail signals inside each theme
Within each theme:
• Identify recurring modifiers (price, location, comparisons, troubleshooting, etc.)
• Identify specific entities mentioned (products, tools, features, audiences, problems)
• Call out rare but high-intent searches
• Highlight wording that suggests confusion or unmet expectations
STEP 3 – Generate content ideas
For each theme:
• Suggest 3 – 5 content ideas
• Include at least one long-tail content idea derived directly from the queries
• Include one “high-intent” content idea
• Include one “problem-solving” content idea
STEP 4 – Identify UX or navigation issues
Point out searches that suggest:
• Users cannot find existing content
• Misleading navigation labels
• Missing landing pages
Output format:
Theme:
Supporting queries:
Long-tail insights:
Content opportunities:
UX observations:
Again, customize this prompt based on what you know about your audience and how they search.
The detail matters. Many SEO practitioners stop at a prompt like “give me a list of topics for my clients,” but this pushes the LLM beyond simple clustering to understand the intent behind the searches.
I used on-site search data because it’s one of the richest, most transparent, and most actionable sources. But similar prompts can uncover hidden value in other keyword lists, such as “striking distance” terms from Google Search Console or competitive keywords from Semrush.
Even better, if your organization keeps detailed customer interaction records (e.g., sales call notes, support tickets, chat transcripts), those can be more valuable. Unlike keyword datasets, they capture problems in full sentences, in the customer’s own words, often revealing objections, confusion, and edge cases that never appear in traditional keyword research.
Your goal is to create content so strong and authoritative that it’s picked up by sources like Common Crawl and survives the intense filtering AI companies apply when building LLM training sets. Realistically, only pioneering brands and recognized authorities can expect to operate in this rarefied space.
For the rest of us, the opportunity is creating high-quality long-tail content that ranks at the top across search engines — not just Google, but Bing, Brave, and even X.
This is one area where I wouldn’t rely on LLMs, at least not to generate content from scratch.
Why?
LLMs are sophisticated pattern matchers. They surface and remix information from across the internet, even obscure material. But they don’t produce genuinely original thought.
At best, LLMs synthesize. At worst, they hallucinate.
Many worry AI will take their jobs. And it will — for anyone who thinks “great content” means paraphrasing existing authority sources and competing with Wikipedia-level sites for broad head terms. Most brands will never be the primary authority on those terms. That’s OK.
The real opportunity is becoming the authority on specific, detailed, often overlooked questions your audience actually has. The long tail is still wide open for brands willing to create thoughtful, experience-driven content that doesn’t already exist everywhere else.
We need to face facts. The fat head is shrinking. The land rush is now for the “fat tail.” Here’s what brands need to do to succeed:
Dominate searches for your brand
Search your brand name in a keyword tool like Semrush and review the long-tail variations people type into Google. You’ll likely find more than misspellings. You’ll see detailed queries about pricing, alternatives, complaints, comparisons, and troubleshooting.
If you don’t create content that addresses these topics directly — the good and the bad — someone else will. It might be a Reddit thread from someone who barely knows your product, a competitor attacking your site, a negative Google Business Profile review, or a complaint on Trustpilot.
When people search your brand, your site should be the best place for honest, complete answers — even and especially when they aren’t flattering. If you don’t own the conversation, others will define it for you.
The time for “frequently asked questions” is over. You need to answer every question about your brand—frequent, infrequent, and everything in between.
Go long
Head terms in your industry have likely been dominated by top brands for years. That doesn’t mean the opportunity is gone.
Beneath those competitive terms is a vast layer of unbranded, long-tail searches that have likely been ignored. Your data will reveal them.
Review on-site search, Google Search Console queries, customer support questions, and forums like Reddit. These are real people asking real questions in their own words.
The challenge isn’t finding questions to write about. It’s delivering the best answers — not one-line responses to check a box, but clear explanations, practical examples, and content grounded in real experience that reflects what sets your brand apart.
Expertise is now a commodity: Lean into experience, authority, and trust
Publishing expert content still matters, but its role has changed. Today, anyone can generate “expert-sounding” articles with an LLM.
Whether that content ranks in Google is increasingly beside the point, as many users go straight to AI tools for answers.
As the “expertise” in E-E-A-T becomes table stakes, differentiation comes from what AI and competitors can’t easily replicate: experience, authority, and trust.
That means publishing:
Original insights and genuine thought leadership from people inside your company.
Real customer stories with measurable outcomes.
Transparent reviews and testimonials.
Evidence that your brand delivers what it promises.
This isn’t just about blog content. These signals should appear across your site — from your About page to product pages to customer support content. Every page should reinforce why a real person should trust your brand.
Stop paywalling your best content
I’m seeing more brands put their strongest content behind logins or paywalls. I understand why. Many need to protect intellectual property and preserve monetization. But as a long-term strategy, this often backfires.
If your content is truly valuable, the ideas will spread anyway. A subscriber may paraphrase it. An AI system may summarize it. A crawler may access it through technical workarounds. In the end, your insights circulate without attribution or brand lift.
When your best content is publicly accessible, it can be cited, linked to, indexed, and discussed. That visibility builds authority and trust over time.
In a search- and AI-driven ecosystem, discoverability often outweighs modest direct content monetization.
This doesn’t mean content businesses can’t charge for anything. It means being strategic about what you charge for. A strong model is to make core knowledge and thought leadership open while monetizing things such as:
Tools.
Community access.
Premium analysis or data.
Courses or certifications.
Implementation support.
Early access or deeper insights.
In other words, let your ideas spread freely and monetize the experience, expertise, and outcomes around them.
Stop viewing content as a necessary evil
I still see brands hiding content behind CSS “read more” links or stuffing blocks of “SEO copy” at the bottom of pages, hoping users won’t notice but search engines will.
Spoiler alert: they see it. They just don’t care.
Content isn’t something you add to check an SEO box or please a robot. Every word on your site must serve your customers. When content genuinely helps users understand, compare, and decide, it becomes an asset that builds trust and drives conversions.
If you’d be embarrassed for users to read your content, you’re thinking about it the wrong way. There’s no such thing as content that’s “bad for users but good for search engines.” There never was.
Embrace user-generated content
No article on long-tail SEO is complete without discussing user-generated content. I covered forums and Q&A sites in a previous article (see: The reign of forums: How AI made conversation king), and they remain one of the most efficient ways to generate authentic, unique content.
The concept is simple. You have an audience that’s already passionate and knowledgeable. They likely have more hands-on experience with your brand and industry than many writers you hire. They may already be talking about your brand offline, in customer communities, or on forums like Reddit.
Your goal is to bring some of those conversations onto your site.
User-generated content naturally produces the long-tail language marketing teams rarely create on their own. Customers
Describe problems differently.
Ask unexpected questions.
Compare products in ways you didn’t anticipate.
Surface edge cases, troubleshooting scenarios, and real-world use cases that rarely appear in polished marketing copy.
This is exactly the kind of content long-tail SEO thrives on.
It’s also the kind of content AI systems and search engines increasingly recognize as credible because it reflects real experience rather than brand messaging many dismiss as inauthentic.
Brands that do this well don’t just capture long-tail traffic. They build trust, reduce support costs, and dominate long-tail searches and prompts.
In the age of AI-generated content, real human experience is one of the strongest differentiators.
See the complete picture of your search visibility.
Track, optimize, and win in Google and AI search from one platform.
Start Free Trial
Get started with
The new SEO playbook looks a lot like the old one
For years, SEO has been shaped by the limits of the search box. Short queries and head terms dominated strategy, and long-tail content was often treated as optional.
LLMs are changing that dynamic. AI is expanding search, not eliminating it.
AI systems encourage people to express what they actually want to know. Those detailed prompts still need answers, and those answers come from the web.
That means the SEO opportunity is shifting from competing over a small set of keywords to becoming the best source of answers to thousands of specific questions.
Brands that succeed will:
Deeply understand their audience.
Publish genuinely useful content.
Build trust through real engagement and experience.
That’s always been the recipe for SEO success. But our industry has a habit of inventing complex tactics to avoid doing the simple work well.
Most of us remember doorway pages, exact match domains, PageRank sculpting, LSI obsession, waves of auto-generated pages, and more. Each promised an edge. Few replaced the value of helping users.
We’re likely to see the same cycle repeat in the AI era.
The reality is simpler. AI systems aren’t the audience. They’re intermediaries helping humans find trustworthy answers.
If you focus on helping people understand, decide, and solve problems, you’re already optimizing for AI — whatever you call it.
https://i0.wp.com/dubadosolutions.com/wp-content/uploads/2026/02/Grok-Highly-rated-space-heaters-made-domestically-bxRsJD.webp?fit=941%2C870&ssl=1870941http://dubadosolutions.com/wp-content/uploads/2017/05/dubado-logo-1.png2026-02-17 15:56:332026-02-17 15:56:33Why AI optimization is just long-tail SEO done right
Over two months ago, Google began testing its AI-powered configuration tool. It allows you to ask AI questions about the Google Search Console performance reports and it would bring back answers for you. Well, Google is now rolling out this tool for all.
Google said on LinkedIn, “The Search Console’s new AI-powered configuration is now available to everyone!”
AI-powered configuration. AI-powered configuration “lets you describe the analysis you want to see in natural language. Your inputs are then transformed into the appropriate filters and settings, instantly configuring the report for you,” Google said.
Rolling out now. If you login to your Search Console account and click on the performance report, you may see a note at the top that says “New! Customize your Performance report using Al.”
When you click on it, you get into the AI tool:
More details. As we reported earlier, Google said “The AI-powered configuration feature is designed to streamline your analysis by handling three key elements for you.”
Selecting metrics: Choose which of the four available metrics – Clicks, Impressions, Average CTR, and Average Position – to display based on your question.
Applying filters: Narrow down data by query, page, country, device, search appearance, or date range.
Configuring comparisons: Set up complex comparisons (like custom date ranges) without manual setup.
Why we care. This is only supported in the Performance report for Search results. It isn’t available for Discover or News reports, yet. Plus, it is AI, so the answers may not be perfect. But it can be fun to play with and get you thinking about things you may not have thought about yet.
https://i0.wp.com/dubadosolutions.com/wp-content/uploads/2026/02/google-search-console-ai-powered-configuration1-scaled-p2Qi6F.webp?fit=2048%2C928&ssl=19282048http://dubadosolutions.com/wp-content/uploads/2017/05/dubado-logo-1.png2026-02-17 14:27:282026-02-17 14:27:28Google Search Console AI-powered configuration rolling out
His conclusion – that AI tools produce wildly inconsistent brand recommendation lists, making “ranking position” a meaningless metric – is correct, well-evidenced, and long overdue.
But Fishkin stopped one step short of the answer that matters.
He didn’t explore why some brands appear consistently while others don’t, or what would move a brand from inconsistent to consistent visibility. That solution is already formalized, patent pending, and proven in production across 73 million brand profiles.
When I shared this with Fishkin directly, he agreed. The AI models are pulling from a semi-fixed set of options, and the consistency comes from the data. He just didn’t have the bandwidth to dig deeper, which is fair enough, but the digging has been done – I’ve been doing it for a decade.
Here’s what Fishkin found, what it actually means, and what the data proves about what to do about it.
Fishkin’s data killed the myth of AI ranking position
Fishkin and Patrick O’Donnell ran 2,961 prompts across ChatGPT, Claude, and Google AI, asking for brand recommendations across 12 categories. The findings were surprising for most.
Fewer than 1 in 100 runs produced the same list of brands, and fewer than 1 in 1,000 produced the same list in the same order. These are probability engines that generate unique answers every time. Treating them as deterministic ranking systems is – as Fishkin puts it – “provably nonsensical,” and I’ve been saying this since 2022. I’m grateful Fishkin finally proved it with data.
But Fishkin also found something he didn’t fully unpack. Visibility percentage – how often a brand appears across many runs of the same prompt – is statistically meaningful. Some brands showed up almost every time, while others barely appeared at all.
That variance is where the real story lies.
Fishkin acknowledged this but framed it as a better metric to track. The real question isn’t how to measure AI visibility, it’s why some brands achieve consistent visibility and others don’t, and what moves your brand from the inconsistent pile to the consistent pile.
That’s not a tracking problem. It’s a confidence problem.
Your customers search everywhere. Make sure your brand shows up.
The SEO toolkit you know, plus the AI visibility data you need.
Start Free Trial
Get started with
AI systems are confidence engines, not recommendation engines
AI platforms – ChatGPT, Claude, Google AI, Perplexity, Gemini, all of them – generate every response by sampling from a probability distribution shaped by:
What the model knows.
How confidently it knows it.
What it retrieved at the moment of the query.
When the model is highly confident about an entity’s relevance, that entity appears consistently. When the model is uncertain, the entity sits at a low probability weight in the distribution – included in some samples, excluded in others – not because the selection is random but because the AI doesn’t have enough confidence to commit.
That’s the inconsistency Fishkin documented, and I recognized it immediately because I’ve been tracking exactly this pattern since 2015.
City of Hope appearing in 97% of cancer care responses isn’t luck. It’s the result of deep, corroborated, multi-source presence in exactly the data these systems consume.
The headphone brands at 55%-77% are in a middle zone – known, but not unambiguously dominant.
The brands at 5%-10% have low confidence weight, and the AI includes them in some outputs and not others because it lacks the confidence to commit consistently.
Confidence isn’t just about what a brand publishes or how it structures its content. It’s about where that brand stands relative to every other entity competing for the same query – a dimension I’ve recently formalized as Topical Position.
I’ve formalized this phenomenon as “cascading confidence” – the cumulative entity trust that builds or decays through every stage of the algorithmic pipeline, from the moment a bot discovers content to the moment an AI generates a recommendation. It’s the throughline concept in a framework I published this week.
Every piece of content passes through 10 gates before influencing an AI recommendation
The pipeline is called DSCRI-ARGDW – discovered, selected, crawled, rendered, indexed, annotated, recruited, grounded, displayed, and won. That sounds complicated, but I can summarize it in a single question that repeats at every stage: How confident is the system in this content?
Is this URL worth crawling?
Can it be rendered correctly?
What entities and relationships does it contain?
How sure is the system about those annotations?
When the AI needs to answer a question, which annotated content gets pulled from the index?
Confidence at each stage feeds the next. A URL from a well-structured, fast-rendering, semantically clean site arrives at the annotation stage with high accumulated confidence before a single word of content is analyzed. A URL from a slow, JavaScript-heavy site with inconsistent information arrives with low confidence, even if the actual content is excellent.
This is pipeline attenuation, and here’s where the math gets unforgiving. The relationship is multiplicative, not additive:
C_final = C_initial × ∏τᵢ
In plain English, the final confidence an AI system has in your brand equals the initial confidence from your entity home multiplied by the transfer coefficient at every stage of the pipeline. The entity home – the canonical web property that anchors your entity in every knowledge graph and every AI model – sets the starting confidence, and then each stage either preserves or erodes it.
Maintain 90% confidence at each of 10 stages, and end-to-end confidence is 0.9¹⁰ = 35%. At 80% per stage, it’s 0.8¹⁰ = 11%. One weak stage – say 50% at rendering because of heavy JavaScript – drops the total from 35% to 19% even if every other stage is at 90%. One broken stage can undo the work of nine good ones.
This multiplicative principle isn’t new, and it doesn’t belong to anyone. In 2019, I published an article, How Google Universal Search Ranking Works: Darwinism in Search, based on a direct explanation from Google’s Gary Illyes. He described how Google calculates ranking “bids” by multiplying individual factor scores rather than adding them. A zero on any factor kills the entire bid, no matter how strong the other factors are.
Google applies this multiplicative model to ranking factors within a single system, and nobody owns multiplication. But what the cascading confidence framework does is apply this principle across the full 10-stage pipeline, across all three knowledge graphs.
The system provides measurable transfer coefficients at every transition and bottleneck detection that identifies exactly where confidence is leaking. The math is universal, but the application to a multi-stage, multi-graph algorithmic pipeline is the invention.
This complete system is the subject of a patent application I filed with the INPI titled “Système et procédé d’optimisation de la confiance en cascade à travers un pipeline de traitement algorithmique multi-étapes et multi-graphes.” It’s not a metaphor, it’s an engineered system with an intellectual lineage going back seven years to a principle a Google engineer confirmed to me in person.
Fishkin measured the output – the inconsistency of recommendation lists. But the output is a symptom, and the cause is confidence loss at specific stages of this pipeline, compounded across multiple knowledge representations.
You can’t fix inconsistency by measuring it more precisely. You can only fix it by building confidence at every stage.
The corroboration threshold is where AI shifts from hesitant to assertive
There’s a specific transition point where AI behavior changes. I call it the “corroboration threshold” – the minimum number of independent, high-confidence sources corroborating the same conclusion about your brand before the AI commits to including it consistently.
Below the threshold, the AI hedges. It says “claims to be” instead of “is,” it includes a brand in some outputs but not others, and the reason isn’t randomness but insufficient confidence.
The brand sits in the low-confidence zone, where inconsistency is the predictable outcome. Above the threshold, the AI asserts – stating relevance as fact, including the brand consistently, operating with the kind of certainty that produces City of Hope’s 97%.
My data across 73 million brand profiles places this threshold at approximately 2-3 independent, high-confidence sources corroborating the same claim as the entity home. That number is deceptively small because “high-confidence” is doing the heavy lifting – these are sources the algorithm already trusts deeply, including Wikipedia, industry databases, and authoritative media.
Without those high-authority anchors, the threshold rises considerably because more sources are needed and each carries less individual weight. The threshold isn’t a one-time gate. Once crossed, the confidence compounds with every subsequent corroboration, which is why brands that cross it early pull further ahead over time, while brands that haven’t crossed it yet face an ever-widening gap.
Not identical wording, but equivalent conviction. The entity home states, “X is the leading authority on Y,” two or three independent, authoritative third-party sources confirm it with their own framing, and the AI encodes it as fact.
This fact is visible in my data, and it explains exactly why Fishkin’s experiment produced the results it did. In narrow categories like LA Volvo dealerships or SaaS cloud computing providers – where few brands exist and corroboration is dense – AI responses showed higher pairwise correlation.
In broad categories like science fiction novels – where thousands of options exist and corroboration is thin – responses were wildly diverse. The corroboration threshold aligns with Fishkin’s findings.
Authoritas proved that fabricated entities can’t fool AI confidence systems
Authoritas published a study in December 2025 – “Can you fake it till you make it in the age of AI?” – that tested this directly, and the results confirm that Cascading Confidence isn’t just theory. Where Fishkin’s research shows the output problem – inconsistent lists – Authoritas shows the input side.
Authoritas investigated a real-world case where a UK company created 11 entirely fictional “experts” – made-up names, AI-generated headshots, faked credentials. They seeded these personas into more than 600 press articles across UK media, and the question was straightforward: Would AI models treat these fake entities as real experts?
The answer was absolute: Across nine AI models and 55 topic-based questions – “Who are the UK’s leading experts in X?” – zero fake experts appeared in any recommendation. Six hundred press articles, and not a single AI recommendation. That might seem to contradict a threshold of 2-3 sources, but it confirms it.
The threshold requires independent, high-confidence sources, and 600 press articles from a single seeding campaign are neither independent – they trace to the same origin – nor high-confidence – press mentions sit in the document graph only.
The AI models looked past the surface-level coverage and found no deep entity signals – no entity home, no knowledge graph presence, no conference history, no professional registration, no corroboration from the kind of authoritative sources that actually move the needle.
The fake personas had volume, they had mentions, but what they lacked was cascading confidence – the accumulated trust that builds through every stage of the pipeline. Volume without confidence means inconsistent appearance at best, while confidence without volume still produces recommendations.
AI evaluates confidence — it doesn’t count mentions. Confidence requires multi-source, multi-graph corroboration that fabricated entities fundamentally can’t build.
AI citability concentration increased 293% in under two months
Authoritas used the weighted citability score, or WCS, a metric that measures how much AI engines trust and cite entities, calculated across ChatGPT, Gemini, and Perplexity using cross-context questions.
I have no influence over their data collection or their results. Fishkin’s methodology and Authoritas’ aren’t identical. Fishkin pinged the same query repeatedly to measure variance, while Authoritas tracks varied queries on the same topic. That said, the directional finding is consistent.
Their dataset includes 143 recognized digital marketing experts, with full snapshots from the original study by Laurence O’Toole and Authoritas in December 2025 and their latest measurement on Feb. 2. The pattern across the entire dataset tells a story that goes far beyond individual scores.
The top 10 experts captured 30.9% of all citability in December. By February, they captured 59.5% – a 92% increase in concentration in under two months.
The HHI, or Herfindahl-Hirschman Index, the standard measure of market concentration, rose from 0.026 to 0.104 – a 293% increase in concentration. This happened while the total expert pool widened from 123 to 143 tracked entities.
More experts are being cited, the field is getting bigger, and the top is pulling away faster. Dominance is compounding while the long tail grows.
This is cascading confidence at population scale. The experts who actively manage their digital footprint – clean entity home, corroborated claims, consistent narrative across the algorithmic trinity – aren’t just maintaining their position, they’re accelerating away from everyone else.
Each cycle of AI training and retrieval reinforces their advantage – confident entities generate confident AI outputs, which build user trust, which generate positive engagement signals, which further reinforce the AI’s confidence. It’s a flywheel, and once it’s spinning, it becomes very, very hard for competitors to catch up.
At the individual level, the data confirms the mechanism. I lead the dataset at a WCS of 23.50, up from 21.48 in December, a gain of +2.02. That’s not because I’m more famous than everyone else on the list.
It’s because we’ve been systematically building my cascading confidence for years – clean entity home, corroborated claims across the algorithmic trinity, consistent narrative, structured data, deep knowledge graph presence.
I’m the primary test case because I’m in control of all my variables – I have a huge head start. In a future article, I’ll dig into the details of the scores and why the experts have the scores they do.
The pattern across my client base mirrors the population data. Brands that systematically clean their digital footprint, anchor entity confidence through the entity home, and build corroboration across the algorithmic trinity don’t just appear in AI recommendations.
They appear consistently, their advantage compounds over time, and they exit the low-confidence zone to enter the self-reinforcing recommendation set.
AI retrieves from three knowledge representations simultaneously, not one
AI systems pull from what I call the Three Graphs model – the algorithmic trinity – and understanding this explains why some brands achieve near-universal visibility while others appear sporadically.
The entity graph, or knowledge graph, contains explicit entities with binary verified edges and low fuzziness – either a brand is in, or it’s not.
The document graph, or search engine index, contains annotated URLs with scored and ranked edges and medium fuzziness.
The concept graph, or LLM parametric knowledge, contains learned associations with high fuzziness, and this is where the inconsistency Fishkin documented comes from.
When retrieval systems combine results from multiple sources – and they do, using mechanisms analogous to reciprocal rank fusion – entities present across all three graphs receive a disproportionate boost.
The effect is multiplicative, not additive. A brand that has a strong presence in the knowledge graph and the document index and the concept space gets chosen far more reliably than a brand present in only one.
This explains a pattern Fishkin noticed but didn’t have the framework to interpret – why visibility percentages clustered differently across categories. The brands with near-universal visibility aren’t just “more famous,” they have dense, corroborated presence across all three knowledge representations. The brands in the inconsistent pool are typically present in only one or two.
The Authoritas fake expert study confirms this from the negative side. The fake personas existed only in the document graph, press articles, with zero entity graph presence and negligible concept graph encoding. One graph out of three, and the AI treated them accordingly.
What I tell every brand after reading Fishkin’s data
Fishkin’s recommendations were cautious – visibility percentage is a reasonable metric, ranking position isn’t, and brands should demand transparent methodology from tracking vendors. All fair, but that’s analyst advice. What follows is practitioner advice, based on doing this work in production.
Stop optimizing outputs and start optimizing inputs
The entire AI tracking industry is fixated on measuring what AI says about you, which is like checking your blood pressure without treating the underlying condition. Measure if it helps, but the work is in building confidence at every stage of the pipeline, and that’s where I focus my clients’ attention from day one.
Start at the entity home
My experience clearly demonstrates that this single intervention produces the fastest measurable results. Your entity home is the canonical web property that should anchor your entity in every knowledge graph and every AI model. If it’s ambiguous, hedging, or contradictory with what third-party sources say about you, it is actively training AI to be uncertain.
I’ve seen aligning the entity home with third-party corroboration produce measurable changes in bottom-of-funnel AI citation behavior within weeks, and it remains the highest ROI intervention I know.
Cross the corroboration threshold for the critical claims
I ask every client to identify the claims that matter most:
Who you are.
What you do.
Why you’re credible.
Then, I work with them to ensure each claim is corroborated by at least 2-3 independent, high-authority sources. Not just mentioned, but confirmed with conviction.
This is what flips AI from “sometimes includes” to “reliably includes,” and I’ve seen it happen often enough to know the threshold is real.
Knowledge graph presence (structured data, entity recognition), document graph presence (indexed, well-annotated content on authoritative sites), and concept graph presence (consistent narrative across the corpus AI trains on) all need attention.
The Authoritas study showed exactly what happens when a brand exists in only one – the AI treats it accordingly.
Work the pipeline from Gate 1, not Gate 9
Most SEO and GEO advice operates at the display stage, optimizing what AI shows. But if your content is losing confidence at discovery, selection, rendering, or annotation, it will never reach display consistently enough to matter.
I’ve watched brands spend months on display-stage optimization that produced nothing because the real bottleneck was three stages earlier, and I always start my diagnostic at the beginning of the pipeline, not the end.
Maintain it because the gap is widening
The WCS data across 143 tracked experts shows that AI citability concentration increased 293% in under two months. The experts who maintain their digital footprint are pulling away from everyone else at an accelerating rate.
Starting now still means starting early, but waiting means competing against entities whose advantage compounds every cycle. This isn’t a one-time project. It’s an ongoing discipline, and the returns compound with every iteration.
See the complete picture of your search visibility.
Track, optimize, and win in Google and AI search from one platform.
Start Free Trial
Get started with
Fishkin proved the problem exists. The solution has been in production for a decade.
Fishkin’s research is a gift to the industry. He killed the myth of AI ranking position with data, he validated that visibility percentage, while imperfect, correlates with something real, and he raised the right questions about methodology that the AI tracking vendors should have been answering all along.
But tracking AI visibility without understanding why visibility varies is like tracking a stock price without understanding the business. The price is a signal, and the business is the thing.
AI recommendations are inconsistent when AI systems lack confidence in a brand. They become consistent when that confidence is built deliberately, through:
The entity home.
Corroborated claims that cross the corroboration threshold.
Multi-graph presence.
Every stage of the pipeline that processes your content before AI ever generates a response.
This isn’t speculation, and the evidence comes from every direction.
The process behind this approach has been under development since 2015 and is formalized in a peer-review-track academic paper. Several related patent applications have been filed in France, covering entity data structuring, prompt assembly, multi-platform coherence measurement, algorithmic barrier construction, and cascading confidence optimization.
The dataset supporting the work spans 25 billion data points across 73 million brand profiles. In tracked populations, shifts in AI citability have been observed — including cases where the top 10 experts increased their share from 31% to 60% in under two months while the overall field expanded. Independent research from Authoritas reports findings that align with this mechanism.
Fishkin proved the problem exists. My focus over the past decade has been on implementing and refining practical responses to it.
This is the first article in a series. The second piece, “What the AI expert rankings actually tell us: 8 archetypes of AI visibility,” examines how the pipeline’s effects manifest across 57 tracked experts. The third, “The ten gates between your content and an AI recommendation,” opens the DSCRI-ARGDW pipeline itself.
https://i0.wp.com/dubadosolutions.com/wp-content/uploads/2021/12/web-design-creative-services.jpg?fit=1500%2C600&ssl=16001500http://dubadosolutions.com/wp-content/uploads/2017/05/dubado-logo-1.png2026-02-17 14:00:002026-02-17 14:00:00Rand Fishkin proved AI recommendations are inconsistent – here’s why and how to fix it