Most AI crawlers — GPTBot, ClaudeBot, PerplexityBot and others — fetch your raw HTML and do not execute JavaScript. If your content, titles or product details only appear after scripts run in a browser, AI engines see an empty shell and cannot cite you. The fix is server-side rendering or edge-injected HTML, so your facts live in the initial response.
The five-minute test that ruins someone’s day
Open a terminal and fetch your homepage the way a crawler does — no browser, no scripts:
curl -s https://yoursite.com | grep -i "your product name"
If your core copy comes back, breathe. If what comes back is a skeleton — a div called "root", a wall of script tags, none of your actual words — then every crawler that doesn't execute JavaScript is reading that skeleton. For a growing share of the modern web, that's exactly what happens. We run this test inside every audit, and "the site is invisible in raw HTML" is the single most common critical finding.
There's a second, sneakier version of the failure: sites whose auth or consent middleware redirect-loops any visitor without cookies. The crawler never even gets the skeleton — it gets bounced around a handshake until it gives up. Your site works perfectly in every browser, and no machine can read it at all.
Why Google forgives you and AI engines don’t
Google spent fifteen years and a fortune building a rendering pipeline: it fetches your HTML, queues the page, and eventually executes the JavaScript in a headless browser to see what a human sees. Slow, expensive — but it mostly works. This bred a generation of sites that lean entirely on client-side rendering and got away with it.
AI crawlers didn't inherit that machinery. They crawl at enormous scale on tight budgets, and rendering JavaScript multiplies cost by orders of magnitude. So the major AI crawlers read raw HTML, take what's there, and move on. No queue, no second pass, no mercy.
The result is a quiet inversion: a site can be in perfect standing with Google and completely absent from the data that AI engines learn from and cite. You won't see it in any dashboard. You'll just never be the answer.
Who’s most at risk
- JavaScript-framework sites without server rendering. Single-page apps where the HTML response is an empty shell and everything paints client-side.
- Sites behind aggressive middleware. Auth handshakes, bot walls and consent gates that bounce cookie-less visitors — including every AI crawler — before content is served.
- Pages that lazy-load their substance. Reviews, pricing tables and product details fetched after page load are invisible in the initial response.
- Widget-dependent content. If your testimonials, menus or booking info live inside a third-party embed, they're often not in your HTML at all.
Builders, by contrast, vary: classic server-rendered platforms generally pass; modern frameworks pass if server-side rendering or static generation is actually configured — which is precisely the setting teams disable by accident.
The fix: put your facts in the first response
There are three levels of repair, in ascending order of effort:
- Turn on server rendering where you already have it. Most modern frameworks support SSR or static export. The content exists; it just needs to be rendered before shipping instead of after.
- Inject critical HTML at the edge. When you can't rebuild the site, a CDN-level rewrite can insert the essential facts — titles, descriptions, product data, schema — directly into the HTML response as it passes through. This is how we fix client sites without touching their codebase: the fix lives in the raw HTML permanently, visible to Google and every AI crawler. (Pixel-based SEO tools can't do this — they're JavaScript too, invisible to the same crawlers.)
- Whitelist the crawlers your middleware is bouncing. If auth or consent layers intercept document requests, exempt known crawler user agents — or better, serve the public content to everyone and gate only what's actually private.
Verify it like an engine would
After any fix, test like the machines do, not like a human:
- Fetch with curl and confirm your money copy is in the response body.
- Fetch with the actual AI crawler user-agent strings — some firewalls treat them differently than browsers.
- Check that titles, meta descriptions, headings and JSON-LD schema are present in raw HTML, not injected later by scripts.
- Re-test monthly. Deploys, new middleware and "performance optimizations" reintroduce this bug constantly.
The bar is honestly low: serve your words in your HTML. It's just that almost nobody checks — which makes it one of the highest-leverage fixes in modern marketing.
Questions people ask
Generally no. Major AI crawlers fetch the raw HTML response and do not run client-side scripts the way browsers do. Content that only appears after JavaScript executes is effectively invisible to them, even when Google — which does render JavaScript — sees it fine.
Fetch your pages without a browser (for example with curl) and check whether your actual content appears in the response. If you see an empty application shell or script tags instead of your copy, AI crawlers see the same nothing. Also confirm crawler requests are not being redirect-looped by auth or consent middleware.
It is technical SEO applied at the CDN level: as your HTML passes through the edge network, the missing elements — titles, descriptions, structured data, key facts — are inserted server-side into the response itself. Because the fix lives in raw HTML, it is visible to Google and to AI crawlers that never execute JavaScript.
Want this done for you?
Everything in this post is what our engine does daily for the brands we run. If reading it felt like work — that’s what we’re for.
Get a free AI Visibility Audit →