Schema.org for GenAI: What Actually Gets Measured

Schema.org has been around since 2011. AI engines are not asking for new schema; they are asking for the schema that already exists, in the right shape, on the right pages. This is the technical breakdown of which seven types matter, the @graph pattern that connects them, and what we actually observed across six real German brand audits.

Dr. Florian Steiner

Claude AI Consultant & Trainer

7 min read
Schema.org for GenAI: What Actually Gets Measured

Schema.org for GenAI: What Actually Gets Measured

Summary. Schema.org has existed since 2011. The mistake most teams make in 2026 is treating AI-engine optimisation as if it needs a new vocabulary. It does not. It needs the existing schema vocabulary, applied to the seven types that matter, connected with the @graph pattern, served as JSON-LD with the right @id references. Across the six brand audits we discussed earlier, the gap between sites that AI engines can parse and sites that AI engines have to guess at was almost entirely explained by whether those seven types were present and correctly linked. This post is the technical version of that finding — what to ship, how to validate it, and which rabbit holes are not worth your time.

This is the third in a short series on AI visibility. The first post walked through why most Mittelstand sites are invisible to ChatGPT. The second post sorted the tool market into honest categories. This one is the technical playbook for the single highest-leverage fix on most sites.

The seven types that move the needle

Schema.org has more than 800 types. Reading the full hierarchy is a waste of time. For AI-engine readability, the working set is small: Organization, WebSite, WebPage, Article, Product or Service, FAQPage, BreadcrumbList. If you have a local business, add LocalBusiness with a geo and address block and you are done.

The pattern that almost no agency-built site gets right is the @graph envelope. Most sites ship one JSON-LD block per page that describes the page itself, and that is it. The @graph pattern lets you describe the site, the page, the breadcrumb and the primary entity in one structured payload, with @id references that tie them together. AI engines that parse JSON-LD then walk the graph instead of doing entity-extraction from prose. Walking a graph is cheap and unambiguous; extracting entities from prose is expensive and lossy. Models prefer the cheap, unambiguous path. They cite what they can parse with confidence.

Here is the minimum viable shape for a B2B Mittelstand homepage. Copy, adapt, ship.

{
  "@context": "https://schema.org",
  "@graph": [
    {
      "@type": "Organization",
      "@id": "https://example.com/#organization",
      "name": "Example GmbH",
      "url": "https://example.com",
      "logo": "https://example.com/logo.png",
      "sameAs": [
        "https://www.linkedin.com/company/example",
        "https://www.wikidata.org/wiki/Q123456"
      ],
      "description": "Industrial automation for the Mittelstand."
    },
    {
      "@type": "WebSite",
      "@id": "https://example.com/#website",
      "url": "https://example.com",
      "name": "Example",
      "publisher": { "@id": "https://example.com/#organization" }
    },
    {
      "@type": "WebPage",
      "@id": "https://example.com/#webpage",
      "url": "https://example.com",
      "name": "Industrial Automation — Example GmbH",
      "isPartOf": { "@id": "https://example.com/#website" },
      "about": { "@id": "https://example.com/#organization" }
    }
  ]
}

The @id references are the load-bearing wall. Without them you have three disconnected schemas. With them you have one graph that says "this page is about this organisation, which publishes this website". An AI engine that wants to summarise the page now has a typed entity to point at instead of a prose blob to interpret.

What we actually saw in the six audits

Carwow had Vehicle and Product schema almost everywhere because the marketplace business demands it — without product structured data, the inventory cannot appear in Google Shopping or rich results, so the company shipped schema early. The other five brands hit the same wall: technically valid HTML, modern responsive design, and almost no JSON-LD beyond the implicit WebSite shape that some CMS plugins emit by default. None of them had an Organization block with sameAs references to Wikidata or LinkedIn. None had BreadcrumbList. The HDBW had EducationalOrganization references inside HTML attributes but not in a JSON-LD block, which is the worst of both worlds — visible to a human reader who inspects the source, invisible to a machine reader expecting the standard format.

The fix in every case was the same five-day workstream: add a centrally-injected @graph block to the layout template, populate it from the existing CMS fields, add BreadcrumbList to inner pages, add FAQPage to the customer-questions section where one existed. None of these sites needed new content. They needed the content they already had to be rendered in a parseable shape. The schema score on a re-scan eight weeks later moved from 11 % to 70-80 % on the brands that shipped the fix.

The other lesson from the audits is that schema markup decays. Carwow had clean product schema and then a CMS migration two years ago that broke half of it. The team is good. The system is good. Nobody had a quarterly check on whether the schema was still being emitted correctly, and so the regression compounded silently. The remediation for that is not a heavier tool; it is one calendar event every quarter that says "validate schema on the top ten pages". The cheapest measurement habit beats the most expensive one-shot fix.

How to validate

Three tools, in this order. Google's Rich Results Test for the per-page sanity check — confirms that the JSON-LD parses and that the type is one Google recognises. Schema.org's validator for the strict spec-conformance check — catches subtle errors like a missing @context or a wrong-cased type name. Then a crawler with structured-data extraction (we use the @type extraction in our own AI Visibility Engine; Screaming Frog also exposes a JSON-LD column) to see coverage across the site, not just on one URL. Most teams stop after the per-page check, miss that the staging template ships schema but the production template does not, and only catch it weeks later when a re-audit reveals the gap.

The validation step that matters more than any tool is the round-trip test against an actual AI engine. Open ChatGPT, Claude or Perplexity in a private window, ask it to "summarise what [company name] does", and read the answer. If the model names your competitor's category but cites your homepage as the source, the schema is doing work. If the model summarises your category in three sentences without mentioning your brand, the schema is invisible. This is unambiguous, free, and reproducible — and it is the test that closes the loop the SaaS dashboards charge for.

Three rabbit holes that are not worth your time

JSON-LD generators that emit twenty types per page. Some tools will generate schemas for Person, ContactPoint, Place, OpeningHoursSpecification and a dozen others on every page. More is not better. AI engines prefer a clean graph with the seven types above to a noisy graph with twenty types and broken @id references. The signal-to-noise ratio matters more than the absolute volume.

Custom schema types under your own namespace. Schema.org allows extension via custom types. AI engines, in practice, ignore anything outside the public vocabulary. Stick to the standard types. If your business genuinely needs a type that does not exist, write a Schema.org extension proposal — but ship the standard types in the meantime so the model has something to cite.

Microdata or RDFa instead of JSON-LD. All three formats are still valid. JSON-LD is what every major AI engine reads with the best accuracy. Microdata and RDFa are legacy formats that some CMSes default to. Migrating off them is a one-day job and a permanent improvement.

The next eight weeks

The work, ordered by leverage: add an @graph block to the site-wide layout template with Organization, WebSite, WebPage (one day). Add BreadcrumbList to product, service or insight pages (half a day). Add LocalBusiness if applicable, with geo and address (half a day). Add FAQPage to any page with a Q&A section (half a day). Run the validators. Wait eight weeks. Re-scan. If the schema score has moved and the round-trip ChatGPT test starts citing your homepage, ship the same pattern to the rest of the site. If it has not, the issue is not schema; it is one of the other two structural reasons from the first post — crawler configuration or source absence. Diagnose accordingly.

The audit run we did on six brands is reproducible against your own domain at produktentdecker.com/ai-visibility. The Schema section of the report breaks down exactly which of the seven types are present, which are missing, and which pages emit broken JSON-LD. That report is the input. The five-day workstream above is the output. The re-scan eight weeks later is the proof.

Dr. Florian Steiner

Claude AI Consultant, Trainer and Speaker. Anthropic Community Ambassador Munich. I help product teams adopt Claude Code productively.

Book a call →