Designing a GraphQL supergraph your AI agents can safely use

14 min read

Seb Potter

Written by

Seb Potter

Strategist

Agents are blocked by interfaces, not intelligence

You could be forgiven for thinking that the hardest problems in agentic commerce live inside the model. The story we tell ourselves is one of ever larger parameter counts, clever new architectures and a steady rise in benchmark scores.

Look a little closer at what happens inside real organisations and a different picture appears. Most teams can produce a convincing agent demo in a week. Give a model a few hand-picked APIs, wrap it in a polished interface, and it will quite happily talk its way through a plausible shopping journey.

The difficulties begin when you try to connect that same agent to the systems that actually run the business. Product data sits in several places at once. Prices live under their own set of rules. Stock, entitlement, tax, promotions and fulfilment all have their own APIs, their own failure modes and their own interpretations of what a “product” or an “order” really is.

In that environment, an agent does not meet a clean digital twin of your organisation. It meets a collection of historical compromises. If you let it roam freely across those boundaries, you are not only asking it to understand user intent. You are also asking it to reconcile a fragmented domain model, in real time, with no guarantees about what is safe to do.

This article looks at another direction. Instead of wiring agents directly to dozens of microservices or SaaS platforms, you design a single, typed GraphQL supergraph that represents your commerce domain. Human channels and internal tools use this graph first. AI agents become just another client. Orchestration, safety and compliance remain in a deterministic platform layer that you own, where they can be reasoned about, tested and changed in the open.


GraphQL as the AI surface: why the schema matters

Large language models work with text, but not only with the kind that reads well in a sentence. They handle structured formats confidently, as long as the structure is consistent and well defined. This is where GraphQL starts to matter as an interface for agents.

A GraphQL API comes with something most REST ecosystems never quite manage: a complete, machine-readable description of itself. Through introspection, a client can ask the API what types exist, which fields they expose and which arguments they expect. For a model, this is the difference between guessing at a private language and being handed a grammar.

In a typical composable stack, the picture is messier. Commerce, PIM, search, pricing and content systems each describe products and customers in their own way. One API talks about SKUs, another about articles, a third about catalogue items. From a human engineer's point of view, those differences are manageable. For an agent that has to act quickly and safely, they are a source of avoidable confusion.

A supergraph lets you normalise this view. You choose to have a single Product type, a single Price, a single Customer, whether the underlying data comes from one system or several. Relationships between them are expressed as fields in the schema instead of hand-written glue in every client. A Basket contains LineItems. Each LineItem refers to a Product. A Price has a value and a currency. The schema becomes the shared vocabulary for how your organisation talks about itself.

To make this more concrete, imagine that the supergraph exposes part of your commerce domain like this:

type Product {

  id: ID!

  sku: String!

  name: String!

  price: Price!

  availability: Availability!

}

type Price {

  amount: Int!

  currency: String!

}

type Availability {

  inStock: Boolean!

  availableQuantity: Int!

}

type Basket {

  id: ID!

  lines: [LineItem!]!

  total: Price!

}

type LineItem {

  product: Product!

  quantity: Int!

  lineTotal: Price!

}

input AddToBasketInput {

  basketId: ID!

  productId: ID!

  quantity: Int!

}

type Mutation {

  addToBasket(input: AddToBasketInput!): Basket!

}

An agent that wants to help a customer add the cheapest in-stock variant of a product to their basket does not need to guess endpoint names or payload shapes. It can:

  1. Query products that match the user’s description.

  2. Filter on availability.inStock and price.amount.

  3. Call addToBasket with the chosen productId and a sensible quantity.

The fact that AddToBasketInput requires a basketId, a productId and a quantity gives the agent a clear contract. Any attempt to do less fails in a predictable way.

The type system does more than keep front-end developers honest. Non-null fields, enums and input types all act as guard rails. A CreateBasketInput that requires at least one LineItemInput prevents empty baskets. An enum for FulfilmentMethod stops the agent inventing delivery options your logistics cannot support.

The separation between queries and mutations adds another boundary. Queries are questions. Mutations are requests to change something in the world. That distinction gives you a natural place to draw a line between what an agent may explore freely and what requires tighter control.

Once you treat the supergraph as the AI surface, schema design stops being a convenience and becomes a form of governance. The decision to expose or hide a field, to offer or withhold a mutation, is a decision about what any client, human or agent, is allowed to do.


Safety and observability in the graph

When people worry about agents in production, they rarely start with the happy path. The concern is almost always the same: what happens when something goes wrong? A misplaced promotion, an unexpected price, a stock level misread at the wrong moment. In a human interface, these errors are often caught by someone raising an eyebrow. An agent will not hesitate in that way.

GraphQL does not remove this risk, but it gives you useful levers.

On the read side, the main concerns are privacy and performance. You can allow agents to explore product catalogues, stock levels, pricing options and a customer's own data within the boundaries of their authorisation. Field-level rules in the gateway ensure that cost prices or internal notes never appear in responses to external clients. Within those limits, queries are where agents learn how your domain behaves.

On the write side, you move business rules out of prompts and into resolvers. A placeOrder mutation insists that totals are recalculated by the pricing service, that promotions are re-validated and that stock is confirmed before an order moves beyond a draft state. No amount of model creativity bypasses those checks if they are enforced in the platform.

Today, everything in the schema is consumable by any client — there is no built-in distinction between a human session and an agent session. But the building blocks for finer-grained control are already in place. Tools like Hive can manage permissions per endpoint and per client, which means you could express a rule like "this client may call createOrderDraft but not confirmOrder" without changing the schema itself.

In practice, a two-step mutation design makes that kind of boundary easy to introduce when you are ready:

type Mutation {

  createOrderDraft(basketId: ID!): OrderDraft!
  confirmOrder(orderDraftId: ID!): Order!

}


type OrderDraft {

  id: ID!

  basket: Basket!

  total: Price!

  warnings: [String!]!

}

An agent calls createOrderDraft. The gateway and downstream services recalculate totals, re-validate promotions, check stock and delivery options, and attach any warnings a human should see. A separate client — a human-facing UI, governed by a different permission set — calls confirmOrder only after showing the full draft back to the customer.

The schema does not need to know which caller is human and which is an agent. That decision lives in the permission layer, and it can evolve independently as your confidence in agent-driven workflows grows.

Guardrails extend to scale and frequency. Rate limits and quotas set expectations about how much an agent can do in a given window of time. An internal support agent may run many diagnostic queries but create only a limited number of returns without human review. Because all traffic flows through the gateway, these limits apply consistently no matter which framework or model is making the call.

Once agents are live, you need to see what they are doing. A single supergraph gives you a natural vantage point. Every interaction between an agent and your domain passes through the gateway. Each GraphQL document, the variables supplied with it and the caller identity are visible there. You can log them, sample them and feed them into metrics and traces.

If you attach correlation identifiers from the AI layer to each request, you can trace a line from a user's instruction, through the agent's internal steps, to the specific queries and mutations it executed. Field-level usage metrics show which parts of the schema agents depend on and where they struggle. For experimentation, you can direct a small fraction of agent traffic to a staging graph, or to a version of the schema with additional fields, without reworking how the agent calls tools.

The result is not perfect safety. No architecture can promise that. It does provide clearer instruments. Queries, mutations and the rules wrapped around them become a shared language for discussing risk. Observability turns agent behaviour from a mystery into another stream of traffic you can understand and, when necessary, change.


Fitting MCP and agent frameworks around the graph

As soon as you talk about agents and tools, Model Context Protocol and similar standards enter the discussion. They exist for good reasons. If every model, IDE and assistant invented its own way of discovering and calling tools, the result would be chaotic.

MCP addresses the transport problem. It defines how tools describe themselves, how a host discovers them and how a model can call them in a consistent way. That makes it easier to plug an existing capability into many AI-enabled clients without writing a bespoke integration for each one.

What MCP does not do is decide how your domain is structured. It does not know what a Product should contain in your organisation, which combinations of actions are safe, or how pricing and stock relate to each other. Those questions still belong to the systems behind the protocol.

You do not have to choose between them. MCP can handle registration, discovery and invocation. The supergraph can stay focused on describing the domain in a way that makes sense to your organisation and to your agents.

In practice, this often leads to a simple pattern. You expose the supergraph as a single MCP tool. From the model's point of view there is one capability that accepts GraphQL documents as input and returns results from your domain. The gateway resolves those documents against the many services that power your platform.

In practice, the tool description might look something like:

{

  "name": "commerce_graph",

  "description": "Execute GraphQL queries and mutations against the commerce supergraph.",

  "input_schema": {

    "type": "object",

    "properties": {

      "query":   { "type": "string" },

      "variables": { "type": "object" }

    },

    "required": ["query"]

  }

}

A single tool that accepts arbitrary GraphQL is the simplest starting point, but it is not the only option. Some teams are using schema annotations such as @mcpTool to generate a discrete MCP tool for each query or mutation they want to expose. The supergraph still owns the domain model, but the annotation decides exactly which entry points an agent can see. Because every tool maps to a known operation rather than an open-ended query, you can run it through security and governance tooling such as Hive before anything reaches production.

The advantage is that roles stay clear. MCP and the agent framework look after discovery, ranking and orchestration between tools. The supergraph looks after business meaning and rules. You avoid turning every microservice into its own tool with a slightly different story about how the business works, and you keep your investment in domain modelling insulated from churn in the AI ecosystem.


A pragmatic path from today’s platform to agent clients

Most teams do not have the option to pause everything for a grand redesign. Any move towards a supergraph that serves both humans and agents has to align with work you need to do anyway.

A practical path starts with the schema rather than the model. Before you introduce a gateway, decide what you want the graph to describe: products, prices, baskets, orders, customers, availability, content. Writing these down in GraphQL SDL forces a level of clarity that architectural diagrams often gloss over. Disagreements about naming and boundaries surface while they are still cheap to address.

Once you have that shared picture, the gateway becomes the place where you implement it. Initially, it may sit in front of a monolith or a set of tightly coupled services. The value lies in the contract it enforces at the edge. Even if much of the work behind it is still done by legacy systems, clients now interact with a single, consistent surface.

The next step is to let human-facing channels benefit. Moving web and app front ends to the supergraph reduces the tangle of direct integrations and tests the schema under real load. Gaps appear quickly when a checkout depends on them. So do awkward corners in the model that only become visible when designers try to build an experience on top.

At that point, adding agents becomes a smaller step. From the platform's perspective, an agent is another client of the graph. Its tool definition points at the same endpoint. Its permissions are expressed in the same authorisation system. Its behaviour shows up in the same logs. The difference lies in how you bound the agent's abilities and how you review what it does.

Capabilities can expand over time. You might begin with a support-focused agent that only reads from the graph and drafts responses for humans. Later, you might allow it to create draft baskets or propose subscription changes. Only once you have confidence in the patterns and guardrails would you consider letting it commit certain classes of changes on its own.

Throughout this, the AI layer remains replaceable. You can switch from one model provider to another, or from one agent framework to the next, without changing how your core systems present themselves. The investment sits in the graph and the gateway, which continue to serve web, app and partner channels regardless of fashion in AI.


If you control the graph, you control the direction

The pace of change in AI tooling can be distracting. New agents, protocols and frameworks arrive with impressive promises and an expectation that serious organisations will keep up.

If you step back and look at your own domain, a more useful question appears. What is the asset that will matter for the next five or ten years? Not a particular model or framework, but the way you choose to represent what the business can do, and the rules under which it is allowed to happen.

A GraphQL supergraph gives you a way to make that representation explicit. By treating the schema as a formal contract between your systems and any client that wants to act inside them, you own the interaction layer outright. Models, clients and back-end services will all change over time; the data model and the rules you expose through it are what give you control regardless. Human interfaces, partner integrations and AI agents all negotiate with the same description of reality.

This does not make the hard questions around safety, bias or regulation disappear, but it does give you somewhere concrete to work on them. You can decide which capabilities exist in the graph at all. You can express limits, checks and accountability in a place that is visible to everyone who builds on top of it. You can change your mind in public, instead of burying decisions inside prompts and glue code.

For CTOs and senior practitioners, that is a more hopeful way to approach agentic commerce. You do not have to wait for the perfect model or the final agent framework. By investing in the shape of your graph today, you make space for better experiments tomorrow and give your organisation a way to adopt agents on its own terms.


Tags used in this article:
Seb Potter

About the author

Seb Potter

Strategist

Seb has more than 30 years of experience helping clients turn business needs into programmes of technical and organisational transformation.

Related articles