How ChatGPT's New Marketplace Actually Works
#116: Apps in ChatGPT (Turning Conversations Into Interactive Workflows)
Share this post & I'll send you some rewards for the referrals.
You’ve probably asked ChatGPT to help you book a flight or find a restaurant at some point.
It gives you a helpful list: airline breakdowns, price ranges, maybe some tips on timing. Then, you leave ChatGPT, open a travel website, and start your search from scratch.
The conversation helped you think, but it didn’t actually do anything…
Now imagine you say, “Find me a flight to Tokyo under $500,” and an interactive widget appears inside the chat. You browse options, compare prices, select your seat, and book, all without leaving the conversation.
That’s ChatGPT Apps.
Instead of responding with text and sending you elsewhere, ChatGPT can now surface interactive widgets from third-party apps directly in your conversation. OpenAI opened the app store for submissions on 17 December 2025, and with 800 million weekly active users, this may be the next great distribution wave for developers.
But how does it actually work?
How does a message like “find me a hotel” turn into an interactive booking widget?
In this newsletter, I’ll break down the architecture behind ChatGPT Apps, walk through how tool calls1 and widgets2 work together, and cover everything you need to know to build your own.
We’ll follow one example throughout: building a restaurant finder.
Onward.
Blitzy: First Autonomous Software Development Platform (Partner)
Blitzy is the first autonomous software development platform with infinite code context, enabling Fortune 500 companies to ship 5x faster.
Large enterprises are adopting Blitzy at the beginning of every sprint. While individual developer tools like co-pilots struggle with context and only autocomplete code, Blitzy is engineered specifically for enterprise-scale codebases: clearing years of tech debt and executing large-scale refactors or new feature additions in weeks, not quarters.
The Blitzy Edge:
Infinite Code Context - Ingest millions of lines of code, mapping every line-level dependency.
Agent Orchestration - 3,000+ cooperative agents autonomously plan, build, and validate production-ready code using spec and test-driven development at the speed of compute.
End Result - Over 80% of the work delivered autonomously, 500% faster.
The future of autonomous software development is here:
I want to introduce Colin Matthews as a guest author.
He’s the founder of Chippy.build, an enterprise prototyping and observability platform for ChatGPT Apps. He’s also a popular Maven instructor and former healthtech product manager.
Find him on:
How Is This Different From Plugins?
If you’ve been following OpenAI for a while, you might think, “Wait, didn’t they already try this with Plugins?”
ChatGPT Plugins (launched March 2023, deprecated April 2024) were essentially API wrappers. You’d describe your API endpoints, and ChatGPT would call them and return text. The problem was that everything came back as text that ChatGPT had to interpret and re-present to the user3. There was no native UI, no interactivity, and no way to create rich experiences.
GPTs are similar.
They allow users to upload content and shape their conversations, but are difficult to share broadly and don’t give brands control over how they’ll appear in chats.
ChatGPT Apps take a fundamentally different approach.
Instead of just calling APIs and returning text, Apps can render full interactive widgets directly in the conversation. You can show a map, display a booking form, or let users interact with a spreadsheet.
Here’s how the three approaches compare:
The key insight is that Apps aren’t just “GPTs with UI”; they’re a different architecture entirely.
The widget runs in a sandboxed iframe with its own state, its own event handling, and direct communication with your backend4. ChatGPT orchestrates when to show your app, but once it’s rendered, users interact with your UI directly.
What Is a ChatGPT App?
Here’s the simplest way to think about it: a chatbot talks, but an app does.
Ask ChatGPT to book a restaurant, and normally, it explains how. It lists options, gives links, and tells you what to search for. Helpful, but nothing actually changes in the world.
Now ask ChatGPT with a restaurant app installed, and something different happens.
An interactive widget appears right in the chat. You can see real availability, browse the menu, pick a time, and confirm your reservation. When you’re done, you have an actual booking, not just information about how to make one.
Here’s exactly what that would look like:
The launch partners tell you a lot about where OpenAI sees this going: Booking.com, Canva, Coursera, Expedia, Figma, Spotify, and Zillow.
Users install apps through: Settings > Apps & Connectors.
Once installed, ChatGPT can automatically surface your app based on conversation context, or users can tag it manually.
What Can You Build?
Before we get into the technical details, it’s worth thinking about what kinds of apps work well in this model.
ChatGPT apps should be quick and conversational. The goal is not to export your entire web app to ChatGPT. OpenAI says that apps should show, do, or know something that ChatGPT doesn’t. For example, ChatGPT can’t deliver groceries to your house, but the Instacart app with ChatGPT can.
Now let’s look at how this all works under the hood.
Every ChatGPT App is built from three parts that work together5:
The first is the MCP Server, which is your backend. It tells ChatGPT what your app can do by defining “tools” (functions the model can call) and “resources” (UI templates to render).
MCP stands for Model Context Protocol, an open standard that Anthropic created and OpenAI has now adopted.
The second component is the Widget, which is your frontend. It’s HTML that runs on ChatGPT in a sandboxed iframe.
The third is ChatGPT itself, which acts as the host. It decides when to call your tools, renders your widgets, and manages the conversation state.
Let’s look at each component in detail:
The MCP Server
The MCP Server is your backend—it’s where you define what your app can actually do.
It exposes two main things to ChatGPT: tools and resources.
Tools are actions that ChatGPT can call.
Each tool has a name (like search_restaurants), a description that helps ChatGPT decide when to use it, an input schema defining which parameters are required, and an output template specifying which UI resource to render the results in.
Resources are UI templates that ChatGPT renders when tools return data.
They’re served as HTML bundles (typically React apps compiled to a single file) and rendered inside a sandboxed iframe.
Here’s what a tool definition might look like for our restaurant finder:
Tool: search_restaurants
Description: “Find restaurants by location and cuisine type”
Inputs: location (required), cuisine (optional), price_range (optional)
Output: → ui://widget/restaurant-list.htmlThe description is especially important.
ChatGPT uses it to decide whether your tool is relevant to the user’s request. If your description is vague—like “do restaurant stuff”—ChatGPT won’t know when to invoke it. Be specific about what the tool does and when it should be used.
Let’s trace what happens when you tell ChatGPT, “Find me Italian restaurants in Brooklyn”.
Understanding this flow is key to building apps that work well.
First, ChatGPT checks the available tools from all installed apps.
It sees your search_restaurants tool with the description “Search for restaurants by location and cuisine type”. Based on the user’s message, it decides this tool is relevant.
Next, ChatGPT constructs a tool call:
{
“name”: “search_restaurants”,
“parameters”: {
“location”: “Brooklyn”,
“cuisine”: “Italian”
}
}Your MCP server receives this request, validates the parameters, queries your database (or an external API such as Yelp), and returns the results.
This includes a reference to your UI template if you have one.
return {
content: [
{ type: “text”, text: “Found 12 Italian restaurants in Brooklyn” }
],
structuredContent: {
restaurants: results
},
_meta: {
“openai/outputTemplate”: “ui://widget/restaurant-list.html”
}
};The
contentfield is for ChatGPT. It’s a text summary, so the model knows what happened and can respond intelligently.The
structuredContentfield is for your widget, containing the raw data your UI will display.And the
outputTemplatein_metatells ChatGPT to fetch and render your widget HTML.
Finally, ChatGPT fetches your HTML bundle, renders it in a sandboxed iframe, and injects your structuredContent via a JavaScript bridge. The user sees an interactive list of restaurants right in their chat.
The Widget
The Widget is your frontend widget.
When your widget loads, ChatGPT injects an window.openai object6 that gives you access to data and methods for interacting with the system.
Here’s what you can do with window.openai:
Read data:
toolOutput— The data returned from your tool calltoolInput— The parameters that were passed to your toolwidgetState— Any persisted UI statetheme— Whether the user is in light or dark modelocale— The user’s locale (e.g., “en-US”)
Take actions:
callTool()— Call another MCP tool directlysetWidgetState()— Persist UI statesendFollowUpMessage()— Send a message back to ChatGPTrequestDisplayMode()— Switch between inline, full screen, or picture-in-pictureopenExternal()— Open an external URL
This is what makes Apps fundamentally different from Plugins.
Your widget can trigger new tool calls, save state, and continue the conversation. Users interact directly with your UI, not through ChatGPT’s text interface. When they click on a restaurant to see more details, you have two options for handling that interaction.
The first option is a direct tool call.
Your widget can call tools directly using window.openai.callTool(), which bypasses the model entirely. Your widget requests restaurant details, your server returns the data, and the widget updates immediately. This is fast and efficient for straightforward data fetching.
The second option is a follow-up message.
Your widget can send a follow-up window.openai.sendFollowUpMessage(), which puts the model back in the loop. ChatGPT sees the message, decides what to do next, and might call additional tools or ask clarifying questions.
// Direct tool call (model not involved)
const details = await window.openai.callTool(
‘get_restaurant_details’,
{ id: restaurant.id }
);
// Follow-up message (model decides next step)
await window.openai.sendFollowUpMessage({
prompt: `I want to book ${restaurant.name} for 4 people`
});This creates a continuous loop that makes the whole experience feel seamless:
The user speaks, ChatGPT calls a tool, the widget renders, the user interacts with the widget, the widget either calls another tool or sends a message, and the cycle continues.
Widgets can appear in three different formats depending on what makes sense for your app.
Inline widgets embed directly in the conversation flow. This is the default for all apps, and it works well for things like listings, search results, or quick selections.
Fullscreen mode takes over the entire viewport. This is better for maps, dashboards, or complex workflows where users need more space to work.
Picture-in-picture mode floats the widget while the user continues chatting. This is great for music players, timers, or other persistent tools that the user might want to keep visible while doing other things.
window.openai.requestDisplayMode({ mode: “fullscreen” });
window.openai.requestDisplayMode({ mode: “pip” });One constraint to keep in mind: you can only show one widget per message.
If someone asks ChatGPT to “book a restaurant and order an Uber,” it will show one app at a time. Users work through these requests sequentially.
Security
Now that we understand how the pieces fit together, let’s talk about security.
With four parties involved (ChatGPT, your MCP server, your widget, and external APIs), it’s important to understand where the trust boundaries lie.
ChatGPT ↔ MCP Server:
ChatGPT calls your MCP server over HTTPS. Your server should validate that requests are actually coming from ChatGPT. ChatGPT trusts your server to return valid tool responses and UI resources.
ChatGPT ↔ Widget:
The widget runs in a heavily sandboxed iframe. ChatGPT injects the window.openai bridge, but the widget cannot access ChatGPT’s DOM, cookies, or any data from other apps. This is the strictest boundary.
Widget ↔ External APIs:
Your widget can only make network requests to domains you’ve explicitly declared in your Content Security Policy7 (CSP) configuration. All other requests are blocked.
App ↔ App Isolation:
Each app’s widget runs in its own isolated sandbox. Apps cannot access each other’s data, state, or DOM. Even if a malicious app tried to extract data from another app, the browser’s same-origin policy prevents it.
Your widget runs under strict restrictions. It cannot:
Access cookies (it’s on a sandbox origin, not your domain)
Use localStorage or sessionStorage
Access the parent DOM (ChatGPT’s interface)
Submit forms directly (use
callTool()instead)Open popups (use
openExternal()instead)Make network requests except for declared CSP domains
It can:
Execute JavaScript normally
Fetch from CSP-allowed domains
Communicate through the
window.openaibridgeStore UI state via
setWidgetState()
You declare your allowed connections in your tool’s _meta:
_meta: {
“openai/widgetCSP”: {
“connect_domains”: [”api.yourservice.com”],
“resource_domains”: [”cdn.yourservice.com”],
“frame_domains”: [] // Nested iframes trigger stricter review
}
}The key principle: external API calls should go through your MCP server, not the widget.
Let your widget handle the UI and let your server handle the business logic and sensitive operations.
Authentication
With the security model in mind, let’s look at how authentication8 works:
There are two separate concerns: authenticating that requests to your MCP server are coming from ChatGPT, and authenticating users to your service.
For authenticating users to your service (so they can access their own data), ChatGPT acts as an OAuth 2.1 client while your MCP server acts as the resource server.
PKCE and Dynamic Client Registration are required. ChatGPT registers with your identity provider at runtime, generating a unique client_id for each connection9.
Discovery endpoints are also required.
Your MCP server needs to expose /.well-known/oauth-protected-resource (RFC 9728), and your IdP needs to expose either /.well-known/oauth-authorization-server (RFC 8414) or /.well-known/openid-configuration.
One important security detail: your widget never sees tokens directly.
All authentication flows go through the ChatGPT host, which keeps credentials out of the browser sandbox. Your MCP server validates tokens on every incoming request: fetch JWKS signing keys, verify the signature and issuer, check expiry and audience claims, and validate scopes.
If validation fails, return a 401 with a WWW-Authenticate header.
User sessions are scoped to the conversation.
If a user authenticates with your app in one conversation, they’ll need to re-authenticate in a new conversation. This is by design as it prevents credentials from leaking across contexts and gives users clear control over when they’re authenticated.
State Management
State in ChatGPT Apps exists in three tiers, each with different owners, lifetimes, and purposes. Understanding which tier to use for what is important for building apps that feel reliable.
Widget state is stored using setWidgetState() and is scoped to a specific message and widget pair. Use it for UI preferences, such as which tab is selected or how results are sorted.
One important detail: this state is visible to the model, so keep it under 4k tokens and never store sensitive data here.
// Good: minimal UI state for preferences
await window.openai.setWidgetState({
selectedTab: “reviews”,
sortOrder: “rating”
});
// Bad: don’t store sensitive or large data here
await window.openai.setWidgetState({
userEmail: “...”, // Never store credentials
allResults: [...] // Too large
});Conversation context is maintained by ChatGPT.
The model remembers what’s happened in the conversation and uses that context when deciding what to do next. You don’t control this directly, but you influence it through the content field in your tool responses.
Backend state is anything you persist in your own database.
This is the only state that survives across conversations, devices, and time. If something matters long term, like a user’s saved preferences, booking history, or account data, store it in your backend.
If the user refreshes the page, the widget state persists (it’s stored by ChatGPT, not in browser storage). If the user switches devices mid-conversation, widget state is lost, but the conversation context remains.
If the user starts a new conversation, everything resets except the backend state.
Running in Production
Once you’ve built your app, you need to actually run it.
This section covers everything from error handling to deployment:
Error Handling
Things will go wrong. Networks fail, APIs time out, servers crash.
The good news is that errors cascade through the system in predictable ways, and ChatGPT has built-in fallbacks for most failure modes.
If your MCP server is unreachable, ChatGPT falls back to text and continues the conversation without your app.
The user might see something like “I wasn’t able to connect to the restaurant app, but I can help you find Italian restaurants in Brooklyn another way. Similarly, if your tool returns an error, ChatGPT explains what went wrong in natural language. Return clear error messages in your content field so ChatGPT can communicate the issue to users.
If an external API fails while your tool is running, try to return partial results or a useful error state rather than failing completely.
If you found 8 restaurants but couldn’t get ratings for 3 of them, return the 8 restaurants with partial data. Users generally prefer some information over none.
One important detail: ChatGPT may automatically retry failed tool calls, so make sure your handlers are idempotent. If a user clicks “book this restaurant” and the network hiccups, you don’t want to create two bookings when the retry succeeds.
If you’re concerned about overwhelming requests, there are three layers of rate limiting to think about:
First is ChatGPT’s loop detection, which prevents runaway tool calls.
If your app gets stuck calling the same tool repeatedly, ChatGPT will eventually stop. But this is a backstop, not a strategy. This means your server needs its own rate limiting. Implement per-user limits on tool calls and track costs for external API usage. If your app calls the Google Maps API on every tool invocation, and a user asks a vague question that triggers dozens of calls, you’ll see that in your bill.
You should also validate inputs and set limits on expensive operations.
Consider what happens if someone asks ChatGPT to “find every restaurant in New York.” That’s potentially millions of results. Your tool should have sensible defaults (e.g., return 20 results, not 20,000) and reject queries that would be prohibitively expensive.
A good pattern is to return early with a clarifying question: “I found over 10,000 Italian restaurants in New York. Can you narrow down the search to a specific neighborhood?”
Finally, we have privacy. Building apps that handle user data comes with responsibilities, and OpenAI’s guidelines are specific about what you can and cannot do.
First, you should collect only the minimum data needed to perform your tool’s function.
Your input schema should be narrowly scoped and clearly linked to the task (no “just in case” fields or broad profile data). The same applies to responses: return only what’s directly relevant to the user’s request. Don’t include diagnostic info, telemetry, session IDs, or logging metadata unless strictly necessary.
Some data types are off-limits entirely. You cannot collect, solicit, or process:
Payment card information (PCI DSS data)
Protected health information (PHI)
Government identifiers (social security numbers, etc.)
Access credentials (API keys, passwords, MFA codes)
For sensitive data categories (as defined by local regulations), you can only collect if it’s strictly necessary, the user has given adequate consent, and you’ve clearly disclosed the collection.
Any tool that posts messages, sends emails, uploads files, or otherwise moves data outside the current boundary must be marked as a write action. This surfaces it to ChatGPT so users can confirm before execution. Misrepresenting a write action as read-only is a common rejection reason.
If you’re building an app that might have users in regions with strong privacy regulations, you’re responsible for compliance.
Your OAuth consent screen should clearly explain what data your app accesses. Users should be able to request copies of their data and request deletion. Remember that ChatGPT Apps involve data flowing through multiple parties (user → ChatGPT → your MCP server → external APIs), so your privacy policy should explain this chain clearly.
Building Your First App
Ready to build something?
There are a couple of good paths depending on how much control you want:
Chippy (chippy.build) is an AI agent trained on the Apps SDK (and built by me!). It can create MCP servers and widgets for you, includes built-in testing, and lets you connect to ChatGPT without dealing with deployment setup. This is probably the fastest way to get something running.
Replit gives you more control. Import the official examples from https://github.com/openai/openai-apps-sdk-examples, then configure: bundle your React widgets with Vite (set BASE_URL to your Replit domain), serve /assets with CORS on port 80, run your MCP server on port 8000 bound to 0.0.0.0, and allow all hosts.
Of course, you can also clone the official examples locally, set up an ngrok tunnel to your MCP endpoint, and connect directly.
In any case, you’ll need to enable Developer Mode in ChatGPT settings to connect. Once enabled, go to Settings > Apps & Connectors > New App, enter your MCP URL, and test by mentioning your app by name in a new conversation.
Putting It All Together
Here’s the mental model of how everything fits together:
Your MCP server defines what your app can do through tools and resources. Your widget handles how it looks and how users interact with it. ChatGPT orchestrates the whole thing—deciding when to invoke your tools and rendering your widgets in the conversation.
The communication loop is what makes it feel seamless: user speaks → ChatGPT calls tool → widget renders → user interacts → cycle continues.
Security is enforced through sandboxing. Your widget can’t access cookies, storage, or arbitrary network requests. Everything flows through declared CSP domains or the window.openai bridge.
State exists at three levels: widget state is per-message, conversation context is per-session, and backend state is persistent in your database.
Errors cascade predictably. ChatGPT falls back to text and explains what went wrong in natural language.
With 800 million weekly active users and the app directory just launched, this is the moment to start building. The companies that figure out how to create genuinely useful apps will capture significant distribution in this new channel.
👋 I’d like to thank Colin for writing this newsletter!
Also, find him on:
He’s also a popular Maven instructor and former healthtech product manager.
I launched Design, Build, Scale (newsletter series exclusive to PAID subscribers).
When you upgrade, you’ll get:
High-level architecture of real-world systems.
Deep dive into how popular real-world systems actually work.
How real-world systems handle scale, reliability, and performance.
10x the results you currently get with 1/10th of your time, energy, and effort.
👉 CLICK HERE TO ACCESS DESIGN, BUILD, SCALE!
If you find this newsletter valuable, share it with a friend, and subscribe if you haven’t already. There are group discounts, gift options, and referral rewards available.
Want to reach 200K+ tech professionals at scale? 📰
If your company wants to reach 200K+ tech professionals, advertise with me.
Thank you for supporting this newsletter.
You are now 200,001+ readers strong, very close to 201k. Let’s try to get 201k readers by 2 February. Consider sharing this post with your friends and get rewards.
Y’all are the best.






















