bhart.org - AI, Tech and Personal Blog

GPT-5.5 still struggles to invent polished UI from text alone, but a visual-first workflow works much better: use GPT Image 2 to create a mockup, then give that mockup to Codex as the implementation target.

GPT-5.5 is still not a great UI designer. But it is getting very good at implementing a design once you give it something visual to aim at.

I have been impressed with GPT-5.5 for coding, data work, and agentic tasks, but UI is still one of the places where the model can drift into mush. It knows the vocabulary of good design. It can say "clean hierarchy" and "modern dashboard" all day. But if you ask it to invent a polished interface from text alone, the result can still feel generic, over-carded, weirdly spaced, or just a little off.

The pattern that has worked better for me is simple: stop asking the coding model to imagine the UI from scratch.

Give it a picture.

The mockup is the missing interface between taste and code

The best UI workflow I have found lately is not pure prompting. It is visual direction first, implementation second.

Use GPT Image 2 to create a mockup of the UI you actually want. If you have an old screenshot, use it. If there are other products or design references that capture the shape, density, or feel you are after, use those too. Then ask the image model for a concrete mockup: the page, the layout, the visual rhythm, the empty states, the controls, the proportions.

Once you have that mockup, hand it to Codex as the reference.

In my experience, GPT-5.5 is much better at "implement this screenshot" than "invent a beautiful interface for this product." That is not a knock on the model. It is a useful division of labor. The image model is the sketchpad. Codex is the builder.

Text prompts are too lossy for layout

A lot of UI feedback is annoyingly visual.

A mockup shortens the chain. Instead of asking Codex to infer the intended layout, you give it something it can inspect. It can see that the toolbar is compact, the table is dense, the actions are aligned, the chart has room to breathe, and the nav is restrained. It still has to implement all of that, but the target is less ambiguous.

This is especially useful when you are improving an existing UI. Start with the old screenshot, add a few references, and ask GPT Image 2 to produce a better version. You are not asking for production code yet. You are asking for direction.

Then Codex can do the engineering work: components, CSS, accessibility, responsive states, screenshot comparison, and iteration.

Maquette is pointing at the right shape of workflow

An interesting public example I have seen is Maquette, a Codex plugin by Ixel that is explicitly built around image-guided website workflows.

Its README describes a staged process: generate or edit a visual artifact first, inspect it, convert it into design contracts and CSS tokens, build reusable components, render screenshots, compare the implementation against the approved visual reference, and iterate.

Or at least it is the first version of the spec. You still need to turn it into real constraints: tokens, components, breakpoints, states, and QA notes. But the visual artifact gives the rest of the process something to converge on. Without it, the model is often optimizing against vibes.

I like this because it makes the workflow more honest. Human designers do this all the time. They do not usually start with a perfect implementation. They sketch, mock, critique, refine, and then build. AI coding tools should not be expected to skip that entire loop and magically land on taste.

Let the image model carry the aesthetic uncertainty

One way to think about this is that UI work has two different uncertainties.

The first is aesthetic uncertainty: what should this feel like? How dense should it be? What visual language fits the product? What is the right balance between calm and information-rich?

The second is implementation uncertainty: how do we make this real in the codebase? Which components already exist? What CSS patterns are established? How should it respond on mobile? How do we avoid layout shifts and text overflow?

GPT-5.5 is better at the second one than the first. GPT Image 2 is better at exploring the first one quickly.

So split the job.

Let the image model explore the visual direction. Let Codex translate the approved direction into code. Then use screenshots to close the loop. That last step matters because generated mockups can cheat. They do not have to obey DOM constraints, real content, localization, accessibility, or browser quirks. The coded page does.

The goal is not to worship the mockup. The goal is to give the implementation loop a strong starting point.

This may be how better design models get trained

I also wonder if this pattern becomes part of how the next generation of coding models gets better at design.

Imagine a harness where Codex can call an image model, generate a design direction, implement it, capture browser screenshots, compare the result to the mockup, and iterate. That loop produces exactly the kind of signal current text-only UI prompting often lacks: a visual target, a coded attempt, a measured difference, and a revised attempt.

OpenAI may already be thinking along these lines. Maybe not in this exact form. But it feels like the natural training shape for a future GPT-5.6 or whatever comes next. If you want a coding model to become better at design, you probably do not just need more examples of HTML and CSS. You need feedback loops where the model learns how visual intent survives contact with implementation.

That is the hard part of UI.

Not writing a div. Not choosing a border radius. Keeping the intent intact across the translation.

For now, use pictures as leverage

My practical takeaway is this: if GPT-5.5 gives you a mediocre UI from a text prompt, do not keep arguing with it in text form.

Make a mockup.

Use GPT Image 2. Use the old screenshot. Use inspiration from interfaces that already solve the density, workflow, or visual hierarchy problem you have. Then hand the mockup to Codex and ask it to build toward that reference. If you can, have it capture screenshots and compare them against the target.

This does not make AI a perfect designer. It does something more immediately useful: it gives the coding agent a visual contract.

And once the agent has a visual contract, GPT-5.5 gets a lot more useful at UI work.

Bruce Hart

Search

Topics

GPT-5.5 Is Better at UI When You Give It a Picture

The mockup is the missing interface between taste and code

Text prompts are too lossy for layout

Maquette is pointing at the right shape of workflow

Let the image model carry the aesthetic uncertainty

This may be how better design models get trained

For now, use pictures as leverage

More articles

GPT-5.5 Feels Like an Incremental Upgrade That Actually Matters

The AI Race Is Starting to Look Like Logistics

Most of Life Is Pattern Matching Until It Isn't

Partial Minds: Intelligence, God, and the Limits of Access

GPT-5.4 Is My Daily Driver. Claude Opus 4.6 Is Still My Specialist.

GPT-5.4 Feels Like the Start of the Stability Era

What If the Best Models End Up a Little Weird?

Gemini 3.1 Pro Review: Better, But Still Behind Where It Matters

Bruce Hart

Recent Posts

GPT-5.5 Is Better at UI When You Give It a Picture

GPT-5.5 Feels Like an Incremental Upgrade That Actually Matters

I Built Some SMS Backup Tools So My Texts Could Escape XML

The AI Race Is Starting to Look Like Logistics

I Added AVIF Export to My Browser Image Editor in Under 10 Minutes With Codex

Recent News

NYT Connections Helper is now a Chrome extension

Starting Lineup Talking Baseball emulator update: much better UI, bug fixes, and what is next

New Web Tools addition: area code lookup

Subscribe

Posts by Month