Skip to content

Bruce Hart

Latest

Apr 27, 2026 6 min read

GPT-5.5 Is Better at UI When You Give It a Picture

GPT-5.5 still struggles to invent polished UI from text alone, but a visual-first workflow works much better: use GPT Image 2 to create a mockup, then give that mockup to Codex as the implementation target.

GPT-5.5 is still not a great UI designer. But it is getting very good at implementing a design once you give it something visual to aim at.

I have been impressed with GPT-5.5 for coding, data work, and agentic tasks, but UI is still one of the places where the model can drift into mush. It knows the vocabulary of good design. It can say "clean hierarchy" and "modern dashboard" all day. But if you ask it to invent a polished interface from text alone, the result can still feel generic, over-carded, weirdly spaced, or just a little off.

The pattern that has worked better for me is simple: stop asking the coding model to imagine the UI from scratch.

Give it a picture.

The mockup is the missing interface between taste and code

The best UI workflow I have found lately is not pure prompting. It is visual direction first, implementation second.

Use GPT Image 2 to create a mockup of the UI you actually want. If you have an old screenshot, use it. If there are other products or design references that capture the shape, density, or feel you are after, use those too. Then ask the image model for a concrete mockup: the page, the layout, the visual rhythm, the empty states, the controls, the proportions.

Once you have that mockup, hand it to Codex as the reference.

In my experience, GPT-5.5 is much better at "implement this screenshot" than "invent a beautiful interface for this product." That is not a knock on the model. It is a useful division of labor. The image model is the sketchpad. Codex is the builder.

Text prompts are too lossy for layout

A lot of UI feedback is annoyingly visual.

A mockup shortens the chain. Instead of asking Codex to infer the intended layout, you give it something it can inspect. It can see that the toolbar is compact, the table is dense, the actions are aligned, the chart has room to breathe, and the nav is restrained. It still has to implement all of that, but the target is less ambiguous.

This is especially useful when you are improving an existing UI. Start with the old screenshot, add a few references, and ask GPT Image 2 to produce a better version. You are not asking for production code yet. You are asking for direction.

Then Codex can do the engineering work: components, CSS, accessibility, responsive states, screenshot comparison, and iteration.

Maquette is pointing at the right shape of workflow

An interesting public example I have seen is Maquette, a Codex plugin by Ixel that is explicitly built around image-guided website workflows.

Its README describes a staged process: generate or edit a visual artifact first, inspect it, convert it into design contracts and CSS tokens, build reusable components, render screenshots, compare the implementation against the approved visual reference, and iterate.

Or at least it is the first version of the spec. You still need to turn it into real constraints: tokens, components, breakpoints, states, and QA notes. But the visual artifact gives the rest of the process something to converge on. Without it, the model is often optimizing against vibes.

I like this because it makes the workflow more honest. Human designers do this all the time. They do not usually start with a perfect implementation. They sketch, mock, critique, refine, and then build. AI coding tools should not be expected to skip that entire loop and magically land on taste.

Let the image model carry the aesthetic uncertainty

One way to think about this is that UI work has two different uncertainties.

The first is aesthetic uncertainty: what should this feel like? How dense should it be? What visual language fits the product? What is the right balance between calm and information-rich?

The second is implementation uncertainty: how do we make this real in the codebase? Which components already exist? What CSS patterns are established? How should it respond on mobile? How do we avoid layout shifts and text overflow?

GPT-5.5 is better at the second one than the first. GPT Image 2 is better at exploring the first one quickly.

So split the job.

Let the image model explore the visual direction. Let Codex translate the approved direction into code. Then use screenshots to close the loop. That last step matters because generated mockups can cheat. They do not have to obey DOM constraints, real content, localization, accessibility, or browser quirks. The coded page does.

The goal is not to worship the mockup. The goal is to give the implementation loop a strong starting point.

This may be how better design models get trained

I also wonder if this pattern becomes part of how the next generation of coding models gets better at design.

Imagine a harness where Codex can call an image model, generate a design direction, implement it, capture browser screenshots, compare the result to the mockup, and iterate. That loop produces exactly the kind of signal current text-only UI prompting often lacks: a visual target, a coded attempt, a measured difference, and a revised attempt.

OpenAI may already be thinking along these lines. Maybe not in this exact form. But it feels like the natural training shape for a future GPT-5.6 or whatever comes next. If you want a coding model to become better at design, you probably do not just need more examples of HTML and CSS. You need feedback loops where the model learns how visual intent survives contact with implementation.

That is the hard part of UI.

Not writing a div. Not choosing a border radius. Keeping the intent intact across the translation.

For now, use pictures as leverage

My practical takeaway is this: if GPT-5.5 gives you a mediocre UI from a text prompt, do not keep arguing with it in text form.

Make a mockup.

Use GPT Image 2. Use the old screenshot. Use inspiration from interfaces that already solve the density, workflow, or visual hierarchy problem you have. Then hand the mockup to Codex and ask it to build toward that reference. If you can, have it capture screenshots and compare them against the target.

This does not make AI a perfect designer. It does something more immediately useful: it gives the coding agent a visual contract.

And once the agent has a visual contract, GPT-5.5 gets a lot more useful at UI work.

Read the full piece

More articles

AI Apr 26, 2026

GPT-5.5 Feels Like an Incremental Upgrade That Actually Matters

GPT-5.5 has been a solid step above GPT-5.4 in speed, token efficiency, and agent stamina, but that does not mean every API workflow should move. A Codex CLI experiment with Chrome remote debugging and Yahoo Fantasy Football data made the upgrade feel real.

7 min read
AI Apr 3, 2026

The AI Race Is Starting to Look Like Logistics

Anthropic's March 2026 changes now look like a full shift from off-peak incentives to explicit peak-hour cutbacks. I still think Opus slightly edges GPT-5.4 on pure feel, but OpenAI looks stronger on the things that increasingly decide who wins: reliability, efficiency, and focus.

7 min read
AI Mar 17, 2026

Most of Life Is Pattern Matching Until It Isn't

The more I learn, the less random the world looks. Stories, people, math, and LLMs all seem to rest on patterns, but the most interesting work happens right where those patterns stop being enough.

8 min read
coding Mar 10, 2026

GPT-5.4 Is My Daily Driver. Claude Opus 4.6 Is Still My Specialist.

GPT-5.4 is still the model I use most, but this week Claude Opus 4.6 beat it decisively on two tasks that matter: intuitive explanation and last-mile debugging in messy JavaScript environments. The gap between frontier models now feels less like general capability and more like which kinds of hard problems they handle best.

7 min read
AI Mar 7, 2026

GPT-5.4 Feels Like the Start of the Stability Era

OpenAI's March 5, 2026 release of GPT-5.4 feels less like a one-off model bump and more like evidence that coding is getting close to solved for everyday work. The next frontier looks more like reliability, long-running agents, and systems that can keep learning without falling apart.

6 min read
AI Mar 6, 2026

What If the Best Models End Up a Little Weird?

Human intelligence is messy, uneven, and sometimes inseparable from cognitive flaws. That raises a strange AI question: if we keep sanding every rough edge off LLMs, are we also sanding off paths to unusual capability?

6 min read