I Put Codex Inside My Bedtime Stories App

A bedtime story emergency turned into a pretty good test of what happens when you let an app hire an agent for a few minutes.

I still make AI-generated bedtime stories for my son every night.

It started as a fun parenting ritual, but it has also become one of my favorite little AI labs. The app is useful enough that we keep using it, weird enough that I can experiment freely, and low-stakes enough that if I break something, nobody is losing payroll data. Worst case, I owe a small child an improvised dinosaur story.

Recently I had to leave the house unexpectedly. I did not have a new story ready, and I wanted to make one from my phone.

I could have used the new Codex desktop app remote-control capability and driven the process from another machine. That would have worked. But it felt like the wrong abstraction.

If the app exists to make bedtime stories, why should I have to remote-control a dev environment at all?

So I tried something more interesting: I put the dev environment behind the app.

The feature was simple, but the boundary mattered

The shape of the feature was pretty straightforward.

My Bedtime Stories app already runs on Cloudflare Workers. The story pipeline itself is heavier: Codex CLI, ffmpeg, and custom Python scripts that talk to the Replicate API and stitch everything together.

A Worker is a great place for auth, UI, routing, state, and orchestration. It is not the place I want to install a pile of media tools and let an agent run around.

That is where Fly.io Sprites looked interesting. Fly describes Sprites as persistent, hardware-isolated Linux environments for running arbitrary code. The docs frame them as stateful computers that can wake on demand, keep their filesystem, and go idle without compute charges.

That is almost exactly the missing piece.

Not a serverless function.

Not a permanent server.

A small remote room with tools in it.

The Worker can kick off the job, track the status asynchronously, and return control to the site. The Sprite can do the messy work: run Codex, execute scripts, generate assets, and hand back the result.

That boundary is the whole product design.

The website stays boring. The agent gets a workshop.

Sprites make agents feel like callable infrastructure

The part I like about Sprites is not only that they spin up fast. It is the combination of fast startup, persistent disk, and normal Linux ergonomics.

The working-with-Sprites guide says the environment includes common dev tools and AI CLI tools, including OpenAI Codex. It also notes that installed packages and files persist across hibernation, while processes need to be restarted when the Sprite wakes.

For my use case, that is a useful trade.

I can install ffmpeg once. I can keep my story repo and Python scripts there. I can checkpoint a known-good setup. When bedtime rolls around, the app does not need to rebuild the world. It just needs to wake the room up and run the next job.

That is a different mental model from traditional hosting.

An EC2 or Linode box is a thing you provision and then feel mildly guilty about when it sits idle. A serverless function is cheap and wonderful until your task needs a real filesystem, long-running tools, or a slightly haunted pile of dependencies.

Sprites sit in the middle. They are not for every workload, but they are very good at this one: infrequent, tool-heavy, stateful work that should not live on my laptop.

The billing model matters too. Fly's Sprites page says billing is based on actual CPU cycles, resident memory, and consumed storage. For a bedtime-story generator that runs a few minutes per day, that changes the economics from server-month thinking to task thinking.

At this scale, the cost is basically pennies.

That makes silly ideas cheaper to try.

GPT-5.5 did not need me to code. It needed me to improve its visibility

I asked Codex to investigate whether this was feasible. It read the documentation and said yes.

Then I asked it to build the thing.

Not just sketch the architecture. Actually build it. Use the Sprite CLI. Create the VM. Set up the asynchronous link between the Cloudflare Workers site and Codex. Wire enough of the pieces together that I could generate a story from the website.

GPT-5.5 got surprisingly far.

It also got stuck.

The failure mode was familiar: subtle bugs between platforms. The Worker thought one thing was happening. The Sprite was doing something slightly different. A command completed but the result was not where the site expected it. A status transition was technically valid but operationally useless. All the little seams between systems became the problem.

At that point I had the urge to jump in and fix it by hand.

I could have. But I also realized that would be the less useful intervention.

The better intervention was to teach the model how to debug the system it had built.

I showed it how to log into the Sprites platform and read the logs being generated. Once it could see the actual runtime output, progress accelerated. The model stopped guessing and started diagnosing.

That is a lesson I keep re-learning with AI.

Just because I can fix the bug does not mean I should take the keyboard away.

Sometimes the most valuable human move is to improve the agent's feedback loop: show it the logs, name the failing boundary, ask it to prove each assumption, and let it keep working.

An agent without observability is a very confident autocomplete.

An agent with logs becomes much closer to a junior engineer who can actually learn from the system.

Just because you can does not mean you should, but it might teach you something

There is an obvious absurdity here.

I built an app feature that lets a cloud-hosted coding agent generate bedtime stories for my son by waking a tiny VM, running a custom dev pipeline, calling AI APIs, and sending the result back to a website.

That is a lot of machinery for a bedtime story.

I know.

But side projects are useful because they let you see real patterns in low-stakes places.

The pattern here is not really bedtime stories. The pattern is: an app can delegate a bounded job to an agent that has a full working environment.

That opens up a lot of possibilities.

A product could let an agent repair an import that failed. A dashboard could let an agent investigate a weird metric and generate a report. A developer tool could spin up an environment, reproduce a bug, and attach the logs to an issue. A personal app could run a weird media pipeline without asking me to keep a server warm forever.

The point is not to put a chatbot in every UI.

The point is to give software a controlled place where it can do real work for a short amount of time.

That feels much more powerful than another chat box.

The sandbox makes the security story less scary, not solved

The security part matters, especially when the thing you are running is an agent.

I would not casually point a coding agent at my personal machine, my browser profile, and a directory full of secrets. Prompt injection is real. Tool use expands blast radius. A model can misunderstand instructions or follow the wrong text with too much enthusiasm.

A Sprite does not make those problems disappear.

It does make the boundary easier to reason about.

My story Sprite has the tools it needs and my open-source Python scripts. It does not need my whole laptop. It does not need every credential I have ever accumulated. If the environment starts acting strange, I can delete it and create a new one. If I find a known-good state, I can checkpoint it. If a token is needed, it can be scoped to the smallest useful surface.

Fly's Sprites docs also call out isolated networking and network policy support, which is the kind of thing I want around agent workloads. I still need to be careful. But I get to be careful around a smaller box.

That is the right security posture for this kind of feature.

Worry less because the room is contained.

Do not stop worrying.

I want more apps to have a little agent room

The feature works now.

I can generate stories directly from the site, whether I am on my phone, my PC, or my parents' laptop. Anywhere with internet access where I can log in, I can make the bedtime story happen.

That is a small thing, but it changed how I think about agents in apps.

For a while, the default agent UI has been chat: ask a question, get an answer, maybe call a tool. This felt different. The agent was not the product surface. It was infrastructure behind the button.

That is probably where a lot of the useful stuff will live.

Not AI as a replacement for every interface.

AI as a temporary worker with a bounded job, a disposable workspace, and enough context to do something useful.

My bedtime-story generator is a trivial example. It is also exactly the kind of example that makes the future feel less abstract.

A few years ago, this would have meant provisioning a server, hardening it, paying for it all month, and maintaining a custom queue. Now I can connect a Worker to a small persistent VM, let Codex do the heavy lifting for a few minutes, and shut the room back down when the story is done.

That is worth paying attention to.

If you are building something similar, especially a small app with a weird tool-heavy workflow, I would love to compare notes. In the meantime, check out the Bedtime Stories app if you want to see how it works under the hood: https://github.com/brucehart/bedtimestories