Skip to content

Bruce Hart

Building with AI, keeping it human

Practical AI, automation, and the messy reality of shipping software. Expect experiments, tradeoffs, and the occasional “well, that didn’t work” postmortem.

Latest

Codex logo flexing while a Chrome logo injects a HAR-labeled syringe.
Jan 23, 2026 2 min read

A HAR-First Pattern for Web Scraping with Codex

Capturing a HAR file and letting Codex trace the requests is my new shortcut for scraping complex sites—faster than guessing and easier to automate.

New pattern: capture a HAR file and let Codex reverse-engineer the data flow.

I used to do this manually: open DevTools, scan Network, guess which calls mattered, then stitch together a script. Now I just save a HAR, hand it to Codex, and let it do the archaeology. It’s been a big speedup for the messy, real-world sites that don’t want to be scraped.

HAR files are a map, not just a log

A HAR file is basically a time-stamped ledger of every request and response. The key trick is that it’s JSON, which means tools like jq (or a tiny Python parser) can slice it fast. That gives Codex a structured way to explore the site quickly without overloading the context window.

The workflow that keeps me out of the weeds

I’m converging on a simple loop:

  1. Capture a HAR in Chrome DevTools while doing the exact thing I want to automate.
  2. Ask Codex to inspect the HAR, identify API-like calls, and trace where the data I care about shows up.
  3. Have it generate a Python script that replays those requests or extracts the HTML if that’s the only path.

When the site has dozens (or hundreds) of resources, this saves a ton of time. The HAR is a breadcrumb trail. Codex is the bloodhound.

Auth isn’t a blocker if you plan for it

For logged-in sites, I just ask the script to accept headers on the command line. That way I can paste in a session cookie or bearer token and keep the script generic. It’s boring, but it keeps the extraction logic clean and repeatable.

Do I need a library for this?

Maybe. I’ve been tempted to write a small library or Codex skill to normalize HAR parsing and expose “find me the data” helpers. But the reality is: Codex is already fine at reading JSON and writing a one-off parser. The tradeoff is tokens and time. If a simple helper set can shrink both, it might be worth it. If not, the DIY path wins.

I’ll keep experimenting and see what actually feels like friction. If you’re doing something similar—or have opinions on HAR tooling—hit me up.

Read the full piece

More articles

Coding Jan 17, 2026

When Coding Gets Cheap

LLMs are approaching top programmers; the bottleneck shifts from syntax to intent, Jevons Paradox likely drives more software, and software IP moats move beyond code.

5 min read
AI Jan 13, 2026

Chrome, the Clipboard, and Two Buttons

A practical AI workflow: use Gemini to generate one-off console scripts, and use small TamperMonkey buttons to turn web pages into clean Google Sheets rows and predictable Google Drive receipt filenames (Walgreens + HSA receipts case study).

6 min read
AI Jan 11, 2026

The Good-Enough Model Era

If open source catches up quickly and inference keeps getting cheaper, model IQ stops being the bottleneck. The next wins come from reliability, memory, context, and distribution - and that shift changes pricing too.

5 min read
Coding Jan 9, 2026

Web Tools: my tiny-tool factory on Cloudflare Workers

I built a single Cloudflare Worker that serves a suite of small web tools. Codex helps me spin up new ones fast, and when a tool needs login + persistence, I bolt on Google OAuth and D1 without turning the whole project into a full app.

6 min read
AI Jan 5, 2026

Introducing Codex Transcripts

codex-transcripts converts Codex CLI session JSONL files into clean, mobile-friendly HTML transcripts (with pagination, search, and optional Gist sharing). Inspired by Simon Willison’s claude-code-transcripts.

4 min read