I Built Some SMS Backup Tools So My Texts Could Escape XML
Bruce Hart
A backup is not the same thing as an archive.
I have used SMS Backup & Restore for a while, and I am glad it exists. It solves the important problem: getting my text messages off my phone and into a format I can keep. But after living with those backups for a bit, I realized I wanted something more than safe storage. I wanted something I could actually search, browse, and reuse.
So I built sms-backup-tools: a small set of Python scripts that turns those exports into something I can actually query, browse, and reuse. The scripts load SMS and MMS data into SQLite, extract MMS images to disk, and hash those images so I can upload them to Google Photos without spraying duplicates everywhere.
Nothing here is groundbreaking. That is exactly why I like it.
XML is a backup format, not a working format
XML is fine if your goal is disaster recovery. It is much less fine if your goal is "find that photo from three years ago" or "search for the conversation where we planned that trip."
One of my favorite rules for personal data is this: the safer the source format is, the more likely you still need a second format for daily use. XML is portable and boring, which is good. SQLite is also portable and boring, but in a way that is useful. You can inspect it with normal tools, write ad hoc queries, and build other scripts around it without treating the data like a special artifact.
That is what this repo does. It leaves the original export alone, then builds a searchable SQLite database next to it. Not a reinvention of messaging, but a better landing zone for message history.
MMS attachments are the part I actually care about
Text is only half the story. A lot of what I want to preserve from old message threads is the media: family photos, screenshots, memes, random receipts, the occasional deeply unimportant image that still feels weirdly worth keeping.
SMS Backup & Restore keeps those attachments in the backup, but again, not in a form that is pleasant to browse. So the second part of the toolchain is extraction. Pull the images out of MMS messages, name them sanely, and make them available to the rest of my photo workflow.
That matters because Google Photos is already where I search and browse images. I do not want a parallel archive hidden in an XML blob. I want the MMS photos to rejoin the rest of my library.
Deduplication is one of those boring features that makes the whole thing usable
The least glamorous part of this repo is probably the most important one: image hashing.
If you have years of message backups, duplicates happen fast. The same picture gets forwarded, restored, re-exported, or saved twice under slightly different filenames. Without some kind of fingerprinting, a cleanup project turns into a duplicate factory.
So the scripts hash images before upload and use that to skip files I already have. It is not fancy computer vision. Just a simple, durable way to avoid doing dumb work twice. A lot of good tooling is like that. Not clever, just respectful of your future self.
The interesting part is not the code, it is the lowered activation energy
This is also exactly the kind of project I might have postponed for months a few years ago.
Not because it is especially hard, but because it has just enough little moving parts to be annoying. Parse a somewhat weird export format. Design a schema that is simple without being useless. Handle edge cases around MMS attachments. Add hashing. Keep the scripts clean enough that I will understand them later.
That is where Codex keeps being useful for me. Not as a magical replacement for knowing what I want, but as a very good sidekick for turning a rough idea into working scripts. I still have to decide the data model, the tradeoffs, and what counts as done. But the distance between "this would be nice to have" and "this exists now" is a lot shorter than it used to be.
That change matters. I think we are going to see more personal infrastructure get built because the fixed cost of building small software has dropped.
Useful beats groundbreaking
I do not think sms-backup-tools is some big open source breakthrough. It is a repo full of Python scripts that solve a very specific problem I happened to have.
But those are often the best kinds of tools. Narrow, honest, and built by someone who actually needed them.
If you also use SMS Backup & Restore and wish your archive were easier to search, query, or mine for old images, maybe this will be useful to you too. And if nothing else, it is another reminder that a lot of good software is not about inventing a new category. It is about taking data out of a dead format and putting it somewhere you can actually use.
If you end up trying it, or if you have your own version of this kind of personal data plumbing, I would love to hear about it.