Here's something I wouldn't have done two months ago.

I've been building a proper mail transfer agent — not a wrapper around someone else's API, but an actual MTA. It does MX lookups, connects directly to the receiving server, and delivers the mail itself. Fallbacks, retries, the whole nine yards. If you've ever run your own mail infrastructure, you already know where this is going.

Email delivery has edge cases that are genuinely painful to test. Greylisting, for starters — that's when a receiving server hits you with a 450 error, which is polite server-speak for "I don't know you yet, come back in a few minutes and prove you're not a spammer." Most spam systems fire and forget, so they never retry. Legitimate servers do. It's an effective filter, but it means your MTA needs to be smart enough to queue the message and try again later.

Then there are bounces — not the instant kind where the server says "no such user" right away, but the delayed ones. The front-end mail server accepts the message, runs it through its spam filters and journaling, passes it along to the actual mailbox server, and then discovers the address doesn't exist. By that point you've already gotten a 250 OK and moved on with your life. Now you're dealing with an asynchronous bounce notification that your system needs to catch, interpret, and handle gracefully.

Add in relay denials, connection timeouts, and the dozen other ways an SMTP conversation can go sideways, and you've got a testing problem. A real one.

In any shop I've ever managed, the answer would have been the same: test what you can locally, and accept that the weird edge cases get validated in production. Not because anyone likes that answer, but because the alternative — building a full SMTP test harness that simulates greylisting, delayed bounces, and relay failures on demand — would eat an entire sprint. You'd burn thousands of dollars in developer time building infrastructure that only exists to test other infrastructure. I've seen proposals like that, and even when I thought the engineering was sound, I'd have had a hard time approving the spend.

But that calculus doesn't apply anymore.

It hit me mid-conversation with Claude. I was literally about to type "we'll just have to test those in the wild" when I stopped and thought: wait. I have an effectively unlimited development resource sitting right here. Why wouldn't I build the test harness?

So I asked. Not even a formal PRD — just a conversation. "Could you build a mock SMTP server that simulates these scenarios?" And what happened next is the part that really got me.

Claude didn't just start coding from scratch. It asked me what I needed, walked through the scenarios — and then went and looked to see what already existed. It found an IMAP testing framework that already handled a chunk of the mail verification work. So instead of reinventing the wheel, it integrated that framework and built everything else around it. The greylisting logic, the delayed bounce simulation, the configurable scenario routing — all of that was custom, wrapped around a foundation it found on its own.

That's the part that floored me. It did what a good senior engineer does: it scoped the problem, researched what was already out there, pulled in what made sense, and built the rest. No one told it to go look for existing tools. It just did.

The finished system accepts REST API calls that configure its behavior: send to this address and get a clean accept, send to that one and get greylisted with a 450, try another and get a delayed bounce after initial acceptance. Relay denials, unknown user rejections, the full menu — all configurable, all repeatable.

The whole thing was running within the session. No sprint planning, no backlog grooming, no arguments about whether test infrastructure belongs in this quarter's roadmap.

The Real Point

This isn't a story about email. It's about what happens when there's no incremental cost to building something.

There's a whole category of work that every experienced engineer knows would be valuable but isn't worth doing. Custom test harnesses. One-off data migration validators. Prototype UIs for conversations that are still in the "what if" stage. The ROI is there in theory, but the practical cost makes it a non-starter. So you skip it, work around it, and accept the risk.

Agentic development changes that math. I'm paying for Claude whether I build the test harness or not. The subscription is the same whether I use it for the main project or spin up side infrastructure in the same session. There's no incremental cost — no additional sprint, no extra headcount, no line item to justify. So the question flips from "is this worth building?" to "why wouldn't I build it?"

That's the part people miss when they talk about AI replacing developers. It's not about doing the same things faster — it's about doing things you'd never have greenlit in the first place.

And here's what I keep thinking about: right now I'm on a flat $200/month subscription. What happens when the pricing shifts to pure token-based billing, or when that subscription tier grows tenfold? The economics of what's worth building are already shifting. Whether the cost goes down or up from here, the calculus has permanently changed. Once you've seen what's possible when incremental cost isn't a factor, you can't unsee it.

My buddy Mike calls it "digital prepping." He's already looking at self-hosted models, open-source alternatives, anything that means he won't be entirely beholden to someone else's pricing page. I laughed when he first said it. Now I'm starting to think maybe he's right.

I built a test harness I wouldn't have approved as a manager, using a tool I was already using for the main build, in the same session, at no incremental cost. And my MTA is better for it.

That's what this opens up.

Kevin Phifer is the founder of Theoretically Impossible Solutions LLC, specializing in agentic AI development and consulting. You can reach him at kevin.phifer@theoreticallyimpossible.org.

← Back to Blog