What I learned building an AI coding agent for a year

We thought we'd be the best within months. Here's what went wrong — and why I'm more excited than ever.

Jul 05, 2025

It’s been a full year of trying to build the best coding agent!

I didn’t know that my world was about to change last July 4th, at hackathon where I first prototyped a CLI coding tool that became Codebuff. What a ride it’s been!

From leaving Manifold, to doing YC F24, to hiring, to competing with Claude Code, all the while averaging ~70 hours weekly by working most weekends — it’s been a lot!

We may not have won the first round, but I’m more fired up and excited for the future than ever.

Our bet

We got so many things right initially:

CLI first. Scoping down to just a command line tool helped us focus on the core of a coding agent.
Inject more context. Immediately reading a dozen files related to the user prompt gave a huge advantage over competitors.
No permissions checks. We were full YOLO mode from the very beginning which was positively heretical then.
Premium tool. It makes sense to spend more when developer salaries are the alternative.
Knowledge files. We came up with the idea of knowledge.md files that are checked in to your codebase. Codebuff would automatically update these files as it learned.

Most of these are standard or becoming standard in coding agents today!

What didn’t work out

For the first 10 months, we always thought we were weeks away from breaking out and growing exponentially. During YC, we even did grow exponentially, to $5k MRR.

We regularly got people saying it was the best coding agent. But, it wasn’t always as reliable.

Our file editing strategy was flaky for months, much worse than Cursor’s with its custom model to rewrite a file.

Even after we adopted Relace’s fast rewriter model, our product still had a long tail of issues that made ~5-10% of tasks fail. Some of these issues just take time to isolate and fix, but we could have prioritized better.

Without reliability, we could not have high retention. Without high retention, Codebuff could not grow.

What we should have done

Here’s what I’d do differently after an extensive retrospective.

Build end-to-end evals and run them nightly

This would get us regular quantified feedback on how Codebuff performs as a coding agent. It would help solve reliability issues AND allow us to test hypotheses on how to further improve our product.

Because we did not have this, we spent way too much time manually testing Codebuff after every change or when evaluating whether to switch models.

Cut every feature that is not core

We thought we scoped down a lot by sticking to the CLI, but we should have cut even more. Elon Musk was right when he said you must first “delete the part!”.

Here are a few features we should have cut earlier:

Magic detection of whether the input is supposed to be a terminal command or prompt
Automatic knowledge file updates, which we tweaked for months before largely scrapping
A pseudo-terminal library (node-pty) for color output & aliases, which was recently named our biggest blackhole feature ever

Get the whole team improving the core product

I took on too much of the core system and left my cofounder to deal with other tasks which may not have been as impactful. It helps focus and morale to get all hands in the game.

Live in the future

Never stop thinking about how to disrupt your current product. What is the next thing? What experiments can we try today to make it work?

Monthly retrospectives

One bit of process that could have helped us achieve the above is monthly retrospective meetings. Schedule these on your calendar and set aside an hour for everyone to answer these questions and discuss them:

What should we double down on?
What should we cut?
What should we explore next?

Next steps for Codebuff

In the last couple months, we’ve done more reflection and exploration as competitors such as Claude Code have entered the market with similar ideas.

(Incidentally, I believe Claude Code succeeded in part by having a more focused bet: client-side only, search-replace file editing only, agentic-RAG only.)

We’ve been dreaming of the next thing, and now I’m confident we know what it is.

Our new multi-agent product is live!

Our multi-agent framework, launched two days ago, is already increasing our evals!

I’m happy to say that, as of two days ago, we’ve soft-launched our multi-agent architecture, where agents spawn other agents with different roles.

The reception so far has been overwhelmingly positive even though this is the very beginning. My cofounder says we’re just scratching the surface of what is possible in this framework: “it feels like an infinite world of possibilities,” he says.

I agree — check it out! And stay tuned for a bigger launch soon!

Predictions for the next year

Follow along on Manifold and place your bets!

If we got so many things right about what was coming for coding agents last year, can we do it again? I think so!

Here are my forecasts:

The multi-agent paradigm will win. Our experience is that it’s possible to rapidly improve capabilities by delegating tasks to specialized agents.

“Live learning” will be standard. Having the coding agent learn as it does tasks is extremely powerful.

Coding agents will flip the initiative. We’ll see a shift from the user always initiating prompts, to the coding agent more often coming up with tasks for the user, e.g. to review key decisions.

Coding agents will close the loop. Instead of just proposing code changes, they will also use the product itself to perform QA and evals, and commit the changes autonomously.

Recursively improving coding agents will start working. And all the top coding agents will be a flavor of this.

xAI will gain a sizable lead. The multi-polar era will recede as xAI gains a decisive lead in model quality and intelligence.

The best model will not matter as much as today. Instead, it will be the network of agents that distinguishes the best product.

It’s been a blast

Thanks for reading, and cheers to another year of:

Big ideas, grinding, new employees, office snacks, customers that want to acquire us, offsites in Tokyo, afternoon breaks for running or basketball, and late night coding sessions.

May the best coding agent win!

James

P.S. Come help us build the world’s best coding agent!

You can join as a founding engineer and possibly have a stake in the first 10 trillion dollar startup once agents rule the world. Email james@codebuff.com. We also offer referral bonuses!

Liberty

Discussion about this post