Evolution of Agentic Programming: From Code Completion to Personal Orchestration Tools

Late summer of last year I went from using code completion tools in vs code to the same in cursor. After a few weeks I tried its chat tool, and was blown away. I’m sure many of you can relate. Within a few days I was annoyed by the UI and learned that claude code was a cli tool, which naturally resonates with me, so I tried it out. Immediately I stopped using the cursor chat, and only used cursor for code completion (which it was excellent at). Over time, I used the IDE less and less and less.

It wasn’t long before I switched to using terminal primarily and fell back to using the IDE when I needed to look at code. This was in the Sonnet/Opus 3.5 days of yore. I went from using claude as a pair programmer, single-threaded, to using it as an implementer, multi-threaded and that’s where the real unlock happened.

To make it work, my best practices around context management had to evolve. One major pain point was that claude starts every session with no memory, so I’d put more facts into claude.md. This was not a good solution, I realize now, because those large initial prompt files cause context bloat. They’re sent on every. single. turn. for every single session so unless every rule is important for every single session and turn, you’re wasting context, which means you’re burning token unnecessarily which means you’re wasting money.

This led me to try out skill files which work fairly well. The idea is that claude code injects the description of every skill file into every session so the agent automatically knows which ones to read. That’s a huge improvement because it meant I could keep 5 to 10 to 20 to 50 core skill files that claude would read on demand.

I noticed 2 big problems with that: it slowed the context creep but didn’t stop it because those descriptions getting injected on every session add up too, and those files have to be maintained semi-manually and worse, checked into git. The latter results in noisy PRs for things that humans don’t want to read, plus merge conflicts, etc. Not a big deal, but significant all the same.

Around this time I saw people online talking a lot about memory mcps, so I wrote one. It was a very fun project because I had need for a vector db for the first time and learned more about embedding models. I realized that what is powerful about those skill files is that everyone on the team has access to them so my writing a verbose doc on how the scheduling system works is useful to everyone on the team. Easy solution: back the memory server with a vector db that is hosted on our infrasctructure and get everyone to install it by writing an install doc that they feed to their own claude. Fun!

But remember that context management is an important consideration. Mempalace was relased a few weeks ago with almost perfect memory recall, which is amazing, but I don’t actually want claude to remember all the choices I made along the way. I don’t want it to remember that we started with a sqlite db and switched to postgres when it got too big and then had major performance problems and eventually switched to mongo which didn’t solve the performance issues so we landed on dynamo; I want it to know that we use dynamo.

My workflow, therefore, doesn’t remember every decision that I made along the way, it instead stores memories more for documentation than recalling the past.

The trick, then, is getting it to always remember and getting it to always search memory for the context it needs for a given session. The solution to the latter is a claude.md instruct, because this IS something that I want on every single session. The solution to the former, getting it to always remember architecture decisions, is solved through workflow.

Any time you have a workflow that repeats the same steps you always want to do, one tool to reach for is a custom slash command, which is simply a stored prompt that can take some arguments if you like. I have two that I use regularly, and a few that I use occasionally. The two that are topical are /m (search memories relevant to this conversation and update them appropriately and store new memories with lessons learned and architectural decisions we’ve made during this session), and /push (lint, compile, commit, push, but only once, then update memories). I use these in every single session unless it is truly a short throwaway conversation that isn’t worth recalling. If I’m not sure, I store it. Another common slash command I have that isn’t relevant to this post but is quite useful is /ask: answer this question and do not continue working on anything until I tell you you. Claude is tuned to DO and sometimes you need to stop and consider for a turn or three before steering in a new direction. Without this /ask command, I found that I would interrupt, ask a question, and claude would consider that a hint that it should immediately steer in a different direction which probably wasn’t the intent of the question.

So by this point in my evolution of agentic programming I was fully multi-threaded, often running 3+ sessions in parallel. One small problem with this is I needed more terminal columns so I bought a 34” extra wide monitor that allows me to comfortably run 6-8 terminal columns in parallel and I got very comfortable with tmux sessions and worktrees. The problem then, became completely human: tracking what a given session was about and knowing what branch a worktree was created from and the state of the git system became very error prone. Tmux session titles helped a bit but consistently updating them, killing them, not losing them, well, its a lot of overhead. The git problem is bigger. If you mess up you end up merging code to the wrong place. For example, you accidentally create a worktree off dev and hotfix to main and accidentally end up doing a full release before QA has approved everything. Not great. Certainly possible when single threaded but more rare; with agents doing all the things, its much easier to do accidentally if you don’t really nail your workflow.

At this point I was convinced I needed an orchestration tool of some sort. I tried out quite a few. Vibe Kanban was my favorite but had UX/DX issues and then they moved to a freemium / nagware model that killed it for me. I tried conductor.build, and claude-code-ui and probably looked at many others before I just said fuck it and built my own.

Thus arrived Orchestrel, which is a kanban board that ties an agent session to a card on the board, with as many columns as I can fit onto screen with a 350px min width. Its opensource, so you’re welcome to try it out if you like or, better, grab the parts that are useful to you and build your own. Getting it working well has been an interesting challenge. The basic architecture was very fun to build because its a pure event driven design that I haven’t gotten to play with since I was building a realtime chat server for Purecloud. On the backend I have a session manger that makes api calls to a harness sdk and emits events to controllers that emit events to the client. I ended up separating the session manager out to a daemon process so I can run the web code in dev mode for HMR support without interrupting running session. I experimented with claude code sdk, then opencode, then pi-ai, then landed back on CC’s agent sdk because having a universal harness that most people in the community already use just streamlines a lot. I do think anthropic makes the best harness, and projects like claude code router allow me to use it with other models pretty easily.

Having my own orchestration tool means that I can build a workflow that perfectly works with how my brain works and I can customize everything to ensure best practices. For example, I built support for multiple columns for different sessions, have deep support for multiple projects all running in parallel, using different providers, different keys for billing purposes, built in support for using my memory server, etc etc. I found that having 6 running columns was distracting and I recently built a new feature to automatically open the next most important card when I submit a prompt. That allows me to rotate through as many sessions as I want with impunity. New critical bug report just came in? Kick off a new card for that project, in a worktree from main, /jira command to pull all the context, skill files on how to access the logging server, and off it goes. Work on other things, and when the agent figures out the problem that card will pop back up on the ferris wheel.