Using Devcontainers to Fix Coding Agent's Foibles
I’m publishing the template I use for AI Coded projects here.
For the last two years, I’ve been forcing myself to use AI to write as much of my code as possible, because I want to understand the technology’s weaknesses and strengths. Things started to take off with Cursor’s agent mode, and the release of Anthropics Sonnet 3.7 model and then Claude Code rocketed it to a place where it could write all of my code… with caveats.
Agents can
…be incredibly destructive - They once decided top upgrade my linux kernal when I wasn’t looking
…be bad at managing processes - They might try to start a server to see logs, and hang because it doesn’t return
…forget everything on a new session - They need an out of band “long term memory”
…be confidently incorrect - They need test harnesses and verification mechanisms
…work much better with some languages - Static typing matters, as well as tranining size
Some of these require careful architecture and management over time to keep codebases in good shape, but some things are able to be systematized:
Devcontainers
As I’ve developed projects over time, I’ve found myself pulling forward a particular set of code; my devcontainers. Devcontainers are a way to define a development environment, based on Docker with some additional configuration. They allow many code editors to connect to the containers directly using their environment.
Devcontainers solve for sandboxing, repeatability, and process management. They also introduce problems.
Because devcontainers are docker containers, if files are written to untracked directories, when they’re restarted, the data is lost. Code is mounted in as a volume, so changes are saved back. But if for example, one logs into Claude Code, that saves its session to the home directory. Then if the container is restarted, session information and interaction history is lost.
The solution is to mount the home directory for the container use to a persisted volume. This saves writes to that directory across refreshes.
Process Management and Monitoring
Many times Claude Code was looping on a project, and I’d look back to a message like “Starting the server to check changes, running “go run main.go…” It had gotten stuck because it wanted to run the app to check the changes, but because its only tool was a bash command and it’s “synchronous,” it would never finish since the server wouldn’t exit. Perhaps an ideal solution would be giving the agent a built in process manager, but for now I’ve found the best option is to start all the processes in separate threads and instruct Claude on how to access their logs. Every process is started as a separate container via the docker compose definition, this allows the processes to be automatically started up, and lets docker manage the logs.
Docker can take care of restarting processes, or use a service that watches code and live reloads on changes.
Coding agents can then query “docker logs” of the dev container to get the logs. But I’ve had luck instead draining all of my logs to the open source logging platform Loki with Alloy, and then tell the coding agent how to query the logs. I also set up Grafana in fron of Loki, so I check the logs easily myself.
Now, every time a change is made, the agent can just look at loki to see if there were any errors.
Planning and long term memory
Planning is perhaps the most interesting meeting of Agents and the Coder. They’re surprisingly good at it, but often the initial plan is where things go wrong. Claude code allows us to create custom commands, and I created one to be able to write /plan
and then give a description of the feature I’m working on. This turns out a structured markdown file with considerations for all of the things I care about, and then I can curate that file, suggesting edits and adding requirements.
When I start to actually implement I have a /execute_plan that reaffirms to use the plan procedurally. The plan then becomes a cross session “long term memory” that is updated as we finish each task, keeping the Agent on track.
What Next
In some ways, it feels odd to spend so much time optimizing for the AI Agent. It’s always a balance to work on developer efficency versus delivering features, but I think I’ve found a structure that works out of the box to solve most of my issues getting them to work. That time has allowed me to deliver features in days that used to take weeks. I’ve put together a basic Fullstack typescript project here fully instrumented with these practices for anyone to use.
Next I’m thinking about a operator one level above this. I find myself spinning up many environment, and spending a lot of time doing repetitive work to do the core flow (plan -> verify -> excute -> verify -> commit). There should be something that can orchestrate spinning up a devcontainered environment and interacting with these agents at a higher level. We’ll see what that is…