by codecracker3001 ·
Is anyone using deep agents (LangChain or others like Claude Code, Codex) internally in their company or for real work that is non-coding?
I'm building one for a specific task, mainly to run through cron (but users can also chat/ask/provide feedback), and I'm trying to understand best practices.
Can anyone share any examples? I'd like to see something that is not a research agent, a coding agent, or a data/financial analyst agent, but something that does real work.
Hi all!
I've built a small tool to visualize how inefficient `docker pull` is, in preparation for standing up a new Docker registry + transport. It's bugged me for a while that updating one dependency with Docker drags along many other changes. It's a huge problem with Docker+robotics. With dozens or hundreds of dependencies, there's no "right" way to organize the layers that doesn't end up invalidating a bunch of layers on a single dependency update - and this is ignoring things like compiled code, embedded ML weights, etc. Even worse, many robotics deployments are on terrible internet, either due to being out in the boonies or due to customer shenanagins. I've been up at 4AM before supporting a field tech who needs to pull 100MB of mostly unchanged Docker layers to 8 robots on a 1Mbps connnection. (and I don't think that robotics is the only industry that runs into this, either - see the ollama example, that's a painful pull)
What if Docker were smarter and knew about the files were already on disk? How many copies of `python3.10` do I have floating around `/var/lib/docker`. For that matter, how many copies of it does DockerHub have? A registry that could address and deduplicate at the file level rather than just the layer level is surely cheaper to run.
This tool:
- Given two docker images, one you have and one you are pulling, finds how much data docker pull would use, as well as how much data is _actually_ required to pull
- Shows an estiimate for how much time you will save on various levels of cruddy internet
- There's a bunch of examples given of situations where more intelligent pulls would help, but the two image names are free text, feel free to write your own values there and try it out (one at a time though, there's a work queue to analyze new image pairs)
The one thing I wish it had but haven't gotten around to fitting in the UI somehow is a visualization of the files that _didn't_ change but are getting pulled anyhow.It was written entirely in Claude Code, which is a new experience for me. I don't know nextjs at all, I don't generally write frontends. I could have written the backend maybe a little slower than Claude, but the frontend would have taken me 4x as long and wouldn't have been as pretty. It helped that I knew what I wanted on the backend, I think.
The registry/transport/snapshotter(?) I'm building will allow both sharing files across docker layers on your local machine well as in the registry. There's a bit of prior art with this, but only on the client side. The eStargz format allows splitting apart the metadata for a filesystem and the contents, while still remaining OCI compliant - but it does lazy pulls of the contents, and has no deduplication. I think it could easily compete with other image providers both on cost (due to using less storage and bandwidth...everywhere) as well as speed.
If you'd be interested, please reach out.
I built this because I was running 3-5 Claude Code instances on the same repo and burning out from constantly context switching between terminal windows. Carefully making sure agents' work didn't overlap, preventing context windows from degrading, manually enforcing documentation/memory policies, and re-explaining decisions across sessions.
Stoneforge is the coordination layer I wanted. A Director agent breaks goals into tasks. A dispatch daemon assigns them to workers when available, each task runs in its own git worktree. Stewards review completed tasks and squash-merge to main if everything passes inspection, otherwise they handoff the task with review comments to be picked up by a new agent. When a worker hits its context limit, it commits, writes handoff notes, and exits, so the next worker can pick up on the same branch with a fresh context window and important notes from the previous agent's work.
Some design decisions that might be interesting to this crowd:
- Fully event-sourced with a complete audit log. - Supports syncing tasks to Github or Linear, and documents to Notion, Obsidian, or a local folder. Custom providers are also supported. - JSONL as source of truth, SQLite as a disposable cache. JSONL diffs, merges across branches, and survives corruption. SQLite gives you FTS5 and indexed queries. The SQLite .db can be rebuilt on different devices in seconds. - No approval gates by default. If five agents each need confirmation for every file write, you won't be moving any faster. Review happens at the merge steward level. - Worktrees over containers. The conflict surface for coding agents is git and the file system, containers or remote instances are overkill. Worktrees create in milliseconds, share node_modules and build caches, and don't need Docker or separate servers. - You can run multiple Claude Code / Codex plans simultaneously on the same codebase.
Works with Claude Code, OpenAI Codex, and OpenCode. Apache 2.0. GitHub: https://github.com/stoneforge-ai/stoneforge
Happy to discuss the architecture or any of the tradeoffs.