The promise is simple: stop context-switching between your editor and a chat window, let a loop run while you aren’t watching, and wake up to a finished feature. I spent a week testing five agentic tools—Claude Code, OpenCode, Factory Droid, Copilot CLI, and Pi—trying to find a loop I could actually trust.
| CLI | Strengths | Weaknesses | Verdict |
|---|---|---|---|
| Claude Code | Deep reasoning, multi-file edits | Very expensive, poor 3rd party/local model support | Only plays well with Anthropic models |
| OpenCode | Fast, easy onboarding | Buggy guardrails, easily bypassed | Promising but untrustworthy |
| Factory Droid | The “Lamborghini”: Enterprise-grade, seamless IDE integration | Not optimized for 3rd-party/local LLMs | Personally, I’m more of a Corolla Guy |
| GitHub Copilot CLI | Safe, familiar | Horrendous local model support | It seems that Microsoft hasn’t learned from Internet Explorer |
| Pi | Raw agentic behavior, hybrid-friendly | Zero built-in safety | The one I’m using |
The Ralph Loop
Most agents are designed to exit after a single task. The Ralph Loop—a pattern originally coined by Geoffrey Huntley—inverts this. Instead of the agent deciding when it’s done, you wrap it in a shell loop that forces it to keep iterating until external verification (like a test suite or a “Judge” model) passes. It’s named after Ralph Wiggum: stubborn, naive persistence in the face of danger.
The magic happens when you separate the Worker from the Judge. In my setup, a local Qwen-35b handles the repetitive coding tasks, but a SOTA cloud model (GLM-5.1) reviews every commit. If the Judge smells a hallucination, it pulls the fire alarm.

# A snippet from ralph.sh: The circuit breaker
if _run_judge "per-iteration" "$judge_verdict_file" "phase-${PHASE}-iter-${i}"; then
echo "[ralph] judge passed iteration $i — continuing"
else
echo "[ralph] judge FAILED iteration $i — halting loop" >&2
_notify "Judge ✗" "Phase $PHASE iter $i: judge review FAILED"
exit 1
fi
The “Lamborghini” Problem
After finding success with the Ralph Loop in OpenCode, I tried Factory Droid. Mario Zechner (the creator of Pi) famously called Droid the “Lamborghini” of CLI agents, and he’s right—it’s enterprise-grade, handles compliance, and has a beautiful VS Code integration.
But a Lamborghini is high-maintenance. While OpenCode played nice with LM Studio, Droid struggled to parse Qwen’s “thoughts.” I eventually fixed the parsing by swapping to oMLX, but even then, Qwen couldn’t grasp Droid’s advanced tool-calling. It would get stuck in infinite loops trying to fetch a website over and over rather than admitting a failure.
Why I’m sticking with Pi (for now)
I eventually settled on Pi. It doesn’t have the “guardrail marketing” of OpenCode or the corporate polish of Droid. It’s a raw engine. By running Pi inside a Lume VM, I get the physical safety I need without the “buggy” permission layers that agents just bypass anyway.
The other reason is this wonderful rpiv-pi package. The project manager part of my brain love this! It wires in a skill pipeline — discover, research, design, plan, implement, validate — where each step produces an artifact the next one consumes. Specialist subagents fan out automatically per stage; hand-off skills freeze session state so the next session resumes from a document, not from memory.
If these tools want to be the “Linux” of the AI space, security and guardrails should be first-class citizens, not optional add-ons. But for now, I’ll keep building my own boxes.