Offline Agentic Coding
Published 2026-04-27

You can use ollama as the backend for claude code!
ollama launch claude --model
This allows you to use claude code with local models. I'm writing this from an airplane with no internet connection.
Overall model comparisons
Gemma4:e2b did not finish any tasks despite being blazing fast at over 100 tokens per second.
qwen3-coder-next:q4_K_M actually did reasonably well. Felt a bit worse than haiku quality but notably slower. Took around half an hour to fill up 75k of context, which is about 40 tokens per second while taking 50-60gb of memory.
qwen3.6:35b was also fairly reasonable. Did an adaquate job writing a small local data processing job, but was also fairly slow.
Gemma4:31b felt the most 'claude-like' in claude code, but was also fairly slow and occasionally required some jostling and interruption.
Overall
I don't seriously recommend local agentic coding with LLMs. You need some serious hardware to run decent models and it's still slow. It's a nice capability to have locally, but it probably isn't better than coding by hand. Still very cool to have a computer that can program itself though, and amazing that a consumer device can locally run models and software that matches the original gpt-3 era ChatGPT style experience.