Will Angel's Blog

Offline Agentic Coding: OpenCode

Thu, 07 May 2026 00:00:00 +0000

Offline Agentic Coding part 2: OpenCode & Kilocode.

Published 2026-05-07

OpenCode:

Claude code with non-anthropic models feels limited. Conveniently, we can also use OpenCode!

ollama launch opencode

OpenCode is like Claude Code but bring your own model. Overall it's very comparable, but is less polished in some ways while feeling more solid in others.

Kilocode:

Kilocode (kilo.ai) is an agentic coding platform. It's now primarily powered by a fork of OpenCode. Their VScode/codium extension is very nice, and offers one of the my favorite views of the context window:

Hooking up a local model to the Kilocode vscode extension is straightforwad. Just give it your local server port and you're good to go.

Kilocode is also nice because they have support for a wide range of model providers as well as their own model provider platform, so it's seemless to switch between local, open, and proprietary models.

Overall

Overall local models for coding are still too slow to be practical on regular hardware, but it's nice to have as a capability if the internet goes down, and it's still magical to be able to tell me computer to program itself.

Offline Agentic Coding

Mon, 27 Apr 2026 00:00:00 +0000

Offline Agentic Coding

Published 2026-04-27

You can use ollama as the backend for claude code!

ollama launch claude --model

This allows you to use claude code with local models. I'm writing this from an airplane with no internet connection.

Overall model comparisons

Gemma4:e2b did not finish any tasks despite being blazing fast at over 100 tokens per second.

qwen3-coder-next:q4_K_M actually did reasonably well. Felt a bit worse than haiku quality but notably slower. Took around half an hour to fill up 75k of context, which is about 40 tokens per second while taking 50-60gb of memory.

qwen3.6:35b was also fairly reasonable. Did an adaquate job writing a small local data processing job, but was also fairly slow.

Gemma4:31b felt the most 'claude-like' in claude code, but was also fairly slow and occasionally required some jostling and interruption.

Overall

I don't seriously recommend local agentic coding with LLMs. You need some serious hardware to run decent models and it's still slow. It's a nice capability to have locally, but it probably isn't better than coding by hand. Still very cool to have a computer that can program itself though, and amazing that a consumer device can locally run models and software that matches the original gpt-3 era ChatGPT style experience.

Washington DC on track for most volatile temperature year since 1959

Sun, 19 Apr 2026 00:00:00 +0000

Washington DC on track for most volatile temperature year since 1959

An analysis of 85 years of daily weather data from Reagan National Airport.