<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
     xmlns:atom="http://www.w3.org/2005/Atom"
     xmlns:content="http://purl.org/rss/1.0/modules/content/">
  <channel>
    <title>Will Angel's Blog</title>
    <link>https://www.williamangel.net/blog/</link>
    <description>Posts from williamangel.net</description>
    <language>en-us</language>
    <lastBuildDate>Wed, 10 Jun 2026 00:00:00 +0000</lastBuildDate>
    <atom:link href="https://www.williamangel.net/rss.xml" rel="self" type="application/rss+xml"/>
    <item>
      <title>Anthropic Fable</title>
      <link>https://www.williamangel.net/blog/2026/06/10/anthropic-fable.html</link>
      <guid isPermaLink="true">https://www.williamangel.net/blog/2026/06/10/anthropic-fable.html</guid>
      <pubDate>Wed, 10 Jun 2026 00:00:00 +0000</pubDate>
      <description>Anthropic's Fable 5 is Opus on a Good Day</description>
      <content:encoded><![CDATA[<h1>Anthropic&#x27;s Fable 5 is Opus on a Good Day</h1>
<h3>Published 2026-06-10</h3>
<p><img src="claude-code.png" alt="Claude Code with Fable 5"></p>
<p>So far a couple of hours with Antrhopic&#x27;s new Fable model in Claude Code feels like it consistently does what Opus does on a good day. Claude Code with Fable more consistently has The Magic ™️ that it had in November and December of last year. It does look like it&#x27;ll be notably more expensive than Opus, and it does seem to run longer without asking for clarification (for better or worse), but the rate at which it just makes software is pretty impressive.</p>
<p><img src="2026-06-10-anthropic-pricing.png" alt="Anthropic Release Note Pricing"></p>
<p>Once again interesting to see performance curves in the wild. Opus 4.8 at Max reasoning is worse than at Xhigh and more expensive...</p>]]></content:encoded>
    </item>
    <item>
      <title>The Stochastically K-Shaped Engineering Job Market</title>
      <link>https://www.williamangel.net/blog/2026/06/05/the-stochastically-k-shaped-engineering-job-market.html</link>
      <guid isPermaLink="true">https://www.williamangel.net/blog/2026/06/05/the-stochastically-k-shaped-engineering-job-market.html</guid>
      <pubDate>Fri, 05 Jun 2026 00:00:00 +0000</pubDate>
      <description>The job market is simultaneously hot and impossible.</description>
      <content:encoded><![CDATA[<h1>The Stochastically K-Shaped Job Market</h1>
<h3>Published 2026-06-05</h3>
<p><img src="2026-06-05-bls-employment-population-ratio.png" alt="BLS employment to population ratio"></p>
<h1>The job market has gone K-Shaped in a weird way.</h1>
<p>The job market seems to have gone stochastically K-Shaped. It is simultaneously hot and impossible, in that for some the odds of finding a job are very high, but for others it&#x27;s nearly impossible.</p>
<p>This is only stochastically true, so your experience may very.</p>
<h2>Anecdotes</h2>
<p>I had someone in a group chat today say they&#x27;d take 5 months of severance if offered because the job market is hot enough. At a happy hour I met a new grad software engineer who told me that 1000 job applications hadn&#x27;t gotten them a single interview...</p>
<h2>Data</h2>
<p>The BLS has a lot of data which shows that things are not great, not terrible. The unemployment rate in May 2026 was 4.3%, down from 4.5% in November of last year.</p>
<p>https://www.bls.gov/charts/employment-situation/persons-not-in-the-labor-force-selected-indicators.htm</p>
<p>But the ratio of employment to people is decreasing:</p>
<p><img src="2026-06-05-bls-employment-population-ratio.png" alt="BLS employment to population ratio"></p>
<p>And the number of people not in the labor force is increasing:</p>
<p><img src="2026-06-05-bls-people-not-in-the-labor.png" alt="BLS people not in the labor market"></p>
<p>And the number of &#x27;long term unemployed&#x27; people who have been unemployed for more than 27 weeks was 27.5% of the overall unemployed population in May 2026, up from 19.2% in May 2023.</p>
<p>https://www.bls.gov/charts/employment-situation/unemployed-27-weeks-or-longer-as-a-percent-of-total-unemployed.htm</p>
<p>So 1.3 million more people are unemployed than 3 years ago. The number of people not in the labor force who would like a job is up 600,000 from May of 2024.</p>
<h2>Conclusion</h2>
<p>The job market is hot for some, impossible for others, and stochastically that makes me sad thinking about the people who got laid off 18 months ago and may never find a job again despite wanting one. Increase your luck by learning, practicing competence, and helping out your fellow humans, as networks are a significant factor in what makes a person hot outside or inside of a job. Plus, tautologically, the world is overall better when we&#x27;re good to each other.</p>]]></content:encoded>
    </item>
    <item>
      <title>Claude.AI Pro Plan quotas too small for deep research</title>
      <link>https://www.williamangel.net/blog/2026/05/20/ClaudeAI-Pro-Plan-Gives-Single-Deep-Research.html</link>
      <guid isPermaLink="true">https://www.williamangel.net/blog/2026/05/20/ClaudeAI-Pro-Plan-Gives-Single-Deep-Research.html</guid>
      <pubDate>Wed, 20 May 2026 00:00:00 +0000</pubDate>
      <description>The Claude.AI Pro Plan doesn't consistently give you a single research mode question!</description>
      <content:encoded><![CDATA[<h1>Claude.AI Pro Plan quotas too small for deep research</h1>
<h3>Published 2026-05-20</h3>
<p><img src="2026-05-20-usage-limit-reached.png" alt="Usage limits reached!"></p>
<h2>Usage limits are too small for deep research</h2>
<p>The $20/month Claude pro plan gives me at most 1 successful research mode question per session before I hit the usage limits, and I have twice seen a single research question fail to finish because it burns the entire session usage. Even when using Sonnet.</p>
<p>My suggestions if anyone at Anthropic sees this: Add some kind of hint to Claude to wrap things up as usage starts coming to an end, add some kind of UI hint to just use web search more often, and maybe allow soft overages for session usage?</p>
<h2>The setup:</h2>
<p>Dominion Energy and next era energy announced they plan on doing a merger and will form the largest energy company in the United States. Having written a book about energy technology and energy policy, I have some opinions on this space and wanted to learn a little bit more. So I asked Claude to do some research into this. I specifically chose the sonnet 4.6 model because I had previously seen research mode on opus fully burn all the usage in my $20 per month pro plan.</p>
<p>A few minutes into this it exploded!<br>
<img src="2026-05-20-usage-limit-reached.png" alt="Usage limits reached!"></p>
<p>I had hit my limits before it could finish!</p>
<p>Well, 80% of my limits... Which is also weird because last time I checked even dumb local models will confirm that 80% is not 100%... There might be a bug?</p>
<p><img src="2026-05-20-usage-limit.png" alt="80% is 100%??"></p>
<h2>Take 2.</h2>
<p>So I tried again a few hours later with the same prompt. No other connectors, no excessive MCP servers or customizations to burn context. No other usage in that session.</p>
<p><img src="2026-05-20-take-2.png" alt="New research prompt"></p>
<p>It worked this time, but used 56% of my usage.</p>
<p>Just 236 sources with lots of fact checking and cross referencing. The average of the internet averaging this section of the internet.</p>
<h2>Claude.AI feels less quota efficient than Claude Code?</h2>
<p>Last week I had a similar experience where resuming a short Claude web session and asking for a little bit of spreadsheet modeling work with Opus 4.7 consumed a whole session of usage quota and failed before finishing. I suspect that Claude Code would have used significantly less quota for the same amount of actual work.  The moral of the story is never resume a session and manage context ruthlessly.</p>
<p>Maybe the real moral of the story is that most tasks don&#x27;t need deep research, but it&#x27;s still surprising to feel so constrained to only have at most one research mode question per ~6 hours.</p>
<h2>Conclusions</h2>
<p>I&#x27;m increasingly coming to believe that features failing outright is a very bad user experience.</p>
<p>Agentic workflows should either reject tasks up front due to lack of projected budget, or degrade gracefully within their budget. Otherwise people won&#x27;t trust the feature itself, let alone trust it to use as many tokens as it can get.</p>]]></content:encoded>
    </item>
    <item>
      <title>Apple Silicon costs more than OpenRouter</title>
      <link>https://www.williamangel.net/blog/2026/05/17/offline-llm-energy-use.html</link>
      <guid isPermaLink="true">https://www.williamangel.net/blog/2026/05/17/offline-llm-energy-use.html</guid>
      <pubDate>Sun, 17 May 2026 00:00:00 +0000</pubDate>
      <description>Local LLMs can be very very cheap</description>
      <content:encoded><![CDATA[<h1>Offline Agentic Coding part 3: Apple Silicon costs more than OpenRouter.</h1>
<h3>Published 2026-05-17</h3>
<p><img src="2026-05-17-offline-llm-energy-use.png" alt="Apple silicon costs more than open router. Spreadsheet showing tokens per second and costs to show overall cost per million tokens."></p>
<p>Apple silicon costs more than OpenRouter.</p>
<p>At ~50-100 watts under load, and ~$0.20 per kWh, my M5 MacbookPro will cost a few cents per hour. Accelerated depreciation (if any) from shortening the lifespan of the device will be more expensive than the electricity.  At a few tens of tokens per second this works out to ammortized costs of ~$1.50 per million tokens. Openrouter for comparable models is 1/3rd the price and ~2x the speed.</p>
<h2>Electricity</h2>
<p>In Northern Virginia my last electricity bill worked out to $0.18 per kilowatt hour. Let&#x27;s round up to $0.20 per kWh.</p>
<p>EIA has average residential costs for 2025 at $0.1730 per kWh in the US.<br>
https://www.eia.gov/electricity/monthly/epm_table_grapher.php?t=table_5_03</p>
<p>At ~50-100 watts and $0.18/kWh that&#x27;s $0.009 or $0.018 per hour. $0.02 per hour. <strong>$0.48 cents per day for the electricity to be running inference at 100%.</strong></p>
<h2>Hardware</h2>
<p>A 14 inch MBP with M5 Max and 64 gigs of ram is currently listed as $4299 on the apple website. 128 gigs will cost you more but 64 gigs should run a model like Gemma 4 31b, which is almost anthropic sonnet levels of performance.</p>
<p>For cost allocation, let&#x27;s consider that this hardware will last 3, 5, or 10 years.  The cost per year is $1433, $860, or $430 respectively.</p>
<p>The hourly cost over 3, 5, and 10 years is thus:</p>
<ul>
<li>$0.16358</li>
<li>$0.09815</li>
<li>$0.04908</li>
</ul>
<p>Depending on useful lifespan, I think 5 years is a reasonable estimate for normal use. 7 or 10 is very plausible. At maxed out inference 3 years may be a reasonable estimate as well.</p>
<h2>Tokenomics</h2>
<p>The big question is how many tokens per hour can you get out of a local model. My M5 Max testing seems to be in the 10-40 tokens per second range for a serious model like Gemma4:31b. At 10 tokens per second that&#x27;s 36000 tokens per hour.</p>
<p>36000 tokens per hour across our 3-10 year lifespan at $0.18 per kwh gives a price per million tokens of $1.61 to $4.79 on the high end.</p>
<p>At 40 tokens per second that&#x27;s 144000 tokens per hour which gets you to $0.40 to $1.20 per million tokens.</p>
<p>For apple silicon, the hardware cost dominates.</p>
<h2>OpenRouter</h2>
<p>OpenRouter has Gemma4 31b at ~38-50 cents per million tokens. This means that on the optimistic side (50 watts, 40 tokens per second, and 10 years) the pro max is as cheap as openrouter. On the pessimistic side (100 watts and 3 years at 10 tokens per second) the pro max is 10x the cost. I think ~3x the cost per million tokens is likely the right number for local inference on the pro max from an accounting perspective.</p>
<h2>Conclusion</h2>
<p>Speed of inference is the biggest factor here though for most cases. Local inference is slower than cloud inference. Some of the gemma 4 providers on openrouter get up to 60-70 tokens per second, which is 3-7 times faster than what I&#x27;m seeing with the pro max (~10-20 tokens per second). For a human employee with a work laptop, their salary costs are going to be ~1000x the cost of the tokens they can generate locally. Throwing money at anthropic makes more sense in this context.</p>
<p>It&#x27;s still wild that a consumer device can run models that are close to anthropic sonnet levels of performance.</p>]]></content:encoded>
    </item>
    <item>
      <title>Jankmarking: Janky Benchmarking</title>
      <link>https://www.williamangel.net/blog/2026/05/08/jankmarking.html</link>
      <guid isPermaLink="true">https://www.williamangel.net/blog/2026/05/08/jankmarking.html</guid>
      <pubDate>Fri, 08 May 2026 00:00:00 +0000</pubDate>
      <description>Are janky benchmarks useful?</description>
      <content:encoded><![CDATA[<h3>Published 2026-05-08</h3>
<h1>Jankmarking: Janky Benchmarking</h1>
<p>Local LLM performance frontier: Ollama with M5 Pro<br>
<img src="2026-05-08-jankmarking.png" alt="my jankmark frontier using ollama"></p>
<h2>Are janky benchmarks useful?</h2>
<p>I put together some evals for local benchmarking. I&#x27;m mostly interested in approximately measuring performance versus speed because I don&#x27;t like waiting while my macbook imitiates a jet turbine in noise and temperature and want to get good enough answers locally. For anything bigger the cloud is still the move. The problem is my benchmarks are --- to use a human em-dash and quote the kids these days &quot;hella jank&quot;.</p>
<p>my janky benchmark questions:</p>
<ul>
<li><strong>hello</strong>: Say hello in one sentence.</li>
<li><strong>haiku</strong>: Write a haiku about autumn leaves.</li>
<li><strong>reasoning</strong>: If a train travels 60 mph for 2.5 hours, how far does it go? Show your work.</li>
<li><strong>summarize</strong>: Summarize the following in two sentences: The Apollo program was a NASA spaceflight program that successfully landed humans on the Moon from 1969 to 1972, using the Saturn V rocket and Command/Service Module.</li>
<li><strong>code_simple</strong>: Write a Python function that returns the nth Fibonacci number.</li>
<li><strong>code_complex</strong>: Implement a thread-safe LRU cache in Python with O(1) get and put. Include type hints and a brief docstring.</li>
</ul>
<p>These aren&#x27;t very good. They&#x27;re half AI generated. The summarization one is a single sentence and is actually backwards asking for a summary longer than the input sentence... (it used to be longer but I think it got lost at some point when I copy and pasted it. Never made it into version control... )</p>
<p>And yet, these are directionally correct. The better models cluster together scorewise.</p>
<p>For serious work, make sure your bench marks aren&#x27;t jankmarks.</p>]]></content:encoded>
    </item>
    <item>
      <title>Offline Agentic Coding: OpenCode</title>
      <link>https://www.williamangel.net/blog/2026/05/07/offline-agentic-coding-2.html</link>
      <guid isPermaLink="true">https://www.williamangel.net/blog/2026/05/07/offline-agentic-coding-2.html</guid>
      <pubDate>Thu, 07 May 2026 00:00:00 +0000</pubDate>
      <description>Limitations of offline coding</description>
      <content:encoded><![CDATA[<h1>Offline Agentic Coding part 2: OpenCode &amp; Kilocode.</h1>
<h3>Published 2026-05-07</h3>
<p><img src="2026-05-07-offline_agentic_coding_2.png" alt="offline agentic coding: a handdrawn clock"></p>
<h2>OpenCode:</h2>
<p>Claude code with non-anthropic models feels limited. Conveniently, we can also use OpenCode!</p>
<pre><code>ollama launch opencode</code></pre>
<p>OpenCode is like Claude Code but bring your own model.  Overall it&#x27;s very comparable, but is less polished in some ways while feeling more solid in others.</p>
<h2>Kilocode:</h2>
<p>Kilocode (kilo.ai) is an agentic coding platform. It&#x27;s now primarily powered by a fork of OpenCode. Their VScode/codium extension is very nice, and offers one of the my favorite views of the context window:</p>
<p><img src="2026-05-07-kilocode.png" alt="kilocode screenshot"></p>
<p>Hooking up a local model to the Kilocode vscode extension is straightforwad. Just give it your local server port and you&#x27;re good to go.</p>
<p>Kilocode is also nice because they have support for a wide range of model providers as well as their own model provider platform, so it&#x27;s seemless to switch between local, open, and proprietary models.</p>
<h2>Overall</h2>
<p>Overall local models for <strong>coding</strong> are still too slow to be practical on regular hardware, but it&#x27;s nice to have as a capability if the internet goes down, and it&#x27;s still magical to be able to tell me computer to program itself.</p>]]></content:encoded>
    </item>
    <item>
      <title>Offline Agentic Coding</title>
      <link>https://www.williamangel.net/blog/2026/04/27/offline-agentic-coding.html</link>
      <guid isPermaLink="true">https://www.williamangel.net/blog/2026/04/27/offline-agentic-coding.html</guid>
      <pubDate>Mon, 27 Apr 2026 00:00:00 +0000</pubDate>
      <description>Offline Agentic Coding: Ollama and Claude code</description>
      <content:encoded><![CDATA[<h1>Offline Agentic Coding</h1>
<h3>Published 2026-04-27</h3>
<p><img src="2026-04-27-offline_agentic_coding.png" alt="offline agentic coding: a handdrawn aeroplane"></p>
<p>You can use ollama as the backend for claude code!</p>
<pre><code>ollama launch claude --model</code></pre>
<p>This allows you to use claude code with local models. I&#x27;m writing this from an airplane with no internet connection.</p>
<h2>Overall model comparisons</h2>
<p>Gemma4:e2b did not finish any tasks despite being blazing fast at over 100 tokens per second.</p>
<p>qwen3-coder-next:q4_K_M actually did reasonably well.  Felt a bit worse than haiku quality but notably slower. Took around half an hour to fill up 75k of context, which is about 40 tokens per second while taking 50-60gb of memory.</p>
<p>qwen3.6:35b was also fairly reasonable. Did an adaquate job writing a small local data processing job, but was also fairly slow.</p>
<p>Gemma4:31b felt the most &#x27;claude-like&#x27; in claude code, but was also fairly slow and occasionally required some jostling and interruption.</p>
<h2>Overall</h2>
<p>I don&#x27;t seriously recommend local agentic coding with LLMs. You need some serious hardware to run decent models and it&#x27;s still slow. It&#x27;s a nice capability to have locally, but it probably isn&#x27;t better than coding by hand. Still very cool to have a computer that can program itself though, and amazing that a consumer device can locally run models and software that matches the original gpt-3 era ChatGPT style experience.</p>]]></content:encoded>
    </item>
    <item>
      <title>Washington DC on track for most volatile temperature year since 1959</title>
      <link>https://www.williamangel.net/blog/2026/04/19/Washington_DC_On_Track_For_Stormy_2026.html</link>
      <guid isPermaLink="true">https://www.williamangel.net/blog/2026/04/19/Washington_DC_On_Track_For_Stormy_2026.html</guid>
      <pubDate>Sun, 19 Apr 2026 00:00:00 +0000</pubDate>
      <description>An analysis of 85 years of daily weather data from Reagan National Airport.</description>
      <content:encoded><![CDATA[<h1>Washington DC on track for most volatile temperature year since 1959</h1>
<p>An analysis of 85 years of daily weather data from Reagan National Airport.</p>
<p><img src="2026-04-19-Washington_DC_On_Track_For_Stormy_2026_figure2.png" alt="Temperature volatility chart"></p>]]></content:encoded>
    </item>
  </channel>
</rss>
