Posts tagged with "LLM"

Reviewing token usage

3 min read; 7 GFI

As I've written about a few times in the last week or so, the journey with AI-based coding tools has hit an interesting time when it comes to prices, quotas, usage, availability and all that. Having come into all of this via a place where it was a flat fee, and where I didn't really need to think about input tokens and output tokens and so on, I'm pretty ignorant of what that all means in terms of scale. If I'm looking at a new tool and I see prices and/or quotas for in/out tokens, it means nothing to me. I can't relate to it. I've never had to care about it.

While using Gemini CLI to quickly make a change to BlogMore this morning, I was reminded that at the end of a session it does tell me this:

Session usage

Seeing that got me thinking: is there a way to get the total usage for all of my sessions, or at least the sessions that have still been retained (I'm guessing they expire after a wee while)? After a little bit of searching I found ccusage. That looked exactly like the sort of thing I was after.

Now, this is only going to be good for Gemini (it says it supports Copilot too, but it seems to be failing to find any Copilot sessions), but it should give me a feel for what my token usage looks like.

I work on BlogMore on two different machines: the MacBook Air and also a Mac Mini I have in my office. Here's all of the available token usage data I can get out of the Air:

DateInputOutputCache ReadTotal TokensCost (USD)
2026-04-29235,23820,282773,6421,032,608$0.23
2026-05-01315,0013,181447,556768,532$0.20
2026-05-022,621,62852,29018,260,59720,955,447$2.44
2026-05-033,627,84630,53811,819,27915,509,213$5.74
2026-05-04869,82949,1632,721,0743,656,649$0.77
2026-05-092,287,76050,0819,973,76412,327,819$1.84
2026-05-101,019,55034,5568,061,8979,125,838$1.05
2026-05-111,112,12335,61010,523,34811,689,576$1.24
2026-05-131,506,51341,8027,561,1689,124,651$2.88
2026-05-15123,1613,155587,248716,813$0.11
2026-05-16111,33414,836519,161646,275$0.13
2026-05-17940,48536,1717,682,3148,706,034$1.41
2026-05-1867,0331,357205,921275,707$0.05
2026-05-2160,9041,182119,055184,117$0.05
Total14,898,405374,20479,256,02494,719,279$18.13

And also the same for the Mac Mini (which gets used less frequently for this sort of thing):

DateInputOutputCache ReadTotal TokensCost (USD)
2026-05-04212,17831,6312,128,0742,389,927$0.36
2026-05-051,108,90331,9976,222,8687,374,732$1.13
2026-05-0830,8991,19464,07498,146$0.03
2026-05-111,339,33327,3998,074,9049,459,253$1.21
2026-05-12952,05753,02312,751,53913,838,943$1.52
2026-05-18166,8754,774651,417827,746$0.22
2026-05-19449,08723,9763,236,3243,721,558$0.54
2026-05-22335,15110,0121,919,8152,272,553$0.32
Total4,594,483184,00635,049,01539,982,858$5.33

In both cases I've removed a couple of columns to make the tables fit better. The first was the model name (varying between gemini-3-flash-preview and gemini-3.1-pro-preview), the second was Cache Create (which was always 0 all the way down).

From what I can see, it would appear that these two tables do cover my increasing use of Gemini CLI for doing work on BlogMore (the first intensive use being back around the 5th of this month, if I recall correctly). So this would seem to be a reasonably informative way to view things.

All of which is to say, over a roughly three week period, while getting things done, I've used getting on for 20,000,000 input tokens, and around 600,000 output tokens (presumably I do also need to be keeping the 114,300,000 cache read tokens in mind too). With this in mind I might now be able to make more sense of the pricing I see for various tools.

Reviewing the cost of BlogMore

3 min read; 8 GFI

Now that we're near the end of the free or cheap GitHub Copilot party, I thought it might be interesting to look at how much BlogMore has "cost" me to build, and what it would have cost under the proposed new pricing structure that is coming in next month. While I've looked at the comparison for last month, I've not looked at the whole period I've been seriously using it.

So, for this review, I'm looking at all the data I can pull out of GitHub for the months of February, March, April and May of this year. Development of BlogMore started back in February and, while it hasn't been 100% the cause of my use of Copilot premium requests, it's been almost all of it. For the purposes of this review I'm just going to take the approach that all I worked on was BlogMore.

Remember that, even when I had free access, I had a maximum of 300 premium requests per month. Once I lost free access I had the same number of requests for $10 a month.

Here's how those months broke down:

MonthPaidPremium Requests%agePredicted Price
February$0.0024983%$21.67
March$10.0014047%$56.38
April$10.0013244%$53.77
May$10.003411%$53.69
Total:$30.0055546%$185.51

So, give or take, something that I've actually spent $30.00 on could have, at best, cost me $185.51. That's assuming that the "cost" of the models I was using stays the same. You can see that the costs have risen already in that the predicted price from February, where I used 83% of my premium requests, is a touch under half the cost for this month, where I've used just 11%. From what I can see in the raw data, it's down to some models suddenly being considered more expensive (perhaps I was doing something that just consumed more tokens, I'm not 100% sure if I'm honest, but I don't recall anything that seemed like harder work).

Who knows what the real costs will be come June.

Now, technically, the actual cost under the new regime could or should be $156, because it would be 4 lots of the $39.00/month plan, which would better cover that use1. Again though, that's assuming the actual cost of using whatever models remains pretty stable. It also assumes that I'd want to spend that much each month, and that I would be correctly anticipating that I'd need that much.

Also, this isn't even the total cost of getting this project done. As I've written recently: I've been using Gemini CLI more this month, and while the usage there is a flat cost, until now, that's changing too.

Now, of course, these aren't the only games in town. I could "go to the source" and just get a sub for Claude Code or something, and as Tim pointed out over in the Fediverse, something like Cursor does a lot of this and is just $20/month. Which all sounds fine, but what happens when those fleeing GitHub Copilot or Gemini CLI/Antigravity head over to something like Cursor? Is it sensible to expect the pricing to stay the same2?

I guess, at this point, I'm just mulling over the same issue time and again, but from different angles. It does seem clear to me, though, that in less than 4 months, in my experiment of "what happens if I use agents to develop a Free Software tool I want?", the market has gone from being entirely reasonable to pretty much unjustifiable from a price point of view.


  1. As I understand it, the $39 gets you almost twice that value in "AI credits", so the base allotment plus the flex allotment would cover what I've used. 

  2. That's not even the main reason to be concerned about a switch to Cursor

It's all so vague

3 min read; 10 GFI

The recent changes to pricing and usage, in relation to AI, aren't just about agents and coding. Not only have I seen GitHub Copilot and Gemini CLI hugely restrict their offerings for the same price, it's also come to at least one "general" tool I use too.

For a while now, as part of a Google One subscription I keep, I've had a Gemini AI Pro subscription. I've generally found this useful, mainly using the Gemini app on my iPhone to research things1, and also commonly using the web application to help proofread blog posts, and sometimes explore coding problems. Another way I use it is via NotebookLM. The subscription has meant that I can do all of this without ever having to worry about hitting any usage limits. While I'm sure they were there, I was never aware of them and never hit them.

In the last 48 hours, along with the changes to the coding agent offerings, Gemini itself has moved to a compute-based usage limit approach.

Gemini will move to compute-based usage limits that will refresh every 5 hours until you reach your weekly limit. Calculation of your usage will factor in the complexity of your prompt, the features you use, and the length of your chat. Paid users have higher limits than users without a Google AI subscription.

The thing that bothers me about this -- and I've seen this with other companies in this market too -- is just how vague the wording is. Look at this table that is supposed to inform you about your usage limits, depending on your plan:

PlanLimit
Without a planStandard limits
AI Plus2x higher than standard limits
AI Pro4x higher than standard limits
AI Ultra5x or 20x higher than AI Pro depending on your subscription

Okay, great, thanks to my Pro plan I get 4x the limits. Awesome. But... 4x what exactly? What exactly are the standard limits? How do I assess which plan is better for me? How do I compare Google's product against another offering?

I suspect, for the most part, I'll be fine where I am. So far today I've used Gemini to proofread the previous post I wrote, there was a bit of back and forth as I edited my post, and that cost me 1% of my five hour window.

My usage limits

What impact that has on my weekly usage, I don't know, but based on this it would appear to be almost nothing.

I can appreciate that it's been a bit of a free party for a while, and now each provider has to start to have this cost them less -- if not actually make them money -- before the whole thing collapses. Fair enough. But it's annoying as hell to not be able to gauge what I'm actually getting, or easily compare products.

That's not to say that I know how this can be communicated well. There's a flip-side to all of this. If I go and look at the Anthropic website and their detailed pricing information it seems to take it to the other extreme. There's so much you need to know and understand, and you'd need to know so much about how their models work and how your needs would interact with them... it feels like you need specialised training to comprehend any of it. While I can't find it back at the moment, I seem to remember a similar issue with trying to follow such information with GitHub Copilot.

If it doesn't exist already, I suspect there's a market here for a site that makes it incredibly simple to plug in your requirements and have a product recommendation be made.


  1. In the past six months I've found it's generally a far better method of finding things than simply using a search engine; no ads, cited sources, results that are easy to revisit, etc. 

NGMCP v0.2.0

1 min read; 10 GFI

The experiment with building an MCP server continues, with some hacking on it happening over a couple of hours while killing time in an Edinburgh coffee shop.

It is, of course, a solution looking for a problem, and I suspect I'm the only person who will ever use it, and even then only as a test, but building it is proving interesting.

The main changes in v0.2.0 are:

  • I turned the current search_guide tool into a line_search_guide tool (because that's what it was doing: a line-by-line search).
  • I added a body_search_guide tool that treats all the lines in an entry as a single block of text and then does the search in that (so searches over line breaks will work).
  • I added a read_entry_source tool that, rather than rendering the entry of a guide as plain text with all the markup removed, it instead delivers the underlying "source" for the entry; something that could be useful if you wanted to get an agent to convert it into another marked-up body of text.
  • I added a markup glossary resource, which technically tells an agent everything it needs to know about Norton Guide markup.

The latter one is interesting. I added it and did some experimenting locally and it seemed to be helpful and I could ask questions about markup and Copilot seemed to use it. Meanwhile, having installed v0.2.0 globally on my machine, and having enabled it, I'm finding that Copilot seems to have zero clue about the markup and instead is using the server to go off and read the guides to work out the markup1.

On the other hand, the new "get source" tool seems to work a treat.

Peeking at the source for an entry

So I suspect I still have some reading/experimenting to do when it comes to resources, so I can better understand why I'd want to provide them and what problem they solve.


  1. All credit to it: it did find CREATING.NG and read the markup out of that. 

Global and local MCP servers with Copilot CLI

1 min read; 11 GFI

This morning I'm tinkering some more with NGMCP. Having done a release yesterday and tested it out by globally installing it with:

uv tool install ngmcp

I was then left with the question: how do I easily test the version of the code I'm working on, when I now have it set up globally? Having done the global installation I had ~/.copilot/mcp-config.json looking like this:

{
  "mcpServers": {
    "ngmcp": {
      "command": "ngmcp",
      "args": [],
      "env": {
        "NGMCP_GUIDE_DIRS": "/Users/davep/Documents/Norton Guides"
      }
    }
  }
}

whereas before it looked like this:

{
  "mcpServers": {
    "ngmcp": {
      "command": "uv",
      "args": ["run", "ngmcp"],
      "env": {
        "NGMCP_GUIDE_DIRS": "/Users/davep/Documents/Norton Guides"
      }
    }
  }
}

But now I want both. Ideally I'd want to be able to set up an override for a specific server in a specific repository. I did some searching and reading of the documentation and, from what I can tell, there's no method of doing that right now1. So I've settled on this:

{
  "mcpServers": {
    "ngmcp-global": {
      "command": "ngmcp",
      "args": [],
      "env": {
        "NGMCP_GUIDE_DIRS": "/Users/davep/Documents/Norton Guides"
      }
    },
    "ngmcp-local": {
      "command": "uv",
      "args": ["run", "ngmcp"],
      "env": {
        "NGMCP_GUIDE_DIRS": "/Users/davep/Documents/Norton Guides"
      }
    }
  }
}

and then in Copilot CLI I just use the /mcp command to enable one and disable the other. It's kind of clunky, but it works.


  1. I did see the suggestion that you can write your MCP server so that it does a non-response depending on the context, but that seems horribly situation-specific and wouldn't really help in this case anyway because I want it to work in both contexts, depending on what I'm doing. 

NGMCP - An MCP experiment

2 min read; 12 GFI

Recently I've been thinking that it would be interesting to get to know a little about the Model Context Protocol and see what it's about and get a feel for how useful it might be, if at all, for anything I do.

As always happens when I want to try out something new, I reached for a problem I know well so I don't have to get bogged down in solving the problem itself. As almost always happens, I decided I should base it around Norton Guides.

Part of the point of MCP seems to be providing an interface over sources of data and actions, that an LLM might not otherwise be able to cope with, and so it sounded to me like providing a bridge to the content of Norton Guide files would be a perfect test. Of course, this isn't the first time I've bridged LLMs and NG files, but this is obviously intended to be a more generic solution than throwing a Markdown file at NotebookLM.

Earlier this afternoon I sat down and did some reading, and then decided to throw the problem at GitHub Copilot. I told it I wanted to use my NGDB library as the core of the tool, and that it should wrap it up with FastMCP. The initial result was... a bit of a mess. It sort of worked, sort of, but it also seemed to try and put together a project that mostly looked how my Python repos look, but with some bits just wrong.

I did some cleaning up, did some testing, did some tweaking, and eventually I had something working.

Asking what NGMCP can do

So far I've given the code a fairly quick read over, and I can see what it's doing and how it's going about this. This approach obviously has the disadvantage that I didn't hand-write it so there's still a lot to read to really appreciate what's going on; on the other hand, it does have the advantage that it's implemented a tool based on my library so I know what to expect it to be doing.

There will be more code reading happening, and I also intend to look to tidy up the code more and perhaps hand-add some more features.

Looking at the credits of a guide

I very much doubt that this particular MCP server is going to be any use to anyone, but as a proof of concept it works well for me. If I were in a position of needing to build something genuinely useful, I now have a start and a vague idea.

Reading some text from a guide

On the other hand: once again, as with other projects I've done related to Norton Guides, this is a tool that helps keep the content available and accessible; that alone is one reason for me to tidy this up and move it towards v1.0.0 and keep it maintained.

If you fancy having a play, some (currently Copilot-generated) documentation can be found on the server's dedicated site. When I get a bit more time I'm going to flesh this out.

obs2nlm v1.2.0

1 min read; 9 GFI

Three months back I released obs2nlm, a tool that takes an Obsidian vault and turns it into a single Markdown file so it can be used as a source for NotebookLM.

Since then I've been using it a lot and it's working out really well.

Meanwhile, one of my vaults has started to creep up towards the documented word limit for a single source in NotebookLM (500,000 words). Right now it's sitting at around 75% and is steadily creeping up.

So, with this in mind, I've made a change I've been planning from the start and have added a --split option. If used, if the generated file looks like it's going to hit the word limit, a second (or more) file will be created. The naming scheme is simple enough: if you ask obs2nlm to create an output file called dirt.md and it needs to run over, it'll then create dirt-2.md, dirt-3.md, and so on. The idea then is that, rather than upload that single Markdown file as a source, you upload all of the generated Markdown files.

Given you get up to 50 sources per notebook, this should see me right for any reasonable vault. As for if it will affect the quality of the results I get when I query the notebook... that's hard to say until I find myself in that situation. If Google are to be believed it shouldn't be an issue, and the alternative is to fall foul of the limit so this seems like the only sensible solution.

I've also added a --dry-run command line switch too; this should be handy for checking how big a vault is when compared to the word limit, without actually generating any files.

BlogMore v2.3.0

1 min read; 9 GFI

I've just pushed BlogMore v2.3.0 up to PyPI. This release has a couple of bug fixes and a couple of significant new features.

The first new feature, which came in as a request, is to add support for control over the themes used for code blocks, including independent control of the themes used for light and dark mode. With these you can specify any of the Pygments styles to use for code blocks. Personally, I prefer to have things blend in, but this now also gives you the chance to have them really contrast (use a light mode theme for dark-mode blog, or a dark mode theme for a light-mode blog).

The other big feature popped into my head earlier today and once I thought about it I had to have it. It's similar to something I had for the photography section of the older version of my website, consisting of a bunch of useless but fun stats and facts about the content.

Things like which hour of the day I tend to post during:

Hour of day

Or the day of the week:

Day of week

Or the month of the year1:

Month of year

This is designed to be turned off by default -- I can imagine most folk would not want this sort of thing on their blog -- but it can easily be turned on with with_stats. The location of the stats can also be controlled using stats_path.


  1. Unsurprisingly March is leaping ahead as of the time of writing. 

Copilot rate limits

1 min read; 12 GFI

Last night, while tinkering with another BlogMore feature request, I ran into the sudden rate limit issue again. As before, there was no warning, there was no indication I was getting close to any sort of limit, and the <duration> I was supposed to wait to let things cool down was given as <duration> rather than an actual value.

So this time I decided to actually drop a ticket on GitHub support. Around 12 hours later they got back to me:

Thanks for writing in to GitHub Support!

I completely understand your frustration with hitting rate limits.

As usage continues to grow on Copilot — particularly with our latest models — we've made deliberate adjustments to our rate limiting to protect platform stability and ensure a reliable experience for all users. As part of this work, we corrected an issue where rate limits were not being consistently enforced across all models. You may notice increased rate limiting as these changes take effect.

Our goal is always that Copilot remains a great experience, and you are not disrupted in your work. If you encounter a rate limit, we recommend switching to a different model, using Auto mode. We're also working on improvements that will give you better visibility into your usage so you're never caught off guard.

We appreciate your patience as we roll out these changes.

So, in other words: expect to be rate limited more on this product we're trying to get everyone hooked on and trying to get everyone to subscribe to.

Neat.

I especially like this part:

We're also working on improvements that will give you better visibility into your usage so you're never caught off guard.

You know, if it were me, if I wanted to build up and keep goodwill for my product, I'd probably do that part first and communicate about it earlier rather than later.

I guess this is why I don't hold the sort of position that gets to make those decisions.

BlogMore v2.2.0

1 min read; 7 GFI

I've just bumped BlogMore to v2.2.0. This release adds post counts to the archive page.

Post counts in the archive

The overall count appears at the top of the page, with further counts broken down for each year and each month. I've tried to ensure that the counts appear subtle enough, but still readable.

Also, serving the same purpose, but giving more information at a glance, I've added the same counts to the table of contents that appears to the right of the archive if you're on a suitably wide display.

Counts in the ToC

This means I can easily see that I've posted more times this month than I have in any other month since I started this blog. In fact, I've posted more times this month than I have in quite a few individual years in the past.

So... that's today's "I thought I'd added everything but oh look here's another thing to add" feature. Which goes some way to explain why there are so many posts this month, I guess.