Posts tagged with "AI"

Busy doing nothing

3 min read

The first company I worked for full-time had two offices. One in the south of England, another in the north. Despite being a northern lad, I'd somehow found myself working in the southern office. While the company used a few languages, there was a split between the two offices, mostly driven by the fact that the northern office was more minicomputer-based (lots of DEC stuff as I recall), whereas at our office it was more PC-based (we were an Apricot dealership, amongst other things). At our office, the predominant language was Clipper (later to be called CA-Clipper).

At one point, at the other office, they hired someone to start doing Clipper coding up there too, and he was handed his first project, to add a new report to an existing system. After around three weeks, he just didn't turn up for work one day, called in to say he'd quit (or so I was told). Meanwhile, the work he had done didn't seem to be working. If someone took the newly-compiled system and ran the new report, nothing happened.

When the code was looked at, it became clear why. The new module had one line of code. Well, not one line of code exactly: it had a one-line comment.

* This is too hard. I can't do this.

That was it. He'd spent those weeks appearing to work on the requirement, but never produced a single line of actual code.

I felt really bad for the guy. He'd somehow managed to make it through the interview, somehow managed to convince others, and himself, that he was capable of working with Clipper and writing code (probably made easier by the fact that the office in question wasn't a "Clipper shop"). But when it came to actually getting on with a job, he'd been unable to get it done (and, apparently, had felt unable to ask anyone around him for help, which probably says a lot about that office and the industry at the time1).

I bring this up because I was reminded of this story when I was tinkering with Gemini last night. While working on the optimised images PR, towards the end of the session, I asked it to make a particular change. It then started "thinking", and after a couple of minutes appeared to get to work on the problem. It kept printing, scrubbing out and printing again, lines of text of what it was apparently doing. This went on for something like five minutes. Eventually it announced that the work had been done, explained what it had changed, and how it had implemented the requirement.

I flipped to another terminal to test out the work and... no changes. Zero changes. Nothing to diff, nothing to commit.

I flipped back to the CLI app, mentioned that nothing had changed, and it then very quickly made some edits; nothing spectacular, a 14-line diff affecting five lines (to start with).

This is the first time I've seen this, and I guess yet another thing I need to keep an eye out for. Of course I would notice if I asked for some work to be done and it wasn't done (I did), but it feels like another method via which this "productivity tool" can make you less productive.

If you give me the under-qualified, solution-paralysed, entry-level developer who doesn't know how to proceed, I can help them. Their current inability to actually bash on the keyboard and make code appear isn't the problem here. Giving them a tool that will busy-work for five minutes and produce nothing isn't going to help them, neither will things improve if they're given a tool that does emit all the code. Removing the human element is going to remove safety, growth and also domain knowledge. I feel it's going to rot software engineering departments from within, if handled badly.

Watching people talk about agents as if they're the solution, and that writing code is now a solved problem, really troubles me. I won't question the idea that it can be a very useful tool -- goodness knows I've found it useful recently -- but I do question the common assertion that it finally is a silver bullet. I find this to be lazy, dangerous and harmful thinking.


  1. Because of course we're so much better as an industry these days. 

Gemini CLI vs GitHub Copilot (the result)

4 min read

Following on from this morning's initial experiment, I think I'm settling on a winner. Rather than be annoying and have you scroll to the bottom to find out: it's Gemini CLI. Here's how I found the process played out, and why I'm settling for one over the other.


Gemini CLI

Initially this was an absolute mess. After letting it initially work on the problem, the resulting code didn't even really run. The first go, and the three follow-up prompt/result cycles that followed, all resulted in code that had runtime errors. I'm pretty sure it didn't even bother to try and do any adequate testing. This is odd given I've generally seen it do an okay job when it comes to writing and running tests.

Once I had the code in a stable state, with all type checking, linting and testing passing, it still didn't work. No matter how I tried to use the new facility it just didn't make a difference. No images were optimised. In the end I dived into the code, with the help of its attempt at debugging (it added print calls to try and get to the bottom of things -- how very human!), diagnosed what I thought was the issue (it was looking in the wrong location for the files to optimise), told it my hypothesis and let it check if I was right. It concluded I was and fixed the problem.

Since then I've had a working implementation of the initial plan.

Once that was in place it's been a pretty smooth journey. I've asked it questions about the implementation, had my concerns set to rest, had some concerns addressed and fixed, improved some things here and there, added new features, etc.

All of this has left me with 18% of my daily quota used up. While I think this is the highest I've ever got while using Gemini CLI, it still feels like I got a lot of things done for not a lot of quota use.

GitHub Copilot

Initially I thought this had managed to one-shot the problem. Once it had finished its initial work the code ran without incident and produced all the optimised files. Or so I thought. Doing a little more testing, though, it became clear it was only optimising a subset of the images and it didn't seem to be producing the actual HTML to use the images.

On top of this it didn't even follow the full plan that was laid out in the issue it was assigned. For example: once I'd got it doing the main part of the work, it became apparent that it had pretty much ignored the whole idea of using a cache to speed this process up. I had to remind it to do this.

At one point I switched from the in-PR web interaction with Copilot, and used the local CLI instead. When I ran that up it warned me that I was already 50% of the way through some sort of rate limit and this wouldn't reset for another 3 hours. I think I was about 40 minutes into letting it try and do the work at this point.

After a bit more testing and follow-up prompts, I got to a point where I had something that looked like it was working; albeit in a slightly different way from how Gemini CLI did it (the Copilot approach was writing the optimised images out to the extras directory, mixing them in with my own images; Gemini opted for having a separate directory for optimised images within the static hierarchy).


At this point I will admit to not having carefully reviewed the code of either agent; that's a job still to do. But while Gemini got off to a very rocky start, with a bit of guidance it seemed to arrive at an implementation I'm happy with, and one that seems to be working as intended. While it didn't anticipate all the edge cases, when I asked about them it easily found and implemented solutions for them. Moreover, the fact that I could do all of this and confidently know the "cost" made a huge difference. Copilot seems to generally approach this like a quota or rate limit should be a lovely surprise that will destroy your flow; Gemini has it there and in front of you, all the time.

As for the general idea that I'm working on: I think I'm going to implement it. Weirdly I'm slightly nervous about building the blog such that it won't be using the images I created, but I also recognise that that's a little irrational. Meanwhile I'm very curious about the impact this might have on the PageSpeed measurement of the blog. While it's far from horrific, image size optimisation and size declaration seem to be fairly high on the things that are impacting the performance score (currently sat at 89 for the front page of the blog, as I type this).

The other thing that gives me pause for thought about merging this in, and then subsequently using it, is that I've just finished migrating all images to webp, and so saving a lot of space in the built version of the blog. Generating all the responsive sizes of the images eats that up again. With this feature off, the built version of the blog stands at about 84MB; with it on, this rises to 133MB. That extra 49MB more than eats up the 24MB saving I made earlier.

On the other hand: storage is a thing for GitHub to worry about, what I'm worrying about here, and aiming to improve, is the reader's experience.

I'm going to sit on this for a short while and play around with it, at least until I get impatient and say "what the hell" and run with it.

Gemini is kind of messy

1 min read

As I've mentioned a few times recently, I'm using Google's Gemini CLI more at the moment; in part because I have a Gemini Pro account so it makes sense to use it, but also in anticipation of dropping anything to do with Copilot.

While I've had some troubles with it -- as can be seen here, here and here for example -- I'm mostly having an okay time. The code it writes isn't too bad, and while it seems to need a little more direction and overseeing than I've been used to while using Copilot/Claude, it generally seems to arrive at sensible solutions for the problems I'm throwing at it1.

One difference with working with Copilot CLI that I have noticed, however, is that Gemini doesn't seem to care for cleaning up after itself. When faced with a problem it'll often write a test program or two, perhaps even create a subdirectory to hold some test data, run the tests and be sure about the outcome. This is good to see. It's not unusual for me to do this myself (or at least in the REPL anyway). But it really doesn't seem to care to actually clean up those tests. A handful of times now I've had it leave those files and directories kicking around. I've even said to it "please clean up your test files" and it's gone right ahead and done so, which suggests it "knows" what it did and what it should do.

This also feels like a new source of mess for all the people who commit their executables and the like to their repositories. That should be fun.

The thing I don't know or understand, at least at the moment, is if this is down to the CLI harness itself, or the choice of model, or a combination of both, or something else. I'm curious to know more.


  1. There is a weird thing I'm seeing, which I want to try and properly capture at some point, where it'll start tinkering with unrelated code, I'll undo the change, it'll throw it back in the next go, I'll undo, rinse, repeat... 

When Gemini CLI gets stuck

1 min read

Another evening, and another period of Gemini CLI getting stuck thinking. So this time I thought I'd try something: cancel it while it was thinking and change the model.

Gemini Thinking...

I was working on something new for BlogMore and, sure enough, after a wee while, we got stuck in "Thinking..." mode. So I hit Escape and asked to pick a different model. I chose to pick manually, and went with gemini-3.1-pro-preview.

Picking the model

I then literally asked that it carry on where it left off...

Carry on

...and it did! It worked. No more sitting around thinking for ages.

Watching the quota after doing this, it looks like the model I was using ate quota faster, but that was worth it given I've never come close to hitting full quota with Gemini CLI.

Once the immediate job was done, I went back to auto and it worked for a bit, only to get stuck thinking again. I repeated this process and it did the trick a second time. From now on I'm going to use this approach.

It does, again, highlight how unreliable these tools are, but at least I've found a workaround that works for now.

The Copilot bait and switch

4 min read

Well, it's here: GitHub's tool to let you see how much better off you're going to be under the new Copilot billing system that comes in next month. It's... something.

But let's set the background first. I'm here (in Copilot usage space) as an observer, spending time on an experiment that started with the free pro tier and then transitioned into the "okay, I'll play along for $10 a month, the tool I'm building is fun to work on and is useful to me" phase. I doubted it would last forever -- the price was obviously too good to be true for too long -- but I wasn't expecting it to collapse quite so soon and in quite such a spectacular way.

When GitHub announced the move to usage-based billing I was curious to see if I'd be better off or worse off. It was hard to call really. My use of Copilot is sporadic, and as BlogMore has started to settle down and reach a state approaching feature-saturation the need to do heavy work on it has reduced. I did use it a fair bit last month, but that was more in tinkering and experimenting mode rather than full development mode1, so it's probably a good measure.

Checking the details on GitHub, it looks like I used a touch under 1/3 of my premium requests.

A table of my premium requests for April 2026

It also looks like the usage came in a couple of bursts lasting a few days, with a pretty flat period in the middle of the month.

Cumulative use for April

So, technically, GitHub won. I paid them $10 for 300 premium requests, I left a touch over 2/3 unused. I think it's fair to suggest that I'm a pretty lightweight user, even when I have a project under active development.

This is where the new usage-based preview tool comes in. Launched yesterday, it lets you take your existing usage stats and see how much it would have really cost you.

The app itself comes over as being hastily spat out with an agent and little communication between responsible teams. You'd think you just press a button when viewing some historical usage figures and get a display that shows you what it would cost under the new approach.

You'd think.

Nope. First you generate your report for a particular month. Then have to ask for it to be emailed to you as a CSV!

Requesting the email

Even that part isn't super reliable. When I tried it last night it took a wee while to turn up, and that was after about 10 attempts where I got an error message saying it couldn't generate the report. This morning I tried again and I've yet to see the email, 30 minutes later2.

Having done that you click through to another page/app where you have to upload the CSV, to GitHub, that GitHub just sent you in an email. Brilliant. It then gives you the good news.

So what is my 1/3 use of the premium request allowance going to save me under the new approach to billing?

Such a good deal

Amazing. I especially like the part where they spin it as: if I spent $39/month with them I would save money!

I guess I should take comfort that I'm not that one Reddit user whose $39 April would really have cost them almost $6,0003.

Watching this journey has been wild. The free Pro as a taster to get me onto $10/month I can go with, that's fair enough. For the longest time I never even paid it any attention. But watching GitHub give it to so many people, and especially so many students, and then watching them do shocked Pikachu when it cost them an arm and a leg and probably caused the degradation of the performance of their systems... who could possibly have seen this coming? Impossible to predict.

Back when I first wrote about my initial impressions of working with Copilot I wondered in the conclusion if I'd transition to a paying version of Copilot. I obviously did. At $10/month it was a very affordable tinker toy that gave me a new dimension to the hobby side of my love of creating things with code. But the prospect of paying $39/month for something in the region of 1/3 of requests that I had before: nah, I'm not into that.

It looks like this month will be the last month I keep a Copilot subscription. BlogMore will carry on being developed, I'll probably transition to leaning on Gemini CLI more (as I have been the last week anyway), and also start to get my hands dirty with the code more too.

This feels like one of the early signs of the bait and switch that the AI suppliers have been building up all along. Experimenting and better understanding how and why people use these tools has been seriously useful, and I can't help but feel that I accidentally started at just the right moment. Watching this happen, with actual experience of what's going on, is very educational. It's going to be super interesting to see if this same stunt gets pulled on a bigger scale, with all the companies that uncritically embraced AI at every level of their organisation.

It's going to be especially interesting to watch the AI leaders in those companies to see how they spin this, if and when the real costs are more widely applied.


  1. Is my recollection. I should probably review the ChangeLog and see what I actually did add in April. 

  2. Yes I checked spam. 

  3. In part because yikes, but also in part because at least I'm not the reason this is happening, unlike them. 

The other unreliable buddy

2 min read

Having had Copilot crash out the other day, while working on the linter for BlogMore, I decided to lean into Gemini CLI a little more and see how that got on.

When I first tried it out, a week back, I found it worked fairly well but could be rather slow at times. On the whole though, I found it easy enough to work with; the results weren't too bad, even if it could throw out some mildly annoying code at times.

Yesterday evening though, because of the failure of Copilot, I decided to go just with Gemini and work on the problem of speeding up BlogMore. This worked really well. I found that it followed instructions well1 when given them, and also did a good job of applying what it was told, constantly, without needing to be told again. I actually found I had a bit of a flow going (in the minimal way that you can get any sort of flow going when you're not hand-coding).

Using it, I tackled all the main bottlenecks in BlogMore and got things working a lot faster (at this point it's generating a site in about 1/4 of the time it used to take). By the time that work was done, I wanted to do some last tidying up.

This was where it suddenly got unreliable. I asked it a simple question, not even tasking it with something to do, and it went into "Thinking..." mode and never came back out of it. I seem to remember I gave it 10 minutes and then cancelled the request.

After that I tried again with a different question, having quit the program and started it again with --resume. This time I asked it a different question and the same thing happened. I hit cancel again and then, a moment later, finally got an answer to the previous question.

From this point onwards I could barely ever get a reply out of it. I even tried quitting and starting up again without --resume, only for the same result.

A quick search turns up reports similar to this issue on Reddit, Google's support forums and on GitHub. It looks like I'm not alone in running into this.

This here is one of the things that concerns me about the idea of ever adopting agents as the primary tool for getting code written: the unreliability of their availability, and so the resulting inconsistency of the output. It feels like any perceived win in terms of getting the code written is going to be lost in the frustration of either waiting and trying again when it just gives up playing along, or in running from one agent to another, hoping you find the one that is capable of working with you at that given moment.

Meanwhile folk talk like it's the solution to the problem of software development. It's especially concerning when those folk are in "engineering leadership" or a position with a similar name. When they talk like this they are either displaying a lack of foresight, or betraying a lack of care for the craft they are supposed to represent (amongst other reasons).

It's very timely that this post from Rob Pike popped up in my feed this morning:

Although trained in physics, I worked in the computing industry with pride and purpose for over 40 years. And now I can do nothing but sit back and watch it destroy itself for no valid reason beyond hubris (if I'm being charitable).

Ineffable sadness watching something I once loved deliberately lose its soul.

Yup.


  1. Albeit I sense it pays little to no attention to AGENTS.md 

Speeding up BlogMore

3 min read

As mentioned recently, Andy did a nifty bit of testing of BlogMore to measure the performance hit of each of the optional generation features. Performance is something I haven't really spent much time thinking about; I cared more about how the result looked than I did about how quickly a site got generated.

I seem to remember that, early on, I did have a bit of a play with trying to get Copilot to tackle the idea of parallel generation of parts of the site, but I didn't seem to get any sort of win out of it so I closed the PR and gave up on the idea.

Now though, especially having added the elapsed time of each step to the progress report during generation, I am curious about what wins there are. One that came to mind right away, and I have been considering dealing with for a wee while, is the fact that every generation downloads the FontAwesome metadata. Caching that would be a simple win, for example.

But, if I'm going to do this, it makes sense that I stick to the spirit of this project and get one or more agents to tackle the problem. So I asked Gemini CLI to look over the codebase and give me some big easy wins. I made clear I didn't want it to do any work yet, I just wanted it to report back what it could find.

The result made a lot of sense. Here's what it said:


Performance Optimization Recommendations for BlogMore

I have conducted a comprehensive review of the blogmore codebase and identified several areas where measurable and significant speedups can be achieved, particularly for larger sites.

The most significant bottleneck is in src/blogmore/backlinks.py. The build_backlink_map function iterates through every internal link in every post. For each link, it calls _extract_snippet, which re-parses the entire Markdown content of the source post to generate a plain-text snippet.

  • The Issue: If a post has 20 internal links, it is parsed 20 times just for backlinks.
  • The Win: Modify the logic to insert unique markers for all links in a post and perform a single Markdown-to-plain-text conversion per post.

2. Redundant Markdown Parsing for Metadata

The Post object in src/blogmore/parser.py has properties like description and reading_time that are accessed frequently (on the post page, index pages, RSS feeds, search index, etc.).

  • The Issue: Post.description calls extract_first_paragraph, and Post.reading_time calls calculate_reading_time. Both of these functions perform a full Markdown conversion every time they are called because they are not cached.
  • The Win: Use functools.cached_property on these fields. Additionally, these should be calculated by extracting text from the html_content that has already been generated during the initial parse, rather than re-parsing the raw Markdown.

3. Sequential Processing

The generator currently processes everything sequentially in a single thread.

  • The Issue: For a site with hundreds or thousands of posts, the initial Markdown parsing and the subsequent Jinja2 rendering are strictly serial, leaving multi-core CPUs mostly idle.
  • The Win: Use multiprocessing (e.g., ProcessPoolExecutor) to parallelize the parsing of Markdown files and the rendering of HTML pages. Since these tasks are CPU-bound and independent, this would scale almost linearly with the number of CPU cores.

4. Asset Management Inefficiencies

The AssetManager in src/blogmore/generator/assets.py performs some operations that could be optimized:

  • FontAwesome Metadata: It downloads icon metadata from GitHub on every build to optimize CSS. This can be cached locally to avoid network latency.
  • Icon Generation: It regenerates favicons and Apple touch icons on every build. This should be skipped if the source icon hasn't changed.

5. Minor Algorithmic Improvements

  • Navigation Lookups: In PageGenerator.generate_post_page, the code uses all_posts.index(post) to find the current post's position for prev/next links. This is an O(N) operation called N times, leading to O(N^2) behavior. Pre-calculating a mapping would make this O(1).
  • Markdown Instance Reuse: Several utility functions (markdown_to_plain_text, extract_first_paragraph) create a fresh markdown.Markdown instance on every call. Reusing a single instance (and calling .reset()) or caching the instance creation would reduce overhead.

Summary

The "Big Wins" are definitely optimizing backlink snippet extraction and implementing parallel parsing/rendering. Together, these would likely reduce generation time for large sites by 60-80% on modern hardware.


At first glance they all seem pretty sensible, and I like that it seems to have found some really obvious big improvements, and also the little wins that might have a cumulative effect. I strongly suspect the parallel generation suggestion is again going to be a non-win (and so for now I'm going to pass on it), but most of the other suggestions seem doable and worth trying out.

I think what I will do, rather than work through them all in one PR, is tackle one at a time. So, over the next few days, expect to see some PRs turn up and get merged (hopefully), as I try and tease some speed wins out of the code.

But is the code that bad?

5 min read

There is, obviously and understandably, a lot of conversation online about AI and coding and agents and all that stuff. Much of it I get, much of it I agree with, I share the vast majority of the concerns. The impact on people, the impact on society, the impact on the environment, the impact on security... there's a good list of things to worry us there.

The one that crops up a lot though, that I don't quite get, is the constant claim I see that at best AI tools produce bad code, and at worst they produce unworkable code. That really isn't my recent experience.

Sure, going back to 2023 or 2024, when I first started toying with these new chatbot things some folks were raving about, the output was laughable. I can remember spending some fun times trying to coax whatever version of ChatGPT was on the go at the time into writing workable code and being amused by just how bad it was.

Even back in October last year, when I first tried out the free Copilot Pro that GitHub had given me to play with, I tried to get it to build a Textual application for me and it was terrible. The code was bad, it didn't really know how to use Textual properly, the application I was trying to get it to write as a test barely worked. It was a disaster.

A month later, in November of last year, I had a second go and better success. That time the (still not released, perhaps one day) application I was building was Swift-based and worked really well, but I can't really comment on the quality of the code or how idiomatically correct the code is in respect to the type of application it is (it's a wee game that runs on iOS, iPadOS, macOS).

By the time I tried my first serious experiment things seemed to be a little different. The code actually wasn't bad. It wasn't good, it was far from good, but it wasn't bad. Also, because it was Python, I was in a good place to judge the code.

Since I've started working on BlogMore I've noticed issues such as:

  • Lots of repetitive boilerplate code.
  • Lots of magic numbers.
  • Lots of magic strings.
  • Functions with redundant and unused parameters.
  • A default state of just adding more and more code to one file.
  • A habit of writing least-effort-possible type hints.
  • A habit of sometimes taking a hacky shortcut to solve a problem.
  • A habit of sometimes over-engineering a solution to a problem.
  • A weird obsession with importing inside functions.
  • An occasional weird obsession with guarding some imports with TYPE_CHECKING to work around non-existent circular imports.
  • An unwillingness to use newer Python capabilities (I've yet to see it make use of := without being prompted, for example).
  • A tendency to write what I would consider less-elegant code over more-elegant code.

The list isn't exhaustive, of course. The point here is that, as I've reviewed the PRs1, and read the code, I've seen things I wouldn't personally do. I've seen things I wouldn't personally write, I've seen things I've felt the need to push back on, I've seen things I've fully rejected and started over. Ultimately BlogMore isn't the code I would have written, but at the moment it is the application I would have written2.

So, here's the thing: every time I see someone writing a negative toot or post or article or whatever, and they talk about how the code it produces is unworkable, I find myself wondering about how they formed this opinion. Are they just writing the piece for the audience they want? Are they writing the piece based on their experience from months to years back, when these tools did seem to still be laughably bad? Are they simply cynically generating the piece using an LLM to bait for engagement? When I see this particular aspect of such a post it's a bit of a red flag about where they're coming from, kind of like how you suddenly realise that someone who seems to speak with authority might be full of shit when they start to spout questionable "facts" on a subject you understand well.

But wait! What about that list of dodgy stuff I've seen while building BlogMore with Copilot? What about all the reading and reviewing I've had to do, and what about the other crimes against Python coding I can probably still find in the codebase? Surely that is evidence that these tools produce terrible, unworkable, unusable code?

I mean, okay, I suppose I could reach that conclusion if I'd had a massively atypical experience in the software development industry and had never had to review anyone else's code, or had never needed to work on someone else's legacy code. Is what I'm seeing out of Copilot something I'd consider ideal code? Of course not. Is it worse than some of the worst code I've had to deal with since I started coding for a living in 1989? Hell no!

From what I'm seeing right now I'm getting code whose quality is... fine. Mostly it does the job fine. Often it needs a bit of coaxing in the right direction. Sometimes it gets totally confused and goes down a rabbit hole which needs to just be blocked off and we start again. Occasionally it needs rewriting to do the same thing but in a more maintainable way.

All of which sounds very familiar. I've had times where that describes my code (and I would massively distrust anyone who says they've never had the same outcomes in their time writing code). For sure it describes code I've had to take over, maintain or review.

It's almost like it was trained on lots of code written by humans.

Meanwhile... not every instance of using these tools to get code done needs to be about writing actual code. More and more I'm finding Google Gemini (for example) to be a really handy coding buddy and faster "Google this shit 'cos I can't remember this exact thing I want to achieve". I'll ask, I'll almost always get a pretty good answer, and then I can generally take that snippet of code and implement it how I want.

I've seldom had to walk away from that sort of interaction because it was getting me nowhere.

All of which is to say: I remain concerned about a great many things in the AI space at the moment, but I'm also as equally suspicious of someone who just flatly says "and the code it produces just doesn't work". If that's part of an article or post I'm left with the feeling that the author put zero actual effort into forming their opinion, let alone actually writing it.


  1. To varying degrees. Sometimes I have plenty of time to kill and I read the PR carefully, other times I glance it over, be happy there's nothing horrific there, and then decide to push back or merge based on the results of hand-testing and automated testing. 

  2. To be fair, it's the application I would still be writing and would be some time off finishing; there's no way it would be as feature-complete as it is now had I been 100% hand-coding it. 

NGMCP v0.2.0

1 min read

The experiment with building an MCP server continues, with some hacking on it happening over a couple of hours while killing time in an Edinburgh coffee shop.

It is, of course, a solution looking for a problem, and I suspect I'm the only person who will ever use it, and even then only as a test, but building it is proving interesting.

The main changes in v0.2.0 are:

  • I turned the current search_guide tool into a line_search_guide tool (because that's what it was doing: a line-by-line search).
  • I added a body_search_guide tool that treats all the lines in an entry as a single block of text and then does the search in that (so searches over line breaks will work).
  • I added a read_entry_source tool that, rather than rendering the entry of a guide as plain text with all the markup removed, it instead delivers the underlying "source" for the entry; something that could be useful if you wanted to get an agent to convert it into another marked-up body of text.
  • I added a markup glossary resource, which technically tells an agent everything it needs to know about Norton Guide markup.

The latter one is interesting. I added it and did some experimenting locally and it seemed to be helpful and I could ask questions about markup and Copilot seemed to use it. Meanwhile, having installed v0.2.0 globally on my machine, and having enabled it, I'm finding that Copilot seems to have zero clue about the markup and instead is using the server to go off and read the guides to work out the markup1.

On the other hand, the new "get source" tool seems to work a treat.

Peeking at the source for an entry

So I suspect I still have some reading/experimenting to do when it comes to resources, so I can better understand why I'd want to provide them and what problem they solve.


  1. All credit to it: it did find CREATING.NG and read the markup out of that. 

Global and local MCP servers with Copilot CLI

1 min read

This morning I'm tinkering some more with NGMCP. Having done a release yesterday and tested it out by globally installing it with:

uv tool install ngmcp

I was then left with the question: how do I easily test the version of the code I'm working on, when I now have it set up globally? Having done the global installation I had ~/.copilot/mcp-config.json looking like this:

{
  "mcpServers": {
    "ngmcp": {
      "command": "ngmcp",
      "args": [],
      "env": {
        "NGMCP_GUIDE_DIRS": "/Users/davep/Documents/Norton Guides"
      }
    }
  }
}

whereas before it looked like this:

{
  "mcpServers": {
    "ngmcp": {
      "command": "uv",
      "args": ["run", "ngmcp"],
      "env": {
        "NGMCP_GUIDE_DIRS": "/Users/davep/Documents/Norton Guides"
      }
    }
  }
}

But now I want both. Ideally I'd want to be able to set up an override for a specific server in a specific repository. I did some searching and reading of the documentation and, from what I can tell, there's no method of doing that right now1. So I've settled on this:

{
  "mcpServers": {
    "ngmcp-global": {
      "command": "ngmcp",
      "args": [],
      "env": {
        "NGMCP_GUIDE_DIRS": "/Users/davep/Documents/Norton Guides"
      }
    },
    "ngmcp-local": {
      "command": "uv",
      "args": ["run", "ngmcp"],
      "env": {
        "NGMCP_GUIDE_DIRS": "/Users/davep/Documents/Norton Guides"
      }
    }
  }
}

and then in Copilot CLI I just use the /mcp command to enable one and disable the other. It's kind of clunky, but it works.


  1. I did see the suggestion that you can write your MCP server so that it does a non-response depending on the context, but that seems horribly situation-specific and wouldn't really help in this case anyway because I want it to work in both contexts, depending on what I'm doing.