Posts tagged with "AI"

The other unreliable buddy

2 min read; 10 GFI

Having had Copilot crash out the other day, while working on the linter for BlogMore, I decided to lean into Gemini CLI a little more and see how that got on.

When I first tried it out, a week back, I found it worked fairly well but could be rather slow at times. On the whole though, I found it easy enough to work with; the results weren't too bad, even if it could throw out some mildly annoying code at times.

Yesterday evening though, because of the failure of Copilot, I decided to go just with Gemini and work on the problem of speeding up BlogMore. This worked really well. I found that it followed instructions well1 when given them, and also did a good job of applying what it was told, constantly, without needing to be told again. I actually found I had a bit of a flow going (in the minimal way that you can get any sort of flow going when you're not hand-coding).

Using it, I tackled all the main bottlenecks in BlogMore and got things working a lot faster (at this point it's generating a site in about 1/4 of the time it used to take). By the time that work was done, I wanted to do some last tidying up.

This was where it suddenly got unreliable. I asked it a simple question, not even tasking it with something to do, and it went into "Thinking..." mode and never came back out of it. I seem to remember I gave it 10 minutes and then cancelled the request.

After that I tried again with a different question, having quit the program and started it again with --resume. This time I asked it a different question and the same thing happened. I hit cancel again and then, a moment later, finally got an answer to the previous question.

From this point onwards I could barely ever get a reply out of it. I even tried quitting and starting up again without --resume, only for the same result.

A quick search turns up reports similar to this issue on Reddit, Google's support forums and on GitHub. It looks like I'm not alone in running into this.

This here is one of the things that concerns me about the idea of ever adopting agents as the primary tool for getting code written: the unreliability of their availability, and so the resulting inconsistency of the output. It feels like any perceived win in terms of getting the code written is going to be lost in the frustration of either waiting and trying again when it just gives up playing along, or in running from one agent to another, hoping you find the one that is capable of working with you at that given moment.

Meanwhile folk talk like it's the solution to the problem of software development. It's especially concerning when those folk are in "engineering leadership" or a position with a similar name. When they talk like this they are either displaying a lack of foresight, or betraying a lack of care for the craft they are supposed to represent (amongst other reasons).

It's very timely that this post from Rob Pike popped up in my feed this morning:

Although trained in physics, I worked in the computing industry with pride and purpose for over 40 years. And now I can do nothing but sit back and watch it destroy itself for no valid reason beyond hubris (if I'm being charitable).

Ineffable sadness watching something I once loved deliberately lose its soul.

Yup.


  1. Albeit I sense it pays little to no attention to AGENTS.md 

Speeding up BlogMore

3 min read; 11 GFI

As mentioned recently, Andy did a nifty bit of testing of BlogMore to measure the performance hit of each of the optional generation features. Performance is something I haven't really spent much time thinking about; I cared more about how the result looked than I did about how quickly a site got generated.

I seem to remember that, early on, I did have a bit of a play with trying to get Copilot to tackle the idea of parallel generation of parts of the site, but I didn't seem to get any sort of win out of it so I closed the PR and gave up on the idea.

Now though, especially having added the elapsed time of each step to the progress report during generation, I am curious about what wins there are. One that came to mind right away, and I have been considering dealing with for a wee while, is the fact that every generation downloads the FontAwesome metadata. Caching that would be a simple win, for example.

But, if I'm going to do this, it makes sense that I stick to the spirit of this project and get one or more agents to tackle the problem. So I asked Gemini CLI to look over the codebase and give me some big easy wins. I made clear I didn't want it to do any work yet, I just wanted it to report back what it could find.

The result made a lot of sense. Here's what it said:

Performance Optimization Recommendations for BlogMore

I have conducted a comprehensive review of the blogmore codebase and identified several areas where measurable and significant speedups can be achieved, particularly for larger sites.

The most significant bottleneck is in src/blogmore/backlinks.py. The build_backlink_map function iterates through every internal link in every post. For each link, it calls _extract_snippet, which re-parses the entire Markdown content of the source post to generate a plain-text snippet.

  • The Issue: If a post has 20 internal links, it is parsed 20 times just for backlinks.
  • The Win: Modify the logic to insert unique markers for all links in a post and perform a single Markdown-to-plain-text conversion per post.

2. Redundant Markdown Parsing for Metadata

The Post object in src/blogmore/parser.py has properties like description and reading_time that are accessed frequently (on the post page, index pages, RSS feeds, search index, etc.).

  • The Issue: Post.description calls extract_first_paragraph, and Post.reading_time calls calculate_reading_time. Both of these functions perform a full Markdown conversion every time they are called because they are not cached.
  • The Win: Use functools.cached_property on these fields. Additionally, these should be calculated by extracting text from the html_content that has already been generated during the initial parse, rather than re-parsing the raw Markdown.

3. Sequential Processing

The generator currently processes everything sequentially in a single thread.

  • The Issue: For a site with hundreds or thousands of posts, the initial Markdown parsing and the subsequent Jinja2 rendering are strictly serial, leaving multi-core CPUs mostly idle.
  • The Win: Use multiprocessing (e.g., ProcessPoolExecutor) to parallelize the parsing of Markdown files and the rendering of HTML pages. Since these tasks are CPU-bound and independent, this would scale almost linearly with the number of CPU cores.

4. Asset Management Inefficiencies

The AssetManager in src/blogmore/generator/assets.py performs some operations that could be optimized:

  • FontAwesome Metadata: It downloads icon metadata from GitHub on every build to optimize CSS. This can be cached locally to avoid network latency.
  • Icon Generation: It regenerates favicons and Apple touch icons on every build. This should be skipped if the source icon hasn't changed.

5. Minor Algorithmic Improvements

  • Navigation Lookups: In PageGenerator.generate_post_page, the code uses all_posts.index(post) to find the current post's position for prev/next links. This is an O(N) operation called N times, leading to O(N^2) behavior. Pre-calculating a mapping would make this O(1).
  • Markdown Instance Reuse: Several utility functions (markdown_to_plain_text, extract_first_paragraph) create a fresh markdown.Markdown instance on every call. Reusing a single instance (and calling .reset()) or caching the instance creation would reduce overhead.

Summary

The "Big Wins" are definitely optimizing backlink snippet extraction and implementing parallel parsing/rendering. Together, these would likely reduce generation time for large sites by 60-80% on modern hardware.

At first glance they all seem pretty sensible, and I like that it seems to have found some really obvious big improvements, and also the little wins that might have a cumulative effect. I strongly suspect the parallel generation suggestion is again going to be a non-win (and so for now I'm going to pass on it), but most of the other suggestions seem doable and worth trying out.

I think what I will do, rather than work through them all in one PR, is tackle one at a time. So, over the next few days, expect to see some PRs turn up and get merged (hopefully), as I try and tease some speed wins out of the code.

But is the code that bad?

5 min read; 10 GFI

There is, obviously and understandably, a lot of conversation online about AI and coding and agents and all that stuff. Much of it I get, much of it I agree with, I share the vast majority of the concerns. The impact on people, the impact on society, the impact on the environment, the impact on security... there's a good list of things to worry us there.

The one that crops up a lot though, that I don't quite get, is the constant claim I see that at best AI tools produce bad code, and at worst they produce unworkable code. That really isn't my recent experience.

Sure, going back to 2023 or 2024, when I first started toying with these new chatbot things some folks were raving about, the output was laughable. I can remember spending some fun times trying to coax whatever version of ChatGPT was on the go at the time into writing workable code and being amused by just how bad it was.

Even back in October last year, when I first tried out the free Copilot Pro that GitHub had given me to play with, I tried to get it to build a Textual application for me and it was terrible. The code was bad, it didn't really know how to use Textual properly, the application I was trying to get it to write as a test barely worked. It was a disaster.

A month later, in November of last year, I had a second go and better success. That time the (still not released, perhaps one day) application I was building was Swift-based and worked really well, but I can't really comment on the quality of the code or how idiomatically correct the code is in respect to the type of application it is (it's a wee game that runs on iOS, iPadOS, macOS).

By the time I tried my first serious experiment things seemed to be a little different. The code actually wasn't bad. It wasn't good, it was far from good, but it wasn't bad. Also, because it was Python, I was in a good place to judge the code.

Since I've started working on BlogMore I've noticed issues such as:

  • Lots of repetitive boilerplate code.
  • Lots of magic numbers.
  • Lots of magic strings.
  • Functions with redundant and unused parameters.
  • A default state of just adding more and more code to one file.
  • A habit of writing least-effort-possible type hints.
  • A habit of sometimes taking a hacky shortcut to solve a problem.
  • A habit of sometimes over-engineering a solution to a problem.
  • A weird obsession with importing inside functions.
  • An occasional weird obsession with guarding some imports with TYPE_CHECKING to work around non-existent circular imports.
  • An unwillingness to use newer Python capabilities (I've yet to see it make use of := without being prompted, for example).
  • A tendency to write what I would consider less-elegant code over more-elegant code.

The list isn't exhaustive, of course. The point here is that, as I've reviewed the PRs1, and read the code, I've seen things I wouldn't personally do. I've seen things I wouldn't personally write, I've seen things I've felt the need to push back on, I've seen things I've fully rejected and started over. Ultimately BlogMore isn't the code I would have written, but at the moment it is the application I would have written2.

So, here's the thing: every time I see someone writing a negative toot or post or article or whatever, and they talk about how the code it produces is unworkable, I find myself wondering about how they formed this opinion. Are they just writing the piece for the audience they want? Are they writing the piece based on their experience from months to years back, when these tools did seem to still be laughably bad? Are they simply cynically generating the piece using an LLM to bait for engagement? When I see this particular aspect of such a post it's a bit of a red flag about where they're coming from, kind of like how you suddenly realise that someone who seems to speak with authority might be full of shit when they start to spout questionable "facts" on a subject you understand well.

But wait! What about that list of dodgy stuff I've seen while building BlogMore with Copilot? What about all the reading and reviewing I've had to do, and what about the other crimes against Python coding I can probably still find in the codebase? Surely that is evidence that these tools produce terrible, unworkable, unusable code?

I mean, okay, I suppose I could reach that conclusion if I'd had a massively atypical experience in the software development industry and had never had to review anyone else's code, or had never needed to work on someone else's legacy code. Is what I'm seeing out of Copilot something I'd consider ideal code? Of course not. Is it worse than some of the worst code I've had to deal with since I started coding for a living in 1989? Hell no!

From what I'm seeing right now I'm getting code whose quality is... fine. Mostly it does the job fine. Often it needs a bit of coaxing in the right direction. Sometimes it gets totally confused and goes down a rabbit hole which needs to just be blocked off and we start again. Occasionally it needs rewriting to do the same thing but in a more maintainable way.

All of which sounds very familiar. I've had times where that describes my code (and I would massively distrust anyone who says they've never had the same outcomes in their time writing code). For sure it describes code I've had to take over, maintain or review.

It's almost like it was trained on lots of code written by humans.

Meanwhile... not every instance of using these tools to get code done needs to be about writing actual code. More and more I'm finding Google Gemini (for example) to be a really handy coding buddy and faster "Google this shit 'cos I can't remember this exact thing I want to achieve". I'll ask, I'll almost always get a pretty good answer, and then I can generally take that snippet of code and implement it how I want.

I've seldom had to walk away from that sort of interaction because it was getting me nowhere.

All of which is to say: I remain concerned about a great many things in the AI space at the moment, but I'm also as equally suspicious of someone who just flatly says "and the code it produces just doesn't work". If that's part of an article or post I'm left with the feeling that the author put zero actual effort into forming their opinion, let alone actually writing it.


  1. To varying degrees. Sometimes I have plenty of time to kill and I read the PR carefully, other times I glance it over, be happy there's nothing horrific there, and then decide to push back or merge based on the results of hand-testing and automated testing. 

  2. To be fair, it's the application I would still be writing and would be some time off finishing; there's no way it would be as feature-complete as it is now had I been 100% hand-coding it. 

NGMCP v0.2.0

1 min read; 10 GFI

The experiment with building an MCP server continues, with some hacking on it happening over a couple of hours while killing time in an Edinburgh coffee shop.

It is, of course, a solution looking for a problem, and I suspect I'm the only person who will ever use it, and even then only as a test, but building it is proving interesting.

The main changes in v0.2.0 are:

  • I turned the current search_guide tool into a line_search_guide tool (because that's what it was doing: a line-by-line search).
  • I added a body_search_guide tool that treats all the lines in an entry as a single block of text and then does the search in that (so searches over line breaks will work).
  • I added a read_entry_source tool that, rather than rendering the entry of a guide as plain text with all the markup removed, it instead delivers the underlying "source" for the entry; something that could be useful if you wanted to get an agent to convert it into another marked-up body of text.
  • I added a markup glossary resource, which technically tells an agent everything it needs to know about Norton Guide markup.

The latter one is interesting. I added it and did some experimenting locally and it seemed to be helpful and I could ask questions about markup and Copilot seemed to use it. Meanwhile, having installed v0.2.0 globally on my machine, and having enabled it, I'm finding that Copilot seems to have zero clue about the markup and instead is using the server to go off and read the guides to work out the markup1.

On the other hand, the new "get source" tool seems to work a treat.

Peeking at the source for an entry

So I suspect I still have some reading/experimenting to do when it comes to resources, so I can better understand why I'd want to provide them and what problem they solve.


  1. All credit to it: it did find CREATING.NG and read the markup out of that. 

Global and local MCP servers with Copilot CLI

1 min read; 11 GFI

This morning I'm tinkering some more with NGMCP. Having done a release yesterday and tested it out by globally installing it with:

uv tool install ngmcp

I was then left with the question: how do I easily test the version of the code I'm working on, when I now have it set up globally? Having done the global installation I had ~/.copilot/mcp-config.json looking like this:

{
  "mcpServers": {
    "ngmcp": {
      "command": "ngmcp",
      "args": [],
      "env": {
        "NGMCP_GUIDE_DIRS": "/Users/davep/Documents/Norton Guides"
      }
    }
  }
}

whereas before it looked like this:

{
  "mcpServers": {
    "ngmcp": {
      "command": "uv",
      "args": ["run", "ngmcp"],
      "env": {
        "NGMCP_GUIDE_DIRS": "/Users/davep/Documents/Norton Guides"
      }
    }
  }
}

But now I want both. Ideally I'd want to be able to set up an override for a specific server in a specific repository. I did some searching and reading of the documentation and, from what I can tell, there's no method of doing that right now1. So I've settled on this:

{
  "mcpServers": {
    "ngmcp-global": {
      "command": "ngmcp",
      "args": [],
      "env": {
        "NGMCP_GUIDE_DIRS": "/Users/davep/Documents/Norton Guides"
      }
    },
    "ngmcp-local": {
      "command": "uv",
      "args": ["run", "ngmcp"],
      "env": {
        "NGMCP_GUIDE_DIRS": "/Users/davep/Documents/Norton Guides"
      }
    }
  }
}

and then in Copilot CLI I just use the /mcp command to enable one and disable the other. It's kind of clunky, but it works.


  1. I did see the suggestion that you can write your MCP server so that it does a non-response depending on the context, but that seems horribly situation-specific and wouldn't really help in this case anyway because I want it to work in both contexts, depending on what I'm doing. 

NGMCP - An MCP experiment

2 min read; 12 GFI

Recently I've been thinking that it would be interesting to get to know a little about the Model Context Protocol and see what it's about and get a feel for how useful it might be, if at all, for anything I do.

As always happens when I want to try out something new, I reached for a problem I know well so I don't have to get bogged down in solving the problem itself. As almost always happens, I decided I should base it around Norton Guides.

Part of the point of MCP seems to be providing an interface over sources of data and actions, that an LLM might not otherwise be able to cope with, and so it sounded to me like providing a bridge to the content of Norton Guide files would be a perfect test. Of course, this isn't the first time I've bridged LLMs and NG files, but this is obviously intended to be a more generic solution than throwing a Markdown file at NotebookLM.

Earlier this afternoon I sat down and did some reading, and then decided to throw the problem at GitHub Copilot. I told it I wanted to use my NGDB library as the core of the tool, and that it should wrap it up with FastMCP. The initial result was... a bit of a mess. It sort of worked, sort of, but it also seemed to try and put together a project that mostly looked how my Python repos look, but with some bits just wrong.

I did some cleaning up, did some testing, did some tweaking, and eventually I had something working.

Asking what NGMCP can do

So far I've given the code a fairly quick read over, and I can see what it's doing and how it's going about this. This approach obviously has the disadvantage that I didn't hand-write it so there's still a lot to read to really appreciate what's going on; on the other hand, it does have the advantage that it's implemented a tool based on my library so I know what to expect it to be doing.

There will be more code reading happening, and I also intend to look to tidy up the code more and perhaps hand-add some more features.

Looking at the credits of a guide

I very much doubt that this particular MCP server is going to be any use to anyone, but as a proof of concept it works well for me. If I were in a position of needing to build something genuinely useful, I now have a start and a vague idea.

Reading some text from a guide

On the other hand: once again, as with other projects I've done related to Norton Guides, this is a tool that helps keep the content available and accessible; that alone is one reason for me to tidy this up and move it towards v1.0.0 and keep it maintained.

If you fancy having a play, some (currently Copilot-generated) documentation can be found on the server's dedicated site. When I get a bit more time I'm going to flesh this out.

obs2nlm v1.2.0

1 min read; 9 GFI

Three months back I released obs2nlm, a tool that takes an Obsidian vault and turns it into a single Markdown file so it can be used as a source for NotebookLM.

Since then I've been using it a lot and it's working out really well.

Meanwhile, one of my vaults has started to creep up towards the documented word limit for a single source in NotebookLM (500,000 words). Right now it's sitting at around 75% and is steadily creeping up.

So, with this in mind, I've made a change I've been planning from the start and have added a --split option. If used, if the generated file looks like it's going to hit the word limit, a second (or more) file will be created. The naming scheme is simple enough: if you ask obs2nlm to create an output file called dirt.md and it needs to run over, it'll then create dirt-2.md, dirt-3.md, and so on. The idea then is that, rather than upload that single Markdown file as a source, you upload all of the generated Markdown files.

Given you get up to 50 sources per notebook, this should see me right for any reasonable vault. As for if it will affect the quality of the results I get when I query the notebook... that's hard to say until I find myself in that situation. If Google are to be believed it shouldn't be an issue, and the alternative is to fall foul of the limit so this seems like the only sensible solution.

I've also added a --dry-run command line switch too; this should be handy for checking how big a vault is when compared to the word limit, without actually generating any files.

BlogMore v2.8.0

1 min read; 9 GFI

I've just published v2.8.0 of BlogMore to PyPI. This is a small update which addresses a bug that Andy reported.

The fix was simple enough, and is another little interesting thing to keep in mind given that BlogMore is an ongoing Copilot experiment. When I first kicked off BlogMore I let it decide which library to use to handle Markdown (I'm more used to markdown-it-py via Textual and so via Hike), and so also let it decide which extensions made most sense given the request. I've honestly never run into the idea of metadata before, only ever dealing with or caring about frontmatter1.

On the other hand, I will say this: I was cooking dinner when the report came in; I pointed Copilot at the issue and let it figure it out. After eating, clearing things away, and general post-dinner chilling, I dropped into the repo to see what it had made of it and... it had figured the issue out and fixed it.


  1. I guess technically they're the same thing, but here I mean I'm more used to the delimited YAML of frontmatter than whatever it is the meta plugin was dealing with. 

GitHub Copilot wants our interaction data

(Modified: 2026-03-26 10:17:43 UTC)
3 min read; 11 GFI

I guess it was inevitable1, but yesterday GitHub announced a new opt-out approach to learning from people's interactions with Copilot. I don't have anything novel or insightful to say on this switch, and I'm sure folk with better-informed opinions have already rushed out posts and articles about this, but I did want to jot down just how curious I am to see this roll out.

For starters: for me this feels like one of those things that will get a lot of backlash, and in a day or so GitHub will say they're pausing rolling this out while they reevaluate this approach2. Then, eventually, they roll it out anyway after a "period of consultation with the community". That sort of thing.

I've not read further this morning, but before going to bed last night it wasn't a happy time in the comments section of the FAQ. I can also see why some would be cynical about this change, given the tone of some of the questions and answers in that FAQ. I'll hand it to them: they're pretty candid and honest with the FAQs, but kinda yikes too.

A bad time in the FAQ

Here's the key thing I'm curious about, and which I'll be thinking about and watching for movement on in the next few days: all the talk here seems to be about protecting the privacy of the proprietary code of businesses3. That... is understandable, from a business point of view, from a commercial adoption point of view, from a "we want all software engineering departments to use Copilot" point of view. But how the heck are they really going to manage that?

In the comments in the FAQ this explanation stood out:

We do not train on the contents from any paid organization’s repos, regardless of whether a user is working in that repo with a Copilot Free, Pro, or Pro+ subscription. If a user’s GitHub account is a member of or outside collaborator with a paid organization, we exclude their interaction data from model training.

This seems somewhat unclear to me. Let's walk this one through for a moment: my GitHub account is a member of a "paid organisation". My account is also my account, for my personal code, I've had it a long time and it's filled with a lot of FOSS repos and I keep adding more. So which scenario is the right one here?

  1. Because I'm currently a member of at least one "paid organisation" I'm always opted-out of this training no matter how the opt in/out setting is set and no matter what code I work on?
  2. Because I'm currently a member of at least one "paid organisation" I can opt in when working on code that is from a repository which is mine, but I'm opted out when I'm working on code from a repository belonging to the paid organisation?

I think it reads like it's #1. But then that seems rather odd to me because, if I go and look at my settings right now, I can elect to opt in/out of this training system. If the correct reading is #1 why not just disable that setting altogether and say below it that I'm opted-out because I'm part of a paid organisation?

Which sort of suggests we should perhaps read it as #2? If that, that raises all sorts of questions. How would Copilot know I'm working on code from such a repository? Sure, it's not impossible to infer if I am working within the context of a given repository, doing some fun stuff to work out the origin and so on, but it feels messy. It also feels like a scenario that could end up being incredibly leaky. It really would not be difficult to run into a scenario where I'm working on some non-Free code but in an environment where the licence isn't clear, or where it appears that the licence4 would permit such training.

ℹ️ Note

Editing to add: there is even a comment where it is acknowledged that someone could be working in such a way that it's impossible to know the provenance of the code: "Copilot ... can even work when you are not connected to any repo."

Or... perhaps there's a #3, or a #4, or so on, that I've not even considered yet. The fact that software engineering departments suddenly have to start thinking about this issue (yes, I know, it's been a background issue for a while but this really drags it out into the open) is going to make for a few interesting weeks, assuming people care about where their code ends up.

Who knows. Perhaps, in some strange way, this is how all software ends up being Free.


  1. And I think a bit of me is surprised that they weren't just doing it anyway. 

  2. This isn't a prediction, I'm just saying it feels like that sort of announcement. 

  3. It's not that simple, but to save getting into the deep detail... 

  4. I'm using licence here as shorthand for a lot of things to consider relating to who should have access to the code and how. 

BlogMore v2.3.0

1 min read; 9 GFI

I've just pushed BlogMore v2.3.0 up to PyPI. This release has a couple of bug fixes and a couple of significant new features.

The first new feature, which came in as a request, is to add support for control over the themes used for code blocks, including independent control of the themes used for light and dark mode. With these you can specify any of the Pygments styles to use for code blocks. Personally, I prefer to have things blend in, but this now also gives you the chance to have them really contrast (use a light mode theme for dark-mode blog, or a dark mode theme for a light-mode blog).

The other big feature popped into my head earlier today and once I thought about it I had to have it. It's similar to something I had for the photography section of the older version of my website, consisting of a bunch of useless but fun stats and facts about the content.

Things like which hour of the day I tend to post during:

Hour of day

Or the day of the week:

Day of week

Or the month of the year1:

Month of year

This is designed to be turned off by default -- I can imagine most folk would not want this sort of thing on their blog -- but it can easily be turned on with with_stats. The location of the stats can also be controlled using stats_path.


  1. Unsurprisingly March is leaping ahead as of the time of writing.