Posts from April 2026

A different approach

4 min read

As mentioned in the previous post, I've been having a play around with Copilot/Claude vs Gemini when it comes to getting the agents to seek out "bad" code and improve it. In that first post on the subject, I highlighted how both tools noticed some real duplication of effort, both addressed it in more or less the same way, and neither of them took the clean-up to its logical conclusion (or, at the very least, neither cleaned it up in a way that I feel is acceptable).

The comparison of the two PRs (Gemini vs Claude via Copilot) is going to be a slow and occasional read, and if I notice something that catches my interest, I'll note it on this blog.

Initially, I was looking at which files were touched by both. With Gemini it was:

And with Copilot/Claude:

On the surface, it looks like Claude might have done a better job of finding untidy issues in the code. Of course a proper read/assessment of the outcome is needed to decide which is "better"; not to mention the application of a lot of personal taste.

So, with the initial/surface impression that "Claude went deeper", I took a look at the first file they had in common: content_path.py. This is documented as a module related to:

Shared path-resolution utilities for content output paths.

This module provides the generic building blocks used by page_path and post_path. Each content type supplies its own allowed-variable set and variable dict; this module handles the common validation, substitution, and safety checks.

There's 3 functions in there:

  • validate_path_template -- for validating a format string used in building a path.
  • resolve_path -- given a template and some values to populate variables in the template, create a path.
  • safe_output_path -- helper function for joining paths and ensuring they don't escape the output directory.

These seem like sensible functions to have in here, and I can imagine me writing a similar set in terms of the problem they seek to solve.

Both agents seemed to agree on what needed some work: validate_path_template. Both also seem to agree that building knowledge of which variable is required into the function itself isn't terribly flexible; I feel this is a reasonable review of the situation. However, the two agents seem to disagree on how this should be resolved.

Claude's take on this is that the function should grow an optional keyword argument called required_variable, which defaults to slug. It also adds an assert to test if the required variable exists in the allowed_variables (okay, I could quibble about this but given this is a code-check rather than a user-input check, eh, I can go with it). Finally it does the check using the new variable and also makes the error reporting a touch more generic too.

--- /Users/davep/content_path.py        2026-04-30 13:20:00.737955197 +0100
+++ src/blogmore/content_path.py        2026-04-30 13:20:04.560178727 +0100
@@ -17,13 +17,15 @@
     template: str,
     config_key: str,
     allowed_variables: frozenset[str],
-    item_name: str,
+    item_name: str = "",
+    *,
+    required_variable: str | None = "slug",
 ) -> None:
     """Validate a path format string for a content type.

     Checks that *template* is non-empty, well-formed, references only
-    variables from *allowed_variables*, and includes the mandatory
-    ``{slug}`` placeholder.
+    variables from *allowed_variables*, and (when *required_variable* is
+    not ``None``) includes the mandatory placeholder.

     Args:
         template: The path format string to validate.
@@ -33,11 +35,19 @@
             template.
         item_name: The human-readable name of the content type used in
             the uniqueness error message (e.g. ``"page"`` or ``"post"``).
+            Ignored when *required_variable* is ``None``.
+        required_variable: The variable name that must appear in the
+            template, or ``None`` if no variable is mandatory.  Defaults
+            to ``"slug"`` for backward compatibility.

     Raises:
         ValueError: If the template is empty, malformed, references an
-            unknown variable, or omits the ``{slug}`` placeholder.
+            unknown variable, or omits the required placeholder.
     """
+    assert required_variable is None or required_variable in allowed_variables, (
+        f"required_variable {required_variable!r} is not in allowed_variables"
+    )
+
     if not template:
         raise ValueError(f"{config_key} must not be empty")

@@ -61,9 +71,9 @@
             + f". Allowed variables are: {', '.join(sorted(allowed_variables))}"
         )

-    if "slug" not in field_names:
+    if required_variable is not None and required_variable not in field_names:
         raise ValueError(
-            f"{config_key} '{template}' must contain the {{slug}} variable so that "
+            f"{config_key} '{template}' must contain the {{{required_variable}}} variable so that "
             f"each {item_name} can be uniquely identified"
         )

Gemini, on the other hand, has a very similar idea but allows for the fact that the caller might want to specify multiple required variables. So in this case it adds required_variables (as a positional/keyword argument rather than a pure-keyword argument) and defaults it to a frozenset that contains "slug". The rest of the change is also about making the test for the required variables, and the reporting of the error, generic. It doesn't do anything about checking that the required variables are within the allowed variables.

--- /Users/davep/content_path.py        2026-04-30 13:20:00.737955197 +0100
+++ src/blogmore/content_path.py        2026-04-30 14:47:41.607748447 +0100
@@ -18,12 +18,13 @@
     config_key: str,
     allowed_variables: frozenset[str],
     item_name: str,
+    required_variables: frozenset[str] = frozenset({"slug"}),
 ) -> None:
     """Validate a path format string for a content type.

     Checks that *template* is non-empty, well-formed, references only
-    variables from *allowed_variables*, and includes the mandatory
-    ``{slug}`` placeholder.
+    variables from *allowed_variables*, and includes the
+    *required_variables*.

     Args:
         template: The path format string to validate.
@@ -33,10 +34,13 @@
             template.
         item_name: The human-readable name of the content type used in
             the uniqueness error message (e.g. ``"page"`` or ``"post"``).
+        required_variables: The set of variable names that MUST appear
+            in the template to ensure uniqueness. Defaults to
+            ``{"slug"}``.

     Raises:
         ValueError: If the template is empty, malformed, references an
-            unknown variable, or omits the ``{slug}`` placeholder.
+            unknown variable, or omits a required variable.
     """
     if not template:
         raise ValueError(f"{config_key} must not be empty")
@@ -61,10 +65,12 @@
             + f". Allowed variables are: {', '.join(sorted(allowed_variables))}"
         )

-    if "slug" not in field_names:
+    missing = required_variables - set(field_names)
+    if missing:
         raise ValueError(
-            f"{config_key} '{template}' must contain the {{slug}} variable so that "
-            f"each {item_name} can be uniquely identified"
+            f"{config_key} '{template}' must contain the "
+            + ", ".join(f"{{{v}}}" for v in sorted(missing))
+            + f" variable(s) so that each {item_name} can be uniquely identified"
         )

For the most part I think I prefer what Gemini is trying to do, although Claude's sanity check that the required variable is one of the possible variables makes sense. I kind of feel like both of them missed the point when it came to handling the fact that "slug" is required: given that validate_path is otherwise built to be pretty generic, I think I would have defaulted to nothing and simply left it up to the caller to be explicit that "slug" is required, because that matters in context of the caller. This feels like a pretty obvious case of a "business logic" vs "generic utility code" separation of concerns scenario.

As mentioned in passing in another post, it's interesting to see that neither of them noticed the opportunity to turn this:

unknown = set(field_names) - allowed_variables
if unknown:
    ...

into this:

if unknown := (set(field_names) - allowed_variables):
    ...

I know at least one person who would be happy about this fact.

So where does this leave me? At the moment I'm not inclined to merge either PR, but that's mainly because I want to carry on reading them and perhaps writing some more notes about what I encounter. What this does illustrate for me is something we know well enough anyway, but which I wanted to experiment with and see for myself: the initial implementation of any working code written by an agent seems optimised for that particular function or method, perhaps class if you're lucky. It will happily repeat the same code to solve similar problems, or perhaps even use very different approaches to solve the same problem. What it won't do well is recognise that this problem is solved elsewhere and so either use that other code by calling it, or perhaps modify it slightly to make it more generic and more applicable in more situations.

On the other hand, it has shown that with a bit of prompting (and keep in mind that the prompt that arrived at this comparison was really quite vague) it is possible to get an agent to "consider" the problem of duplication and boilerplate and to try and address that.

Having seen the two solutions on offer here, it's hard not to conclude that the best solution would be for me to take the PRs as flags marking places in the code that could be cleaned up, and do the tidy myself.

At least I have, as of the time of writing, 1,380 tests to check that I've not broken anything when I do hand-clean the code. But, hmm, there's a question: can I actually trust those tests? It's not like I wrote them.

Guess that's a whole other thing to worry about at some point...

Duplication of effort

3 min read

While I don't, for a moment, think that the work on BlogMore is complete, I think it's fair to say that the rate of new feature additions has slowed down. Which is fine, there's only so much I need from a self-designed/directed static site generator; at a certain point there's a danger of adding features for the sake of it.

Around this point I think I want to start to pay proper attention to the code quality and maintainability of the ongoing experiment.

As I mentioned the other day, while working through this, I had noticed plenty of bad habits that Copilot (and in this case pretty much always Claude Sonnet 4.6) has. All were very human (obviously), but also the sort of thing you'd expect a human developer to educate themselves out of.

Yesterday evening, out of idle curiosity, I installed Gemini CLI because I wanted to see what would happen if I pointed it at the v2.18.0 codebase and asked it to look for things to clean up, and then what would happen if I did the same with Copilot CLI.

I've saved the results as a PR for what Gemini came up with and what Copilot came up with1. I've not given them a proper read over yet, but while having a quick glance at them something leapt out at me: in the code before the request, there was this in utils.py:

def count_words(content: str) -> int:
    """Count the number of words in the given content.

    Strips common Markdown and HTML formatting before counting so that only
    prose words are included.  The same normalisation rules as
    :func:`calculate_reading_time` are applied.

    Args:
        content: The text content to analyse (may include Markdown/HTML).

    Returns:
        The number of words in the content.

    Examples:
        >>> count_words("Hello world")
        2
        >>> count_words("word " * 10)
        10
    """
    # Remove code blocks
    content = re.sub(r"```[\s\S]*?```", "", content)
    content = re.sub(r"`[^`]+`", "", content)

    # Remove markdown links but keep the text: [text](url) -> text
    content = re.sub(r"\[([^\]]+)\]\([^\)]+\)", r"\1", content)

    # Remove markdown images: ![alt](url) -> ""
    content = re.sub(r"!\[([^\]]*)\]\([^\)]+\)", "", content)

    # Remove HTML tags
    content = re.sub(r"<[^>]+>", "", content)

    # Remove markdown formatting characters
    content = re.sub(r"[*_~`#-]", " ", content)

    return len([word for word in content.split() if word])


def calculate_reading_time(content: str, words_per_minute: int = 200) -> int:
    """Calculate the estimated reading time for content in whole minutes.

    Uses the standard reading speed of 200 words per minute. Strips markdown
    formatting and counts only actual words to provide an accurate estimate.

    Args:
        content: The text content to analyze (can include markdown)
        words_per_minute: Average reading speed (default: 200 WPM)

    Returns:
        Estimated reading time in whole minutes (minimum 1 minute)

    Examples:
        >>> calculate_reading_time("Hello world")
        1
        >>> calculate_reading_time("word " * 400)
        2
    """
    # Remove code blocks (they typically take longer to read/understand)
    content = re.sub(r"```[\s\S]*?```", "", content)
    content = re.sub(r"`[^`]+`", "", content)

    # Remove markdown links but keep the text: [text](url) -> text
    content = re.sub(r"\[([^\]]+)\]\([^\)]+\)", r"\1", content)

    # Remove markdown images: ![alt](url) -> ""
    content = re.sub(r"!\[([^\]]*)\]\([^\)]+\)", "", content)

    # Remove HTML tags
    content = re.sub(r"<[^>]+>", "", content)

    # Remove markdown formatting characters
    content = re.sub(r"[*_~`#-]", " ", content)

    # Count words (split by whitespace and filter out empty strings)
    words = [word for word in content.split() if word]
    word_count = len(words)

    # Calculate minutes, rounding to the nearest minute with a minimum of 1
    minutes = max(1, round(word_count / words_per_minute))

    return minutes

I think this right here is a great example of why the code that these tools produce is generally kind of... meh. Let's just really appreciate for a moment the duplication of effort going on there. But it's even more fun. Look at the docstring2 for count_words: it says right there that the "same normalisation rules as calculate_reading_time are applied". It "knows" it copied the work that went into calculate_reading_time too, but never once did it then "think" to pull the common code out and have both of the functions call on that helper function.

Back to the parallel invitations to refactor, having asked:

please do a review of this codebase and see if there is any scope for refactoring so there's less duplication

Both Gemini and Claude noticed this and did something about it. Gemini came up with a:

def _strip_formatting(content: str) -> str:

with all the regex-based-markdown-stripping code in there and then rewrote count_words and calculate_reading_time to call on that. The Copilot/Claude cleanup did something very similar:

def _strip_markdown_formatting(content: str) -> str:

So it's a good thing that both of them "noticed" this duplication of effort and cleaned it up. What I do find interesting though is what the result was. Stripping docstrings and comments for a moment, here's what I was left with, by Gemini, for count_words and calculate_reading_time:

def count_words(content: str) -> int:
    content = _strip_formatting(content)
    return len([word for word in content.split() if word])

def calculate_reading_time(content: str, words_per_minute: int = 200) -> int:
    content = _strip_formatting(content)
    words = [word for word in content.split() if word]
    word_count = len(words)
    minutes = max(1, round(word_count / words_per_minute))
    return minutes

and here's what Copilot/Claude came up with:

def count_words(content: str) -> int:
    return len([word for word in _strip_markdown_formatting(content).split() if word])

def calculate_reading_time(content: str, words_per_minute: int = 200) -> int:
    words = [word for word in _strip_markdown_formatting(content).split() if word]
    return max(1, round(len(words) / words_per_minute))

In both cases calculate_reading_time is still doing the work of counting words when count_words is right there to be called! Don't even get me started on how the Gemini version of calculate_reading_time is so obsessed with assigning values to variables that only get used once in the next line3. Were I reviewing these PRs (oh, wait, I am reviewing these PRs!), I'd request the latter function be turned into:

def calculate_reading_time(content: str, words_per_minute: int = 200) -> int:
    return max(1, round(count_words(content) / words_per_minute))

I would imagine that there's a lot more of this going on in the code, and under ideal conditions this sort of thing would not have made its way into the codebase in the first place. Part of the point of this experiment was to mostly get the agent to do its own thing, without me doing full-on reviews of every PR. Were I to use this sort of tool in a workplace, or even on a FOSS project that wasn't intended to be this exact experiment, I'd be far more inclined to carefully review the result and request changes.

Or, perhaps, hear me out... I have a third agent that I teach to be just like me and I get it do the work of reviewing the PRs for me. What could possibly go wrong?


  1. Again, I guess I should stop referring to Copilot in this case and instead refer to Claude Sonnet. 

  2. Note to self: I need to educate the agents in how I prefer and always use the mkdocstrings style of cross-references

  3. Yes, I know, this is a favoured clean code kind of thing in some circles, but it can be taken to an unnecessary extreme. 

On GitHub

2 min read

It seems that dunking on GitHub is the flavour of the day. At the moment most of the social/news type things I tend to read are filled with the Ghostty news, as well as a small revival of posts and links to blog posts about all the recent outages. It's understandable. It does seem that something has shifted with GitHub in the last few months. While it hasn't been the site I used to enjoy for quite some time now, it just seems to be getting worse at the moment.

It's even pissing off the loyal AI enthusiasts with the Copilot changes.

As I read all of this I find myself mostly nodding along. For the most part I'm not finding that GitHub is getting in the way and stopping me from doing the things I want to do, and for the most part it does act as a vital tool that lets me get work done, and also lets me enjoy my longest-enjoyed hobby. On the other hand I couldn't help but sigh and think "yeah, I get why this is the time that people are done" when I opened up the PR page for this blog, just now, and saw this:

A warning about my PRs

It does get me thinking about my relationship with GitHub, and how long I've been using it. As I've written before, I created my account back in 2008; I was within the first 30,000 users. While my use of it was only very occasional for quite a long time, for the last decade I've been constantly interacting with it. It is somewhere I visit constantly, not just to do work on my own projects, but to read what other people are doing. One of the first things I do every morning, when I sit down at my desk, is open my GitHub dashboard page and have a scroll through the feed to see what people I follow have been up to.

It's generally been the most fulfilling feed I've read.

But I'm also getting that feeling I got when I hung on to my Twitter account far longer than I really should have; not just because of the general vibe of "it's falling apart", but also other types of questionable behaviour. The degrading performance, the troubling business relationships, the over-emphasis on all things AI... it adds up.

There is a sense that some time ago was the time to move elsewhere as my sociable forge (probably around the time that Microsoft took over), and that not having done that, now is the second best time. But the effort of making that move is non-trivial and, quite frankly, I'd want to see where folk start to land, if they started to move away in any numbers at all. For me the real utility of GitHub isn't the "it's somewhere to store my shit" thing, it's the socially coding thing.

Then there's the follow-up problem: if some other forge was to become the next flavour of the decade, it too would probably end up suffering the same fate as GitHub.

Perhaps now is the time for me to start looking into options for collaborative code forges that offer the same sort of solution that Mastodon does for Twitter-like nattering.

Considering a rescue

4 min read

Ever since I kicked off the work on BlogMore I've had a renewed interest in writing on this blog (as you can probably tell from the stats and the calendar). But not just writing: also tweaking it, tidying it up, thinking about maintaining it into the future, thinking about the links and the categories and so on.

In doing so, I've also been looking at other folk who persist in keeping a blog, and especially those who maintain blogs built with static site generation tools, and in some cases I'm mildly envious of how far back some of them stretch.

When it comes to the world of blogging I was kind of late to the party. The first version was just a section of my self-developed website, hosted on www.davep.org. Don't go looking there for it now, it was long ago removed. In fact my personal website is mostly just a placeholder for what once was. The Wayback Machine still has a copy though, so I can see that the first blog post I wrote for my site was dated 2003-03-31.

My first blog post

I maintained this for a while, the engine for it all being some self-written PHP engine that was what could be best described as a dynamic static site (in other words it generated everything on request from underlying text files and HTML snippets because I had no wish to be faffing around with databases on a web host). Eventually though the blog side of this got to be too much trouble and I jumped over to Blogger.

I maintained that blog for quite a few years, with the first post being made in 2006 and the last in 2011. Sadly it's all quite broken now. I used to include a lot of images and, while some of them are embedded in the site itself, most were hosted on the older version of my website, as part of the photo gallery I also had there.

This all fell apart when I finally killed off the PHP version of my site and all the images were removed. Now the blog is a wasteland of broken image icons (not to mention a wasteland of broken external links -- so many of the sites I referred to back then have fallen off the net).

I hate this. I hate that thirty-something me was fired up enough to want to write stuff down and communicate to other people (and to future me) and it's all decayed. I especially dislike that the original version of my blog, now only stored on the Wayback Machine (and perhaps on a hard drive that I think is in a box somewhere in storage, perhaps also on some burnt-as-a-backup DVDs) is otherwise inaccessible. Much like I did with my original photoblog, I want to rescue this. I want to rescue all of this.

The technical challenges of teasing out the original posts from the Internet Archive and from Blogger aren't too great. Turning a bunch of HTML into Markdown isn't impossible either -- the library that I use in OldNews should do the job fine there. All that sort of work feels like a fun little challenge that will keep me amused for a few evenings.

There are two main things that cause me to pause when thinking about doing this.

The first is that some of those very old posts, as I mentioned above, link to places that don't exist and haven't existed for a long time. It raises the question: do I even care to preserve things that have no context any more?

The second is that many of the posts in the Blogger blog, as I mention, relied on images hosted on my old site. Right now I'm not actually sure where those photos are! While I took a backup of all the code and other data for www.davep.org when I did the big reboot (storing it all up on GitHub), I seem to have stripped out all of the photos. This makes sense as there was a lot of data there. Making sure I had a backup of those files feels like something I would do -- I hang on to all sorts of data -- but at the moment I can't locate them1.

To make this work, for this to stand any chance of working, I need to pull them all back out from somewhere.

Will I do this? I don't know yet. The seed is there, the itch is waiting to be scratched. I look at the age span of this blog, and the calendar page, and think it could be really cool to really back-fill it from my older blogs. The graph might end up looking really funky.

On the other hand: am I just trying to preserve irrelevant things as a way to make work for myself (albeit "work" that is fun; after all coding is a hobby as well as a living).

On the gripping hand: if I can get the images back, a wasteland of links to sites that don't exist any more does, at the very least, provide a history of what was and is no longer.


  1. I should point out that I have the original photos all backed up any number of ways and in multiple locations, but it's the specific jpeg files with their specific names as appeared in the photo library on my site that I need to make this work. 

BlogMore v2.18.0

1 min read

After releasing the graph view yesterday I got to thinking that it might be nice if the "tooltips" for the nodes in the graph were a little richer. Since we already know how many posts are within a category, or have a specific tag, it makes sense that those counts should be shown; posts themselves have descriptions available and some even have cover images that could be turned into thumbnails. Why not make use of all of that?

So I've made use of all of that. As mentioned, categories and tags simply show the count of posts related to them:

A tag tooltip in the graph

Posts will show the title, date, description and the cover image if available:

A post showing its tooltip

I'll admit that the transparency is a little distracting -- this comes from the library being used for the graph -- but I kind of like it. I'm going to roll with it now and see how I feel about it as time goes on. It's not like I expect a reader to read the post in the tooltip, it's an invitation to click through and read the actual post.

Another small change is something I've been meaning to address for a while. While BlogMore supports a modified time for a post it never shows it or uses it in any meaningful way. So now I've updated the way the time of a post is displayed so that, if there is a modified time, it's also shown:

Showing when a post was last modified

The final change came in as a request over on Mastodon. The wish being that there was an easy method, that didn't require the user spin out their own copy of a template just to do it, of changing the title of the backlinks section on a post from "References & Mentions" to something else. That seemed fair so I've introduced backlinks_title.

blogmore.el v4.3.0

1 min read

After adding the email comment invite facility to BlogMore it only made sense that I add some commands to blogmore.el to make it easier to edit the front matter that can help drive that feature.

So... I've released v4.3.0 of blogmore.el that adds two new commands:

  • blogmore-toggle-invite-comments -- toggles the comment invitation property
  • blogmore-invite-comments-to -- makes it easy to set, edit or remove the email address to use when making the invite

I've also added the two commands to the transient menu, using C-t for the former and C-a for the latter.

BlogMore v2.17.0

4 min read

I did some more tinkering with BlogMore yesterday, adding two new features. The first is one I've been considering adding for a wee while now.

For a large part of the lifetime of this blog I used Disqus to provide a comments section on every post. It was, as you'd imagine for a small personal blog, a pretty quiet thing; I'd get the odd comment from time to time but it wasn't significant. This worked well for the longest time, until Disqus decided that they were going to force adverts into your pages if you were using the free tier. Now, I'm fine with paying for tools I use, but I wasn't using Disqus enough to make the cost worth it. I'm also not opposed to a bit of subtle advertising to help cover costs either.

What Disqus did wasn't subtle. It was far from subtle. It was a horror show of the worst kind of sleazy advertising you can imagine.

So I removed it and called it a day on comments.

After the work on BlogMore was well under way I did start thinking about this problem again. Given how BlogMore is constructed, anyone using it could override a template and include whatever they want; with this in mind I looked at static-site-friendly comment options but nothing really stood out. Every solution seemed to either heavily rely on a third party service (see above for possible problems), self-hosting such a service (spinning up hosts and web servers and databases and stuff is the antithesis of using a static site generator to get stuff done easily), or some hacky use of a social media platform or other discussion venue that would require the reader jump through hoops that really looks like "go away, I don't want to hear from you".

So I concluded that it just wasn't worth the effort and I've done nothing with it.

Meanwhile: on occasion I have had people just email me about a post. Good old email, like in the good old days of the Internet. I kind of liked that. In fact I really liked that. So over the weekend, after receiving just such an email the other day, I decided I'd add a feature to BlogMore that provided just that: an invitation to send an email at the end of every post.

The configuration file now has two new properties that support this. The first is invite_comments. This is a boolean value that simply turns on or off the feature. The second is invite_comments_to. This should be set to an email address that the reader will be invited to direct their comment or question or whatever.

I've made the latter a little smart, in that it's actually a template, so that you can control the email address used per-post. This could be great for filtering, etc. Examples could be:

  • blog-comment@example.com
  • blog-comment-{year}{month}{day}@example.com
  • {author}+comment@example.com

And so on. You get the idea.

Further to this there's also post frontmatter properties of the same name. In this case the frontmatter setting always overrides the configuration file setting, for that single post. Also the invite_comments_to frontmatter setting isn't a template -- it's being set for a single post so that didn't seem necessary. The point of the frontmatter is it gives the flexibility to turn the invite off for an individual post (or indeed turn it on if the global setting is for it to be off).

The effect of all of this is that, if the invitation setting is on and if there is an email address available, this little box will appear at the bottom of a post:

An invitation to send me an email

When the reader clicks on the link it should open their MUA of choice and pre-fill the to address, and should also pre-fill the subject with the title of the post they're emailing from.

The second addition is prompted by the final paragraph in the post announcing the previous release of BlogMore:

At some point in the future it might be interesting to take this even further and produce a map of interconnected posts; for now though I think this is enough.

Apparently "some time in the future" was the following day; because that also got added while I was hacking on the sofa. There's a new --with-graph command line option, and with_graph configuration file setting, that adds a Graph page to the top "menu" of the blog. The result looks something like this:

Initial graph view

Given the nature of the graph and that the viewer is naturally going to want to explore, it can be toggled into a "full screen" (well, "mostly most of the page") mode too:

In full screen mode

The graph itself (built using force-graph) can be explored in the ways you'd reasonably expect, allowing zooming, panning around, dragging nodes around to get a better view of things, and so on.

Zoomed in on the graph

If you click on any of the nodes the graph will show you everything that's linked to it:

Highlighted links

and if you click the node again it will take you to the post, tag archive or category archive, depending on what it is you are clicking on.

So far I'm finding this is working really well as yet another method of discovering posts and themes, etc; it's already helped me find some "under-used" tags that deserved to be added to posts to better connect things. I suspect the feature will need refining over time, especially from a cosmetic point of view, but the result feels very usable as it stands.

But is the code that bad?

5 min read

There is, obviously and understandably, a lot of conversation online about AI and coding and agents and all that stuff. Much of it I get, much of it I agree with, I share the vast majority of the concerns. The impact on people, the impact on society, the impact on the environment, the impact on security... there's a good list of things to worry us there.

The one that crops up a lot though, that I don't quite get, is the constant claim I see that at best AI tools produce bad code, and at worst they produce unworkable code. That really isn't my recent experience.

Sure, going back to 2023 or 2024, when I first started toying with these new chatbot things some folks were raving about, the output was laughable. I can remember spending some fun times trying to coax whatever version of ChatGPT was on the go at the time into writing workable code and being amused by just how bad it was.

Even back in October last year, when I first tried out the free Copilot Pro that GitHub had given me to play with, I tried to get it to build a Textual application for me and it was terrible. The code was bad, it didn't really know how to use Textual properly, the application I was trying to get it to write as a test barely worked. It was a disaster.

A month later, in November of last year, I had a second go and better success. That time the (still not released, perhaps one day) application I was building was Swift-based and worked really well, but I can't really comment on the quality of the code or how idiomatically correct the code is in respect to the type of application it is (it's a wee game that runs on iOS, iPadOS, macOS).

By the time I tried my first serious experiment things seemed to be a little different. The code actually wasn't bad. It wasn't good, it was far from good, but it wasn't bad. Also, because it was Python, I was in a good place to judge the code.

Since I've started working on BlogMore I've noticed issues such as:

  • Lots of repetitive boilerplate code.
  • Lots of magic numbers.
  • Lots of magic strings.
  • Functions with redundant and unused parameters.
  • A default state of just adding more and more code to one file.
  • A habit of writing least-effort-possible type hints.
  • A habit of sometimes taking a hacky shortcut to solve a problem.
  • A habit of sometimes over-engineering a solution to a problem.
  • A weird obsession with importing inside functions.
  • An occasional weird obsession with guarding some imports with TYPE_CHECKING to work around non-existent circular imports.
  • An unwillingness to use newer Python capabilities (I've yet to see it make use of := without being prompted, for example).
  • A tendency to write what I would consider less-elegant code over more-elegant code.

The list isn't exhaustive, of course. The point here is that, as I've reviewed the PRs1, and read the code, I've seen things I wouldn't personally do. I've seen things I wouldn't personally write, I've seen things I've felt the need to push back on, I've seen things I've fully rejected and started over. Ultimately BlogMore isn't the code I would have written, but at the moment it is the application I would have written2.

So, here's the thing: every time I see someone writing a negative toot or post or article or whatever, and they talk about how the code it produces is unworkable, I find myself wondering about how they formed this opinion. Are they just writing the piece for the audience they want? Are they writing the piece based on their experience from months to years back, when these tools did seem to still be laughably bad? Are they simply cynically generating the piece using an LLM to bait for engagement? When I see this particular aspect of such a post it's a bit of a red flag about where they're coming from, kind of like how you suddenly realise that someone who seems to speak with authority might be full of shit when they start to spout questionable "facts" on a subject you understand well.

But wait! What about that list of dodgy stuff I've seen while building BlogMore with Copilot? What about all the reading and reviewing I've had to do, and what about the other crimes against Python coding I can probably still find in the codebase? Surely that is evidence that these tools produce terrible, unworkable, unusable code?

I mean, okay, I suppose I could reach that conclusion if I'd had a massively atypical experience in the software development industry and had never had to review anyone else's code, or had never needed to work on someone else's legacy code. Is what I'm seeing out of Copilot something I'd consider ideal code? Of course not. Is it worse than some of the worst code I've had to deal with since I started coding for a living in 1989? Hell no!

From what I'm seeing right now I'm getting code whose quality is... fine. Mostly it does the job fine. Often it needs a bit of coaxing in the right direction. Sometimes it gets totally confused and goes down a rabbit hole which needs to just be blocked off and we start again. Occasionally it needs rewriting to do the same thing but in a more maintainable way.

All of which sounds very familiar. I've had times where that describes my code (and I would massively distrust anyone who says they've never had the same outcomes in their time writing code). For sure it describes code I've had to take over, maintain or review.

It's almost like it was trained on lots of code written by humans.

Meanwhile... not every instance of using these tools to get code done needs to be about writing actual code. More and more I'm finding Google Gemini (for example) to be a really handy coding buddy and faster "Google this shit 'cos I can't remember this exact thing I want to achieve". I'll ask, I'll almost always get a pretty good answer, and then I can generally take that snippet of code and implement it how I want.

I've seldom had to walk away from that sort of interaction because it was getting me nowhere.

All of which is to say: I remain concerned about a great many things in the AI space at the moment, but I'm also as equally suspicious of someone who just flatly says "and the code it produces just doesn't work". If that's part of an article or post I'm left with the feeling that the author put zero actual effort into forming their opinion, let alone actually writing it.


  1. To varying degrees. Sometimes I have plenty of time to kill and I read the PR carefully, other times I glance it over, be happy there's nothing horrific there, and then decide to push back or merge based on the results of hand-testing and automated testing. 

  2. To be fair, it's the application I would still be writing and would be some time off finishing; there's no way it would be as feature-complete as it is now had I been 100% hand-coding it. 

BlogMore v2.16.0

1 min read

BlogMore has had a new release, bumping the version to v2.16.0. There are two main changes in this update, both coming from a single idea: internal back-links.

Where it makes sense, I always try and link posts in this blog to other related posts, but I've never really had a sense of how interconnected things are. So, the first new thing I added was a with_backlinks configuration option. This is off by default, but when turned on, will add a list of any referring posts to the bottom of a post.

A list of references to a post

Like some of the work I did in the stats page, this feels like another interesting method of discovering posts and related subjects within a blog.

Once this work was done, it seemed to make sense to use the link-gathering code to then get a sense of which posts are most often linked to within a blog, and so a table of most-linked posts has been added to the stats page.

Internal link stats

This particular table will only appear in the stats if with_backlinks is set to true.

At some point in the future it might be interesting to take this even further and produce a map of interconnected posts; for now though I think this is enough.

kbdify.el v1.0.0

1 min read

When I'm writing documentation in Markdown I like, where possible, to mark up keys with the <kbd> tag. This was the reason for one of the updates to BlogMore: I'd not done any good default markup for <kbd> and the moment I realised, I knew I had to fix it.

Now that I'm writing more on this blog, and especially about coding, I'm mentioning keys pretty often (even more so given I'm doing a lot of tidying up of my Emacs Lisp packages). The thing is though: I find having to type out <kbd> and </kbd> kind of tedious, and it's something I mistype from time to time. I guess I could use some sort of HTML tag inserting tool or whatever, but I got to thinking that it would be handy if I could point an Emacs command at a particular sequence in a buffer and have it mark up the whole thing.

This resulted in a small bit of code I'm calling kbdify.el. It's pretty simple, if point is sat on some text that looks like this:

C-M-S-s-<up>

and I run kbdify I get this:

<kbd>C</kbd>-<kbd>M</kbd>-<kbd>S</kbd>-<kbd>s</kbd>-<kbd>&lt;up&gt;</kbd>

The result rendering as C-M-S-s-<up>.

I could probably take it a bit further, have it optionally work on a region and stuff like that, but even in its current simplistic form it's going to be loads quicker and a lot more accurate and will probably perfectly cover 99% of the times I need it. There is the issue that it's not going to handle something like M-x some-command RET in the way I might like, but then again some-command isn't a key. Like, does it make more sense to have:

M-x some-command RET

anyway? Personally I think this:

M-x some-command RET

probably makes more sense.

I think I'm good for now.