Posts in category "Coding"

BlogMore v2.25.0

2 min read

Following on from the previous release, which was all about trying to get a big PageSpeed Insights win through image optimisation, I'm chasing some more validation from that site by trying to squeeze just a little more performance out of the code that BlogMore generates.

BlogMore v2.25.0 has the following changes to allow tinkering in ways that might speed things up a touch, depending on the nature of the blog:

CSS bundling -- Every page generated by BlogMore pulls in at least these three CSS files: style.css, code.css and fontawesome.css (or their minified versions if minify_css is turned on). While this separation of concerns sits well with me, while it feels like the elegant way of doing things, there is the issue that it requires 3 trips back to the server to get base styling for any given page1.

So with this new version, if you set bundle_css to true, those three files are included and delivered as a single bundle.css (or bundle.min.css). This saves a couple of requests.

Theme helper inlining -- the lesser of the two main changes. There is some JavaScript that's part of each page that helps with theme switching and also provides the code to toggle the header display on mobile-sized screens. It's not a lot of code, but it is another file that has to be fetched. If inline_theme_js is set to true, this code will be included in the <head> of every single page generated for the site.

I suspect I'm going to leave this one off, but it's there if it's helpful to anyone (and also does let me experiment more with PageSpeed measurements).

Optimised logo -- one image that got left out of the work to optimise images was the site logo. While an optimised version of the image was created, no HTML was generated to make use of it. With this release, if optimise_images is true, <picture> will be used for this too.

With those shameless performance-measurement changes aside, there are a couple more changes in this release. The first is that the markup for the site title (that appears below the logo, if you have one) has been changed away from using a <h1> tag. The SEO gods frown on multiple <h1>s on a page and given the "main" title of any page is also a <h1>, this meant there were always 2 such tags. Now just the main title will be marked up this way; the site title becomes a <div> with appropriate styling to maintain the existing look.

Finally, this release fixes a small bug in the search index. It was being created with escaped HTML entities in any text that came out of fenced code blocks. From now on any text that goes into the search index is unescaped.

As always: if a blog-oriented static site generator that is all about Markdown sounds like your thing, check out the installation instructions and give it a go.


  1. Yes, of course the client-side cache makes this moot after the first page is loaded. All of this is about making that first load faster, and so appeasing the PageSpeed Insights gods. 

OldNews v1.4.2

1 min read

OldNews

I've made a minor bump to OldNews, my terminal-based client for TheOldReader. There's no significant change in this release, but it does change the dependency on html-to-markdown.

Since I initially released the application, this library seems to have been through a couple of significant changes, and not every breaking change seems to have resulted in a major version bump. OldNews doesn't pin this dependency to a major version (I try not to, only ever setting lower-bounds for dependencies where possible), so it's fair that a change there can break things. I also think it's fair to hope that minor version changes won't cause trouble.

Recently, I've seen the library update with a minor version change and it's flat-out caused runtime errors, either because the API has changed, or because of an error being thrown by legitimate use of the API.

Most recently, such an error happened, and was fixed by the time I noticed it, but the release that was made never made it up to PyPI. This left OldNews stuck not working. Because of this I had to pin the library to an earlier version.

It's now been updated again and PyPI is correct, so I think it's safe to relax the pin.

Fingers crossed...

BlogMore v2.24.0

3 min read

Quite a few weeks ago now -- I think it was around the time I started work on blogmore.el and got the new MacBook Air -- I remember sitting in a cafe in Edinburgh and via Mastodon having a conversation with Andy about tweaking better results out of PageSpeed Insights. I seem to remember him correctly observing that one of the big hits on the performance score was the size of images, and also the format, and that some SSG engines would go to the trouble of converting to the likes of WebP and/or generating different sizes that are appropriate to different screens, that sort of thing.

I can't quite remember where we left it, but I think it was considered more work than was worth worrying about, and perhaps swapping all images on our blogs to WebP would solve most of the issues.

For a couple of different reasons, late last week, I decided it was time to play with the problem. For some reason I've been pretty cautious with this PR. I planned it out last Friday night, kicked off work on it on Saturday morning, and have then been tinkering and changing it and testing it and iterating over it all weekend. Something about the nature of the change made me want to go very slowly with this. I think it was an unease about messing with the images that would get served, the nature of the new tags that would get emitted, the fact that there would be even more HTML tinkering going on, the possible complexity of maintaining the cache... lots of things to consider and this is supposed to be a nice, simple, unfussy site generator.

Anyway, I've just released v2.24.0 with this feature added. It's off by default, and is turned on by setting optimise_images to true. Then, when you build your blog, each PNG, JPEG or WebP image will be converted into one or more WebP images stored below static/images/optimised. How many are made for each image will depend on how image_widths is set. The physical size of each image (and how the image looks) can be affected by image_quality.

This does have two very obvious effects:

  1. It will result in your generated site being quite a bit bigger, if you have lots of images.
  2. It will result in the build time taking much longer.

The first issue is something I can't do anything about; it is what it is. The second issue, however, is something that can be dealt with. Given I've just made a release that speeds up build times, this would be a huge step backwards. So with this in mind, as the optimised images are created, a cache of them is also created in BlogMore's cache directory. This, again, does mean that more space is taken on your local storage to build your site, but it also means that repeated builds will remain fast.

If you run into problems or need space back, don't forget you can easily clear the cache.

So what's the result of all of this? Is it worth the effort? Well, to be sure, before I upgraded the version of BlogMore that I build this site with, I measured its performance.

Built with BlogMore v2.23.0

After upgrading and rebuilding, here is how the same home page measures up.

Built with BlogMore v2.24.0

I was genuinely surprised by the difference. The settings I used were:

optimise_images: true
image_quality: 95

and, of course, almost all the images on this site are now WebP anyway. I think I was expecting it to have a small impact, but even having those WebP images turned into stepped sizes seems to have a very measurable effect.

I'm going to be keeping a close eye on how this works for the next few days. As I say, I've tested this as much as possible and gone over the code as carefully as time has allowed. If this feature does break something I hadn't anticipated I can always just turn it off again anyway. Meanwhile though, the improvement on mobile does seem genuinely worth it.

Braindrop v1.1.0

1 min read

Braindrop

It's now well over a year since I released Braindrop and it's in constant use by me. I continue to find raindrop.io a really useful resource, and more often than not manage, edit, tag, and review what I save there with Braindrop, including which become public, and which don't.

I've made a few small changes to the application in the past year and a bit, but not much. It's been stable and useful. But on the back of a recent change I made to OldNews, I felt I needed to make the same change here.

So with the release of v1.1.0 I've added three new commands to the application:

  • JumpToNavigation - Jump to the navigation panel; bound to 1 by default
  • JumpToRaindrops - Jump to the main raindrops list panel; bound to 2 by default
  • JumpToDetails - Jump to the details panel for the selected raindrop, if the panel is visible; bound to 3 by default

Now it's just a little easier and quicker to get around the UI.

If raindrop.io is your thing, and you want to work with your saved bookmarks in the terminal: Braindrop is licensed GPL-3.0 and available via GitHub and also via PyPI. It can also be installed using uv:

uv tool install braindrop

If you don't have uv installed you can use uvx.sh to perform the installation. For GNU/Linux or macOS or similar:

curl -LsSf uvx.sh/braindrop/install.sh | sh

or on Windows:

powershell -ExecutionPolicy ByPass -c "irm https://uvx.sh/braindrop/install.ps1 | iex"

If uv isn't your thing then it can also be installed with pipx:

pipx install braindrop

Once installed, run the braindrop command.

BlogMore v2.23.0

1 min read

I wasn't quite planning on making a new release of BlogMore so soon after the previous version, but I had a couple of ideas that I wanted to add, and then also got a nifty request too; so here we are: we have v2.23.0.

The first couple of changes relate to the cache. In the previous release I added a cache of the FontAwesome metadata, which in turn means that a cache directory is being created. I felt it would be fair and useful to provide a command that both lets the user know where the cache lives, and to also remove it. So now BlogMore has a cache command with two sub-commands:

  • location: tells you where the cache directory is located
  • clear: removes the cache directory

Also, now that we have a cache directory, it makes sense to use it a bit more to squeeze even more time out of the build process. So starting with this release, per content directory, the various icons that are created for the site are cached. This means that if the source image doesn't change, for each subsequent build there's no conversion and resize for every variation. This saves a good fraction of a second, making the build of my blog feel noticeably quicker.

Finally, earlier today, Andy asked if it would be possible to have the BlogMore serve mode auto-reload any page being viewed in a browser, when the site is regenerated. It was something I'd considered myself a couple of times so that was a good reason to finally look into it. Not knowing how this could be achieved1, I prompted Gemini for an idea, stressing I wanted a solution that didn't disturb a generated site; it came up with a convincing solution. I let it run at it and, along with a few changes of my own, it seems to be working a treat.

This, of course, now makes me want to squeeze even more time out of the build process.


  1. Web development has never been my primary area of knowledge. 

BlogMore v2.22.0

2 min read

As mentioned a couple of days ago, I've been toying with finding areas of improvement in respect to the performance of BlogMore. Until now, for good reasons, I've not really paid any attention to how fast (or slow) BlogMore is when it comes to generating my blog. While it's never been blindingly fast, it's always been fast enough and I was more keen on making it work right. So for a good while the focus has been on well-formed output, stuff that keeps the crawlers happy, that sort of thing.

But now that I'm in a place where new features aren't really so necessary, it does feel like a good point to find any easy wins in speeding up the code. I think it's gone well.

BlogMore v2.22.0 contains quite a few internal changes that speed up some core parts of site generation. Many of the things identified by Gemini, back when I first kicked this process off, have been done. The amount of Markdown->HTML conversion work has been vastly reduced, which has had a pretty big impact on all sorts of things. There's also caching of the FontAwesome metadata1 which should save a fair bit of time on slower connections. I did avoid the whole business of parallel processing as I dabbled with this near the start of the project and I could not wrangle a win out of that at all; given how much of a win I've had with these changes, I doubt that would change (it could conceivably make things worse).

So, how much faster is it? Roughly, based on my tests, a site generates in about 1/4 of the time it did before. On my M2 Mac Mini my blog builds in under 3 seconds; with v2.21.0 it took around 13 seconds. In my case that's with all the optional features of BlogMore turned on.

Naturally this work has touched on a lot of internals of the code, and made significant changes to the generation pipelines of lots of different pages and features. I've done my absolute best to compare2 the output of v2.21.0 and v2.22.0 and I can't see any significant differences3. When trying out v2.22.0 I would suggest paying just a little extra attention to the result, to be sure you're happy that nothing has changed.


  1. It lives in ~/.cache/blogmore on Unix and Unix-like systems, or %LOCALAPPDATA%\blogmore\cache on DOS/VMS-influenced systems. 

  2. Lots of diff -rq and then diffing an assorted sample of files that showed differences to inspect what was actually different. 

  3. Actually, there's a small difference in the context shown in backlinks, but this was a deliberate change and a very small cosmetic enhancement. 

Speeding up BlogMore

3 min read

As mentioned recently, Andy did a nifty bit of testing of BlogMore to measure the performance hit of each of the optional generation features. Performance is something I haven't really spent much time thinking about; I cared more about how the result looked than I did about how quickly a site got generated.

I seem to remember that, early on, I did have a bit of a play with trying to get Copilot to tackle the idea of parallel generation of parts of the site, but I didn't seem to get any sort of win out of it so I closed the PR and gave up on the idea.

Now though, especially having added the elapsed time of each step to the progress report during generation, I am curious about what wins there are. One that came to mind right away, and I have been considering dealing with for a wee while, is the fact that every generation downloads the FontAwesome metadata. Caching that would be a simple win, for example.

But, if I'm going to do this, it makes sense that I stick to the spirit of this project and get one or more agents to tackle the problem. So I asked Gemini CLI to look over the codebase and give me some big easy wins. I made clear I didn't want it to do any work yet, I just wanted it to report back what it could find.

The result made a lot of sense. Here's what it said:


Performance Optimization Recommendations for BlogMore

I have conducted a comprehensive review of the blogmore codebase and identified several areas where measurable and significant speedups can be achieved, particularly for larger sites.

The most significant bottleneck is in src/blogmore/backlinks.py. The build_backlink_map function iterates through every internal link in every post. For each link, it calls _extract_snippet, which re-parses the entire Markdown content of the source post to generate a plain-text snippet.

  • The Issue: If a post has 20 internal links, it is parsed 20 times just for backlinks.
  • The Win: Modify the logic to insert unique markers for all links in a post and perform a single Markdown-to-plain-text conversion per post.

2. Redundant Markdown Parsing for Metadata

The Post object in src/blogmore/parser.py has properties like description and reading_time that are accessed frequently (on the post page, index pages, RSS feeds, search index, etc.).

  • The Issue: Post.description calls extract_first_paragraph, and Post.reading_time calls calculate_reading_time. Both of these functions perform a full Markdown conversion every time they are called because they are not cached.
  • The Win: Use functools.cached_property on these fields. Additionally, these should be calculated by extracting text from the html_content that has already been generated during the initial parse, rather than re-parsing the raw Markdown.

3. Sequential Processing

The generator currently processes everything sequentially in a single thread.

  • The Issue: For a site with hundreds or thousands of posts, the initial Markdown parsing and the subsequent Jinja2 rendering are strictly serial, leaving multi-core CPUs mostly idle.
  • The Win: Use multiprocessing (e.g., ProcessPoolExecutor) to parallelize the parsing of Markdown files and the rendering of HTML pages. Since these tasks are CPU-bound and independent, this would scale almost linearly with the number of CPU cores.

4. Asset Management Inefficiencies

The AssetManager in src/blogmore/generator/assets.py performs some operations that could be optimized:

  • FontAwesome Metadata: It downloads icon metadata from GitHub on every build to optimize CSS. This can be cached locally to avoid network latency.
  • Icon Generation: It regenerates favicons and Apple touch icons on every build. This should be skipped if the source icon hasn't changed.

5. Minor Algorithmic Improvements

  • Navigation Lookups: In PageGenerator.generate_post_page, the code uses all_posts.index(post) to find the current post's position for prev/next links. This is an O(N) operation called N times, leading to O(N^2) behavior. Pre-calculating a mapping would make this O(1).
  • Markdown Instance Reuse: Several utility functions (markdown_to_plain_text, extract_first_paragraph) create a fresh markdown.Markdown instance on every call. Reusing a single instance (and calling .reset()) or caching the instance creation would reduce overhead.

Summary

The "Big Wins" are definitely optimizing backlink snippet extraction and implementing parallel parsing/rendering. Together, these would likely reduce generation time for large sites by 60-80% on modern hardware.


At first glance they all seem pretty sensible, and I like that it seems to have found some really obvious big improvements, and also the little wins that might have a cumulative effect. I strongly suspect the parallel generation suggestion is again going to be a non-win (and so for now I'm going to pass on it), but most of the other suggestions seem doable and worth trying out.

I think what I will do, rather than work through them all in one PR, is tackle one at a time. So, over the next few days, expect to see some PRs turn up and get merged (hopefully), as I try and tease some speed wins out of the code.

BlogMore v2.21.0

2 min read

After noticing a broken link in a post yesterday, I got to thinking that it would be useful to add a linter to BlogMore. So I've released v2.21.0 which adds linting support.

A number of things are checked and the results are broken down into things that are errors or warnings. Errors result from any of these checks:

  • Ensures all posts and pages have valid YAML frontmatter. If a file cannot be parsed, it is reported as an error.
  • Scans the generated HTML for links to non-existent internal paths (other posts, pages, categories, tags, archives, site features like search, or files in extras/).
  • Checks that all <img> sources resolve to valid internal paths or files in the extras/ directory.
  • Checks that the cover property in a post or page frontmatter points to a valid resource.
  • Verifies that all page slugs listed in sidebar_pages actually exist.
  • Checks that all internal-looking URLs in the links: and socials: configuration settings point to valid targets.

On the other hand, the following just result in a warning:

  • Flags if a post is missing a title, category, tags, or a date.
  • Reports if a post's date or modified date is set in the future.
  • Notes if a post's modified date is earlier than its original publication date.
  • Identifies if two or more posts share the exact same title.
  • Flags inline images missing an alt attribute, or those with an empty/whitespace-only alt attribute.
  • If clean_urls is enabled, warns if internal links point explicitly to index.html.
  • Reports internal links using the full site_url (e.g., https://example.com/path/) instead of a root-relative path (/path/).

I feel like all of these cover most of the things that are low-cost to detect but have a positive impact on the state of the content of a blog.

One thing I've not done is any sort of checking of external links. This would be costly and could possibly have unintended consequences that I don't want to be messing with (perhaps a tool to export the list of external links for checking could be useful, at some point).

Having run this against this blog, I did find some things that needed cleaning up, mostly absolute links that could be turned into root-relative links (always good for making the content portable).

I'm going to make this a standard part of my "I'm ready to publish" check for this blog, and it should also be helpful as I carry on migrating the images in the blog over to WebP.

More syncing GitHub to GitLab and Codeberg

1 min read

Following on from my first post about this, I've tweaked the script I'm using to backup a repo to GitLab and Codeberg:

#!/bin/sh

# Check if the current directory is a Git repository
if ! git rev-parse --is-inside-work-tree > /dev/null 2>&1; then
    echo "Error: This directory is not a Git repository."
    exit 1
fi

REPO_NAME="$1"

# If no repository name was provided, try to get it from the origin remote
if [ -z "$REPO_NAME" ]; then
    ORIGIN_URL=$(git remote get-url origin 2>/dev/null)
    if [ -n "$ORIGIN_URL" ]; then
        REPO_NAME=$(basename -s .git "$ORIGIN_URL")
    else
        echo "Error: No repository name provided and no 'origin' remote found."
        echo "Usage: $0 <repo-name>"
        exit 1
    fi
fi

echo "Configuring multi-forge backup sync for: $REPO_NAME"

# Set up the remote called backups. Anchor it to Codeberg.
git remote remove backups > /dev/null 2>&1
git remote add backups "ssh://git@codeberg.org/davep/${REPO_NAME}.git"

# Set up the push URLs.
git remote set-url --push backups "ssh://git@codeberg.org/davep/${REPO_NAME}.git"
git remote set-url --add --push backups "git@gitlab.com:davep/${REPO_NAME}.git"

# Only ever backup main.
git config remote.backups.push refs/heads/main:refs/heads/main

# Also backup all tags.
git config --add remote.backups.push 'refs/tags/*:refs/tags/*'

echo "----------------------------------------------------"
echo "Backups configured:"
git remote -v
echo "----------------------------------------------------"
echo "To perform the initial sync, run: git push backups"

### setup-forge-sync ends here

The changes from last time include:

  • The repo name now defaults to whatever is used for GitHub, so I don't have to copy/paste it or type it out.
  • It now backs up all the tags too.

I've been running with this for a couple of days now and it's proving really useful. Well, when Codeberg is available to push anything to...

BlogMore v2.20.0

3 min read

I've just released BlogMore v2.20.0. There are five main changes in this release, and a lot of changes under the hood.

First, the under-the-hood stuff: while this isn't going to make a difference to anyone using BlogMore (at least it shouldn't make a difference -- if it does that's a bug that deserves reporting), the main site generation code has had a lot of work done on it to break it up. The motivation for this is to make the code easier to maintain, and to try and steer it in a direction closer to how I'd have laid things out had I written it by hand. The outcome of this is that, where the generator was over 2,000 lines of code in a single file, it's now a lot more modular and easier to follow.

Some other internals have been cleaned up too. Generally I've had a period of reviewing some of the code and reducing obvious duplication of effort, that sort of thing.

Now for the visible changes and enhancements in this release:

Improved word counts

Until now the word counting (and so the reading time calculations) were done by stripping most of the Markdown and HTML markup from the Markdown source. I wasn't too keen on this approach given that the codebase had a method of turning Markdown into plain text. So in this release the regex-based cleanup code is gone and word counts (and so reading times) use the same Markdown to plain text pipeline as anything else that needs to work on plain text.

Fixed a word count and reading time disparity

It was possible, in the stats page, to have one post appear to have the lowest or highest word count, but to not have the lowest or highest reading time. This was because reading times are always calculated to the minute and so there could be a disparity due to this rounding. The calculation of those stats now takes this into account.

Added an optional title to the socials

The socials setting in the configuration file has had an optional title property added for each entry. Until now the tooltip for an entry would be whatever the site was set to. Generally this works but if you have two or more accounts on the same site, or if you want to use a site value for something different, there was no way of making the tooltip more descriptive.

As an example, currently it's not possible to support Codeberg as a site. On the other hand git is available so it can be used as a substitute icon. The problem is, with this:

- site: git
  url: https://codeberg.org/davep

the tooltip will just say "git". With this update you can do this:

- site: git
  title: Codeberg
  url: https://codeberg.org/davep

and the tooltip will say "Codeberg".

As mentioned: this is optional. If there is no title the previous behaviour still applies.

Wall-clock time measurement

Yesterday, Andy posted about BlogMore's performance with respect to the different optional features. It's something I haven't really considered yet (possibly in part because this blog isn't anywhere near as big as his), but could be a good source of tinkering in the near future. His work to test the different parts of the tool did get me thinking though: it would be neat to know how long each part of the generation process takes.

So now, when a site is generated (either when using build or serve), the time of each step is printed, as is the overall generation time.

Markdown in HTML support

Yesterday I noticed that, on one of my posts, what had been written as a simple caption for an image wasn't rendering as it used to. The actual content of the Markdown source for the post contained this:

<center>
*(Yes, the tin was once mine and was once full; the early 90s were a
different time)*
</center>

While the text was centred, the raw Markdown was left in place (it should have been italic text). The reason for this is that BlogMore had never enabled Markdown-in-HTML support. So, as of this release, if the enclosing tag has markdown="1", any Markdown inside the tags will be parsed. This means the above becomes this:

<center markdown="1">
*(Yes, the tin was once mine and was once full; the early 90s were a
different time)*
</center>

I did think about doing something to turn it on by default (the fact that I didn't have such a "switch" in the post before suggests that Pelican did just always do this), but really I feel this approach is more flexible and less likely to result in unintended consequences.