Posts tagged with "Coding"

BlogMore v2.20.0

3 min read

I've just released BlogMore v2.20.0. There are five main changes in this release, and a lot of changes under the hood.

First, the under-the-hood stuff: while this isn't going to make a difference to anyone using BlogMore (at least it shouldn't make a difference -- if it does that's a bug that deserves reporting), the main site generation code has had a lot of work done on it to break it up. The motivation for this is to make the code easier to maintain, and to try and steer it in a direction closer to how I'd have laid things out had I written it by hand. The outcome of this is that, where the generator was over 2,000 lines of code in a single file, it's now a lot more modular and easier to follow.

Some other internals have been cleaned up too. Generally I've had a period of reviewing some of the code and reducing obvious duplication of effort, that sort of thing.

Now for the visible changes and enhancements in this release:

Improved word counts

Until now the word counting (and so the reading time calculations) were done by stripping most of the Markdown and HTML markup from the Markdown source. I wasn't too keen on this approach given that the codebase had a method of turning Markdown into plain text. So in this release the regex-based cleanup code is gone and word counts (and so reading times) use the same Markdown to plain text pipeline as anything else that needs to work on plain text.

Fixed a word count and reading time disparity

It was possible, in the stats page, to have one post appear to have the lowest or highest word count, but to not have the lowest or highest reading time. This was because reading times are always calculated to the minute and so there could be a disparity due to this rounding. The calculation of those stats now takes this into account.

Added an optional title to the socials

The socials setting in the configuration file has had an optional title property added for each entry. Until now the tooltip for an entry would be whatever the site was set to. Generally this works but if you have two or more accounts on the same site, or if you want to use a site value for something different, there was no way of making the tooltip more descriptive.

As an example, currently it's not possible to support Codeberg as a site. On the other hand git is available so it can be used as a substitute icon. The problem is, with this:

- site: git
  url: https://codeberg.org/davep

the tooltip will just say "git". With this update you can do this:

- site: git
  title: Codeberg
  url: https://codeberg.org/davep

and the tooltip will say "Codeberg".

As mentioned: this is optional. If there is no title the previous behaviour still applies.

Wall-clock time measurement

Yesterday, Andy posted about BlogMore's performance with respect to the different optional features. It's something I haven't really considered yet (possibly in part because this blog isn't anywhere near as big as his), but could be a good source of tinkering in the near future. His work to test the different parts of the tool did get me thinking though: it would be neat to know how long each part of the generation process takes.

So now, when a site is generated (either when using build or serve), the time of each step is printed, as is the overall generation time.

Markdown in HTML support

Yesterday I noticed that, on one of my posts, what had been written as a simple caption for an image wasn't rendering as it used to. The actual content of the Markdown source for the post contained this:

<center>
*(Yes, the tin was once mine and was once full; the early 90s were a
different time)*
</center>

While the text was centred, the raw Markdown was left in place (it should have been italic text). The reason for this is that BlogMore had never enabled Markdown-in-HTML support. So, as of this release, if the enclosing tag has markdown="1", any Markdown inside the tags will be parsed. This means the above becomes this:

<center markdown="1">
*(Yes, the tin was once mine and was once full; the early 90s were a
different time)*
</center>

I did think about doing something to turn it on by default (the fact that I didn't have such a "switch" in the post before suggests that Pelican did just always do this), but really I feel this approach is more flexible and less likely to result in unintended consequences.

blogmore.el v4.5.0

1 min read

Carrying on with the theme of being lazy while editing posts, I've released blogmore.el v4.5.0. This version adds blogmore-set-as-cover. With this, if you place point on a line that is an image and run the command, it is set as the cover for the post.

Sure, it's not like it's hard to copy, move, insert a new line, type cover: and then paste the text, but this is faster and more accurate.

And I'm lazy.

And I like hacking on Emacs Lisp that makes my workflow flow faster.

More Codeberg issues

1 min read

As I've said earlier, I'm not looking to move off GitHub any time soon, but I am curious about evaluating the options. So far, while trying out Codeberg, I am finding it to be very unstable.

A little earlier I was simply browsing some of the repositories I've been adding and got very slow load times, and then a 500.

Codeberg 500 error

As I've said elsewhere: I really wouldn't expect perfection. I doubt that Codeberg has the money behind it that GitHub does. But, again, there is that issue with moving off GitHub because of instability; from that point of view it would feel like swapping some occasional instability for what, at the moment, is feeling like regular instability to the point that I wouldn't get too much done.

Syncing GitHub to GitLab and Codeberg

2 min read

I've had a GitLab account since December 2017. This came about because of the new job I started in January 2018. They used a self-hosted internal instance of GitLab for all their code, so it made sense I get familiar with it (it wasn't hard; especially back in 2017 it was near enough a clone of GitHub in terms of what it did). Since then, though, I've never really done anything with it. I think I had a repo or two on there for a short while, but I must have nuked them at some point because the account has been empty for the longest time.

A Codeberg account, on the other hand, only got created the other day. Having created this, I got to thinking about how I might use it. In doing so I thought back to my GitLab account and then also got to thinking about where all my public code lives, and how "safe" it is.

Now, sure, the whole point of Git is that it's distributed. Forges are a useful thing to have and work with, but they shouldn't be the place where your code lives. On the other hand, I've had so many machines, and so many work environments, that it has become the case that my GitHub account has become the storage location for my code and projects.

Mostly this is fine. If GitHub were to disappear tomorrow I imagine we'd all have bigger things to be worrying about anyway. But the principle stands: why not distribute the load? Why not distribute the effort when it comes to sharing code I write?

So yesterday I finally decided on a plan: for the moment at least, I'm going to keep using GitHub as my "primary" location for working on stuff. It's where I'll have WiP branches, it's where I'll keep issues, it's where I'll encourage people to raise requests and stuff, it's where I'll host this blog. But I'm going to start syncing projects to both GitLab and Codeberg. I see this as having two benefits: anyone who doesn't want to interact with GitHub can now easily fork code, and if they wish they can raise issues and the like too. Meanwhile, in doing this, I'll also have the added benefit of my code being "backed up" in at least three different locations1.

The approach I've settled on, for the moment, is based around this little shell script:

#!/bin/sh

# Check if a repository name was provided
if [ -z "$1" ]; then
    echo "Error: No repository name provided."
    echo "Usage: $0 <repo-name>"
    exit 1
fi

REPO_NAME="$1"

# Check if the current directory is a Git repository
if ! git rev-parse --is-inside-work-tree > /dev/null 2>&1; then
    echo "Error: This directory is not a Git repository."
    exit 1
fi

echo "Configuring multi-forge backup sync for: $REPO_NAME"

# Set up the remote called backups. Anchor it to Codeberg.
git remote remove backups > /dev/null 2>&1
git remote add backups "ssh://git@codeberg.org/davep/${REPO_NAME}.git"

# Set up the push URLs.
git remote set-url --push backups "ssh://git@codeberg.org/davep/${REPO_NAME}.git"
git remote set-url --add --push backups "git@gitlab.com:davep/${REPO_NAME}.git"

# Only ever backup main.
git config remote.backups.push refs/heads/main:refs/heads/main

echo "----------------------------------------------------"
echo "Backups configured:"
git remote -v
echo "----------------------------------------------------"
echo "To perform the initial sync, run: git push backups main"

### setup-forge-sync ends here

I'm going to keep all repo names the same2. So when I use this script it'll set things up so I can git push backups and main will then get pushed up to both GitLab and Codeberg. I don't feel the need to be keeping any WiP branches in sync or kicking about, likewise any gh-pages branches.

While I'm sure I could have done something a little more automated, this feels like a neat and simple approach, and also allows me to curate what appears in the two other places over time (I suppose, eventually, I'll mirror everything that isn't a dead experimental repo, but meanwhile I'll prioritise projects that are either still very useful or which I'm actively developing and maintaining).


  1. Yes, I have other backups too, but they're always current-working-machine type backups. 

  2. Except, perhaps, for any repo whose name starts with .; I seem to recall that GitLab can't handle that, for some bizarre reason. Perhaps that's fixed now? 

blogmore.el v4.4.0

1 min read

I've released an update to blogmore.el, my Emacs package that helps me out when writing this blog. I've added two commands to this version which help me be lazier than ever.

The first is blogmore-become-like. When run, this prompts for another post and, once selected, it sets this post's category and tags to be the same as the other one. I added this because I'm often writing an occasional series of posts that are all about the same project, and so I always find myself copying and pasting those frontmatter properties from another post.

The second command I've added is blogmore-toggle-image-centre. Built into BlogMore is a little bit of styling that will ensure an image is placed in the centre of the page, if the URL for the image has #centre on the end. This means that, for most images I add, I have to go and edit the URL to add that. Now I can just run a single command when the cursor is on an image and it'll add (or remove, if it's already there) that styling hint.

In both cases, I've added the commands to the transient menu too.

Trying out Codeberg

2 min read

Following on from what I wrote about GitHub recently, I thought I'd check out Codeberg. While I'm aware of it (it is supported by Hike after all), I've never actually used it and have never had an account there.

I was delighted to find my preferred user name was actually available, which made me more inclined to create an account. The account creation process itself seemed a bit of a faff. I can't remember the details now, but it took a couple of goes and necessitated a password reset and recovery to actually get in (I recall there was some sort of error when I tried to use the activation link sent in the email, which took quite a while to turn up, but the rest is hazy, and I wasn't taking notes). It might be that it didn't help that I was signing up in a mobile browser.

That aside, though, I got up and going in the end.

At this point I'm not quite sure what I want to do with the account. While I'm nowhere near being willing or able to move my GitHub activity over there, it at least feels like it might be a viable backup location via which I can make code available. Also, and probably more importantly, it makes it easier for me to follow projects that use it as their primary (or only) forge.

This morning I found myself following a link to a project hosted there. It looked interesting, it looked like something I'd want to follow. It didn't go so well. I hit the star button and... busy circle.

Slow star

This lasted for a while with nothing happening. So I refreshed the page, no star on the repo, so I tried again. Same result. I tried a third time and got something different right away.

Codeberg error

At some point I ended up with a big fat 500 page.

Do I see this as an issue? Of course not. Could be I caught it on a bad day. Stuff happens. I've seen a 500 from things I'm responsible for before now. I'm not going to expect perfection. But it does remind me that leaping from one forge to another, to chase platform stability, and in doing so give up a long history, is a risky endeavour without a guaranteed payoff.

Me vs Claude (redux)

1 min read

It's a small thing, but here's round 2 of me vs Claude. This time I'm directing the agent to clean up the code that does word counts, getting it to use the Markdown to plain text code that exists in BlogMore, rather than the regex-based Markdown-stripper it was using. The approach it landed on made sense to me, adding another text extractor class, but one that ignores fenced codeblocks1. So, in addition to this code (I've removed all docstrings and comments for the sake of including here):

class _AllTextExtractor(HTMLParser):

    def __init__(self) -> None:
        super().__init__(convert_charrefs=True)
        self._chunks: list[str] = []

    def handle_data(self, data: str) -> None:
        self._chunks.append(data)

    @property
    def text(self) -> str:
        return re.sub(r"\s+", " ", "".join(self._chunks)).strip()

it also added this:

class _TextWithoutCodeExtractor(HTMLParser):

    def __init__(self) -> None:
        super().__init__(convert_charrefs=True)
        self._chunks: list[str] = []
        self._pre_depth: int = 0

    def handle_starttag(self, tag: str, attrs: list[tuple[str, str | None]]) -> None:
        if tag == "pre":
            self._pre_depth += 1

    def handle_endtag(self, tag: str) -> None:
        if tag == "pre" and self._pre_depth > 0:
            self._pre_depth -= 1

    def handle_data(self, data: str) -> None:
        if self._pre_depth == 0:
            self._chunks.append(data)

    @property
    def text(self) -> str:
        return re.sub(r"\s+", " ", "".join(self._chunks)).strip()

The function that converts Markdown to plain text then decides which extractor to use, based on if the caller asked for codeblocks to be included or not.

All pretty reasonable.

Only... that text property on both those classes is identical. The __init__ method is the same save for one extra line. Even handle_data is more or less the same except for that guarding if.

I can't. I can't let that stand. It's almost copy/paste. For me, this is the ideal time to use just a little bit of inheritance. Here's my take (with classes renamed too, the leading _ didn't feel necessary for one thing):

class TextExtractor(HTMLParser):

    def __init__(self) -> None:
        super().__init__(convert_charrefs=True)
        self._chunks: list[str] = []

    def handle_data(self, data: str) -> None:
        self._chunks.append(data)

    @property
    def text(self) -> str:
        return re.sub(r"\s+", " ", "".join(self._chunks)).strip()


class TextSansCodeExtractor(TextExtractor):

    def __init__(self) -> None:
        super().__init__()
        self._pre_depth = 0

    def handle_starttag(self, tag: str, attrs: list[tuple[str, str | None]]) -> None:
        if tag == "pre":
            self._pre_depth += 1

    def handle_endtag(self, tag: str) -> None:
        if tag == "pre" and self._pre_depth > 0:
            self._pre_depth -= 1

    def handle_data(self, data: str) -> None:
        if self._pre_depth == 0:
            super().handle_data(data)

Much better!

I was tempted to prompt Copilot/Claude about this and see what clean-up it would do, if it would arrive at similar code. But really it didn't seem like a good use of a premium request (perhaps I should have given Gemini a shot).

I see this kind of thing in the code quite a bit, and it speaks to what I've said before about what I'm seeing: the code it writes is... fine. It's okay. It does the job. The code runs. It's just not... to my taste, I guess.


  1. This is important for working out word counts and so read times. It doesn't make sense that embedded code counts towards those. 

And then there were three

2 min read

Given the concerns I wrote about yesterday, in regard to the core generation code in BlogMore, I've been thinking some more about how I would probably have the code look. First thing this morning, over breakfast and coffee, I concluded that I'd probably have gone with something that was a single orchestration function/method, into which would be composed some modular support code. Back when I started the process of breaking up the generator I seem to recall that Gemini sort of went along those lines, but the code it created seemed pretty messy and the main site generation class was still a lot bigger than I would have liked. This is why, at the time, I went with Copilot/Claude's mixin-based approach; it felt a bit more hacky but the code felt tidier.

With this all in mind, I popped to my desk, made a branch off the current Gemini attempt to clean up the typing issues with the mixin approach, fired up Gemini CLI, and wrote it a prompt explaining what I didn't like and what I wanted it to do. The key points being:

  • I wanted a similar separation of concerns as the mixin approach was aiming for.
  • I wanted to move away from mixins.
  • I wanted to favour something closer to composition.
  • I wanted to favour simple functions over classes where possible.

I then set it off working and left it to get on with things. Overall I think it took around an hour, with the need for me to approve things now and again (so probably could have been faster, I wasn't there to answer right away every time), but it got there in the end. This has resulted in a third PR to clean up the generator typing issues. In doing so I feel I've also addressed most of the unease I was feeling yesterday evening, and might actually have got closer to where I'd rather the code was.

Glancing over the result, I can still see things I'd want cleaned up, and done in a slightly different way, but overall I have a better feeling about this third approach. I sense this is a better place to move on from.

Three PRs

So that's three PRs I have lined up to address the code smell that's been bugging me for a couple of days. One fixes it with an ABC; one fixes it with a protocol; and now one fixes it by reworking the submodularisation of the generator to use a different approach entirely. On the one hand, this seems like a lot of work and a lot of faff (and, as I said yesterday, I wouldn't start here to get where I want to be), but on the other hand I do kind of understand the appeal of being able to get hours of work done in a relatively short period of time, so you can experiment with the results.

Would I recommend someone work this way? No, of course not. Does it make for an interesting side-quest when I'm in "it is still my hobby too" mode? Yeah, it does.