Recent Posts

Recreating my blog stats

4 min read; 13 GFI

Introduction

Having recently added the dump command to BlogMore I've been thinking I should try and learn a little more about jq. It's one of those tools that's been on my radar for ages, which I've used on very rare occasions to get something done quickly, but which I've never really used in anger.

So I thought it might be fun to see about recreating some of the stats from the stats page using jq alone. Well, I say "alone", I mean "from the JSON data that is produced by the BlogMore dump command", and of course that makes it easier given it dumps some of the key calculated values. In other words I won't be using jq to calculate the word count, or reading time, or GFI, etc.

Post count

To start with, working out the number of posts in my blog is simple enough:

jq '. | length'
371

Category count

Getting the list of categories would be:

jq '[.[] .safe_category] | unique'
[
  "ai",
  "coding",
  "creative",
  "emacs",
  "gaming",
  "life",
  "meta",
  "music",
  "python",
  "tech",
  "til"
]

and so getting the count of them is simple enough:

jq '[.[] .safe_category] | unique | length'
11

Tags count

Getting the count of tags takes a little more work, as safe_tags is a list too, so I start out with a list of lists, which I need to flatten first.

jq '[.[] .safe_tags] | flatten | unique | length'
224

This, right away, is an interesting finding. In my stats page, as of the time of writing, the number of tags is reported as 243, but here I'm getting 224. Given I'm using the safe_tags property, which ensures all similar tags end up with the same value (so Hello World, hello world, and all variations, become hello-world), that would suggest the stats page isn't taking that into account. That's an issue to address.

A date/time interlude

Here's where things get a little interesting for a moment. In the output of the dump command from BlogMore, the dates of the posts are given in ISO 8601 format; specifically the date and time with offset format. From what I can tell, while jq does have some date/time parsing support, it can't handle that format specifically.

This means that if I try:

jq '.[0] .date | fromdate'

I just get:

jq: error (at <stdin>:27293): date "2015-06-18T14:53:00+01:00" does not match format "%Y-%m-%dT%H:%M:%SZ"

After some searching around it seems the only approach I can really take is to drop the timezone offset and pretend every time is a Z time:

jq '.[0] .date[:19] + "Z" | fromdate'
1434639180

From here I can then get a fully-parsed list of date/time values using gmtime:

jq '.[0] .date[:19] + "Z" | fromdate | gmtime'
[
  2015,
  5,
  18,
  14,
  53,
  0,
  4,
  168
]

This isn't ideal for what I'd like to do, it's going to skew some of the values related to time, but it's close enough for experimenting.

Posts per year

Now that I have a way of breaking the posting time into a workable array of values, getting the number of posts per year becomes:

jq -r '[.[] .date[:19] + "Z" | fromdate | gmtime[0]] | group_by(.) | .[] | "\(.[0]): \(length)"'
2015: 32
2016: 26
2017: 7
2018: 1
2019: 15
2020: 23
2022: 11
2023: 49
2024: 19
2025: 11
2026: 177

Although, to be fair to jq, that's kind of long-winded when I could just pull the year itself out of the posting time:

jq -r '[.[] .date[:4]] | group_by(.) | .[] | "\(.[0]): \(length)"'

Posts by month

At this point getting the posts by month of year seems obvious too:

jq -r '[.[] .date[5:7]] | group_by(.) | .[] | "\(.[0]): \(length)"'
01: 14
02: 12
03: 53
04: 57
05: 76
06: 33
07: 25
08: 25
09: 13
10: 29
11: 19
12: 15

Posts by weekday

For this, I need to go back to the more involved version of the posting date handling query, where I use gmtime to break down the time. It turns out that the penultimate value is the day of the week as a number. So, while it's not quite as readable in that I don't have day names, I can get the values:

jq -r '[.[] .date[:19] + "Z" | fromdate | gmtime[-2]] | group_by(.) | .[] | "\(.[0]): \(length)"'
0: 48
1: 54
2: 51
3: 48
4: 56
5: 56
6: 58

In this case Sunday is the first day (the 0 day here).

Posts by hour

Getting the posts by the hour is really just a variation on the date-chopping query used for the posts by year and the posts by month; it's all there in the string version of the date.

jq -r '[.[] .date[11:13]] | group_by(.) | .[] | "\(.[0]): \(length)"'
00: 1
06: 1
07: 6
08: 51
09: 35
10: 32
11: 25
12: 14
13: 22
14: 24
15: 25
16: 24
17: 18
18: 9
19: 23
20: 33
21: 21
22: 6
23: 1

First and last posting dates

Getting the date of the first and latest post seems nice and easy:

jq -r '[.[] .date[0:10]] | {first: min, last: max}'
{
  "first": "2015-06-18",
  "last": "2026-06-01"
}

Although, from what I can tell, jq doesn't have anything that makes date arithmetic easy so working out the elapsed time between the two isn't so straightforward. It can be done, but it's not as easy as it might be with a bit of Python code, for example. The best I could come up with was:

jq '[ .[] | .date[:19] + "Z" | fromdate ] | ((max - min) / (365.25 * 24 * 60 * 60))'
10.95438841990519

For an approximate value of "year", of course.

Word counts

From here on in many of the stats that can be pulled out from the JSON, with jq, become easier to handle. Each post has a word_count property, so I only need to do this:

jq -r '[.[] .word_count] | {least: min, most: max, average: (add / length)}'
{
  "least": 24,
  "most": 2792,
  "average": 475.0700808625337
}

Reading times

A post's reading time can be accessed by reading_time, so it's as easy to handle as the word counts:

jq '[.[] .reading_time] | {least: min, most: max, average: (add / length)}'
{
  "least": 1,
  "most": 11,
  "average": 1.8921832884097034
}

Gunning fog index

The Gunning fog index is available as the gfi property so there's no work to do to figure it out. It is, however, a floating point value and I want counts in each integer "bucket". That can be done with round.

jq -r '[.[] .gfi | round] | group_by(.) | .[] | "\(.[0]): \(length)"'
3: 1
4: 2
5: 3
6: 7
7: 30
8: 46
9: 67
10: 70
11: 75
12: 35
13: 18
14: 11
15: 1
16: 3
17: 2

As for working out the mean, median and mode... while I worked out the above queries by reading the docs, experimenting, and using Gemini on occasion to either help me understand an error message or to explain why an approach works the way it did, I'm going to have to leave this one 100% to Gemini. Here's its approach to using jq to work out those averages:

jq '
  [ .[] | .gfi | select(. != null) ] as $raw_gfi
  | [ $raw_gfi[] | round ] as $rounded_gfi
  | ($raw_gfi | length) as $count

  # 1. Mean Calculation
  | (($raw_gfi | add) / $count) as $mean

  # 2. Median Calculation
  | ($raw_gfi | sort) as $sorted_gfi
  | (if $count % 2 == 1 then
       $sorted_gfi[($count - 1) / 2]
     else
       ($sorted_gfi[($count / 2) - 1] + $sorted_gfi[$count / 2]) / 2
     end) as $median

  # 3. Mode Calculation (using the rounded values)
  | [ $rounded_gfi
      | group_by(.)
      | map({gfi: .[0], frequency: length})
      | sort_by(.frequency)
      | reverse
      | .[]
    ] as $frequencies
  | [ $frequencies[] | select(.frequency == $frequencies[0].frequency) | .gfi ] as $modes

  # Final Object Assembly
  | {
      count: $count,
      mean: $mean,
      median: $median,
      mode: $modes
    }
'
{
  "count": 371,
  "mean": 9.908842231503396,
  "median": 9.979198312236287,
  "mode": [
    11
  ]
}

As of the time of writing: that's bang on what I get in the stats. Honestly though, by this point, I think I'd be reaching for Python or something similar to do this sort of work. For sure, I can't say if this is a good jq query, if it's in any way idiomatic, or even if it's error-free. The numbers match what BlogMore says though.

Conclusion

This has been a useful exercise in getting to know a little more about jq, and I can see myself reaching for it to do quick little jobs now that I've finally taken some time to dive into it. As it turns out, it's also been a useful little audit of the content of the stats page because I've even found a bug that needs addressing; so that's a bonus.

New GitHub Copilot billing is popular

1 min read; 8 GFI

So today is the day, today is when GitHub Copilot swaps to its new billing system. Watching the relevant subreddit suggests this might not be popular.

Some folk think it isn't the smartest move.

Not a good choice

Some don't feel too friendly towards it any more.

Friendship ended

It looks like some of those friendships have lasted a while.

2021-2026

Some saw the opportunity to create content out of the situation.

A cancellation video

Some have figured out that the thing that costs money, costs money.

It is too expensive

Someone used up half their monthly allowance on just 8 requests.

Half used after 8 requests

Although, of course, there's always someone who has to do it better.

Half after 1 request

To be fair though, at least one person loves the new system.

Love the new system

As for my subscription, which came about after I initially experimented with free access to the tool, I've not actually cancelled yet, but I can't see me making use of it much more. I might try a couple of prompts with it, along the lines of what I was doing while working on BlogMore, just to get a feel for how different the usage is now.

Meanwhile, though, I've found that I'm getting on a lot better with Antigravity and getting the bits done I want to do. I suspect this is how I'll keep tinkering with BlogMore, until Google come to their senses anyway.

BlogMore v2.36.0

1 min read; 10 GFI

Another quick update to BlogMore: this time doing a little bit of tidying up in support of my use of it over on my photoblog.

The new items I've added to the stats page are working out really well, but over on the photoblog they were lacking a bit in a couple of areas. Because no post on the photoblog has any body text -- it's just title, category, tags and image -- things like the word count and yearly focus didn't really have any content. The minimum and maximum word counts were zero, and every single year of the span of the blog had no focus whatsoever.

To clean this up I've made it so that the word count section just doesn't show at all if the minimum and maximum word counts are both zero. I've also changed how the corpus of words for the yearly focus works too. If a particular post has no body text, the text that is used falls back to using the title of the post and the tags. With that change, the yearly focus list for the photoblog goes from being completely empty, to looking quite informative.

Year focus for the photoblog

One other change in v2.36.0 is a small fix to headings as they appear in user-supplied pages. While it was possible to grab an anchor for a heading so you could link to it, the character that should appear on hover, to help make this obvious, wasn't appearing on those pages. That's now fixed.

BlogMore v2.35.0

1 min read; 9 GFI

Well, I wasn't expecting to release a new version of BlogMore quite so soon, given I made a release earlier today. But an issue came in that the Bluesky icon wasn't showing in the socials section on someone's blog, despite the fact that the docs suggested it should work.

Sure enough, the icon wasn't there despite the docs giving Bluesky as an example that should work. A quick check, and the reason for the problem was clear: the code was using a version of FontAwesome that didn't have that icon. An easy fix.

It's interesting to note that Copilot/Claude (which is what did this work originally) produced one thing in the code, and then made an unsupportable claim in the documentation. Furthermore, it didn't construct the tests to at least match the documentation's claims. None of this is a surprise, of course, but it is a good illustration of the fact that agent-based coding is just fancy auto-complete with no real ability to maintain a coherent view of the whole project.

Anyway, v2.35.0 is now available and I too can have a Bluesky link in my socials. Oddly, I hadn't thought to add it before now.

A full month of blogging

1 min read; 11 GFI

I've just realised that, somehow, I've managed to post something on this blog, every single day this month.

May 2026

A large part of this is, of course, because I've been doing a lot of stuff on BlogMore, but there's no getting away from the fact that BlogMore exists and I feel compelled to make use of it, and also that blogmore.el helps make it a lot easier to kick off and edit a post.

Thinking back to other blogs I've maintained over the past couple of decades, I don't think I've ever come close to this. Last month did come close, broken only by the couple of days I was chilling in Whitby. The month before also came close, minus 2 days where I just didn't have something to write.

I don't imagine this will last; in fact, I know for sure that there will be fewer posts next month (I have a trip coming up). What I do know is that I feel more compelled to jot something down when an idea turns up, and I'm enjoying the habit of blogging more frequently. While I expect this run to calm down, I hope I don't fall back to leaving it months at a time before opening a fresh Emacs buffer and kicking off some new Markdown.

BlogMore v2.34.0

1 min read; 12 GFI

I've released BlogMore v2.34.0. This is a small update to make some changes to the newly-added dump command.

The first change is a small fix to the url_path property, which wasn't being populated; now it is.

The second change adds two new properties to the output which relate to links that can be found inside posts: internal_links and external_links. As the names suggest, the first gives you a list of all the internal links that can be found in a given post, with the values given being the same format as the id used for every post in the dump. For example:

"internal_links": [
  "posts/2026/05/2026-05-20-blogmore-v2-25-0.md",
  "posts/2026/05/2026-05-22-blogmore-v2-26-0.md"
]

This should give everything needed to write tools that do things similar to the back-links system in BlogMore itself.

The list of external links is, obviously, a list of all the links in the post that are external to the blog. It looks like this:

"external_links": [
  "https://blogmore.davep.dev/",
  "https://validator.w3.org",
  "https://json-ld.org/",
  "https://microformats.org/wiki/microformats2",
  "https://microformats.org/wiki/rel-me"
]

There is, of course, some overlap with the link dumping command, but given that the information is available it seemed to make sense to provide it here; it also means that it's available in a more structured form.

Also providing this sort of information in the JSON output means there's a lot of flexibility when it comes to analysing all the posts in my blog. For example, I can now easily satisfy my curiosity if I want to know exactly which posts in my blog have no links whatsoever.

blogmore dump | jq -r '.[] | select((.internal_links | length) == 0 and (.external_links | length) == 0) | .id'

posts/2015/2015-06-18-a-mild-chrome-annoyance.md
posts/2015/2015-06-23-and-now-for-some-ios.md
posts/2015/2015-07-01-odd-ipod-update.md
posts/2015/2015-08-03-best-update-ever.md
posts/2015/2015-09-04-unknown-promo.md
posts/2015/2015-10-19-microsoft-accounts.md
posts/2015/2015-11-11-voice-search-failing-on-nexus-6.md
posts/2015/2015-11-12-i-miss-until-next-alarm.md
posts/2016/2016-04-28-i-now-own-a-macbook.md
posts/2017/2017-12-12-on_to_something_new.md
posts/2020/2020-06-24-swift-til-3.md
posts/2020/2020-06-26-switch-til-5.md
posts/2020/2020-06-28-swift-til-7.md
posts/2020/2020-07-05-swift-til-10.md
posts/2022/2022-06-03-failed-successfully.md
posts/2023/2023-07-21-encouragement-i-guess.md
posts/2023/2023-07-29-home-pod-stuck-installing.md
posts/2023/2023-10-20-constant-siri-voice-loss.md
posts/2026/05/2026-05-04-my-new-favourite-game-on-steam.md
posts/2026/05/2026-05-11-steam-controller-is-close.md
posts/2026/05/2026-05-25-this-is-not-fun.md

Sure, I don't know what I'd do with this information, but at least I can easily ask the question.

Tidying and spelling

2 min read; 11 GFI

Since kicking off the work on BlogMore and blogmore.el, I've absolutely found that I've reduced the friction involved when it comes to writing a quick (or not so quick) blog post. I've also found that I want to go back and tidy up lots of my old posts. Over the past few weeks I've gone and cleaned up the size and positioning of images; converted most images to WebP format; cleaned and consolidated the tags used; hunted down and fixed broken internal links; and a few other things besides.

Another thing I want to do is go back and hunt down, and clean up, typos and spelling mistakes, and the like. While I'm careful to try and not make any errors when typing out a post, and while I've always made a point of reading my posts back to try and catch problems, I've not always been successful. Sometimes I'm just blind to the errors, sometimes I'm just rushing. There's over a decade of mistakes on this blog.

So, with this in mind, I've added a couple of little tools to the build environment for this blog to help me go back and catch problems that might need addressing.

The main tool is a script for running aspell over all the Markdown and building a list of errors. This shows the names of the Markdown files that have errors, and lists the unknown words for them. For example:

=== content/posts/2019/2019-11-04-my-pylint-shame.md ===
flycheck
prepending
whitespace

=== content/posts/2020/2020-08-23-the-pep-8-hill-i-will-die-on.md ===
parens
whitespace

=== content/posts/2020/2020-06-22-swift-til-1.md ===
backticks

=== content/posts/2020/2020-06-14-my-journey-to-the-dark-side-is-complete.md ===
Macbook
Macbooks
scrollbars

=== content/posts/2020/2020-01-19-dnote-el.md ===
dnote
Dnote

=== content/posts/2020/2020-01-11-where-i-live-and-work.md ===
adwaita
eshell
powerline

This alone makes it nice and easy to go back and clean up some obvious issues. A problem I ran into though was that I was getting a lot of false reports for things in the front matter of the files (especially parts of the cover: file name) and also in the end-of-file comments I like to use. So, with a little help from Gemini (because it's a moment since I last wrote any awk in anger), I wrote a filter to "clean" the Markdown content before running it through aspell.

Already, using this setup, I've caught a few things that deserved cleaning up, and because there will be a lot of words that are correct but particular to this blog and what I write about, I'm also building up a local ignore list.

While this setup isn't going to make the content of this blog error-free, it should give me everything I need to go back and slowly improve some of the older text, and to harmonise some of the spellings of some technical terms.

BlogMore v2.33.0

1 min read; 11 GFI

I've released v2.33.0 of BlogMore, which extends the stats page some more, and also adds a tool so a user can do all sorts of fun things with the raw data of their posts.

The addition to the stats page is a list of years along with the top five words that characterise the focused subject for those years. This is done using TF-IDF. While the results for my blog don't come as a surprise, I am pleased to see that it does turn out pretty much how I would have expected:

Blogging focus per year

This feels like another fun way to learn something about the post history for your own blog.

Which got me thinking: there's probably any number of other niche and bizarre things that could be done with the content of a blog to gain some insight as to its history, and I really shouldn't just keep adding more and more things to the stats page. But what if I wanted a way to run some code over all the posts? Wouldn't it be useful if I could get all of the parsed post data in JSON format so I can play with it?

With that idea in mind, I added the dump command. When run, it will print to stdout a full dump of all of your posts, as JSON, so you can then write your own nifty tool to read it back in and do any number of interesting checks, tests, reformats or manipulations. For example, if I wanted to use jq to pull out the metadata for this particular post I could:

blogmore dump | jq '.[] | select(.id == "posts/2026/05/2026-05-29-blogmore-v2-33-0.md") | .metadata'

which gives me this result:

{
  "title": "BlogMore v2.33.0",
  "date": "2026-05-29 14:14:10+0100",
  "category": "Coding",
  "tags": "BlogMore, Coding, PyPI, Python",
  "cover": "/attachments/2026/05/29/focus.webp"
}

I can see this being pretty useful to blogmore.el at some point, if I want to add some better querying tools or similar (not that I'd want to run a dump every time, it does require that a full parse and render has to happen).

Hopefully this will be useful to someone else. I know I'll be toying with it to find out other things about my posting history.

BlogMore v2.32.0

1 min read; 11 GFI

Now that BlogMore is so obviously feature-complete and there's nothing else I could possibly need to add to it... here's v2.32.0 with a quick new feature.

The work to add the relatedness of posts got me wondering about other things that could be measured about the content of posts, and a bit of reading and looking for ideas resulted in my (re)discovering the Gunning fog index. That got me thinking it might be interesting to see what the highest and lowest values were for my blog, and perhaps what the average is, perhaps even get a feel for the distribution.

So now BlogMore has a with_gfi configuration setting (and a matching command line switch). When enabled, it adds the score for a post next to the reading time (if that's enabled, if not it's in the same location as the reading time would be). It also adds a load of stats to the stats page.

The GFI stats for my blog

As well as this change, I've also tweaked the stats page so it's easier to get a link to a particular section. As with headings in posts, when you hover over a section heading you'll get a little anchor you can click on that will result in you visiting that particular heading. That should make it easier to highlight a particular statistic, if you need to.

BlogMore v2.31.0

1 min read; 11 GFI

A quick update to BlogMore, with a small accessibility improvement, and also a whole new set of data for the graph.

The accessibility update follows on from a change published yesterday. In that release I did a little bit of work to disambiguate links to categories and tags that had the same link text; this time I've given the same aria-label treatment to the post dates where they link to the year, month, and day archives. While we're never really going to get ambiguous year links (unless there are also links in the body of a post that are four digit numbers), it's highly likely we're going to get the occasional month and day that are ambiguous.

So now those links make it clear, with aria-label, what's being linked to.

The other main change in v2.31.0 comes from a thought I had last night about the new "related posts" feature I added yesterday. Currently the graph shows the relationships the posts have based around their common categories and tags:

The graph based on links

But now I've got this code that can work out how posts might be related based on their text content. What might that look like? It might be cool to be able to switch the graph view between the two sets of data, if the blog owner is building with related posts turned on.

The graph based on related posts

While I'm not sure it really tells me anything, I like that I have yet another method of (re)discovering posts.