Posts from April 10, 2026

obs2nlm v1.2.0

1 min read

Three months back I released obs2nlm, a tool that takes an Obsidian vault and turns it into a single Markdown file so it can be used as a source for NotebookLM.

Since then I've been using it a lot and it's working out really well.

Meanwhile, one of my vaults has started to creep up towards the documented word limit for a single source in NotebookLM (500,000 words). Right now it's sitting at around 75% and is steadily creeping up.

So, with this in mind, I've made a change I've been planning from the start and have added a --split option. If used, if the generated file looks like it's going to hit the word limit, a second (or more) file will be created. The naming scheme is simple enough: if you ask obs2nlm to create an output file called dirt.md and it needs to run over, it'll then create dirt-2.md, dirt-3.md, and so on. The idea then is that, rather than upload that single Markdown file as a source, you upload all of the generated Markdown files.

Given you get up to 50 sources per notebook, this should see me right for any reasonable vault. As for if it will affect the quality of the results I get when I query the notebook... that's hard to say until I find myself in that situation. If Google are to be believed it shouldn't be an issue, and the alternative is to fall foul of the limit so this seems like the only sensible solution.

I've also added a --dry-run command line switch too; this should be handy for checking how big a vault is when compared to the word limit, without actually generating any files.

BlogMore v2.12.0

1 min read

Since kicking off building BlogMore and swapping this blog over to using it I've been playing with the Google Search Console. It's something I've not used in decades, but felt it was time to dip back in again and understand how it works these days.

There are two motivations for this: the first is that, when it comes to my day job, I have cause to interact with people who do use the search console a lot, and so it's worth understanding what they work with and why it matters to them. The second reason is it's a reasonable measure of how good a site BlogMore generates.

Page index inclusion progress

So far the results have been pretty good, and the console has helped me find oddities and things that need tidying up.

So this release of BlogMore includes a couple of changes that stem from looking at the latest updates in the console.

The first is that I've cleaned up how the sitemap.xml gets generated. I noticed that if I had any HTML inside my extras directory it was turning up in the sitemap; something I didn't intend and didn't want. So that's now fixed: only pages generated by BlogMore will appear in sitemap.xml1.

The second is that the stats page, despite being in the sitemap, had a noindex header for some reason. That's now been fixed. The only generated page I've intentionally set up so that it isn't indexed is the search page.

Finally, there's one change unrelated to the above: I realised that if you have with_read_time set to false, the reading time stats still appeared on the stats page; that seems unnecessary and unwanted on a site that doesn't show reading times. So, as of v2.12.0, that section of the stats won't show if reading times are turned off.


  1. Now I think about it, I suppose there might be occasion where someone wants extra HTML to appear in the sitemap. I might consider the idea of allowing extra entries to be declared via the configuration file.