Posts in category "AI"

But is the code that bad?

5 min read

There is, obviously and understandably, a lot of conversation online about AI and coding and agents and all that stuff. Much of it I get, much of it I agree with, I share the vast majority of the concerns. The impact on people, the impact on society, the impact on the environment, the impact on security... there's a good list of things to worry us there.

The one that crops up a lot though, that I don't quite get, is the constant claim I see that at best AI tools produce bad code, and at worst they produce unworkable code. That really isn't my recent experience.

Sure, going back to 2023 or 2024, when I first started toying with these new chatbot things some folks were raving about, the output was laughable. I can remember spending some fun times trying to coax whatever version of ChatGPT was on the go at the time into writing workable code and being amused by just how bad it was.

Even back in October last year, when I first tried out the free Copilot Pro that GitHub had given me to play with, I tried to get it to build a Textual application for me and it was terrible. The code was bad, it didn't really know how to use Textual properly, the application I was trying to get it to write as a test barely worked. It was a disaster.

A month later, in November of last year, I had a second go and better success. That time the (still not released, perhaps one day) application I was building was Swift-based and worked really well, but I can't really comment on the quality of the code or how idiomatically correct the code is in respect to the type of application it is (it's a wee game that runs on iOS, iPadOS, macOS).

By the time I tried my first serious experiment things seemed to be a little different. The code actually wasn't bad. It wasn't good, it was far from good, but it wasn't bad. Also, because it was Python, I was in a good place to judge the code.

Since I've started working on BlogMore I've noticed issues such as:

  • Lots of repetitive boilerplate code.
  • Lots of magic numbers.
  • Lots of magic strings.
  • Functions with redundant and unused parameters.
  • A default state of just adding more and more code to one file.
  • A habit of writing least-effort-possible type hints.
  • A habit of sometimes taking a hacky shortcut to solve a problem.
  • A habit of sometimes over-engineering a solution to a problem.
  • A weird obsession with importing inside functions.
  • An occasional weird obsession with guarding some imports with TYPE_CHECKING to work around non-existent circular imports.
  • An unwillingness to use newer Python capabilities (I've yet to see it make use of := without being prompted, for example).
  • A tendency to write what I would consider less-elegant code over more-elegant code.

The list isn't exhaustive, of course. The point here is that, as I've reviewed the PRs1, and read the code, I've seen things I wouldn't personally do. I've seen things I wouldn't personally write, I've seen things I've felt the need to push back on, I've seen things I've fully rejected and started over. Ultimately BlogMore isn't the code I would have written, but at the moment it is the application I would have written2.

So, here's the thing: every time I see someone writing a negative toot or post or article or whatever, and they talk about how the code it produces is unworkable, I find myself wondering about how they formed this opinion. Are they just writing the piece for the audience they want? Are they writing the piece based on their experience from months to years back, when these tools did seem to still be laughably bad? Are they simply cynically generating the piece using an LLM to bait for engagement? When I see this particular aspect of such a post it's a bit of a red flag about where they're coming from, kind of like how you suddenly realise that someone who seems to speak with authority might be full of shit when they start to spout questionable "facts" on a subject you understand well.

But wait! What about that list of dodgy stuff I've seen while building BlogMore with Copilot? What about all the reading and reviewing I've had to do, and what about the other crimes against Python coding I can probably still find in the codebase? Surely that is evidence that these tools produce terrible, unworkable, unusable code?

I mean, okay, I suppose I could reach that conclusion if I'd had a massively atypical experience in the software development industry and had never had to review anyone else's code, or had never needed to work on someone else's legacy code. Is what I'm seeing out of Copilot something I'd consider ideal code? Of course not. Is it worse than some of the worst code I've had to deal with since I started coding for a living in 1989? Hell no!

From what I'm seeing right now I'm getting code whose quality is... fine. Mostly it does the job fine. Often it needs a bit of coaxing in the right direction. Sometimes it gets totally confused and goes down a rabbit hole which needs to just be blocked off and we start again. Occasionally it needs rewriting to do the same thing but in a more maintainable way.

All of which sounds very familiar. I've had times where that describes my code (and I would massively distrust anyone who says they've never had the same outcomes in their time writing code). For sure it describes code I've had to take over, maintain or review.

It's almost like it was trained on lots of code written by humans.

Meanwhile... not every instance of using these tools to get code done needs to be about writing actual code. More and more I'm finding Google Gemini (for example) to be a really handy coding buddy and faster "Google this shit 'cos I can't remember this exact thing I want to achieve". I'll ask, I'll almost always get a pretty good answer, and then I can generally take that snippet of code and implement it how I want.

I've seldom had to walk away from that sort of interaction because it was getting me nowhere.

All of which is to say: I remain concerned about a great many things in the AI space at the moment, but I'm also as equally suspicious of someone who just flatly says "and the code it produces just doesn't work". If that's part of an article or post I'm left with the feeling that the author put zero actual effort into forming their opinion, let alone actually writing it.


  1. To varying degrees. Sometimes I have plenty of time to kill and I read the PR carefully, other times I glance it over, be happy there's nothing horrific there, and then decide to push back or merge based on the results of hand-testing and automated testing. 

  2. To be fair, it's the application I would still be writing and would be some time off finishing; there's no way it would be as feature-complete as it is now had I been 100% hand-coding it. 

NGMCP v0.2.0

1 min read

The experiment with building an MCP server continues, with some hacking on it happening over a couple of hours while killing time in an Edinburgh coffee shop.

It is, of course, a solution looking for a problem, and I suspect I'm the only person who will ever use it, and even then only as a test, but building it is proving interesting.

The main changes in v0.2.0 are:

  • I turned the current search_guide tool into a line_search_guide tool (because that's what it was doing: a line-by-line search).
  • I added a body_search_guide tool that treats all the lines in an entry as a single block of text and then does the search in that (so searches over line breaks will work).
  • I added a read_entry_source tool that, rather than rendering the entry of a guide as plain text with all the markup removed, it instead delivers the underlying "source" for the entry; something that could be useful if you wanted to get an agent to convert it into another marked-up body of text.
  • I added a markup glossary resource, which technically tells an agent everything it needs to know about Norton Guide markup.

The latter one is interesting. I added it and did some experimenting locally and it seemed to be helpful and I could ask questions about markup and Copilot seemed to use it. Meanwhile, having installed v0.2.0 globally on my machine, and having enabled it, I'm finding that Copilot seems to have zero clue about the markup and instead is using the server to go off and read the guides to work out the markup1.

On the other hand, the new "get source" tool seems to work a treat.

Peeking at the source for an entry

So I suspect I still have some reading/experimenting to do when it comes to resources, so I can better understand why I'd want to provide them and what problem they solve.


  1. All credit to it: it did find CREATING.NG and read the markup out of that. 

NGMCP - An MCP experiment

2 min read

Recently I've been thinking that it would be interesting to get to know a little about the Model Context Protocol and see what it's about and get a feel for how useful it might be, if at all, for anything I do.

As always happens when I want to try out something new, I reached for a problem I know well so I don't have to get bogged down in solving the problem itself. As almost always happens, I decided I should base it around Norton Guides.

Part of the point of MCP seems to be providing an interface over sources of data and actions, that an LLM might not otherwise be able to cope with, and so it sounded to me like providing a bridge to the content of Norton Guide files would be a perfect test. Of course, this isn't the first time I've bridged LLMs and NG files, but this is obviously intended to be a more generic solution than throwing a Markdown file at NotebookLM.

Earlier this afternoon I sat down and did some reading, and then decided to throw the problem at GitHub Copilot. I told it I wanted to use my NGDB library as the core of the tool, and that it should wrap it up with FastMCP. The initial result was... a bit of a mess. It sort of worked, sort of, but it also seemed to try and put together a project that mostly looked how my Python repos look, but with some bits just wrong.

I did some cleaning up, did some testing, did some tweaking, and eventually I had something working.

Asking what NGMCP can do

So far I've given the code a fairly quick read over, and I can see what it's doing and how it's going about this. This approach obviously has the disadvantage that I didn't hand-write it so there's still a lot to read to really appreciate what's going on; on the other hand, it does have the advantage that it's implemented a tool based on my library so I know what to expect it to be doing.

There will be more code reading happening, and I also intend to look to tidy up the code more and perhaps hand-add some more features.

Looking at the credits of a guide

I very much doubt that this particular MCP server is going to be any use to anyone, but as a proof of concept it works well for me. If I were in a position of needing to build something genuinely useful, I now have a start and a vague idea.

Reading some text from a guide

On the other hand: once again, as with other projects I've done related to Norton Guides, this is a tool that helps keep the content available and accessible; that alone is one reason for me to tidy this up and move it towards v1.0.0 and keep it maintained.

If you fancy having a play, some (currently Copilot-generated) documentation can be found on the server's dedicated site. When I get a bit more time I'm going to flesh this out.

GitHub Copilot wants our interaction data

3 min read

I guess it was inevitable1, but yesterday GitHub announced a new opt-out approach to learning from people's interactions with Copilot. I don't have anything novel or insightful to say on this switch, and I'm sure folk with better-informed opinions have already rushed out posts and articles about this, but I did want to jot down just how curious I am to see this roll out.

For starters: for me this feels like one of those things that will get a lot of backlash, and in a day or so GitHub will say they're pausing rolling this out while they reevaluate this approach2. Then, eventually, they roll it out anyway after a "period of consultation with the community". That sort of thing.

I've not read further this morning, but before going to bed last night it wasn't a happy time in the comments section of the FAQ. I can also see why some would be cynical about this change, given the tone of some of the questions and answers in that FAQ. I'll hand it to them: they're pretty candid and honest with the FAQs, but kinda yikes too.

A bad time in the FAQ

Here's the key thing I'm curious about, and which I'll be thinking about and watching for movement on in the next few days: all the talk here seems to be about protecting the privacy of the proprietary code of businesses3. That... is understandable, from a business point of view, from a commercial adoption point of view, from a "we want all software engineering departments to use Copilot" point of view. But how the heck are they really going to manage that?

In the comments in the FAQ this explanation stood out:

We do not train on the contents from any paid organization’s repos, regardless of whether a user is working in that repo with a Copilot Free, Pro, or Pro+ subscription. If a user’s GitHub account is a member of or outside collaborator with a paid organization, we exclude their interaction data from model training.

This seems somewhat unclear to me. Let's walk this one through for a moment: my GitHub account is a member of a "paid organisation". My account is also my account, for my personal code, I've had it a long time and it's filled with a lot of FOSS repos and I keep adding more. So which scenario is the right one here?

  1. Because I'm currently a member of at least one "paid organisation" I'm always opted-out of this training no matter how the opt in/out setting is set and no matter what code I work on?
  2. Because I'm currently a member of at least one "paid organisation" I can opt in when working on code that is from a repository which is mine, but I'm opted out when I'm working on code from a repository belonging to the paid organisation?

I think it reads like it's #1. But then that seems rather odd to me because, if I go and look at my settings right now, I can elect to opt in/out of this training system. If the correct reading is #1 why not just disable that setting altogether and say below it that I'm opted-out because I'm part of a paid organisation?

Which sort of suggests we should perhaps read it as #2? If that, that raises all sorts of questions. How would Copilot know I'm working on code from such a repository? Sure, it's not impossible to infer if I am working within the context of a given repository, doing some fun stuff to work out the origin and so on, but it feels messy. It also feels like a scenario that could end up being incredibly leaky. It really would not be difficult to run into a scenario where I'm working on some non-Free code but in an environment where the licence isn't clear, or where it appears that the licence4 would permit such training.

ℹ️ Note

Editing to add: there is even a comment where it is acknowledged that someone could be working in such a way that it's impossible to know the provenance of the code: "Copilot ... can even work when you are not connected to any repo."

Or... perhaps there's a #3, or a #4, or so on, that I've not even considered yet. The fact that software engineering departments suddenly have to start thinking about this issue (yes, I know, it's been a background issue for a while but this really drags it out into the open) is going to make for a few interesting weeks, assuming people care about where their code ends up.

Who knows. Perhaps, in some strange way, this is how all software ends up being Free.


  1. And I think a bit of me is surprised that they weren't just doing it anyway. 

  2. This isn't a prediction, I'm just saying it feels like that sort of announcement. 

  3. It's not that simple, but to save getting into the deep detail... 

  4. I'm using licence here as shorthand for a lot of things to consider relating to who should have access to the code and how. 

Copilot rate limits

1 min read

Last night, while tinkering with another BlogMore feature request, I ran into the sudden rate limit issue again. As before, there was no warning, there was no indication I was getting close to any sort of limit, and the <duration> I was supposed to wait to let things cool down was given as <duration> rather than an actual value.

So this time I decided to actually drop a ticket on GitHub support. Around 12 hours later they got back to me:

Thanks for writing in to GitHub Support!

I completely understand your frustration with hitting rate limits.

As usage continues to grow on Copilot — particularly with our latest models — we've made deliberate adjustments to our rate limiting to protect platform stability and ensure a reliable experience for all users. As part of this work, we corrected an issue where rate limits were not being consistently enforced across all models. You may notice increased rate limiting as these changes take effect.

Our goal is always that Copilot remains a great experience, and you are not disrupted in your work. If you encounter a rate limit, we recommend switching to a different model, using Auto mode. We're also working on improvements that will give you better visibility into your usage so you're never caught off guard.

We appreciate your patience as we roll out these changes.

So, in other words: expect to be rate limited more on this product we're trying to get everyone hooked on and trying to get everyone to subscribe to.

Neat.

I especially like this part:

We're also working on improvements that will give you better visibility into your usage so you're never caught off guard.

You know, if it were me, if I wanted to build up and keep goodwill for my product, I'd probably do that part first and communicate about it earlier rather than later.

I guess this is why I don't hold the sort of position that gets to make those decisions.

Too much work for Copilot?

1 min read

I don't know if it's just that GitHub Copilot is having a bad time at the moment, or if I've run into a genuine problem, but all isn't well today. After merging last night's result I kicked off another request related to a group of changes I want to make to BlogMore. It's a little involved, and it did take it a wee while to work on, but mostly it got the work done.

Again, as I said in the earlier post, I won't get into the detail of these changes yet, but they're fairly extensive and do include some breaking changes, so it's probably going to take a wee while to have it all come together. Claude's first shot at the latest change was almost there but with the glaring bug that it did all the work but didn't actually add the part that reads the configuration file settings and uses them (yeah, that's a good one, isn't it?).

So I asked it to fix it. It went off, worked on the issue, and then suddenly...

Denied

This surprised me. After the past few weeks I've had sessions where I've requested it do things way more frequently than this morning. I'm also nowhere near out of premium requests either:

Number of requests left

While the error, as shown, might be valid and might be down to my actions, it's massively unhelpful and doesn't really explain what I did to cause this or even how I can remedy it. This is made all the more frustrating by the fact that it seems to be saying I need to wait <duration> to try again. Yes, literally a placeholder of <duration>. O_o

One thing is for sure: this is another useful experiment in the current experiment. It's worth having the experience of having the tool screw with the flow. It doesn't come as a surprise, but it's a good reminder that using an agent that is hosted by someone else means you fully rely on their ability to keep it working, the whims of their API limits, and perhaps even your ability to pay.

When your model leaves you

4 min read

Yesterday evening, after dinner, but just before loading up a game released just a few hours earlier, I decided to kick off a change to BlogMore. Part of the ongoing experiment is this convenience aspect: where I can get some work going and then go and do something entirely different.

The nature of the change itself isn't important (I'll write about it when I release an update), but something that happened is. As usual, I did the prompt as an issue, and assigned it to Copilot. The first thing that was curious was that, after around 5 minutes, despite it having added the 👀 reaction (which it seems to use to indicate the agent has seen it has work to do), nothing happened. I didn't see an agent kick off doing the work. Eventually I gave up waiting and opened a Copilot session and asked it to set about dealing with the issue I'd raised.

It did as I requested, but apparently alongside another agent which had suddenly started doing the exact same work. Thinking it was a glitch (it's not like GitHub hasn't been having some trouble of late) I stopped the newest one and let the "original" go about its work.

A wee bit later, just before I started up my game and the stream, I checked in on how it was doing. It had finished, but in a quick test, I noticed a small bug. I prompted it to fix the problem, closed the tab, and went about the real business of killing bugs.

Fast forward a couple of hours, I was done with getting my arse kicked on Klendathu, I packed away my controller and headset and opened GitHub again to see where Copilot was at. It wasn't good:

@davep The model claude-sonnet-4.6 is not available for your account. This can happen if the model was disabled by your organization's policy or if your Copilot plan doesn't include access to it.

You can try again without specifying a model (just @copilot) to use the default, or choose a different model from the model picker.

If you want to contact GitHub about this error, please mention the following identifier so they can better serve you: 79b81d32-e26a-4898-bd41-4bba088d08f6

Wait... what? I've been using this for weeks now, as best as I can tell I've generally been making all the changes and improvements using Claude Sonnet 4.6; there's never been an issue. Then, suddenly, in the middle of a PR, I don't have it?

Quickly checking elsewhere, sure enough, I had access to almost no models. I can't remember what there was, but it wasn't much and all the Claude ones had disappeared. Even if I tested in the Copilot CLI1 I saw a very limited set of models.

Around this time I had two reactions: one was something like "cool, this is an important part of the experiment, knowing how it goes if your models are taken from you", the other was a curious "but Claude and I have an understanding about this codebase I can't trust some other model I've not been using".

As it was getting late into the evening and I still wanted to watch an episode of Stargate SG-1 (yes, I'm doing a full rewatch given it's on Netflix) I closed the MacBook and decided to check in on the issue in the morning.

Fast forward one SG1 episode later and, just before I headed to bed, I decided to do a quick search. While it could be a problem with my own account, it felt like it was more of a general issue. It was more of a general issue. At that point (around 23:20 UTC), checking in the GitHub app on my phone, I could see that some, if not all, of the models I was used to seeing, had come back.

As of this morning, as of time of writing, it's all looking back to how it was.

All the models back

All of which is a great reminder, and something useful in the experiment: what does happen when some third party takes away the models you're using to get your work done? In the time I've been building up BlogMore I've come to trust the quality of Claude Sonnet (in the sense that I know when and where I have to pay closer attention to what it's done, and where it'll very likely have done a pretty good job2), so finding that I'd possibly have to switch to some other model that I've had no experience with... I genuinely had a moment of concern about how and where I was going to take BlogMore.

Ultimately it's not actually a serious concern for me: while I aim to maintain it for a very long time to come (it is how I'm creating the site for this blog after all), BlogMore isn't that critical. Moreover, I know I could cease using Copilot to create and maintain it and I could tidy up the code and hand-update and hand-manage it. There's a reason why I decided to really dive into this using a language I'm extremely confident with.

But for those applications some might now be relying on, developed by someone keen but as yet unskilled, what way forward for them if such a situation were to happen and not be resolved?


  1. Which I'm not really using at the moment, but do have installed to experiment with. 

  2. So just like working with humans, oddly enough. 

You get what you pay for

2 min read

Just recently I saw a post go past on Mastodon, complaining about the author's perception that there was a breakdown of trust in the FOSS world, in respect to the use of AI to work on FOSS projects, or at least the willingness to accept AI-assisted contributions. The post also highlighted the author's reliance on FOSS projects and how they're driven by ethical and financial motivations (some emphasis was placed on how they had no money to spend on these things so it wasn't even necessarily a choice to FOSS up their environment).

This was, of course, in response to the current fuss about how Vim is being developed these days.

I don't want to comment on the Vim stuff -- I've got no dog in that fight -- but something about the post I mention above got me thinking, and troubled me.

Back when I first ran into the concept of Free Software, before the concept of Open Source had ever been thought of, I can remember reading stuff opposed to the idea that mostly worked along the lines of "you get what you pay for" -- the implication being that Free Software would be bad software. I think it's fair to say that history has now shown that this isn't the case.

But... I think it's fair to say that you do get what you pay for, but in a different sense.

If your computing environment is fully reliant on the time, money and effort of others; reliant on people who are willing to give all of that without the realistic expectation of any contribution back from you; I feel it's safe to say that you are getting a bloody good deal. To then question the motivations and abilities of those people, because they are exploring and embracing other methods of working, is at best a bad-faith argument and at worst betrays a deep sense of entitlement.

What I also found wild was, the post went on to document the author's concerns that they now have to worry about the ability of FOSS project maintainers to detect bad contributions. This for me suggests a lack of understanding of how non-trivial FOSS projects have worked ever since it was a thing.

I mean, sure, there are some projects that are incredibly useful and which have a solo developer working away (sometimes because nobody else wants to contribute, but also sometimes because that solo developer doesn't play well with others -- you pick which scenario you think is more healthy), but for the most part the "important" projects have multitudes working on them, with contributions coming from many people of varying levels of ability. The point here being that, all along, you've been relying on the discernment and mentoring abilities of those maintainers.

To suggest they're suddenly unworthy of your trust because they might be "using the AIs" is... well, it feels driven by dogma and it reads like a disingenuous take.

Don't get me wrong though: you are right to be suspicious, you are right to want to question the trust you place in those who donate so much to you; almost always this is made explicit in the licence they extended to you in the first place. But to suggest that suddenly they're unworthy of your trust because they're donating so much value to you in a way you don't approve of...

...well, perhaps it's time for you to pay it back?

An eager fix

1 min read

An eager fix

Yesterday, while looking at starting to post to my photoblog again, I noticed I'd missed a trick when I added the first and second sets of photos when I created seen-by.davep.dev: I'd left off any cover frontmatter from all of the posts!

While I doubt it's super important -- I can't imagine people will be sharing photos from the blog after all, especially not older ones -- it felt like I'd failed to use a useful feature that I'd made sure BlogMore had.

This was obviously easy to fix. I could just write a tool that would go through all of the Markdown files, find the image that is being displayed in the post, pull out the path to the file, and add a cover to the frontmatter. Not exactly the hardest thing to write, but kind of boring for a one-off fix.

So this felt like another good time to get Copilot to do the hard work. Liking this plan, I wrote an issue for the prompt and set it to work.

The result was unexpected, and in retrospect this is how I should have approached it in the first place; Copilot wrote the script, then ran it, and then submitted the PR including the script and all of the changes after running it! It's like it was super eager to do the fix so went ahead and just did it.

The resulting PR

This, for me, highlights a trick/approach I'm still not fully mindful of: where I might once have written some throwaway code to do a job, run the code, and then made use of the result, when it comes to using an agent I always have the option of saying what I want done to the content of the repository and just let it do it.

When I'm next faced with a problem like this, I think this is the approach I'll take: rather than ask it to write the tool to do the work, I'll just say what the work is I want done and let it go about it. This feels like where an agent should shine and where it's really useful.

Meanwhile I can get on with the fun stuff.

Documentation generation

1 min read

While I've written a lot of documentation in my life, it's not something I enjoy. I want documentation to read well, I want documentation to be useful, I want documentation to be accurate. I also want there to be documentation at all and sometimes the other parts mean it doesn't get done for FOSS projects1.

When I started the experiment that is BlogMore, I very quickly hashed out some ideas on how it might generate some documentation, and the result was okay. In fact, if anything, it was a bit too much and there was a lot of repeated information.

So, this morning, before I sat down for the day's work, I quickly wrote an issue that would act as a prompt to Copilot to rewrite the documentation. This time I tried to be very clear about what I wanted where, but also left it to work out all the details.

The result genuinely impressed me. While I'll admit I haven't read it all in detail (and because of this have left the same warning right at the start), on the surface it looks a lot clearer and feels like a better journey to learning how to use BlogMore.

blogmore.davep.dev hosts the result of this.

I have a plan to work through the documentation and be sure it's all correct and all makes sense, but if it's as correct and as useful as I think, I might have to consider the idea of taking this approach more often. Writing down the plan for the documentation and then letting it appear in the background while I get on with my actual day makes a lot of sense.

I fear I might be warming to these tools, a little bit. :-/


  1. Although I've made a point of doing it for almost every one of my recent Textual-based projects.