Posts tagged with "code review"

But is the code that bad?

5 min read

There is, obviously and understandably, a lot of conversation online about AI and coding and agents and all that stuff. Much of it I get, much of it I agree with, I share the vast majority of the concerns. The impact on people, the impact on society, the impact on the environment, the impact on security... there's a good list of things to worry us there.

The one that crops up a lot though, that I don't quite get, is the constant claim I see that at best AI tools produce bad code, and at worst they produce unworkable code. That really isn't my recent experience.

Sure, going back to 2023 or 2024, when I first started toying with these new chatbot things some folks were raving about, the output was laughable. I can remember spending some fun times trying to coax whatever version of ChatGPT was on the go at the time into writing workable code and being amused by just how bad it was.

Even back in October last year, when I first tried out the free Copilot Pro that GitHub had given me to play with, I tried to get it to build a Textual application for me and it was terrible. The code was bad, it didn't really know how to use Textual properly, the application I was trying to get it to write as a test barely worked. It was a disaster.

A month later, in November of last year, I had a second go and better success. That time the (still not released, perhaps one day) application I was building was Swift-based and worked really well, but I can't really comment on the quality of the code or how idiomatically correct the code is in respect to the type of application it is (it's a wee game that runs on iOS, iPadOS, macOS).

By the time I tried my first serious experiment things seemed to be a little different. The code actually wasn't bad. It wasn't good, it was far from good, but it wasn't bad. Also, because it was Python, I was in a good place to judge the code.

Since I've started working on BlogMore I've noticed issues such as:

  • Lots of repetitive boilerplate code.
  • Lots of magic numbers.
  • Lots of magic strings.
  • Functions with redundant and unused parameters.
  • A default state of just adding more and more code to one file.
  • A habit of writing least-effort-possible type hints.
  • A habit of sometimes taking a hacky shortcut to solve a problem.
  • A habit of sometimes over-engineering a solution to a problem.
  • A weird obsession with importing inside functions.
  • An occasional weird obsession with guarding some imports with TYPE_CHECKING to work around non-existent circular imports.
  • An unwillingness to use newer Python capabilities (I've yet to see it make use of := without being prompted, for example).
  • A tendency to write what I would consider less-elegant code over more-elegant code.

The list isn't exhaustive, of course. The point here is that, as I've reviewed the PRs1, and read the code, I've seen things I wouldn't personally do. I've seen things I wouldn't personally write, I've seen things I've felt the need to push back on, I've seen things I've fully rejected and started over. Ultimately BlogMore isn't the code I would have written, but at the moment it is the application I would have written2.

So, here's the thing: every time I see someone writing a negative toot or post or article or whatever, and they talk about how the code it produces is unworkable, I find myself wondering about how they formed this opinion. Are they just writing the piece for the audience they want? Are they writing the piece based on their experience from months to years back, when these tools did seem to still be laughably bad? Are they simply cynically generating the piece using an LLM to bait for engagement? When I see this particular aspect of such a post it's a bit of a red flag about where they're coming from, kind of like how you suddenly realise that someone who seems to speak with authority might be full of shit when they start to spout questionable "facts" on a subject you understand well.

But wait! What about that list of dodgy stuff I've seen while building BlogMore with Copilot? What about all the reading and reviewing I've had to do, and what about the other crimes against Python coding I can probably still find in the codebase? Surely that is evidence that these tools produce terrible, unworkable, unusable code?

I mean, okay, I suppose I could reach that conclusion if I'd had a massively atypical experience in the software development industry and had never had to review anyone else's code, or had never needed to work on someone else's legacy code. Is what I'm seeing out of Copilot something I'd consider ideal code? Of course not. Is it worse than some of the worst code I've had to deal with since I started coding for a living in 1989? Hell no!

From what I'm seeing right now I'm getting code whose quality is... fine. Mostly it does the job fine. Often it needs a bit of coaxing in the right direction. Sometimes it gets totally confused and goes down a rabbit hole which needs to just be blocked off and we start again. Occasionally it needs rewriting to do the same thing but in a more maintainable way.

All of which sounds very familiar. I've had times where that describes my code (and I would massively distrust anyone who says they've never had the same outcomes in their time writing code). For sure it describes code I've had to take over, maintain or review.

It's almost like it was trained on lots of code written by humans.

Meanwhile... not every instance of using these tools to get code done needs to be about writing actual code. More and more I'm finding Google Gemini (for example) to be a really handy coding buddy and faster "Google this shit 'cos I can't remember this exact thing I want to achieve". I'll ask, I'll almost always get a pretty good answer, and then I can generally take that snippet of code and implement it how I want.

I've seldom had to walk away from that sort of interaction because it was getting me nowhere.

All of which is to say: I remain concerned about a great many things in the AI space at the moment, but I'm also as equally suspicious of someone who just flatly says "and the code it produces just doesn't work". If that's part of an article or post I'm left with the feeling that the author put zero actual effort into forming their opinion, let alone actually writing it.


  1. To varying degrees. Sometimes I have plenty of time to kill and I read the PR carefully, other times I glance it over, be happy there's nothing horrific there, and then decide to push back or merge based on the results of hand-testing and automated testing. 

  2. To be fair, it's the application I would still be writing and would be some time off finishing; there's no way it would be as feature-complete as it is now had I been 100% hand-coding it. 

Not so elegant

1 min read

One thing I've been noticing with my current experiment with GitHub Copilot is that it seems to know Python well enough to write code that gets the job done, and sometimes it knows it well enough to write more modern idiomatic Python code, but it also seems to write the inelegant version of it.

It's hard to pin down exactly, and of course it's a matter of taste (my idea of elegant might burn someone else's eyes), but on occasion, as I review the code, I find things that make me go "ugh".

Here's an example: there's a function that Copilot wrote to extract the first non-markup paragraph of an article (so that it can be used as a page description). One thing it needs to do is skip any initial images, etc. It takes a pretty brute force approach of looking at the start of each stripped line, but it gets the job done -- I can't really argue with that.

But here's how it does it:

# Skip markdown image syntax
if stripped.startswith("!["):
    continue

# Skip markdown linked image syntax ([![alt](img)](url))
if stripped.startswith("[!["):
    continue

# Skip HTML img tags
if stripped.startswith("<img"):
    continue

Now, this is good: it's using startswith. There are less-elegant approaches it could have used so I'll give it bonus points for using that method. The thing is though, it's testing each prefix one string at a time, pretty much rolling out a load of boilerplate code.

What bothers me here is that startswith will take a tuple of strings to test for. I find it curious that the generated code is idiomatic enough to know that startswith is a sensible option here, but at the same time it still writes the list of things to test out in a long-winded way.

This is exactly the sort of thing I'd call out in a normal code review. Technically, if this wasn't mostly a "let's see how it goes about this with minimal input from me" experiment, I'd have called it out here too (as an experiment, I might go back prompt it to "think" about this).

If I ever find myself using this sort of tool for generating code in a work setting, this is exactly the sort of thing I'll be watching for.

Solidarity, Empathy, and Patience -- thinking about code reviews

6 min read

So I saw this video...

A chiral crystal

Death Stranding, along with its sequel, is my absolute favourite video game ever, and probably one of my favourite pieces of fiction ever too; I've watched and read a lot about the game (not to mention I've played both releases a ton too). A good few months back I was watching a video about the making of the first game and during the video, around the 27:20 mark, the narrator says the phrase:

the game's core values of solidarity, empathy, and patience

This stood out to me. There was something about that phrase and what it meant given my experiences in Death Stranding and Death Stranding 2; it spoke to me enough that I jotted it down and kept coming back to it and thinking about it. It was a phrase I couldn't get out of my head.

Around the same time I was also doing a lot of thinking about, and note-writing about, code reviews. Although I've been working in software development1 for a few decades now (I started in July 1989), I was quite late to the whole process of code review -- at least in the way we talk about it today. This mostly comes down to the fact that for a lot of my time I either worked in very small companies or I was the only developer around.

Given this, thinking about my own approach to reviews is something I've only really been doing for the past few years. I've made personal notes about it, read posts and articles about it, I've had conversations about it; my thoughts and feelings about it have drifted a little but seem to have settled.

The idea of that phrase from the Death Stranding video crept into this thought process, as I felt it nicely summed up what a good code review would look and feel like. Weirdly I also realised that, perhaps, the things I like and value about Death Stranding are also the things I like, value, and want to embody when it comes to code reviews.

One of the big selling points for me, with Death Stranding, is the asynchronous multiplayer aspect of it; the reason Kojima calls it a "Strand type game". I have my game, I have my goal, but other people can indirectly affect it, either doing work in their game that leaks into mine in a beneficial way, or by leaking into mine in a way that I have to clean up. There's something really satisfying about this asynchronous collaboration. That feels similar to the collective effort of working in a single repository, each person working on their own branch or PR, sometimes in tandem, sometimes in series, sometimes to the benefit of each other and sometimes in ways that block each other.

But that's not the point here. There's a similarity if I think about it, but I don't want to get too carried away on that line of thought. It's the phrase from the video I care about; it's the approach of involving solidarity, empathy and patience I want to think about more.

Solidarity

In the mountains

This, for me, is all about where you position yourself when you approach reviewing code. I sense things only work well if you view the codebase as something with common ownership. I've worked on and with a codebase where the original author invited others in to collaborate, but where they constantly acted as a gatekeeper, and often as a gatekeeper who was resistant to their own contributions being reviewed, and it was an exhausting experience.

I believe the key here is to work against a "your code" vs "my standard" approach, instead concentrating on an "our repository" view. That's not to say that there shouldn't be standards and that they shouldn't be maintained -- there should be and they should be -- but more to say that they should be agreed upon, mutually understood to be worthwhile, and that any callout of a standard not being applied is seen as a good and helpful thing.

The driving force here should be the shared intent, and how the different degrees of knowledge and experience can come together to express that intent. If a reviewer can see issues with a submission, with a proposed change or addition to the codebase, the ideal approach is to highlight them in such a way as to make it feel like we discovered them, not that I discovered them and you should sort it out. Depending on the degree of proposed change, this might actually be expressed by (if you're using GitHub, for example) using the "Suggested Change" feature to directly feed back into the PR, or perhaps for something a little more complex, or the offer to pair up to work on the solution.

Empathy

Sam and Fragile

As someone who has written a lot of code, and so written a lot of bugs and made a lot of bad design choices, I feel empathy is the easiest of the three words to get behind and understand, but possibly the hardest one to actually put into practice.

When you look at a PR, it's easy to see code, to see those design choices, and to approach the reading as if you were the one who had written it, assessing it through that lens. In my own experience, this is where I find myself writing and re-writing my comments during a review. As much as possible I try and ask the author why they've taken a particular approach. It could be, perhaps, that I've simply missed a different perspective they have. If that's the case I'll learn something about the code (and about them); if that isn't the case I've invited them to have a second read of their contribution. It seems to me that this benefits everyone.

I feel that where I land with this is the idea of striving to act less like a critic and more like a collaborator, and in doing so aiming to add to an atmosphere of psychological safety. Nobody should feel like there's a penalty to getting something "wrong" in a contribution; they should ideally feel like they've learnt a new "gotcha" to be mindful of in the future (both as an author and a reviewer). Done right the whole team, and the work, benefits.

Patience

Arriving at the plate gate

The patience aspect of this view of reviews, for me, covers a few things. There's the patience that should be applied when reading over the code; there's the patience that should be applied when walking someone through feedback and suggestions; and there's the patience needed by the whole team to not treat code reviews as a speed bump on the road to getting work over the line. While patience applies to other facets of a review too, I think these are the most important parts.

In a work environment I think it's the last point -- that of the team's collective patience -- that is the most difficult to embody and protect. Often we'll find ourselves in a wider environment that employs a myopic view of progress and getting things done, where the burn-down chart for the sprint is all that matters. In that sort of environment a code review can often be seen, by some, as a frustrating hurdle to moving that little card across that board. Cards over quality. Cards over sustainability.

It's my belief that this is one of those times where the phrase "slow down to speed up" really does apply. For me, review time is where the team gets to grow, to own the project, to own the code, to really apply the development and engineering principles they want to embody. Time spent on a review now will, in my experience, collectively save a lot more time later on, as the team becomes more cohesive and increasingly employs a shared intuition for what's right for the project.

This is not (entirely) a post about code reviews

Near the wind farm

The thing with code reviews, or any other team activities, is they don't exist in a vacuum. They take on the taste and smell of the culture in which they operate. It's my experience that it doesn't matter how much solidarity, empathy or patience you display during your day-to-day, if it's counter to the culture in which you work it's always going to be exhausting, it's always going to feel like a slog.

If leadership in technology, in software engineering, were to show more of these three basic qualities, they'd start to appear like they realise that they're working with actual humans, not producers of code; and I think we need more of that now than at any time in the history of coding.

Since I first saw that video, and heard that phrase, and had it run around my head, I've come to feel that it's not just a good mantra for code reviews; I think it's a simple blueprint for what good tech leadership should look like. If there was more "Strand-type" leadership in my chosen industry I feel it would be more open, more accessible, would offer more psychological safety and ultimately would result in teams, and projects, that thrive.


  1. Or software engineering, if you prefer, but that's a whole other blog post I'll never get round to writing one day.