git2gantt -- Simple tool to visualise coding runs

Posted on 2019-12-08 13:44 +0000 in Python • Tagged with Python, documentation • 3 min read

At the start of this year, as part of a much bigger process to review the work that had taken place over the previous 12 months, I was asked (at work) to provide some information about how much time I'd spent on various projects. Now, for me, there's really only one project, but there's lots of different tools and libraries that I've written to support the main work I do. All of these are split into different repositories in the company-internal instance of GitLab. This meant that getting a rough idea of what I was working on and when would be easy enough -- it's all there in the commit history.

Given that this information would make up a couple of slides at most during a far bigger presentation, I wanted something that would be snappy and easy for non-developers to follow and understand. I spent a bit of time pondering some options and decided that (ab)using a gantt chart layout would make sense.

That choice was made all the more easier given that GitLab supports the use of mermaid charts within its Markdown. This meant I could quickly write some code that took the git log of each repository, turned it into mermaid code, and then render it (by hand, this was all about getting things done quickly) via GitLab.

This sounded like it could be a fun personal project. The result was some Python code called git2gantt.

As mentioned above, the output isn't anything too clever, it's just code that can be used to create a plot via mermaid. For example, running git2gannt over itself:

gantt
  title git2gantt output
  dateFormat YYYY-MM-DD

  section git2gantt
  Development: devgit2gantt20190208, 2019-02-08, 2019-02-13
  Development: devgit2gantt20190214, 2019-02-14, 2019-02-15
  Development: devgit2gantt20190303, 2019-03-03, 2019-03-04
  Development: devgit2gantt20191203, 2019-12-03, 2019-12-04

Usage is pretty straightforward: Screenshot 2019-12-08 at
13.18.12.png As you can see, it can be run over multiple repos at once, and there's also an option to have it consider every branch within each repository. Another handy option is the ability to limit the output to just one author -- perhaps you just want to document what you've done on a repo, not the contributions of other people.

Also especially handy, if you don't want to bore people with too much detail, is the "fuzz" option. This lets you tell git2gannt how relaxed you want it to be when it tries to decide how long a run of work on a repo lasted. So, perhaps, you're working on and off on a library that supports some other system you're documenting, but you might only be making changes every other day or so. With the correct fuzz value you can make it clear you were working on the library for a couple of weeks, despite there only being a commit every other day.

An example of running the output over a handful of projects would look something like this:

Screenshot 2019-12-08 at 13.34.41.png

This is one of those tools I knocked up quickly to get a job done, and haven't quite got round to finishing off fully. One thing I'd really like to do is add mermaid support directly within it, so that it actually has the option to emit plots, not just mermaid code (or, perhaps, drop the mermaid approach and use something else entirely).

Meanwhile though, if you're looking for something quick and dirty that will help you visualise what you've been working on and when for a good period of time... perhaps this will help.


Being phony, and Lispy regular expressions

Posted on 2019-12-01 16:42 +0000 in Emacs • Tagged with Emacs, Lisp, Emacs Lisp • 2 min read

While it does seem that they're a little out of fashion these days, in some circles anyway, I'm still an avid fan of make and make files. Even in environments where I don't need a Makefile to actually build anything, I'll use one (or more) to help create handy shortcuts for getting stuff done.

Looking at the main Makefile for one of my major work projects, there's 45 targets that help fire off various jobs (all of them self-documenting using a variation on an approach I read a while back).

In most cases the targets aren't real targets. That's to say, they don't build the thing they're called. They are phony targets. So, as makes sense, I make a point of marking them all as such. I follow the convention that has the .PHONY marker appear on the line before the target; this feels cleaner to me and easier to follow and maintain.

But.... I'm lazy. And I use Emacs. Typing out .PHONY foo all the time feels like far too much work. So, some time ago, I quickly threw together make-phony.el.

With this I could be really lazy. I could type out the Makefile target and then, with my cursor on it, press a key combination and have the .PHONY marker put in place.

Does it save much time? Yeah, probably not really. But it was a fun little exercise and an excuse to write a little bit of Emacs Lisp.

There's one thing I made a point of doing in the heart of this too: using rx. For anyone who doesn't know of it, think of it as a very Lispy way of writing regular expressions. I won't even try and explain it all here because others have done an excellent job already. What I will do is say this: if you're in the habit of writing some Emacs Lisp, or even tinkering with your configuration, and you find yourself writing a regular expression, consider looking at rx -- it's well worth the time to get to know it.

Slowly, as time goes on, I'm weeding out "vanilla" regular expressions from my config and code and moving over to using rx. I feel, quite rightly I think, that something like this:

(rx
 (or
  ;; Ignore hidden files.
  (group bol ".")
  ;; I never want to edit the desktop.
  (group "Desktop/" eol)
  ;; Ignore compiled files.
  (group "." (or "pyc" "elc") eol)
  (group ".egg-info/" eol)))

is much easier to write, read and maintain, than this:

"\\(^\\.\\)\\|\\(Desktop/$\\)\\|\\(\\.\\(?:\\(?:\\(?:el\\|py\\)c\\)\\)$\\)\\|\\(\\.egg-info/$\\)"

I mean, even if the regular expression above can be written in a more efficient way (and I imagine it can), as someone working in a Lisp environment, I'd much sooner write and work with the rx version.


EmacsConf 2019 videos are out

Posted on 2019-11-24 17:17 +0000 in Emacs • Tagged with Emacs • 1 min read

At the start of this month I spent a very enjoyable Saturday afternoon watching EmacsConf 2019. If you missed it, or even if you didn't, you might like to know that the announcement has gone out that almost all of the videos are now available for watching.

While I mean nothing negative about any other video, I think the two that stand out the most for me, in terms of things that grabbed my interest, were How a Completely Blind Manager/Dev Uses Emacs Every Day and Emacs as my Go To Script Language.

I seem to recall that both those talks were hit with technical issues (many talks on the day were -- part of the enjoyment for me was how the whole conference was an experiment in using Free Software to host the whole thing), but they're well worth a watch despite this.

If you have any sort of interest in Emacs I'd highly recommend dipping into the whole collection.

I really hope there's going to be a 2020 EmacsConf.


Visual evolution of ~/.emacs.d

Posted on 2019-11-23 14:42 +0000 in Emacs • Tagged with Emacs, Lisp, Emacs Lisp • 2 min read

As detailed in a blog post I wrote back in 2016, I first got into using Emacs in the mid 1990s, starting with it on OS/2 and then moving over to GNU/Linux. It's been my often-used and much-loved development environment for most of those years (I even have a couple of packages that are part of Emacs itself).

For most of that time my configuration was a single ~/.emacs file, which was around 1,000 lines in length (including comments and whitespace). It'd grown over the years, having special configuration sections for versions of Emacs I didn't use any more, and operating systems I didn't work on any more (yes, really, there were things in there specific to MS-DOS, for example). On top of that I always hand-installed packages I used -- Emacs' package management system having turned up long after I first got into using Emacs.

Then, in early 2016, I decided to nuke the whole thing and start from scratch. As mentioned above, the start of this is detailed in an older post. Another big round of changes happened round a year later -- which included the birth of delpa to manage my personal packages. A couple or so months later there was one last big round of changes, mostly killing off my enthusiastic embracing of customize and instead going back to hand-set settings, only this time done via use-package.

The full history of this can be found over on GitHub, starting with the first "throw everything away and start again" process and all the steps between then and where my Emacs configuration is now.

Which brings me to the fun part of this blog post. Earlier this week I stumbled on Gource. It's a tool that's primarily designed to visualise changes in repositories, although it can be used to visualise anything that has a tree structure and changes over time (this week I produced a video of the growth of my employer's electronic lab notebook by hooking up the Benchling API with Gource, for example). So I got curious. What did it look like as I reworked and tweaked and changed and tinkered with my Emacs configuration?

This is what it looked like:

Visit YouTube


Getting started

Posted on 2019-11-17 11:36 +0000 in Coding • Tagged with Coding, learning • 2 min read

By coincidence, in a couple of different places over the last couple of weeks, the subject of "how do I progress in leaning to program?" has cropped up. For me, I think the approaches and solutions tend to be the same for when I want to get my head around a new language: read good examples of idiomatic code, read other related materials, find a problem you care about and implement a solution (ideally something you'll directly benefit from, or at least others may benefit from). Hence the 5x5 puzzle and Norton Guide reader projects I mentioned in my previous post.

Of course, not everyone has problems that they need solving in a way that would work for this approach. So another approach I've recommended in the past is to go looking on somewhere like GitHub and find projects that promote "low-hanging fruit" issues in a way that's designed to be friendly for those who are new to development, new to contributing or new to the problem domain.

While looking for examples of this yesterday I stumbled on Awesome for Beginners. This looks like a great list and one I'm going to keep bookmarked for future reference. Now, this particular list does seem to have an emphasis on pulling in people who are new to contributing to a project rather than new to development, but it does strike me as a good place to start looking no matter where you're coming from.

I know I'm going to start having a wander around that list. It's always nice to contribute and I feel there's real personal benefit in actively solving a problem that someone else has and welcomes help with.


Going on a journey

Posted on 2019-11-10 14:32 +0000 in Python • Tagged with Python • 3 min read

It's hardly a revelation to say that learning a new programming language, or even learning software development at all, is even more difficult if you don't have an actual problem to solve. I know I'm not alone in having pet projects that, when faced with a new environment, I'll code up a version of that project as a way to get familiar with and understand a language's idioms while implementing something I know well.

Personally, my two favourites are a puzzle called 5x5 (here, here, here, here, here, here and here), and writing a library or even a full application to read Norton Guide database files (here, here, here, here, here, here, here and here). Both are fun to work on, have practical uses, and both have the benefit of being solved problems (for me) that let me concentrate on the "how do I do X in this language/toolkit/environment/framework/etc?".

Even with those two as my goto projects, I'm always open to new small problems that might be fun to apply to languages I do know, or languages I want to get to know (internally at work we have a fun "league" of sorts, writing a particular hamming distance calculation tool in different languages, for example).

A few days ago, via this repo on GitHub, I discovered this fun little problem. Right away I could see the benefit in it. As a "go away and code up a solution" interview question it strikes me as near-perfect. It's obviously not hard to solve, but it touches on some basic but important aspects of software development and so will allow the developer to show off how they approach things.

There's so many different approaches to it too. Even in a single language, I could imagine having some fun writing the smallest code to solve the problem, the most idiomatic code to solve the problem, the most supportable and well-documented code to solve the problem, etc. And then there's the thing I talk about above: knowing the solution and knowing it's easy, you can then use it to learn the idiomatic way of solving the problem in new languages.

Even better, the README of the original repo links to solutions others have written. Knowing the problem, and knowing the solution, you can then go and read other people's code and learn something about different styles and different languages.

Over the next few weeks, as I get free time, I think I might just do this. Take the "Journeys" problem and write versions in different languages I work with, or know, and also use it to get to know languages I've yet to know or use heavily (I'm especially keen to try a version in Julia -- a language I really like the look of and want to find a reason to use).

Meanwhile, yesterday, I had a quick go at a first version in Python (aimed at Python 3.8 or higher): https://github.com/davep/journeys.py

I set out to try and write something that was fairly idiomatic Python, which uses tools that I tend to employ when working on Python projects (pipenv, make, etc), and which also used something I've never quite found a need for so far in my usual coding, but which I can see being useful and helpful.

I even threw in a couple of uses of PEP 572!

I can see me tinkering with this some more over the next few days. I can even see me writing a very different implementation in Python, just for the fun of it.

I think that's what I like about this little problem. It's a good way to do a bit of programming exercise; it's like the perfect way to do the programming equivalent of going for a short run.


My Pylint shame

Posted on 2019-11-04 20:39 +0000 in Python • Tagged with Python, fish • 3 min read

I first got into Python in the mid-to-late 1990s. It's so far back that I think the copy of Programming Python that I have (sadly in storage at the moment) might be a first edition. I probably fell out of the habit of using Python some time in the early 2000s (that was when I met Ruby). It was only 22 months ago that I started using Python a lot thanks to a change of employer.

As you might imagine, much had changed in the 15+ years since I'd last written a line of Python in anger. So, early on, I made a point of making Pylint part of my development process. All my projects have a make lint make target. All of my projects lint the code when I push to master in the company GitLab instance. These days I even use flycheck to keep me honest as I write my code; mostly gone are the days where I don't know of problems until I do a make lint.

Leaning on Pylint in the early days of my new position made for a great Python refresher for me. Now, I still lean on it to make sure I don't make daft mistakes.

But...

Pylint and I don't always agree. And that's fine. For example, I really can't stand Pylint's approach to whitespace, and that is a hill I'll happily die on. Ditto the obsession with lines being no more than 80 characters wide (120 should be fine thanks). As such any project's .pylintrc has, as a bare minimum, this:

[FORMAT]
max-line-length=120

[MESSAGES CONTROL]
disable=bad-whitespace

Beyond that though, aside from one or two extras that pertain to particular projects, I'm happy with what Pylint complains about.

There are exceptions though. There are times, simply due to the nature of the code involved, that Pylint's insistence on code purity isn't going to work. That's where I use its inline block disabling feature. It's handy and helps keep things clean (I won't deploy code that doesn't pass 10/10), but there is always this nagging doubt: if I've disabled a warning in the code, am I ever going to come back and revisit it?

To help me think about coming back to such disables now and again, I thought it might be interesting to write a tool that'll show which warnings I disable most. It resulted in this fish abbr:

abbr -g pylintshame "rg --no-messages \"pylint:disable=\" | awk 'BEGIN{FS=\"disable=\";}{print \$2}' | tr \",\" \"\n\" | sort | uniq -c | sort -hr"

The idea here being that it produces a "Pylint hall of shame", something like this:

  12 wildcard-import
  12 unused-wildcard-import
   8 no-member
   6 invalid-name
   5 no-self-use
   4 import-outside-toplevel
   4 bare-except
   2 unused-argument
   2 too-many-public-methods
   2 too-many-instance-attributes
   2 not-callable
   2 broad-except
   1 wrong-import-position
   1 wrong-import-order
   1 unused-variable
   1 unexpected-keyword-arg
   1 too-many-locals
   1 arguments-differ

To break the pipeline down:

rg --no-messages "pylint:disable="

First off, I use ripgrep (if you don't, you might want to have a good look at it -- I find it amazingly handy) to find everywhere in the code in and below the current directory (the --no-messages switch just stops any file I/O errors that might result from permission issues -- they're not interesting here) that contains a line that has a Pylint block disable (if you tend to format yours differently, you'll need to tweak the regular expression, of course).

I then pipe it through awk:

awk 'BEGIN{FS="disable=";}{print $2}'

so I can lazily extract everything after the disable=.

Next up, because it's a possible list of things that can be disabled, I use tr:

tr "," "\n"

to turn any comma-separated list into multiple lines.

Having got to this point, I sort the list, uniq the result, while prepending a count (-c), and then sort the result again, in reverse and sorting the numbers based on how a human would read the result (-hr).

sort | uniq -c | sort -hr

It's short, sweet and hacky, but does the job quite nicely. From now on, any time I get curious about which disables I'm leaning on too much, I can use this to take stock.


EmacsConf 2019

Posted on 2019-11-02 11:33 +0000 in Emacs • Tagged with Emacs • 1 min read

For anyone who doesn't know, today is the day for EmacsConf 2019. As you might gather from the name, it's a conference all about Emacs, the joys of Emacs, Emacs Lisp and other Emacs-related things.

Better still, it's an online conference (although there do appear to be a couple of related physical gatherings around the world).

I've got snacks and drinks in and no plans for my Saturday, and I hope to follow the whole thing from start to finish.

If Emacs is your thing, or knowing more about Emacs is your thing, be sure to check it out!


pydscheck -- A quick hack that keeps slowly growing

Posted on 2019-10-26 13:19 +0100 in Python • Tagged with Python, documentation • 3 min read

Something I always try to do when I'm coding is be consistent. I feel this is important. While people's coding standards may differ, I think different approaches are easier to handle if someone has been consistent with their style across all of their code.

This also stands for documentation too.

In my current position, I do a lot of Python coding, and one of the things I like about Python (there are things I don't like too, but that's not for now) is that it has doc-strings (just like my favourite language). I use them extensively, ensuring every function and method has some form of documentation, and generally I use Sphinx to generate documentation from those doc-strings.

Early on I was bothered by the fact that, just by the simple act of making typos, I wasn't keeping the form of the doc-strings consistent. And in this case it was a really simple thing that was bugging me. Normally, if I'm writing a single-line doc-string, I'll write like this:

def one_liner():
    """Here is a one-line doc-string."""

So far, so good. But, if the doc-string is a multi-liner, I prefer the ending quotes to be on a line of their own, like this:

def multi_liner():
    """Here is the first line.
    Here is another line.
    Here is the final line.
    """"

But, sometimes, by accident, I'd leave a doc-string like this:

def multi_liner():
    """Here is the first line.
    Here is another line.
    Here is the final line.""""

While it's really not a big deal, it would bug me and every time I found one like this I'd "fix" it.

Eventually, it bugged me enough that I decided I was going to write a little tool to find all such instances in my code and report them. My first approach was to think "I could just do this with some regexp magic", which was really a bad idea. Then I though, I know, I should use this as an excuse to to play with Python's ast library.

That worked really well! I had the first version of the code up and running in no time. It was simple but did the job. It ran through Python code I threw at it and alerted me to both missing doc-strings, and doc-strings with the ending I didn't like.

That served me for a while, until one day I realised that it wasn't quite doing the job correctly; it was only really looking at top-level functions and top-level methods in classes. Sometimes, not often, but sometimes, I'll define functions within functions, and I feel they deserve documentation too. So then I modified the code to ensure it walked every part of the AST.

Since then, when I've run into new things and had new ideas, pydscheck has grown and grown. I've added checks that all mentioned parameters have a type; I've added checks that any function/method that returns something actually documents the return value; I've added checks that any documentation of a returned value includes its type; I've added checks that any function or method that yields a value documents that fact; I've added checks that ensure that every parameter is documented in some way.

Each time I've done this it's helped uncover issues in my code's documentation that could be cleaner, and it's also given me a pet project to slowly better understand Python's AST.

It could be that there are better tools out there, I'd have thought that a good doc-string linting tool would be something someone had already written. But this time around I was happy to NIH it because I needed a fun learning exercise that would also have some benefits for my day-to-day work.

I'll caveat this with the fact that it's very particular to how I work and how I like my documentation to look, but if it sounds useful, here it is: https://github.com/davep/pydscheck.

There's still lots I could do with it. First off I should really properly package it up so it can be installed as a command line tool via pip. Other things that would be handy would be to allow some form of customisation of how it works. I'm sure there's other fun things I can do with it too.

That's part of the fun of having a pet project: you can tinker when you like and also get benefits from it as you use it.


Why I really like fish abbreviations

Posted on 2019-10-23 00:18 +0100 in Tech • Tagged with fish, shell • 4 min read

I'm filing this as a TIL because, while it wasn't T, I did L it very recently and it was a new trick that impacted on around 25 years if prior working practice.

I think it must have been around 1991 when I first encountered 4DOS. While I'd used the odd Unix shell here and there previously, it'd only been in passing. It was 4DOS that first introduced me to the power of aliases on the command line. Many of the aliases I set up and used in 4DOS still remain with me to this day, on GNU/Linux and macOS, in some form or another.

I'm sure I don't need to tell anyone reading this why aliases and cool and handy and pretty much vital if you do lots of work on the command line.

And then, a couple or so weeks ago, as a very recent convert to fish, I discovered the abbr command. At first glance it didn't seem to make much sense. It was like alias, only it expanded what you typed rather than acted as a command in its own right.

I did a bit of digging and some of it started to make sense. One thing that really won me over -- and while it's something that doesn't directly impact on me -- was the argument that it allows for a far more transparent command history; especially if you're likely to use a transcript of a shell session in a place where people might not know or have access to your aliases.

Imagine being in a position where you have loads of handy and cool aliases, but you also need to record what you've done so other people can follow your work (does it show that I sit amongst people who maintain lab notebooks?); it seems like it would be a bit of a bother needing to record all of the aliases in your own work environment up front. Without that information few people will be able to make sense of the recorded commands, with that information they'd still need to double-check what each command does.

So imagine an alias that, when used, expands in place. Then you'd get all of the benefit of aliases while also having a full and readable record of what you actually did.

Seems neat!

Here's a silly example. For a long time I've carried around an alias called greedy that runs something like this:

du -hs * | sort -rh

It's pretty straightforward: I'm using du to get a sense of which directories are using what space, and then using sort to make a worst-to-best-offender list out of it. So I could use an alias:

alias greedy="du -hs * | sort -rh"

The only downside to this is that, any time I run it, if I were to record the shell session and make it available for someone else to read, they'd just see:

~/develop$ greedy
1.1G    JavaScript
824M    C
699M    rust
 93M    python
 33M    fonts
 33M    elisp
3.4M    zsh
3.0M    misc
1.1M    bash
840K    ocaml
428K    C++
316K    lisp
172K    Swift
152K    git
132K    ruby
 28K    ObjC

Now, with an abbreviation rather than an alias, I'd type greedy but as soon as I hit Enter it'd get expanded to something anyone could read and follow:

~/develop$ du -hs * | sort -rh
1.1G    JavaScript
824M    C
699M    rust
 93M    python
 33M    fonts
 33M    elisp
3.4M    zsh
3.0M    misc
1.1M    bash
840K    ocaml
428K    C++
316K    lisp
172K    Swift
152K    git
132K    ruby
 28K    ObjC

This is far from the only benefit of abbreviations; for most people it probably isn't one of the most important ones, but I find it neat and compelling and this alone drove me to rework almost all of my aliases as abbreviations.

Having done that, I get other benefits too. For example, fish (like other shells) has good support for argument completion for well-known commands. The problem is, if you alias such a command, you don't get that completion. With an abbreviation though you do! All you need to do is type the abbreviation, hit space and it'll expand to the underlying command and then the full range of completion can happen.

There's also one last reason why I like abbreviations over aliases, and it's kind of a silly one, but in a good way. It's actually fun to see what you type magically expand as you do things, it makes you look like you can type even faster than you normally can! ;-)

PS: If you've never tried fish before and you're curious, it's easy to try in your browser.