Reading 2bit files (for fun)

Posted on 2020-08-30 15:20 +0100 in Coding • Tagged with Bioinformatics • 6 min read

Introduction

I've written a bit before about the value of having simple but interesting "problems", that you know the solution to, as a way of exercising yourself in a new environment. Recently I've added another to the list I already have, and I used it as an excuse to get back into writing some Common Lisp; and then went on to use it as a reason to write yet another package for Emacs.

Having gone through the process of writing code to handle 2bit files twice in about a month, and in two very similar but slightly different languages, I thought it might be interesting for me to then use it to exercise my ability to write blog posts (something I always struggle with -- I find writing very hard on a number of levels) and especially posts that explain a particular problem and how I implemented code relating to that problem.

Also, because the initial version of this post rambled on a bit too much and I lost the ability to finish it, I'm starting again and will be breaking it up a number of posts spanning a number of days -- perhaps even weeks -- so that I don't feel too overwhelmed by the process of writing it. I will, of course, make sure every post links to the other posts.

Now, before I go on, I'll make the important point that everything here is written from the perspective of a software developer who happens to work as part of a bioinformatics team; I don't do bioinformatics, I don't claim to understand it, I just happen to sit with (well, used to sit with them -- hopefully we'll all make it back to the office one day!) and work with them, and develop software that supports their work. Anything you see in any of the posts that's wrong about that subject: that's just my ignorance being shown through the lens of a software developer (all corrections are welcome).

So, with all those disclaimers aside, I'm going to go on a slow wander through what a 2bit file is, how you'd go about reading it, and related issues. This isn't designed as a tutorial or anything like that, this is simply me taking what I've learnt and writing it down. Perhaps someone else will benefit one day, but the purpose is to simply enjoy cementing it in my own mind and to enjoy the process of putting it all in writing.

What is a "2bit file"?

So what's this new "problem" I've added to my list? It's code to read sequence data from 2bit format files. For anyone who doesn't know (bioinformatics people look away now; a software developer is going to explain one of your file formats), this is a file format that is intended to hold sequences in an efficient way. As I'm sure you know, DNA is made up of 4 bases, represented by the letters T, C, A and G. So, in the simplest case, we could just represent a genome using those four letters. Simple enough, right? Nice big text file with just those 4 letters in?

The thing is, something like the human genome is around 3 billion bases in length. That'd make for a petty big file to have to store and move around. So why not compress it down a bit? That's where the 2bit format comes in.

Given this problem I'm sure most developers would quickly notice that, given 4 different characters, you only need 2 bits to actually hold them all (two bits gets us 00, 01, 10 and 11, so four different states). This means with a little bit of coding you can store 4 bases in a single byte. Just like that you've pretty much squished the whole thing down to 1/4 of the original size. And that's more or less what the 2bit format does. If you take a look at the actual data for the human genome you'll see that hg38.2bit is roughly 1/4 of 3 billion bytes, ish, give or take.

There is a wrinkle, however. There are parts of a genome where you might not know what base is there. Generally an N is used for that. So, actually, we want to be able to store 5 different characters. But 5 isn't going to go into 2 bits... Damn! Well, it's okay, 2bit has a solution to that too, and I'll cover that later on.

How is a 2bit file formatted?

As you can see from the format information available online, a 2bit file is a binary file format that is split into 3 key parts:

  • A fixed size header with some key information
  • An index into the rest of the file
  • A series of records that contain actual sequence information

In this first post I'll cover the details of the header. Subsequent posts will cover the index and the actual sequence data records.

The header

The header of a 2bit file is fixed in size and contains some key information. It can be broken down as follows:

Content Type Size Comments
Signature Integer 4 bytes See below for endian issues.
Version Integer 4 bytes Always 0.
Sequence count Integer 4 bytes
Reserved Integer 4 bytes Always ignored.

The signature value is used to test if what you're looking at is a 2bit file, but also tells you some vital information about how to read the file -- see below for more on that. The version value is always 0 -- as such another useful test would be to error out if you get a valid signature but get a version other than 0. The sequence count is, as you'd guess, the number of sequences that are held within the file -- this is important when loading in the index of the file (more on that in the next post).

The signature, big and little endianness, and byte swapping

The header mentioned above comprises of 4 32-bit word values. The very first word is important to how you read the rest of the file. This is the signature for the 2bit file and it should always be 0x1A412743. And this is where it gets interesting and fun right away. The 2bit file format allows for the fact that the file can be built in either a little-endian or a big-endian machine, and the 32-bit word values can be binary-written to the file in the local architecture's byte order. The effect of this is that, from reading the very first value in the file, you need to decide if every other numeric value you read needs to be byte-swapped in some way. The early logic being (in no particular language) something like:

if signature == 0x1A412743 then
  must_swap = False
else if byte_swap( signature ) == 0x1A412743 then
  must_swap = True
else
  raise "This isn't a valid 2bit file"
end

Simply put, to read the rest of the file you will need a function that byte-swaps a 32bit numeric value, and a flag of some sort to mark that you need to do this every time you read such a value. Of course, depending on your language of choice, you could do it in a number of different ways. In a language like JavaScript or Scheme, where you can easily throw around functions, I'd probably just assign the appropriate 32bit-word-reading function to a global function name and call that regardless throughout the rest of the code. In other languages I'd probably just check the flag each time and call the swapping function if needed. In something like Python I'd likely just use the signature to decide on which format to pass to struct.unpack. For example, some variation on:

# Assuming that 'header' is the whole header of the file read as a binary buffer.

word_fmt = ""

for test_fmt in ( "<I", ">I" ):
    if struct.unpack( test_fmt, header[ 0:4 ] )[ 0 ] == 0x1A412743
        word_fmt = test_fmt
        break

if not word_fmt:
    raise Exception( "This isn't a 2bit file" )

Now, the Python approach sort of hides the important detail here. With it we'd simply use struct.unpack's ability to handle different byte orders and not worry about the detail. Which isn't fun, right? So how might code to byte-swap a 32bit value look?

Assuming you've got the value as an actual numeric integer, it can be as simple as using a bit of bitwise anding and shifting. Here's the basic code I wrote in Common Lisp, for example:

(defun swap-long (value)
  "Swap the endianness of a long integer VALUE."
  (logior
   (logand (ash value -24) #xff)
   (logand (ash value -8) #xff00)
   (logand (ash value 8) #xff0000)
   (logand (ash value 24) #xff000000)))

JavaScript might be something like:

function swapLong( value ) {
    return ( ( value >> 24 ) & 0xff       ) |
           ( ( value >>  8 ) & 0xff00     ) |
           ( ( value <<  8 ) & 0xff0000   ) |
           ( ( value << 24 ) & 0xff000000 )
}

and other variations on that theme in different languages.

Up next

In the next post I'll write about how the sequence index is stored and how to load it, including some considerations about how to make the loading as efficient as possible.


The PEP 8 hill I will die on

Posted on 2020-08-23 16:54 +0100 in Python • Tagged with Python • 3 min read

I first learnt Python back in the mid-to-late 90s, used it in place of Perl once I was comfortable with it, and then we sort of drifted apart when I first met Ruby. It's only in the last couple of years that I've got back into it, and in a huge way, thanks to my (not-quite-so-) new job. Despite the quirks and oddness (as I perceive them), I actually quite like Python and it's one of those languages that just flows off my fingers. I'm sure you know the same thing, perhaps not with Python, but there will be languages that just flow for you, and those that take a bit more effort and concentration. Python... feels okay to me.

I also appreciate that there's been a long-standing style guide. I quite like PEP 8 as a read, and think there's a lot of good ideas in there; much of the content sits with how I'd approach things if I was tasked to come up with such a document. With this in mind, I'm a fairly heavy user of pylint and it in turn leans on PEP 8 (amongst other things) and I'm happy to accept most of its judgements. Not all of its judgements, but even when I disagree with it I try and keep track of how far I'm drifting.

But there is absolutely one hill I will happily die on when it comes to PEP 8: the concept of "extraneous whitespace" in lists and expressions. Just.... no! Oh gods no!

To borrow a line of code from the journey problem I dabbled with a while back, PEP 8 would have me write something like this:

def perform(commands: List[str],state: State) -> State:

Now, I'm sure plenty of people won't see a problem with this at all; but all I can see is an almost-claustrophobic parameter list. What's with the parameters being jammed up against the opening and closing parens? Why have the dinky little comma lost between two different things? Why have it look like a long stream of letters and punctuation? Why....

No.

Just no.

I can't.

Rightly or wrongly, I just need for the code to breathe a bit. When I type this:

def perform( commands: List[ str ], state: State ) -> State:

suddenly if feels like there's fresh air in the code, like it flows gently out of my head, off my fingers, through the keyboard and into the buffer.

In my head, and to my eyes, the code is.... relaxed.

Do I have a rational reason for this? Nope. Then again I don't see one for doing it the other way either; I can't think of one and I don't see one in the source document. So, that's a warning I always turn off with pylint and it's a style I carry through all my Python code; and I think that's the important point here: anyone reading and working with my code should see the same style all the way through. It might differ from PEP 8 on this point, but at least it's the same all the way.

And, really, that's okay: PEP 8 is there to be ignored. ;-)

PS: This is a small part of another blog post I was meaning to write, and might still do, about my (still ongoing) experience of getting lsp-mode up and running in Emacs and having it play nice with Python projects. I have that working, but it was a bit of a learning curve and epic battle over a couple of days, and one that had me first encounter pycodestyle. I may still tell the tale...


Swift TIL 12

Posted on 2020-07-18 10:16 +0100 in TIL • Tagged with Swift, Apple • 3 min read

First a small aside: To be honest, the T part of TIL is getting to be less and less true with this series of posts, but the posts themselves serve a useful purpose for me. As I've been reading the book I'm working through, I've also been making notes on my iPad in the Apple Notes application. I'm not really convinced that that's the best final location for such notes, so early on I made an extra step in keeping track of what I'm doing and trying to reinforce what I'm learning: transfer the notes into Org-Mode documents in my notebook repository.

This repository contains lots of Org-Mode files, broken down into subject directories, that hang on to longer-form information I want to keep track of regarding software development and general operating system use. I'm sure you know the sort of thing, the things where you know you know them but you can't always retain all the detail -- so having the detail where you know you'll find it is useful.

So I've made that part of the process: read book, find thing worth noting, note in Apple Notes, a wee while later re-read and test out by writing sample code and transfer to the Org-Mode notebook. During that latter step I sometimes also write up something that I really liked or found interesting here to further reinforce the learning process.

The "TIL" I wanted to note quickly today is how happy I was to see that Swift has something that's a reasonably recent addition to Emacs Lisp: conditional binding. Skipping off into Emacs Lisp for a moment, it would be very common for me to find myself writing something like this (this actually happens in all languages really):

(let ((foo (get-something-from-elsewhere)))
  (when foo
    (do-something-with foo)))

Quite simply: I'd get a value from elsewhere, that value could possibly be nil to mark that there was a failure to get a value, but that failure wasn't in any way fatal or even a problem worthy of note: I just needed to skip along. But that binding followed by the test was always a little jarring. And then, back in 25.1, if-let and when-let got added (of course, this being Lisp, it would have been very simple to add them anyway), and it was easier to write the code so it looked just a little nicer:

(when-let ((foo (get-something-from-elsewhere)))
  (do-something-with foo))

It's a small difference, but I find it a pleasing one.

So of course I was pleased to find that Swift has something similar with if and Optional values, where you can write something like:

if let foo = getSomethingFromElsewhere() {
    // Do something with foo but only if it's not nil
}

Which means you can do things like this (not that I'd really do things like this, but it was a handy test on the command line):

func oddRand() -> Int? {

    let n = Int.random( in: 0...99 )

    if n % 2 == 0 {
        return nil
    }

    return n
}

for _ in 0...10 {
    if let n = oddRand() {
        print( n )
    } else {
        print( "Nope" )
    }
}

As usual... of course that's horrible code, it was just thrown together to test/experience the language feature on the command line.

I like it though. I figure all the best languages have conditional binding. ;-)


Swift TIL 11

Posted on 2020-07-11 21:34 +0100 in TIL • Tagged with Swift, Apple • 2 min read

This is one of those things, especially this little play to help appreciate the feature, that I'm filing under "kinda cool, but I am never doing this in production".

So Swift has operator overriding, and then some. Moreover, operators are, in effect, functions, just with some extra syntax support. None of this is new to me, I've worked in and with other languages that have this ability, expect... I don't ever recall working in a language that, at the time I did, supported creating brand new operators (okay, fine, Lisp is a bit of an outlier here and there's all sorts of fun conversations to be had there -- but still, let's stick with more "conventional" syntax here). Support always seemed to be about extending the ability of an existing operator.

Swift though... yeah, you get to pick from a huge character space when it comes to creating operators.

Which got me thinking... How cool would it be to have a prefix operator that turns a floating point number into a currency-friendly number (you know, the sort of number type that can be used for currency-friendly calculations).

Swift has the decimal type which, at first glance anyway, looks to be a sensible candidate; even if it isn't (and, really, how to actually sensibly handle currency is a whole series of blog posts in their own right, that I have no wish to write myself because such things are a previous working life for me, and other people have doubtless done a very comprehensive job elsewhere over the years) it will serve as a good stand-in for the little bit of horror I'm going to write next.

So... Let's say we want to use £ as a prefix operator to say "see this number? make it a decimal", we could do something as simple as this:

import Foundation

prefix operator £
prefix func £( n : Decimal ) -> Decimal { n }

print( £1.20 + £0.75 + £0.01 )

Horrifically and delightfully, it works:

$ swift run
[3/3] Linking opover
1.96

I know, I know, I feel bad for even trying. But it's also kinda cool that the language has this.

It gets better though...

While reading up on what characters can and can't be used as operators, one thing that stood out was the fact that, more or less, any character that isn't a valid identifier can be used as an operator. So... hang on, we can use "emoji" as identifiers?

Like this?

import Foundation

prefix operator £
prefix func £( n : Decimal ) -> Decimal { n }

let 💵 = £1.20 + £0.75 + £0.01

print( 💵 )

Why yes. Yes we can. 😈


When the man page fibs

Posted on 2020-07-10 20:58 +0100 in Coding • Tagged with homebrew, macOS, Unix, Python • 3 min read

Earlier this week something in my development environment, relating to Homebrew, Python, pyenv and pipenv, got updated and broke a handful of repositories. Not in a way that I couldn't recover from, just in a way that was annoying, got in the way of my workflow, and needed attention. (note to self: how I set up for Python/Django development on a machine might be a good post in the future)

Once I was sure what the fix was (pretty much: nuke the virtual environment and recreate it with pipenv, being very explicit about the version of Python to use) the next step was to figure out how many repositories were affected; not all were and there wasn't an obvious pattern to it. What was obvious was that the problem came down to python in the .venv directory pointing to a binary that didn't exist any more.

Screenshot 2020-07-10 at 20.21.15.png

So... tracking down problematic repositories would be simple enough, just look for every instance of .venv/bin/python and be sure it points to something rather than nothing; if it points to nothing I need to remake the virtual environment.

I quickly knocked up a script that was based around looking over the results of a find, and initially decided to use file to perform the test on python. It seemed to make sense, as I wrote the script I checked the man page for file(1) on macOS and sure enough, this exists:

-E On filesystem errors (file not found etc), instead of handling the error as regular output as POSIX mandates and keep going, issue an error message and exit.

Given that file dereferences links by default, that should get me an error for a broken link, right? Bit hacky I guess, but it was the first thing that came to mind for a quick bit of scripting and would do the trick. Only...

$ file -E does-not-exist
file: invalid option -- E
Usage: file [bcCdEhikLlNnprsvzZ0] [-e test] [-f namefile] [-F separator] [-m magicfiles] [-M magicfiles] file...
       file -C -m magicfiles
Try `file --help' for more information.

Wat?!? But it's right there! It says so in the manual! -E is documented right in the manual page! And yet it's not in the valid switch list as put out by the command, and it's an invalid option. The hell?

So I go back and look at the man page again and then I notice it isn't in the list of switches in the synopsis.

SYNOPSIS
file [-bcdDhiIkLnNprsvz] [--extension] [--mime-encoding] [--mime-type] [-f namefile] [-m magicfiles] [-P name=value] [-M magicfiles] file
file -C [-m magicfiles]
file [--help]

I then did the obvious tests. Did I have file aliased in some way? No. Was some other thing that works and acts like file in my path? No. Was I absolutely 100% using /usr/bin/file? Yes.

Long story short: it seems the man page for file, on macOS, fibs about what switches it supports; it says that -E is a valid option, but it's not there.

What's even odder is the man page says it documents v5.04 of the command, but --version reports v5.37. Meanwhile, if I check on a GNU/Linux box I have access to, it does support -E, reports it in the switches, documents it in the man page (in both the synopsis and in the main body of the page) and it is v5.25 (and so is its man page).

So that was something like 20 minutes lost to a very small problem, for which there was no real solution, but was time that had to be spent to get to the bottom of it.

In the end I went with what I probably should have gone with in the first place: stat -L.

for venv in $(find . -name .venv)
do
    if ! stat -L "$venv/bin/python" > /dev/null 2>&1
    then
        echo "$(dirname $venv)"
    fi
done

And now I have that script in my ~/bin directory, ready for the next time Homebrew and friends conspire to throw my day off for a while.


Helping myself change the default git branch

Posted on 2020-07-09 20:17 +0100 in Coding • Tagged with git • 2 min read

This is something I've being meaning to do for a couple or so years now, and unsurprisingly it's bubbled up again recently: the business of swapping the name of the master branch in git out for something better.

Because it's one of those jobs that's simultaneously simple and also laborious, I kept putting it off. Changing up the local configuration so that main (or whatever name you prefer) is used "out of the box" is simple enough; the laborious part is updating all of the repositories that live in the "forge of choice". In my case, over on GitHub, I have getting on for 200 repositories -- 142 of which are public (as of the time of writing). At work we use GitLab as our internal forge and I've got a non-trivial number of repositories on there too.

The obvious first step to tackling this is to knock up a little tool to help find the repos that still need swapping. That was simple enough:

#!/bin/bash

# Quick and dirty tool to find repositories that still make use of a
# "master" branch. Helps with tracking down the ones that need
# updating/improving.

for repo in $(find . -name .git)
do
    (
        cd "$(dirname $repo)"

        if git branch | grep master > /dev/null 2>&1
        then
            echo "$(dirname $repo)"
        fi
    )
done

### git-archaic ends here

It's not meant to be clever, just something I can run when I'm in a "default branch swapping" mood so find a repository or two to tackle. The idea being that it uses find to pull out any instance of .git in or below the current directory, changes to it (inside a sub-process to ensure the PWD gets put back after the cd that happens, before the next iteration of the loop), gets a list of the branches and, if master is one of them, prints the directory name.

Using this, I can now slowly work through my more active repositories and make the swap -- the idea being that if I currently have them cloned down to my machine, they're obviously some level of "active". At some point I imagine I could get more clever and use the APIs of the forges to look at all the repositories I own; that's another job for another day.

This gives me enough to be going on with. :-)


Swift TIL 10

Posted on 2020-07-05 15:27 +0100 in TIL • Tagged with Swift, Apple • 2 min read

My leisurely journey into getting to know Swift by reading and then making notes to myself in my blog continues, and this weekend I encountered defer. As I was reading about Swift I did keep wondering when something like try (it has try), catch (it has catch) and finally (it doesn't have finally, but...) might crop up. This weekend I hit the part of the book that covered this sort of thing.

Given Swift's apparent general reliance on not throwing errors but instead using Optional and nil to signify issues, it sort of came as no surprise that its approach to implementing something like try...finally is actually divorced from try. I'm not sure how I feel about it yet, but defer makes sense.

Here's an utterly useless bit of code that demonstrates how it works:

func add( _ n1 : Int, _ n2 : Int ) -> Int {

    defer {
        print( "Huh! We did some adding!" )
    }

    print( "About to do the add and then return" )
    return n1 + n2
}

print( add( 2, 2 ) )

When run, the output is this:

$ swift run
[3/3] Linking try-defer
About to do the add and then return
Huh! We did some adding!
4

A defer (and there can be multiple) is tied to the block that it's declared in, and is executed when the block exits. This is, of course, going to be really handy for things like resource-management where you don't want to be leaking something, although I can imagine a few other uses too (none of which are really going to be novel for someone who's coded in other languages with similar constructs).

What I find interesting about this is that one or more defer blocks can be declared at various locations within a block of code; this obviously makes sense in that you might not want to be handling the tidy-up of something you've not got around to creating yet. But there's also part of me feels uneasy about the "exit" code being declared further up the body of code, rather than down towards the end. On the other hand I think I do appreciate the idea of, up front, writing "look, any time there's a GTFO in the code that follows, this will happen" -- you're made aware pretty quickly of what to expect.

Anyway, it's good to know Swift has something similar to a finally.


Swift TIL 9

Posted on 2020-06-30 21:02 +0100 in TIL • Tagged with Swift, Apple • 2 min read

Some may be happy to know that the frequency of these little notes to myself may reduce real soon, as I'm sort of caught up with my notes, mostly, and I'm unlikely to pick up the book that I'm reading for a couple or so days now

I'm used to the idea of structures and classes from other languages, going way back, but as I first started reading about Swift it wasn't obvious to me why it had them both too. At first glance they appeared to be very similar. It was only as I got a little further into reading that I found out one huge difference (there are others, that pretty much stem from this one): objects created from classes are reference types (which you'd expect, of course), structures on the other hand are value types.

A quick illustration of the difference, in my head anyway, can be found if I go back to using an observer. Take this daft bit of illustrative code for example:

class Person {
    var name : String = "No name"
}

var dave = Person() {
    didSet {
        print( "dave was set" )
    }
}

dave.name = "Dave"

If I run that, I get no output. That makes sense. The variable dave is initialised to an instance of a Person but is never subsequently set to anything else. The following assignment is to a property of the object. We're working on a reference to an existing object.

But simply change Person to being a struct:

struct Person {
    var name : String = "No name"
}

var dave = Person() {
    didSet {
        print( "dave was set" )
    }
}

dave.name = "Dave"

and the output looks like this:

$ swift run
[3/3] Linking ref-vs-val
dave was set

That's because the assignment to dave.name means a new value is created.

This, of course, is just the tip of the iceberg; there are all sorts of other things to keep in mind that follow on from this, generally relating to mutability (or the lack thereof). I also imagine this means that, when there's no obvious benefit either way, the choice of using class vs struct is something that could have performance implications. That's something for me to look into more another day; but this is a note here to myself that it's a thing to keep in mind.


Swift TIL 8

Posted on 2020-06-29 22:48 +0100 in TIL • Tagged with Swift, Apple • 3 min read

Although I read up on it a few days back, it was only this evening that I fired up Emacs (well of course I'm testing Swift with Emacs, what did you think I'd do?!?) and dabbled with some code to really get a feel for how casting works in Swift.

Swift seems to be one of those languages that does it with a keyword rather than a non-keyword approach. The two main keywords being as? and as!. I kind of like how there's a polite version and a bossy version. Having the two makes a lot of sense from what I read too.

To try and illustrate my understanding so far (and do keep in mind I'm not writing this with the express purpose of explaining it to anyone else -- I'm writing this to try and retain it all in my head by working through it), here's a bit of utterly pointless and highly-contrived code that defines three classes in a fairly ordinary hierarchy:

class Animal {}

class Dog : Animal {}

class Cat : Animal {
    func beAdorable() {
        print( "Purrrrrr!" )
    }
}

So, so far, so good: we have animals, we have dogs which are a kind of animal, and we have cats, which are also a kind of animal, but they have the special ability of actually being adorable. 😼

Now, for the purposes of just testing all of this out, here's a horrible function that makes little sense other than for testing:

func adore( _ a : Animal ) {
    ( a as! Cat ).beAdorable()
}

Given an animal, it forces it to be a cat (by casting it with as!), and then asks it to be adorable (because, of course, cats always do as they're asked).

So, if we then had:

adore( Cat() )

we'd get what we expect when run:

Purrrrrr!

So far so good. But what about:

adore( Dog() )

Yeah, that's not so good:

~/d/p/s/casting$ swift run
[3/3] Linking casting
Could not cast value of type 'casting.Dog' (0x10a1f8830) to 'casting.Cat' (0x10a1f88c0).
fish: 'swift run' terminated by signal SIGABRT (Abort)

One way around this would be to use as?, which has the effect of casting the result to an Optional. This means I can re-write the adore function like this:

func adore( _ a : Animal ) {
    ( a as? Cat )?.beAdorable()
}

Now, if a can be cast to a Cat, you get an optional that wraps the Cat, otherwise you get an optional that wraps nil (hence the second ? before attempting to call the beAdorable member function).

So now, if I run the problematic dog call above again:

~/d/p/s/casting$ swift run
[3/3] Linking casting

In other words, no output at all. Which is the idea here.

I think I like this, I think it makes sense, and I think I can see why both as! and as? exist. The latter also means, of course, that you can do something like:

func adore( _ a : Animal ) {
    let cat = a as? Cat
    if cat != nil {
        cat!.beAdorable()
    } else {
        print( "That totally wasn't a cat" )
    }
}

which, in the messy dog call again, now results in:

~/d/p/s/casting$ swift run
[3/3] Linking casting
That totally wasn't a cat

Or, of course, the same effect could be had with:

func adore( _ a : Animal ) {
    if a is Cat {
        ( a as! Cat ).beAdorable()
    } else {
        print( "That totally wasn't a cat" )
    }
}

It should be stressed, of course, that the example code is terrible design, so given the above I'd ensure I never end up in this sort of situation in the first place. But for the purposes of writing and compiling and running code and seeing what the different situations result in, it helped.


Swift TIL 7

Posted on 2020-06-28 16:25 +0100 in TIL • Tagged with Swift, Apple • 1 min read

This post is very much a case of me writing it down to try and get it all straight in my head, and to make sure it sticks. The other day I was reading about Swift's types and type-equality checks, and as I'd expect from plenty of other languages I've worked with, there's a way for checking that two types are the same, such that super/subclasses aren't taken into account, and a way where they are. So, given this silly code:

class Animal {}

class Cat : Animal {}

print( Cat.self == Animal.self )          // False
print( Cat.self is Animal.Type )          // True
print( type( of: Cat() ) is Animal.Type ) // True

it's made clear that == checks for strict equality and a super/subclass relationship isn't taken into account. On the other hand is does take it into account.

Only... what's with this whole .self sometimes and .Type other times business? That took a little bit of writing code and playing to get comfortable with. Here's how I understand it now (and do feel free to correct me below if I'm way off):

Given the above code, Animal.Type is the type of a value that expresses the type of Animal. On the other hand, Animal.self is a value that is the type of an Animal. Yeah, I know, that still reads oddly. But written as code:

let feline : Cat.Type = Cat.self

I think it makes a lot more sense. And having got there I felt I better understood it. I'm not 100% sure I'm 100% with it, but I'm getting there.