08 May 2019

Map of Tubewhacks and Mackerel Facts

St John's Wood is the only London Underground station whose name contains none of the letters in the word "mackerel".

The above statement is one of the most famous nuggets of Tube knowledge. It the favourite trivia question of Victoria Coren Mitchell, host of Only Connect (a quiz show which, for context, once asked its contestants to find the fourth item in the sequence "Central = 1", "Circle = 2", "District = 3"). On TfL's "Art on the Underground: Labyrinth" entry for St John's Wood, the Mackerel Fact gets top billing, winning priority over the station's other claim to fame: the site of the Beatles' iconic road-crossing album cover.

The notoriety of the Mackerel Fact is driven, in large part, by its pointlessness. It's not a fact that will help you navigate the Tube. The word "mackerel" in itself doesn't have anything obvious to do with the London Underground.

And as many others have pointed out (see, for instance, this blog post) there isn't even anything particularly special about the link between St John's Wood and "mackerel". St John's Wood is the only station whose name lacks all of the letters "a", "e", "l" and "r", so you could get away with replacing "mackerel" with "real" – but "real" is a far less amusing word.

The Mackerel Fact has given rise to the practice of "Tubewhacking", finding words whose letters overlap with every Tube station except one, and many such words exist. Twitter's Tube Mackerel Bot has been tweeting examples of these words since 2017, working its way through the dictionary.

With a little computer assistance, I found "mackerel"-equivalents for 94 of the London Underground's 270 stations. These are mapped below:
Mackerel Map
Mackerel Tube Map
Needless to say, there were many stations with a large number of qualifying Mackerel-matches. For example, Woodford is the only station with none of the letters of "language", or of "intellectual", or of "physician". For these cases, I just picked one of the possible options.

Some stations get a Mackerel-match with other Tube stations. For example, Ruislip is the only Tube station which has none of the letters of "Camden Town". But Camden Town isn't the only station with none of the letters of "Ruislip". There is only one pair of stations which both match with each other: Perivale is the only station with none of the letters of "St John's Wood", and St John's Wood is the only station with none of the letters of "Perivale". So perhaps there is something special about St John's Wood after all!

(In case you were wondering, the fourth item in Only Connect's sequence of Tube lines is "Bakerloo = 4". Each line is numbered according to the points value of the snooker ball with the same colour as that line.)

03 May 2019

Shared Words Addendum

In the last post we looked at a Tube map connecting 173 of the stations of the London Underground according to the words shared between their names. There were 97 stations left over (with 95 names between them, because there are two "Hammersmith"s and two "Paddington"s): what should we do with those?

The straightforward solution would be to arrange all these stations on a "leftovers" line, in alphabetical order. But we can try something more elaborate: what if we arranged the stations according to how similar their names are?

Two things we need. First we need a procedure for working out how similar two stations' names are. Second we need a procedure for generating a made-up network from these similarity ratings.

Evaluating similarity

We could just look at every pair of stations and score how similar they sound, but this would be fairly subjective, as well as very time-consuming (with 95 station names there are 4465 pairs of names to rate). So instead I used SeatGeek's FuzzyWuzzy library, which algorithmically works out how closely two strings (like two station names) match.

Each pair of names is assigned a score between 0% and 100%. For example: FuzzyWuzzy assigns "Northfields" and "Southfields" an 82% match; "Kennington" and "Kenton" are assigned a 75% match; and "Bank" and "Whitechapel" are assigned a 13% match.

A different procedure would likely give us different results, and ultimately change the arrangement of stations on our map. But since this is all for fun, we need not fuss about that too much.

Generating a network

Now we need to turn these 4456 percentages into a network. We can use the following method:
  1. Start with one station in our network
  2. Find the closest similarity between a station inside our network and a station outside
  3. Create a link between the two stations found in step (2)
  4. Repeat steps (2) and (3) until every station is part of the network
For example, let's start with Balham in the network. According to FuzzyWuzzy, the most similar station name to "Balham" is "Amersham" with a 57% match, so we join Balham to Amersham.

Now the closest match to "Balham" is "Archway" (42%) and the closest match to "Amersham" is "Chesham" (67%). Since the "Amersham"-"Chesham" percentage is higher than "Balham"-"Archway", we join Amersham to Chesham.

We keep going until our last link is added: "Victoria" matches poorly with every other station, but its best match is 42% with "Blackfriars" (I guess because of the shared "c...ria"), so that's where it goes.

This procedure for growing the network is called Prim's algorithm. An interesting property of the algorithm is that the network ends up the same no matter which station we choose to start with.

The final map

Then all we have to do is draw the network with a little Tube map flair, and we end up with this: