etaoinshrdlucmfwypvbgkjqxz

My previous blog post must have lead many readers of mine (hello to the both of you) into thinking I had gone mad. I hadn’t. It must have been maddening, what with self-linking posts and complete and utter gibberish. This is the story of how I dreamt up an encoding for the English language.

I had been lying in bed, late last night, thinking about my own perception of time and space. I have often convinced myself that I perceive time and space rather differently than most people. Last night was one of those nights. I had an imaginary conversation with people in my mind, explaining how I perceive time and space. I gave an example: imagine a spoon falls from a table. Rather than perceive the motion of the spoon falling, I would often perceive it as a solid block between the table and the floor. The solid block would be in the shape of the trajectory of the spoon falling.

Now of course this is all mental. It’s an additional layer of processing my brain does for some odd and undocumented reason, and only under certain mental circumstances (for example, when feeling extremely relaxed). But it’s a good way of explaining to people how I very occasionally perceive time and space.

I then began to think about writing. Our writing is very much based on a deterministic pattern. If I write the rune ‘A’, I would expect the rune to only represent one alphabet (‘A’). What if, however, a rune would represent all possible representations? A good way to frame this would be to think: when you put a pen to paper, at that point an infinite possibilities of runes that could be drawn exists. As you draw more of your rune, the number of possibilities drop until you have a rune that represents one alphabet. Imagine if you will, a person who is able to see a lot of the possibilities would draw a rune that represents all those alphabets. Perhaps this clip from Fringe would help:

And as I thought about it, I too thought about ETAOIN SHRDLU. What if we could encode a rune based on its possibilities? ETAOIN SHRDLU provides a very good basis, given that it’s the frequency each letter is used in the English language. So, the question became: what if we could encode multiple letters into a rune and make a language out of it?

The thought stayed with me for the whole day and after work today, I decided to write a quick and dirty script to encode and decode into this language. To give it some extra mystique, I picked out 26 utf-8 symbols to form the basis of this encoding. Although there are 26 symbols, they do not map 1-1 to the English alphabet – as such, statistical analysis will not yield much. I’ve actually had some difficulties in writing the decoder myself!

This blog post will explain how I did it.

Everything above this line is the exact contents of the previous blog post.

Encoding

Originally this would be a much longer article about encoding, byte order marks, runes, characters, letters and stuff, but I somehow lost steam writing it. So, here goes, how it works:

The main gist of how it works is that every letter in the English language (abcdefghijklmnopqrstuvwxyz) is replaced with one of two runes, in this form. The obverse is also presented in the table – where one rune represents one of two letters.

Letter Pair of Runes Pairs of letters Rune
e Ꮬ, ᕶ (‘e’, ‘t’)
t Ꮬ, ⴾ (‘t’, ‘a’)
a ⴾ, ኡ (‘a’, ‘o’)
o ኡ, Ꮡ (‘o’, ‘i’)
i Ꮡ, Ꮊ (‘i’, ‘n’)
n Ꮊ, Ǭ (‘n’, ‘s’) Ǭ
s Ǭ, փ (‘s’, ‘h’) փ
h փ, ⵓ (‘h’, ‘r’)
r ⵓ, ლ (‘r’, ‘d’)
d ლ, Ӽ (‘d’, ‘l’) Ӽ
l Ӽ, Ꭴ (‘l’, ‘u’)
u Ꭴ, Ⴃ (‘u’, ‘c’)
c Ⴃ, ᕜ (‘c’, ‘m’)
m ᕜ, ᙢ (‘m’, ‘f’)
f ᙢ, ᗎ (‘f’, ‘w’)
w ᗎ, ɷ (‘w’, ‘y’) ɷ
y ɷ, ዯ (‘y’, ‘p’)
p ዯ, Փ (‘p’, ‘v’) Փ
v Փ, ⵙ (‘v’, ‘b’)
b ⵙ, ※ (‘b’, ‘g’)
g ※, ⴿ (‘g’, ‘k’) ⴿ
k ⴿ, ᖝ (‘k’, ‘j’)
j ᖝ, ֆ (‘j’, ‘q’) ֆ
q ֆ, Ꮻ (‘q’, ‘x’)
x Ꮻ, Ꭽ (‘x’, ‘z’)
z Ꭽ, ᕶ (‘z’, ‘e’)

The rules as to how the letters are replaced by the runes are simple – distance of the next letter in the string to the current letter, given a key. If that sounds like giberrish, I’ll provide a more concrete example:

Consider the string “hello world”* For simplicity sake, everything is in lowercase . We’ll start by analyzing the letter h. h can be replaced by փ, ⵓ. Which one is chosen depends on the next letter, e and the distances to other letters of consideration. The other letters of consideration can be consulted by looking up what letters cause փ, ⵓ.

The letters that causes փ are h, s, while the letters that could generate are h, r. Since we’re already at h, the letters that remain to be considered are s, r.

So, we then look at the distance of e to s and r. The distance key that I used is simply “etaoinshrdlucmfwypvbgkjqxz”, or rather, the title of the blog post. I had used a rather bad and simplified distance counter in my code, which is to say, take the position of the next letter (‘e’), and subtract it from the position of the letter of consideration (‘s’ or ‘r’), then take the absolute value.

So, for example, comparing e to s, position of ‘e’ is 0, the position of ‘s’ is 6. Therefore the absolute value is 6. Comparing e to r, the distance of ‘r’ is 8, therefore the absolute value is 8. If the absolute value is larger, use that rune associated with that letter. In this example, case, r is chosen, and therefore, the rune associated with h, r is used. The rune is ⵓ.

Rinse and repeat. You will get ⵓᕶӼᎤኡ ɷኡⵓᎤӼ.

The program ignores spaces and basically replaces spaces with spaces. When it reaches the last character* That’s the technical word for it – character, usually 1 byte of the string, it looks back to the first character of the string as it’s next-letter.

Decoding

While originally this article was planned to be much longer, I lost steam halfway, so the decoder is also kinda crappy. It tries to reverse the exact function above, but that’s not even complete. So if you want to fork the project, you can check out ETAOIN-SHRDLU at Github

comments powered by Disqus