I recently attended QFPL’s excellent Haskell course. Tony Morris was a little DRY*It's a joke. Tony kept mentioning Don't Repeat Yourself and being lazy but nonetheless was an excellent presenter *The course shook my confidence in my existing ability to reason in Haskell for a bit but it was for the better - I had some fundamentals that were broken and Tony explained some things in a way that fixed it... for now - I have no doubt some basics will be lost to the ether in the next few months. So for the rest of the week I was in a bit of a equational-reasoning mode.
Then my dad sent me a cute link to a calculator that calculate vocatives for Chinese relatives. Given English as my first language (hence not default mode of thinking), this kicked me off in to a chain of thoughts about languages and symbols (you’d find a high amount of correlation between my switching modes of thinking and blog posts - the last time this happened, I wrote about yes and no).
One of the difficult things that many people report with programming languages is that the decoupling of syntax and semantics. I’ve often wondered if we might be better off with a syntax that is based off symbols (rather like APL) - the initial hurdle might be higher, but once that’d done, syntax and semantics are completely decoupled. Then we’d not have flame wars on syntax, rather a more interesting flame war on semantics and pragmatics.
Another line of thinking I had was the hypothetical development of computing and logics in a parallel universe where Chinese was the dominant linguistic paradigms - it’s one that I’ve had since I visited China for the first time.
Combined, these trains of thoughts led to this blog post. So let’s learn some Chinese while learning some (really restricted) functional programming! Bear in mind it’s a very rough unrigorous version.
Before we start, we have to introduce the terms. Terms are just squiggly symbols that we use to represent our programs. We’ll start with something simple. Here are the terms:
T ::= Atom | Compound Atom ::= 我 | 父 Compound ::= T 的 Compound | 爸爸
What the declaration above says is a term can be made up of any basic term (which we call atoms) or compounds. Compounds are of the form (
T 的 Compound) or
Now understandably, this is just a bunch of squiggles to those who don’t read Chinese. Trust me it gets confusing if you read Chinese already. The alienness of the characters is meant to help - just remember the shapes. They could easily be replaced with triangles and circles and this blog post still would make sense.
But I have found it helps some people if you are able to sound things. Here’s a pronounciation guide:
Valid and Invalid Sentences
These are valid sentences:
Note that with the grammar definition above, there will be some sentences that are valid, such as
父的爸爸. If you can read Chinese, this will appear weird to you, but it’s still valid.
These are invalid sentences (if you know Chinese, please ignore that knowledge):
父我- impossible order:
我cannot be at the end because it’s not a Compound
Rules of the System
One view of programming is that programming is all about replacing terms. This is the root of the paradigm known as functional programming. The other, more common paradigm treats the computer as a tape with instructions, and the computer merely follows the tape and executes the instructions. All modern hardware*Barring a few exceptions like LISP machines, and the Mill CPU is implemented in the latter paradigm.
Every sentence must have a rule of reading. Here we’ll create a rule of reading - a program is read from left to right. We’ll start reading from the first character. If the first two characters are not
我的, we can assume them to be there. For example:
爸爸 can be read as
Every time we encounter something that can be replaced, we replace it.
There is only one replacement rule. Everytime we see something like this:
a的父, we replace it with
Here, we must be careful to note the specific replacement rule only happens when it’s in a pattern. If the encountered word is
父 alone, it cannot be replaced by
爸爸. For example, in this sentence
父 cannot be replaced with
爸爸. When something can be replaced, we call it a redex. When something cannot be replaced anymore, we call it normal.
Expressing the replacement rule is a bit painful to keep writing over and over again. So we can write it as
父 = 爸爸 in a的父. We read that in English as “let
父 be replaced by
爸爸 in any pattern that looks like
What then is
a? It’s simply anything that when replaced, creates a valid sentence. So you can replace
爸爸的爸爸 and it’s still valid.
Checking For Validity
We now have a rule for reading a Chinese sentence. And we have replacement rules too, so what may appear to be an incorrect term may be correct after some replacements.
我的父 is invalid. Remember a valid term is either an atom or a compound. And there are only two possible compounds:
T 的 Compound or
We can clearly see that
我的父 fits into the first pattern. So let’s try to see if it’s valid by breaking up the pattern.
我的父 == a 的 b; a must be a T, b must be a Compound
T? Yes it is, because it’s a Atom. Is
父 a Compound? No. It’s not. It’s an atom.
So as you can see, the sentence is invalid.
However… we did have a replacement rule - when we see something like
a的父, we replace
爸爸. And so we replace it.
Now the sentence is valid. It means “My Father”.
Now, we’re lazy people and don’t want to type a lot. But for now, we’ll replace two combinations
爸爸的爸爸 = 爷爷 in a的爸爸的爸爸 爷爷的爸爸 = 曾祖父 in a的爷爷的爸爸 曾祖父的爸爸 = 高祖父 in a的曾祖父的爸爸
Note that each replacement builds on another replacement, and they follow a pattern. At this point I find it rather instructive to replace words with pictures.
We’ll start with a simple replacement for the first case
我的爸爸的爸爸. Whilst there can be multiple representations of this sentence as a graph, we’ll say for now that this is one and only correct representation:
We can replace the box with the new vocabulary:
Now a more complicated sentence like
我的爸爸的爸爸的爸爸 would have multiple reduction possibilities. We’ll look at two:
Redex A has a valid replacement, and so does Redex B. However, only Redex A will reduce to the simplest form, given the rules we have.
In the first reduction, they reduce to:
Notice only Redex A has a candidate for replacement and simplification. Hence only Redex A can be reduced further:
Good Rules, Bad Rules
Now of course the rules are what we define.
We can equally define these as our rules :
|Original||New||Original (English)||New (English)|
||No change||Father’s Father = Grandfather||-|
||Grandfather’s Father = Great-grandfather||Father’s Grandfather = Great-grandfather|
||Great-grandfather’s father = Great-great-grandfather||Father’s Great-Grandfather = Great-great-grandfather|
For the sake of regularity, we’ll take this new set of rules to be the “correct” rules. What this means is Redex B is the one that reduces to 曾祖父 instead of Redex A.
Now we have one single reduction rule - graphically speaking, start reducing from the right most branch, before reducing the left branch.
Why do we need such a specific rule? Afterall, both Redex A and Redex B evaluates to
曾祖父, which means Great-grandfather.
Ah well, that’s because we’ve only considered one sex so far.
The Fairer Sex
So let’s add mothers.
We’ll introduce a new Atom (
母) and look at the new Compounds, some of which you’ve met:
T ::= Atom | Compound Atom ::= 我 | 父 | 母 Compound ::= T 的 Compound | 爸爸 | 妈妈 | 爷爷 | 曾祖父 | 高祖父
Here’s the optional pronounciation guide (new words marked are with asterisks)
The rule of replacement for
母 is similar to the rule for
母 = 妈妈 in a的母
With the introduction of new vocabularies, there are now more ways a sentence can be valid and not. For example, this is invalid
母妈. It also introduces a new layer of complexity and evaluation requirements.
For example both
爸爸的妈妈 are valid sentences that uses the same two terms. However they have completely different meanings - or in other words - they evaluate to different values.
If you are familiar with mathematical terms - the function
的 is not commutative.
As mentioned in the opening paragraphs, Chinese kin have complicated addressing schemes. The way you’d address a cousin changes depending on lineage, sex, age, generation, order and marriage ties. Yep, that’s a LOT of dimensions. Therefore it shouldn’t come as a surprise that
妈妈的爸爸, which looks similar to
爸爸的爸爸, will evaluate to different things.
These are the replacement rules for the new Atom we introduced:
妈妈的妈妈 = 外婆 in a的妈妈的妈妈 妈妈的爸爸 = 外公 in a的妈妈的爸爸 爸爸的妈妈 = 奶奶 in a的爸爸的妈妈
So this is now our grammar:
T ::= Atom | Compound Atom ::= 我 | 父 | 母 Compound ::= T 的 Compound | 爸爸 | 妈妈 | 爷爷 | 曾祖父 | 高祖父 | 外婆 | 外公 | 奶奶
At this point it seems rather overwheming.
3-Dimensional Peano Numbers…
At this point, some readers may realize I’ve just been kinda describing a weird version of Peano numbers. Almost. For that to happen we’d need to add two more Atoms, then I’ll explain what I mean. We’ll add two new Atoms (
T ::= Atom | Compound Atom ::= 我 | 父 | 母 | 子 | 女 Compound ::= T 的 Compound | 爸爸 | 妈妈 | 爷爷 | 曾祖父 | 高祖父 | 外婆 | 外公 | 奶奶 | 儿子 | 女儿
Now we can describe things. We’re all familiar with the number line:
...|----|----|----|----|... -2 -1 0 1 2
In the construction of Peano numerals, one would usually be exposed to the functions
One is defined as
Two is defined as
Succ(Succ(Zero)) and so on and so forth.
In fact in the preceding sections I’ve done just that. Here’s a simple table of comparison:
|Atom||Sorta Equivalent To|
Do note that they are merely rough analogues - they’re not necessarily direct analogues, just close enough for me to explain the next bits.
Standard Church Encoding
The standard definitions for Peano axioms in lambda calculus is defined as such:
0 = λf.λx.x 1 = λf.λx.fx 2 = λf.λx.ffx ... Succ = λn.λf.λx.fnfx Pred = λn.λf.λx.n(λg.λh.h(gf))(λu.x)(λu.u)
There are many ways of interpreting the Church encoding of numerals. But I found this to be the simplest intepretion of Church numerals: they’re basically a function that determines how many times the function
f gets applied to the input
x. All while being an encoding for natural numbers.
0. The number is magical - it’s a rule that’s δefined outside the system. And
(+1), also another rule δefined outside the system.
Pred then can be seen as crawling along a number line that exists ephemerally, starting at the index 0. The resulting evaluation of
Pred yields a function that when applied to a
x which represents a state in the number line, gives the correct answer.
If such is the definition of what
Pred does, then we can say
子 fulfil similar purposes. Instead of a number line, we’d be traversing a family tree.
父 walks one level up through a vertex that is
子 walks one level down through the vertex that is a
Abstract and Concrete Graphs
The Church encoding of natural numbers and
Pred work on an abstract number line that contains every possible natural number. But what if we were presented with a different number line that has the number 6 missing?
Succ simply stops at
Succ simply hangs the computer (because by definition you couldn’t have
3 since you’d be able to create
Here the notion of a finite concrete number line is a strange idea, seeing that we’re so used to the notion of an infinite number of natural numbers. But as science fiction can tell you, dreaming up of a different number line isn’t that strange an idea. Indeed throughout history, new types of numbers were invented (or discovered) out of thin air.
Natural numbers are a lot stricter than family trees. Indeed, the Peano axioms are viewed as a proof of natural numbers. On the other hand, family trees are highly variable.
Our language based on family trees would have to work on an abstract family tree where there are all possible familial relations. But they can also work with concrete implementations.
Here is an example tree. Starting from
我, the function
父 walks up the graph to the next circle.
子 walks down the graph to the next circle.
母 walks up to the next square.
女 walks down to the next square.
There is one additional rule: if a reduction were to have many choices within, and a star is amongst those choices, the result will be the star. In other words,
我 is sexless - it’s both female and male.
This also means both
我的爸爸的女儿 evaluates back to
Extending The Graph
The example above shows a limited, very concrete graph. The grammar of family relations is intended to work on an abstract notion of the graph. That abstract notion is that the graph extend infinitely upwards and downwards.
It can also extend infinitely sidewards -
我 can have infinitely multiple
女儿. And each
女儿 can have multiple of their own. Nonetheless the effect is the same as merely having one of each (prove it for yourself - given there are no notions of instances of each vertex).
The constraint for this language is that the upward extension is limited to one of
Also notice at this point there are several possible edges that are missing - siblings, and marital relationships. It’s a single dimensional traversal along two types (male and female). We can of course add horizontal traversal, by adding new words to our vocabulary.
Here are the new atoms we’ll be adding:
妹. The associated compounds are:
Here’s our vocabulary now:
T ::= Atom | Compound Atom ::= 我 | 父 | 母 | 子 | 女 | 兄 | 弟 | 姐 | 妹 Compound ::= T 的 Compound | 爸爸 | 妈妈 | 爷爷 | 曾祖父 | 高祖父 | 外婆 | 外公 | 奶奶 | 儿子 | 女儿 | 哥哥 | 弟弟 | 姐姐 | 妹妹
And now we can add horizontal edges to the graph (and suddenly the complexity of the graph explodes):
Note that there have been some vertices that have been elided - the children and grandchildren of
The mere adding of those 4 new atoms and their associated compounds have led the graph from being planar to being completely non-planar.
One More Bit Of Complexity
So far I’ve kept everything within the family - blood relations only. Now to make this a more basic nuclear family, we’ll add the missing parts: the non-blood relationships. And to keep things simple, we’ll keep to marriage based relationships. That is to say we’ll ignore the adoption, and various other non-blood relationships (like step-parents) that may happen.
We will add a
夫 with its associated compounds
老公. It’d be quite messy to visualize that, but it would be akin to adding a second and third copy of the current graph with
我 swapped out and the shapes replaced with the sex-appropriate shapes. Each copy would then be stacked on top of the current chart, with a single line connecting
You may have noticed something particularly odd about my use of language. Specifically I’m careful to specify that familial relationships are graphs, not trees. I’m of course aware of the famous cycles in family tree software issue.
As for a proper concrete data structure, I think a sheaf would be one of the better structures to model a family tree and its constraints. But that’s beyond the scope of an introductory article.
I’ve done some research on this, and it turns out this wouldn’t really be a problem for addressing. Addressing of relatives has precedence in Chinese culture, in this order:
- Blood relationships
- Fillial relationships (parent-children)
- Order (eldest - youngest relationship within blood relationship)
- Marriage relationship
- Kin relationship
- Clan relationship
I married a widow who had a grown daughter. My father, who often visited us, fell in love with my step-daughter and married her. As a result, my father became my son, and my daughter became my mother. Some time later, I gave my wife a son, who was the brother of my father, and my uncle. My father’s wife (who is also my daughter and my mother) got a son. As a result, I got a brother and a grandson in the same person. My wife is now my grandmother, because she is my mother’s mother. So I am the husband of my wife, and at the same time the step-grandson of my wife. In other words, I’m my own grandpa.
In the above annecdote, the resolution of addressing one’s relatives is aided by additional rules of precedence. So my step daughter would still be my step daughter, not my step mother, because of blood (between her and the mother) followed by marriage (me and the mother).
One of the first things I did was to do this:
There are a lot of corner cases that polyamory opens: in a polyandrous relationship, assuming
我 is a cishet male,
我的老婆的老公 would be have no name because our replacement rule states that it should evaluate back to
To be able to support polyamory, we’d have to change and significantly complexify the rules. Again, I leave that as an exercise to the reader.
Again, this basic graph doesn’t handle non-cishet relationships well. It handles it better than expected though - descendent and prescendents are handled by the precedence rules. Your younger brother’s husband is addressed the same way as you would your younger sister’s husband.
The prescendents are a little difficult. Your dad’s husband has no name - presumably because everyone is born of a mother.
Other Modern Families
This grammar does not work on other modern families either - remarriages, adoptions, etc. There are vocative words for these in Chinese, but it’d be too complicated to add here. Or in the words of any maths text book authors: left as an exercise to the reader.
When I say functional programming I mean programming languages predicated upon term rewriting at runtime. This means lisp/clojure, haskell and ocaml qualify. These languages do not necessarily perform term rewriting at runtime but the languages were built with term rewriting as the starting point (in contrast with C/Pascal languages where they were written with a tape machine in mind).
Yes I am aware the compilation process of C-like languages nowadays involve SSA, which can be viewed as a term-rewriting on its own. And yes I am aware that languages like Haskell are not pure term rewriting systems (Haskell started out in the early days as a graoh rewriting system though).
One key thing from the grammar above is that it lacks the ability to perform a very important form of semantic abstraction: the lambda abstraction. That is to say everything has to be defined in the grammar to be able to be expressed. If we were to port this to pure lambda calculus, everything would be a δ-rule.
Nonetheless it gives, I think an interesting view of programming. If we view the Peano axioms as crawling all over an abstract notion of natural numbers line, and the Chinese kinship language as crawling all over the abstract notion sheaves (or at least very complicated acyclic graphs, for those who do not want to enforce categorical constraints), then we can view general programming language to be crawling all over the space of possible programs.
On the Type Of The Terms
For some people, the only good calculus is a typed calculus. Here’s a list of simple types of the terms:
Here’s an exercise for the reader: There exists a refinement of the type of
的 that involves existential types. The latter type would be more useful in constraining programs.
Have fun figuring out (though it’s not really fair as I’ve not introduced type terms and typing rules, but I’m fairly certain you can derive it from standard typing rules).
This blog post is really nothing more than an exercise in a very restricted form of term rewriting (i.e. I covered only function applications here). I was tickled by the idea of
妻子的妻子 being a normal redex in the calculator. Then started working out a sorta grammar for expressing familial relationships, which then led to this.
The most important takeaway I think was written a few paragraphs above:
If we view the Peano axioms as crawling all over an abstract notion of natural numbers line, and the Chinese kinship language as crawling all over the abstract notion sheaves (or at least very complicated acyclic graphs, for those who do not want to enforce categorical constraints), then we can view general programming language to be crawling all over the space of possible programs.
Now your task should you choose to accept is the exercise of removing self-referencing in this blog post. ;P May your Y combinators ever emerge.
Last bit of trivia: I originally started with
母 as an example. But upon realizing
父(fù) can be read as “foo” and
爸(bà) can be read as “bar”, I was so tickled at that that I rewrote most of the examples. Unfortunately there are no chinese words that correspond to “baz”. Not even my cousin Barry could be called Bazza in Chinese.