Bigger On The Inside Straight from my brain, mostly unfiltered. Mon, 17 Aug 2015 03:19:10 +0000 en-US hourly 1 Intuitions From The Price Equation Mon, 17 Aug 2015 03:00:40 +0000 Continue reading ]]> George Price was a rather interesting fellow. A few months ago, I was reading a rather interesting piece about his life from HN. If you follow my blog posts (hello to the two of you), you’ll note that altruism and cooperative games is one of the things I like to blog about.

Following that article, I discovered the Price equation[1]. While grokking the equation, it had suddenly occurred to me that kin selection and group selection were indeed the same thing. It was a gut feeling, and I couldn’t prove otherwise.

So what I told you was true... from a certain point of view

I recently had a lot of time on hand[2], so I thought I’d sit down and try to make sense of my gut feel that kin selection and group selection were in fact the same thing. Bear in mind I’m neither a professional mathematician nor am I a professional biologist. I’m not even an academic and my interest in the Price equation came from an armchair economist/philosopher point of view. And so, while I grasp a lot of concepts, I may actually have understood them wrongly. In fact, just be forewarned that this entire post was a result of me stumbling around.

So, let’s recap what the Price equations look like (per Wikipedia):

\Delta z = \frac{1}{w} cov(w_i, z_i) + \frac{1}{w} E(w_i \Delta z_i)

Simply put, \Delta z is the difference in phenotype between a parent population and the child population. And that difference is a function of two things:

  1. The covariance of fitness and phenotype – \frac{1}{w} cov(w_i, z_i) where w is the average fitness of the population, w_i is the individual fitness of i , and z_i is the phenotype shared in the group.
  2. The expected value of the fitness of the difference between the group’s phenotype and the parent group’s phenotype.

Deriving Intuition

Let’s make this a little more intuitive. Wiki says that \Delta z is the “… the change in average characteristic…” from one generation to the next. We’ll just call this evolution. Now, evolution can be simplified into a relatively simple equation: evolution is the sum product of the selection process and some kind of inheritance errors. This is commonly phrased as “random mutation + natural selection = evolution”.

In fact, let’s start with that idea, and put it in context of the Price Equation. We’ll say that this is true:

Evolution = RandomMutation + NaturalSelection

From here, we can start modelling random mutation and natural selection.

Modelling Random Mutation

Let there be a population of individuals in a set, S. Now, we’ll say that z is the phenotype – say eye colour (and we’ll use RGB values to denote a numerical value of their eye colour), and \bar{z} is the average eye colour. Now, we randomly partition the set S into n mutually exclusive sets, S_i, where i=0 to n. We’ll say that \bar{z_i} is the average eye colour of the group i. We’ll also say that the proportion of each subset q_i is q_i = \frac{count(S_i)}{count(S)}. This gives us a second way to derive \bar{z}:

\bar{z} = \sum q_i \bar{z_i}

Here’s a simple example in Python to prove this (which I wrote to ensure I was sane):

S = set([1,2,3,4,5,6,7,8,9,10])
S_1 = set([2,5,6])
S_2 = set([1,3,4,7,8,9,10])
avgZ = sum(list(S))/len(list(S))
avgZ_1 = sum(list(S_1))/len(list(S_1))
avgZ_2 = sum(list(S_2))/len(list(S_2))

>>> 0.3 * avgZ_1 + 0.7 * avgZ_2
>>> avgZ

Ahh.. the joys of floating point math[3].

Anyway, let’s imagine the people of population S had plenty of sex, and they created a children population, which we shall call S'. We can also do the same as we did above to the new population. We’ll denote everything with a ' as being part of the child population – i.e. \bar{z'_i} is the average eye colour of the population at a subset i .

So from here we’ll say \Delta \bar{z} = \bar{z'} - \bar{z} as the change of average eye colour (over time, which is implied). We can also say the same for each subset: \Delta \bar{z_i} = \bar{z'_i} - \bar{z_i}. Think of this as the change in average eye colour of a particular subset i. Modelling the difference between two generations would then require us to also look at the proportion of the child subset. Let’s say for subset i, q'_i = \frac{count(S'_i)}{count(S')}

We can then define random mutation over two generations as:

RandomMutation = \sum q'_i \Delta \bar{z_i}

The intuition behind this is quite clear – for each subset, the new (read: mutated) average eye colour is the child proportion multiplied by the change in average eye colour.

Modelling Natural Selection

Natural selection is harder to model. We’ve to think about things like fitness (because that is how selection happens). But we can figure out the progress of selection, by simply looking at the changes of the proportion of a population. The change in proportion is easily defined as \Delta q_i = q'_i - q_i. Applying the change of proportion of a population on a averaged phenotype then simply becomes selection! It can be defined thusly:

Selection = \sum \Delta q_i \bar{z_i}

Now think about the intuition behind this equation. Given an average eye colour, we can say an average eye colour is selected for if there is more subsets with that average eye colour.

Modelling Evolution (of a phenotype)

So, putting them together, we’ll replace the parts of this formula:

Evolution = RandomMutation + NaturalSelection

and it becomes this:

Evolution = \sum q'_i \Delta \bar{z_i} + \sum \Delta q_i \bar{z_i}

And since we’re mainly concerned with the evolution of one particular phenotype, we can say that Evolution is the change in average phenotype:

\Delta \bar{z} = \sum q'_i \Delta \bar{z_i} + \sum \Delta q_i \bar{z_i}

Modelling Fitness

So far we’ve managed to define an equation for evolution[4], but we’ve not talked about fitness at all, where as the Price equation has a term, 2 that denotes fitness. How would we model fitness?

We first need to understand that fitness is a characteristic of an individual carrying a phenotype – specifically, it refers to the ability to propagate the phenotype. By now, something should have clicked in your head. We can simply say average fitness of a group is the number of descendants over the number of parents and define it as such:

\bar{w} = \frac{count(S')}{count(S)} is for the entire population, while \bar{w_i} = \frac{count(S'_i)}{count(S_i)} is for each subset.

To keep in similar notation from the above, I’ve added the bar to denote average fitness.

And suddenly, we’re able to quickly derive Price’s equation by multiplying both sides of the equation with \bar{w}:

\bar{w} \Delta \bar{z} = \bar{w} \sum q'_i \Delta \bar{z_i} + \bar{w} \sum \Delta q_i \bar{z_i}

Because \bar{w}q'_i = q_i \bar{w_i}[5], we can break down each of component of the equation:

For random mutation:

\bar{w} \sum q'_i \Delta \bar{z_i} = \sum q_i \bar{w_i} \Delta \bar{z_i}

And since q_i = \frac{count(S_i)}{count(S)}, the equation q_i \bar{w_i} is w_i and the whole equation simplifies to become \overline{w_i \Delta z_i}[6], which is the expected value of w_i z_i, usually written as E(w_i z_i)

For natural selection:

Natural selection is a bit tricky… we’d have to break \Delta q_i up:

\bar{w} \sum \Delta q_i z_i = \bar{w} (\sum q'_i z_i - \sum q_i z_i)

We can also simply reduce them into this: \overline{w_i z_i} - \bar{w_i} \bar{z_i}

The above formula is simply the definition of covariance, and is usually written as cov(w_i, z_i)

Since cov(w_i, z_i) is the covariance of fitness and phenotype, the more they covary, the stronger the selection for z_i. Now that makes a lot of sense!

So far what I have done is write a layman’s explanation of Steven Frank’s derivation of the Price equation, which frankly is quite a lot better than what I have here… so I don’t even know why I wrote the above. Well, I guess when I did it for myself, I got some intuition on how to think about certain things, so there’s that, and I hope that I’m able to convey the intuitions.

Hamilton’s Rule

George Price had a very interesting relationship with his friend, William Hamilton. He created the Price Equation when he was trying to re-derive Hamilton’s work on kin selection, commonly called Hamilton’s Rule. And yet, a few years later, Hamilton himself reworked his rule to be based off Price’s.

I had found 12 Misunderstanding About Kin Selection by Richard Dawkins and Jonathan Birch’s Hamilton’s Rule and its Discontents to be particularly helpful in understanding kin selection.

In particular, the two variants of Hamilton’s rule is a major cause of confusion. Most people, when talking about kin selection, usually talks about the commonly known on (the one in Wikipedia), often written as rB > C as the rule, where r is the relatedness of the actor to the receipient of an altruistic/spiteful act, B is the benefit conferred upon the receipient, and C is the reproductive cost of said altruistic act.

However it’s the general version (which we shall use Birch’s notation and refer to it as HRG) that I am interested in. See Birch’s easy-to-follow paper for derivation from the Price Equation:

\Delta_s \bar{z} > 0 \ \ iff \ \ rb-c > 0

The difference of course between the second version of Hamilton’s rule is the definition of the relatedness factor r. In the HRG version, r is defined to be \frac{cov(\bar{z_i}, z_j)}{var(z_j)} [7], it’s more actually a statistical tendency that the recipients of the altruistic act are themselves altruistic, rather than a straight out genetic relatedness.

Group Selection

Since 2012, group selection and kin selection have been accepted to be the same bloody thing – mainly due to the works of Grafen, Gardner and Marshall (and others). As such, after researching more into this, it would appear that this blog post is no longer necessary. But for shits and giggles, let’s just continue (because mainly I’ve sunk about 5 hours[8] and 100s of revisions typing in stupid equations that I doodled on a piece of paper)[9].

We’ll start with Price’s Equation:

\Delta \bar{z} = \frac{1}{w} cov(w_i, z_i) + \frac{1}{w} E(w_i \Delta z_i)

We’ll start by hypothetically looking at the individual level. We can do that by partitioning S into subsets that has only an individual, so we can say that S_i is j. This equation simplifies down to:

\Delta \bar{z} = \frac{1}{w} cov(w_j, z_j)

w_j can be thought of the fitness of individual j. Think of it as the number of offsprings to make the more abstract idea concrete. You’ll note that the random mutation component has been eliminated. If you read a number of biology papers, they tend to do that, mainly for simplicity’s sake. The intuitive argument is that random mutation of a gene is too small to bother on an individual level, we’re just going to look at the selection based on the phenotypes[10].

Now this is on an individual level. If we were to put this individual into a group, we’ll work out the average random mutation (i.e. the expected value) to be something like E(cov(w_j, z_j)) [11]. So now the Price Equation looks something like this:

\Delta \bar{z} = \frac{1}{w} cov(w_i, z_i) + \frac{1}{w} E(cov(w_j, z_j))

Now we can rephrase the Price equation to think of it this way: The “natural selection” component of the equation can also be thought of as selection between groups (the terms are of i, which stand for groups). And the “random mutation” component can now be thought about as selection within a group (the terms are of j, which are individuals). So if you think about this in a roundabout way, you’ll get that for \Delta \bar{z} to be selected for, it needs to be > 0. Therefore

\Delta \bar{z} > 0 \ iff \  \frac{1}{w} cov(w_i, z_i) + \frac{1}{w} E(cov(w_j, z_j)) > 0

The intuition about this is relatively straight forwards: if the between group selection process (formerly the “natural selection” component) and the within group selection process (formerly the “random mutation” component) are in agreement, then the phenotype z will be selected for. Note however, that this is an inequality. So if a the within-group selection process turns out to be a negative number, the between-group selection component has to be greater than that in order for a phenotype to be selected.


We’ll cheat. We’ll start by saying group selection and kin selection are equivalent. Afterall, both equations look similar:

\Delta \bar{z} > 0 \ iff \  \frac{1}{w} cov(w_i, z_i) + \frac{1}{w} E(cov(w_j, z_j)) > 0

\Delta \bar{z} > 0 \ \ iff \ \ rb-c > 0

From an intuitive point of view, it makes sense too. In the above section, we can say that the selection between groups is similar, at least intuitively to rb, except now we’ve defined a group to be a group of individuals who are genetically related (r). And the cost of an altruistic act? It’s exactly the same as having a negative selection within groups. In fact, you can think of the group selection equation to be a more generalized version of kin selection, because the cost can apparently be… positive.

For me the biggest lightbulb moment for me was realizing that the Price Equation can be read two ways – the latter being the whole in-group and between-group selection thing.


Where is this going? I have no idea. I started writing this blog post to help myself understand if kin selection and group selection were the same thing, based on the Price Equation. I did some research, found that most people nowadays agree that they’re the same thing anyway. I almost stopped blogging by that time [12]. But then for shits and giggles I completed it. And I think they’re fairly similar, at least from an intuitive point of view. I had wanted to continue on proving that each component mean the same thing, but I think I am happy with it as now.

The reality is the Price equation and the derivatives – kin selection and group selection – are really basic algebra, but the key behind it is the intuition. I think I have understood it well, and wrote down some intuitions behind the simple equations. But you tell me. Do my intuitions make sense? I may have made many mistakes along the way with my intuition. If you spot one, please tell me in the comments section below.

Extra Reading Material

Here are a list of extra reading material that I think may be helpful – some of them I have read, some are references found strewn about that I will hopefully eventually find the time to read:

  1. [1] Funny story. I was quite surprised I hadn’t heard of the Price equation, so I hit the books. I found the equation being referenced very very very very briefly in Martin Nowak’s Evolutionary Dynamics, and that was all
  2. [2] Being laid off does that to you :)
  3. [3] Originally I had written MARKDOWN_HASH8fb0f53b0b3bde79c8bdf648f65ecd6bMARKDOWN_HASH which returned MARKDOWN_HASHf8320b26d30ab433c5a54546d21f414cMARKDOWN_HASH. That nearly sent me into madness as I thought my basic math knowledge were broken
  4. [4] And you can actually reach the same conclusion the other way around. If we define evolution, \Delta \bar{z} as the change of proportions of phenotypes in a group, it’ll end up being something like this:

    \Delta \bar{z} = \sum q'_i z'_i - \sum q_i z_i

    And through factoring out you will end up with this:

    \Delta \bar{z} = \sum (q'_i - q_i) z_i + \sum (z'_i - z_i) q_i

    Of which (q'_i - q_i) and (z'_i - z_i) can be rewritten as \Delta q_i and \Delta z_i respectively, leading to the same equation as above:

    \Delta \bar{z} = \sum q'_i \Delta \bar{z_i} + \sum \Delta q_i \bar{z_i}

  5. [5] Do the algebra yourself to confirm
  6. [6] I skipped a couple of steps, and you should check whether I’m right. Also I’m using \overline because \bar is not readable, but just assume that it’s a bar over.
  7. [7] where j is an individual, and i is the set S_i… the working is derivative of Birch’s paper and I’m not bothered to reproduce it here, so refer to that
  8. [8] Per RecueTime
  9. [9] Ladies and Gentlement, I present to you the Sunk Cost Fallacy
  10. [10] Yea this is a little hard to swallow, but roll with it for a while
  11. [11] You should double check this, my notes at this point got very messy
  12. [12] you can tell that I ran out of steam, because I started by meticulously explaining the evolution equation, and then skipped a bunch of working for kin selection and group selection
]]> 0
The Skynet Argument Against Social Media Thu, 13 Aug 2015 00:04:11 +0000 Continue reading ]]> In The Terminator (1984), Skynet sends a T-800 to terminate Sarah Connor. And the Terminator had to look up a phone book to find three Sarah Connors, because it mainly didn’t know what Sarah Connor looked like or where she lived.

That made sense in 1984. If the records had been destroyed in the war – records can be destroyed because physical drives were expensive and don’t have much capacity. Skynet wouldn’t have known how Sarah Connor looked like, or any other of her personal details. Rewatching The Terminator in 2015, this would have made no sense. If Skynet were made today, it would simply scour the cloud for information about Sarah Connor. And she’d be cleanly terminated.

There you go, kids. Don’t use social media. Arnold Schwartzeneggar and the T-1000 will come kill you.

]]> 0
Addendum/Errata for “Monads, In My Python?” Fri, 07 Aug 2015 02:27:20 +0000 Continue reading ]]> I gave a talk at PyConAU – about monads. This blog posts contains some thoughts about the talk, and some addendum/errata that I was not able to cover in the talk. But first, here’s the talk and associated slides.


Throughout the preparation for the talk I was exceedingly aware of the Monad Tutorial Fallacy, also commonly known as Crockford’s Law. Most examples out there use the null pointer exception (i.e functions that possibly return None in Python), and my talk was going to do the same. It isn’t the most illustrative of what the power of monads are, but I ultimately chose that, and the list monad as examples, as it was ultimately the most beginner friendly (add to that I also actually actively use the maybe monad in my day-to-day work)

I tried to not use the wrap-in-a-context analogies (burritos, bucket brigades, space suits, boxes…) to explain monad, opting for a slightly more academic way of explaining them – via category theory. I’m not an expert in category theory, with only passing knowledge[1] in the subject. I think my revamped visualizations of categories will raise the hackles of most mathematicians out there.

When preparing for the talk, I used a few things as a benchmark as to whether I did a good job explaining what monads are:

  1. If the audience realized that there are different kinds of monads, and that there needs to be some “glue” to help the different monads play nice with each other.
  2. If the audience understood that monads exist separately as a concept and as an implementation.
  3. If the audience understood them, they’d realize that I have skipped through some sections (the monad laws).

These three benchmarks allow me to gauge my audience’s understanding of things, and they would raise questions if they understood it. I think I succeeded on #2, but not #1 and #3, as most of the questions were actually related to that.

Monad Transformers

On the topic of combining the different monads, I had actually prepared an example that combined the Maybe monad with the List monad in what Haskell users would call a MaybeT. I wrote it in a mostly object oriented way, with each of the monads inheriting from the Monad class.

Here’s the code. If you squint just right, you’ll note that the MaybeT is itself a monad. In particular look at the parent Monad object’s bind method:

The Missing Bits

In the talk, I skipped talking about the monad laws, but they were included in the appendix of the slides. Generally, the monad laws are pretty obvious and common-sense to someone who writes Python code, but in any case they weren’t the monad laws are written in pseudo-python that should be easy to understand. A monad needs to fulfill those three laws in order to be considered a monad.

I had hoped to also briefly cover what applicatives and functors were – and how many monads are also applicatives and/or functors – thus giving a broader overview of what monads are. I ended up not having enough time to do that.

On with context being Monadic

I fear that I hadn’t been too clear, in the talk as to what I meant when I said this code is monadic:

with open('blah.txt', 'r') as f:
    lines = f.readlines()

If you think of monads as a collection of things (values and functions – specifically two functions called bind() and unit()), then what the bind() function does is that it applies a function within a monadic value – making a function aware of the context that it’s performing its computations in.

Which is what exactly Python’s context manager syntax does. What I wanted to do was drive in the point that Python itself, provides solutions for things where you need to deal with values within a specific context. I had also hoped to drive in the point that monads, the concept, can exist distinctly from monads the implementation.


When preparing for the talk I read through many many monad tutorials. Eventually I got sick of reading the word monad, so I wrote a quick and dirty Chrome extension to replace the word “Monad” with “Gonad”. Yes it was childish. It is highly derivative of the Cloud to Butt extension. But it’s funny. In a childish way. If you want the extension, get it here. It’ll make this blog post a lot funnier. A LOT. I also just put the source into version control, and you can get it here.

If you have any nasty words about how I fucked up the idea of monads, or I’m not purely functional enough, please comment below.

  1. [1] Before learning about how category theory fits with programming, I had one prior experience with category theory years ago when trying to think up new puzzles after I got bored with Rubik’s Cubes – but those were the early days of Wikipedia and I didn’t learn much, but I think it did help me click together the concepts
]]> 0
Algorithms Are Chaotic Neutral Tue, 04 Aug 2015 00:35:35 +0000 Continue reading ]]> Carina Zona gave the Sunday keynote for PyConAU 2015. It was a very interesting talk about the ethics of insight mining from data, and algorithms. She gave examples of data mining fails – situations where Target discovered a teenage girl was pregnant before her parents even knew; or like machine learned Google search matches that implied black people were more likely to be arrested. It was her last few points that I got interested in the ethical dilemmas that may occur. And it is these last few points that I want to focus the discussion on.

One of the key points that I took away[1] was that the newer and more powerful machine learning algorithms out there are inadvertantly discriminate along the various power axes out there (think race, social economic background, gender, sexual orientation etc). There was an implicit notion that we should be designing better algorithms to deal with these sorts of biases.

I have experience designing these things and I quite disagree with that notion. I noted on Twitter that the examples were basically the machine learning algorithms were exposing/mirroring what is learned from the data.

Carina did indeed point out that the data is indeed biased – she did indeed point out that for example, film stock in the 1950s were tuned for fairer skin, and therefore the amount of photographic data for darker skinned peole were lacking [2]

But before we dive in deeper, I would like to bring up some caveats:

  • I very much agree with Carina that we have a problem. The points I’m disagreeing upon is the way we should go about to fix it
  • I’m not a professional ethicist, nor am I a philosopher. I’m really more of an armchair expert
  • I’m not an academic dealing with the topics – I consider myself fairly well read, but I am by no means an expert.
  • I am moderately interested in inequality, inequity and injustice, but I am absolutely disinterested with the squabbles of identity politics, and I only have a passing familiarity of the field.
  • I like to think of myself as fairly rational. It is from this point of view that I’m making my arguments. However, in my experience I have been told that this can be quite alienating/uncaring/insensitive.
  • I will bring my biases to this argument, and I will disclose my known biases whereever possible. However, it may be possible that I have missed, and so please tell me.

On Discriminating Machines

The points I quite disagree with Carina was in her saying that we should fix machine learning algorithms inadvertantly discriminate along various power axes out there. The example that she gave was based on Latanya Sweeney’s paper – Discrimination in Online Ad Delivery, which coincidentally I had read ages ago.

While the rest of the paper is generally uninspiring, Latanya’s paper’s abstract (and indeed her conclusion) was this:

…raising questions as to whether Google’s advertising technology exposes racial bias in society and how ad and search technology can develop to assure racial fairness.

Having once been in the advertising industry, I can quite confidently say that it is in fact that actual advertising system – a combination of the machine learning system (that figures out which ads would have the highest eCPM) and the advertisers whose job is to optimize for their own profit (afterall, they could have chosen to not have that ad template) – that exposes the inherent racial bias of society.

A better way to frame Latanya’s question would be this: if we were to collect the first names of all the people who are arrested in the US, what is the proportion of black-sounding names vs white-sounding names?

We then take this prior information, and create a new posterior rate which we then can compare with the results Latanya acquired in her paper. Given the rate of black incarceration in the US[3], I’d wager with some amount of confidence that there is indeed a higher proportion of black-sounding names of people who are incarcerated[4].

Let’s consider the algorithm bits only for now. Let’s simplify the process and say that the algorithm optimizes the ads to show ads that will return the highest earnings for Adwords. And that the advertiser has provided two variant templates – arrestRecord and contactRecord. Adwords will randomly choose which templates to fill up with. Over time, Adwords learns that people are more likely to click on arrestRecord when paired with a black-sounding name, guaranteeing more revenue. So the obvious solution is then to show more (earn more!)

Really then the question is who is at fault? The algorithm designers? Or the people who trained the algorithm by clicking more on arrestRecord when paired with a black-sounding name?

We can say that the training data (i.e. the live population actively clicking on ads) is biased. I consider the outcome of the system to be a mirror of society. And in my opinion, this is a Good Thing, and we shouldn’t be changing that part. Instead, I argue, we should be working harder to change the underlying data (i.e. the inherent mental bias of the population).

Garbage In, Gospel Out

Aurynn brings up a point in the conversation, saying that developers think data and algorithms are impartial:

Which exactly highlights my point – algorithms ARE impartial. They work on the given dataset – garbage in garbage out, as they say. It’s in fact one of the keystone principles in designing algorithms.

Related to the GIGO concept is the idea of the GIGO fallacy, also commonly known as Garbage In Gospel Out. It is the fallacy where “…the advocate treats conclusions leading from some flawed data, unsubstantiated evidence, unfounded assumption or baseless theory, as gospel.”

The GIGO fallacy is very common, and is the foundation of the anti-algorithm sentiments I got from the talk.

Throughout Carina’s keynote, she notes that there are consequences of these algorithms being inadvertently discriminative. In fact, most of these consequences can be attributed to the population in general being not critical enough to the data that is presented to them.

In the examples she gave – particularly the one where black people were tagged as gorillas and animals – it would be extremely easy to get offended – but when if the people realize that machines are particularly flawed, there wouldn’t be as much outrage. When I wrote EyeMap, I myself wrote an algorithm that wouldn’t detect my own eyes when they do not form a crease at the eyelid, which happens only when I was tired. Did I get mad at the algorithm? No. I simply realize that there are some limits to the algorithm. I believe the general attitude to issues like these should be amusement, not outrage.

Of course this does not mean that the consequences are not real. This does not mean that the consequences does not hurt people, nor does the consequence not cause triggering issues.

Fixing algorithms to handle the general population’s logical fallacies does not fix the underlying problem – people are still not critically treating the information that comes out of their screens.

Take traumatic triggers for example – they’re real, and they have effects. It would be exceedingly terrible if the machine learning systems outputs something that triggers some traumatic flashback. And the conventional wisdom on the Internet is to provide trigger warnings – to be sensitive, so to speak. But research has shown that it is useless, and depending on studies, could even be counterproductive to the healing process. In fact, Metin Basoglu, in his books on torture and trauma research, points out that exposure works better than avoidance (and CBT is in fact one of the best treatments available) [5].

While it would appear that traumatic triggers are not at all like a logical fallacy – it isn’t – the same analogy can be applied. It is the receiving end that should be critical of the results. Fixing the algorithms would be exactly the same as plastering trigger warnings – useless and unproductive.

On Inequity

Then you say to me, “but Chewxy, surely you cannot expect to fix everyone’s individual issue!”, or “you’re teetering on the edge of victim blaming!”. Here, I shall try to convince you that fixing the algorithm would yield even more harmful consequences. We shall do this with a thought experiment, which is set up thus:

First, we suspend our own morality, and enter a realm and adopt a new morality. In this realm, we have a morality that says that there are categories of people that should be approved for loans, and there are categories of people for whom it is immoral to approve for loans. Now we set up the idea of inherent difference along an axis – say anatomical sex (as in it’s dependent on the physical genitals), for example. The reason why sex is chosen is because it’s pretty binary – you either have a male sex organ, a female sex organ[6]. There are no other options.

Let’s say that for some reason[7], people with male genitals were more likely to default on loans than people with other genitals. And for the sake of drilling in further the idea of this new morality, anyone who defaults on a loan will suddenly suffer a constant physical pain for a large majority for the person’s life. Hence, it is a moral imperative, to not approve anyone for loans if they do not qualify, or else you would be doing harm to them.

On the flip side, getting a loan would improve the lives and futures of the peoples immensely. Getting a loan means a person can acquire assets. By now you should be able to see that we have set up a pretty inequitable situation. Anyone with female genitals would have access to improve their lives tremendously, while anyone with male genitals would lag in access to loans.

In this scenario, think about what the appropriate empathetic and sensitive response would be.

Now, to make things a bit more difficult. Let’s say in this hypothetical situation, there is one other key factor that determines whether a loan might get defaulted – whether a person has assets or not[8]. A person with assets is far less likely to default on a loan than a person without assets. You run A_Firm, a firm that provides credit qualifying analysis. Your machine learning algorithm discovers this fact, and starts asking if applicants have assets.

The problem, of course lies with what proportion of the male population have assets, vs the proportion of the female population that have assets. Because it’s easier for females to get loans (because they don’t default as much), it’s easier for females to acquire assets, which makes it easier for them to have loans. We’re in a classic “privileged” situation.

OK, so we’re pretty close to the situation where Carina mentioned in her talk – there will be a lot of people who are rejected for reasons that have nothing to do with their ability to pay, and everything to do with replicating privilege.

So the question becomes this: if we modify our algorithm, to not take into account whether a person has assets – in the name of inclusivity – would we be causing more harm than good? Comparatively, is the harm from modifying our algorithm more harmful than the harm of lack of inclusivity?

My quick back of the envelope calculations indicate yes. But I’m tired so please do your own.

Obviously the example above is just a model. It’s not meant to be an analogy of the real world, but rather, it guides us to think about how we deal with these things in the real world. Our real world definition of harm is a lot more subtle than that – bankruptcy may not be as bad as a lifetime full of pain, but I would definitely consider it under “harmful” as well – and real life morality is also more heavily coupled with profits (i.e. it may be moral to do certain things because it’d be profitable to do so), so it’s a bit more difficult to extricate pure moral intentions in Carina’s case.

Following that thought experiment, you will quickly realize that the best way to fix this issue would be to fix it by addressing the real life inequity – the proportion of male asset owners are far less than the proportion of female asset owners. A far superior solution to modifying the algorithm. A far more complex one too, no doubt.

Either way, it should be food for thought about modifying algorithms in these sorts of scenarios. And that’s only a short term effect. What about long term effects?


A related point (and forms the pun of this blog’s title) is that the relationship between machines learning and humans responding and machines learning from humans responses form a somewhat tight feedback loop. And we know what happens when things get into feedback loops, and are extremely sensitive to initial conditions – there is an entire branch of math dedicated to it: Chaos Theory!

I had also wanted to go down this path, to see where it leads the argument, but I realize I don’t have any models in mind that could model our relationships with machines, and it was getting late, so I abandoned it for time being.

The general gist of the idea is that we are not able to predict the long term effects because we don’t know the starting condition well enough. Modifying the algorithms could have really really really weird results in the long run.

This of course doesn’t mean that we should be paralyzed by our inability to predict the future and not take steps. I’m just pointing out that it may be difficult to figure out what’s happening in the long run[9].

AI! Teh Horr0rs!

The final point I want to make was one of my original points that I wrote to Betsy:

I will admit that the response wasn’t well thought out. In fact the whole line of reasoning wasn’t well thought out, and I was indeed caught up by the moment. The general gist of it is something like this:

Machines that figure out biases on their own would be superhuman – in a very literal sense. Human beings have problems enough dealing with their own biases. If machines can figure out biases of humans, that would make them more human than human. That would be the danger point.

Of course, we’re no where near that scenario, so we wouldn’t have to worry about it. Any form of debiasing would be inputs from humans for the time being, and that’s … just imparting a known set of human morality into machines, which we will then force upon the world who may or may not have a different set of ethics from what us. Totally not a problem at all.

Like I said, the AI angle of this is poorly thought out.


Throughout this whole post, it may appear that I am ragging on Carina’s talk. Au contraire, I’m actually supportive of her ideas of being more empathetic and sensitive developers. There was a very good talk she made, called Schemas for the Real World (thanks to Caleb Hattingh who pointed me to that video), which shows the depth of what Carina talks about.

I merely disagree with two very specific parts of her talk – specifically on how to deal with it. My opinions are that these inequities should not be handled at the software/reflection layer. It should be handled at the basic level: real life. We should really fix inequities from reality, and let the mirrors (i.e. machines) show us what we really are.

Towards the end of her talk, she did somewhat echo the sentiments I have above: After auditing your algorithms, if you find that it does indeed cause inequity, what do you do about it? In the sections above, I laid out a model of how to think about modifying algorithms to handle such issues. I didn’t give an answer on what we should do about it. Neither did Carina. I guess this is one of those Hard Things.

The final takeaway I really really really agree with Carina from her keynote was that we have to have a diverse way of anticipating how things would fuck up. This cannot be stressed more.

TL;DR – I liked the Sunday Keynote for PyConAU2015. I disagree with the speaker in 2 out of her 10 or so points. I wrote a 3000 word rambling essay on those two examples, and why changing the software is worse than changing people. Lastly I agree with the rest of her steps to reducing these sorts of issues.

  1. [1] not necessarily the key points she was trying to communicate – it could just be I have shitty comprehension, hence rendering this entire blogpost moot
  2. [2] This NPR article seems to be the closest reference I have, which by the way is fascinating as hell.
  3. [3] which in my opinion is a fairly injust problem on its own but is completely scope for this blog post
  4. [4] This back-of-the-envelope calculation could very well be wrong though
  5. [5] Side note: /r/scholar is a good place to ask for research papers and books you cannot afford
  6. [6] Yes, I am aware that intersex peoples exist, and that intersex bodies are higly varied on their own, ranging from ambiguous genitals to multiple genitals, but for the purposes of this thought experiment, we cannot be as inclusive for brevity’s sake.
  7. [7] The reason doesn’t have to be known, and could or could not be correlated with the fact that it is due to having male genitals – it doesn’t matter
  8. [8] Interestingly such a strong predictor of credit default kinda already does exist in real life. If you defaulted once in the past, you are more likely to default in the future
  9. [9] By now you should have realized that I’m quite risk-averse, and this is that bias speaking
]]> 0
Operator Overloading With Right Associativity In Python Thu, 23 Jul 2015 01:03:33 +0000 Continue reading ]]> It’s actually quite fun that after years of using something, you still find a new way to do something. So at the last Sydney Python meet up, there were showings of how Python interfaces objects.

Consider this for example:

class Blah(object):
    ''' skipping the __init__ and stuff '''
    def __add__(self, other):
        # skips checks and stuff
        return self.value + other

>>> b = Blah(2)
>>> b + 2 

However, it was pointed out by my friend Julian, that the other way wouldn’t work – that operator overloading was only left associative:

>>> b = Blah(2)
>>> 2 + b

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: unsupported operand type(s) for +: 'int' and 'Blah'

Last night as I was preparing my slides and code for my PyConAU talk, I accidentally found this. More specifically, I found out about the __radd__, __rmul__ etc methods.

So, if you implement both __add__ and __radd__ interface methods, you can have right associativity:

class Blah(object):
    ''' skipping the __init__ and others '''
    def __add__(self, other):
        # skips checks and stuff
        return self.value + other
    def __radd__(self, other):
        return self.__add__(other)

>>> b = Blah(2)
>>> b + 2 
>>> 2 + b


Here’s Julian’s proof of concept to show that ambiguities don’t matter:

class Multiplier(object):
    def __init__(self, description):
        self.description = description
    def __mul__(self, b):
        print ("__mul__ was called on {0}".format(self.description))
    def __rmul__(self, b):
        print ("__rmul__ was called on {0}".format(self.description))
    def __int__(self):
        return 43
a = Multiplier("a")
b = Multiplier("b")
# Confirm Chew's finding still works.

# Which gets priority in this ambiguous situation? Turns out __mul__ does.

# But, we can force it.

So, there you go… kinda cool, eh?

]]> 0
Writing… Again Mon, 20 Jul 2015 07:41:24 +0000 This blog has been awfully silent the past year. I guess now that my job has been made redundant, I’m going to return to writing more.

Hah! Here’s to hoping!

]]> 0
Designing SquatCoach Tue, 07 Jul 2015 04:06:23 +0000 Continue reading ]]> A few months ago, I blogged about my frustrations with logarithmic progressions with weightlifting. I highly enjoy linear progressions – who doesn’t enjoy work that is easy? But I was wrong about one thing: I hadn’t hit the logarithmic progression part. In fact as at the time of writing of this blog post, I am still firmly in the linear progression phase.

So what went wrong? The answer is form. I was basically squatting with exceedingly poor form. I was using all kinds of stabilizer muscles in an unbalanced way that left me injured often. I took notes and noticed that it was at around 55 to 60kg that I kept getting injured about and hence the weights I squatted lingered around there. There is an old saying goes: “Practice Makes Perfect”. That is wrong. The phrase that should really be passed around is “Perfect Practice Makes Perfect”.

The breakthrough came when I got got my partner to record me squatting for the first time. I had religiously read /r/fitness and /r/formcheck, so I had a fairly good idea of what good form is. I thought I had good form – I didn’t. One of the first things I noticed was that I wasn’t squatting anywhere near deep enough, despite the fact that I had all along thought that I was doing an ass-to-grass squat.

After years spending seated in front of the computer, I had no spatial awareness of how deep I was squatting. I had to learn what a deep squat was (learning the flexibility to do that is a tale on its own). I taught her how to check for correct form: the hip crease must go lower than the top of the kneecap to be counted as a good squat. And so she began to spot me. But this wasn’t fair for her as it was eating into her training time. So after a couple of sessions, I went about developing an app that used computer vision to determine if I was squatting with good form.

The thing about computer vision is while it’s easy to start, accuracy is a Difficult goal with a capital D. One indeed can spend a lot of effort to boost the accuracy a very miniscule amount. I cut down a lot of that by using various hacks like coloured sticker dots on the hip crease, knee and barbell tip to increase the accuracy of the app. By and large, I got it working, for me. But it wasn’t working for my partner, or a colleague who had begun to be interested about the app (he had separately approached me about the feasibility of an idea similar to SmartSpot, whose idea I love). The killing blow, I think was that I had irritated some fellow gym-goers by my wrapping of a gorillapod around their racks or bars in order to set up a static filming point.

And so it transpired I would need a new app. The app would have to do these things in order to teach me to have a better squat form:

  • Monitor my form as I squat
  • Inform me when I have hit a good form
  • Only one person involved – no interfering with anyone else in the gym

Introducing SquatCoach

After a bit of imaginative thinking, I wrote the prototype of what is released today as SquatCoach (Android | iOS). The idea was simple – strap a device full of sensors (read: any modern phone) to the legs, and use the sensors to detect whether I had good form or not. It was kinda obvious.

At first I wrote the app for myself. The app did one thing very simple – play a sound when I had hit good depth. Gradually, the app got a bit more complex – I started tracking if my knees were caving in, and whether or not I was putting excess force on the knees (easily tracked by checking if the knees extend forwards on the downward motion).

About a week after I built my first prototype app, I showed it to a few friends. I still recall the first question one of my friends asked after I told him about the app and how I am using it to improve my form: “Can I play music while using the app?”. The more I asked around, the more it transpired that that the idea of having an app that teaches you how to squat properly was a popular one. And so I decided to commercialize the idea.

From Personal To Production

The first version of the app looked something like this (which I used for myself):

Squat Coach v1

There was no UI, save for a button. I had tinkered around with a few other ways the app could work – I’m really a fan of UI-less applications. One version I had was just a background service that played a sound when the correct squat depth was reached. That wouldn’t fly. I went around asking if people would use the app, and after showing them, they’d ask, “where’s the app?”

I had to have a UI. At around the same time, I had a gym acquaintence who gave me this tip: keep your head neutral when squatting – pick a spot, and stare at it for the duration of the squat. The gym I train at has a mirror behind of the power rack, so you could watch yourself squatting. I put two and two together, and decided to add a feature to count reps as you squat. This gave the app more “meat”. Alongside that I updated the app to vibrate instead of play a sound (so people who use their phones to play music while they work out can use it without distraction).

An app that just vibrates when a good depth is reached isn’t of much value. No. The app needs to actually track good form. And so we did some research and collected data. The app collected data from day 1. I mainly used the data to plot my form. But what is good form, really? There are a few videos on the internet explaining what good form is. I personally wasn’t able to reliably replicate those good form as I had at that point in time, yet to be flexible enough, so I had to enlist outside help.

I enlisted the help of several other more experienced squatters (who all squat with perfect form), and recorded their data. I aggregated the data from the 7 people I had initially testing, and formed the “model” of what a good squat is. It is this model which the squat data are compared and scored against. The set up for the gathering of data itself was a challenge and worthy of another story, perhaps another day. Currently the model is “baked in” as 20 or so constants and a few functions. Future versions will contain a on-line machine learning system. Our team is still debating if we should have a upload data-to-server component.

The most important lesson learned from the process of gathering data was that both iOS and Android versions of the app had to be developed. I had discovered that many people were disdainful that the app would be exclusive to Android (as I am a daily Android user). So at Pressyo we serially-parallel developed both Android and iOS versions.

Designing the SquatCoach App

SquatCoach was a learning experience for us – while we had written many production quality web apps before, a production quality Android and iOS app was new for us (our experience with mobile apps had been mostly personal and internal apps, where polish wasn’t as important). We ran into a few interesting design challenges along the way.

Interaction Design

We designed this app to be as simple as possible. The idea was that you’re just supposed to strap the phone to your legs, and squat. The app is supposed to figure out the rest. This way, you can focus on performing the squat without having to worry or constantly check the app to see what is happening.

We had initially collected data on all sorts of different kinds of squats (well, 3 – back squats, front squats, and bodyweight squats). We discovered that having to specify which exercise you would be doing was involving too many steps for the user. Later on, we differentiated the back squats into high-bar and low-bar squats, which added to the user confusion.

It’s often easy to get carried away by adding features when writing programs. Over the course of iterating on this app, we went overboard with features. One feature we added was social media sharing –

It was buggy, as it was supposed to screenshot the screen, and then crop to an appropriate size, then shared on Twitter. Granted, it was a bad idea to do it that way, and there are better ways of doing it. But a lot of time was spent dicking around with things that aren’t related to the core duties of the app: teach the users how to squat.

Some other features that we started to implement, before they were ultimately canned were cloud based storage of historical records (useful, but not really useful in the context of learning how to squat, other than to see progress) and tracking various other activities like benchpress. The project was becoming runaway.

Eventually we scaled back down and focused on the core thing: tracking squat forms. We doubled down on figuring out better algorithms for detecting good squat forms. We figured that reinforcement learning would be the best way to teach squat form. The vibration feedback when squatting was one way, but there would have to be a repeated reinforcement at the end of a set, to remind the user that he/she did well, and showed where to improve.

Score Design

The app itself tracks a lot of other things to determine if a squat form is good (limited of course, to what it is able to detect). We needed a way to display that information without cluttering the screen.

The first few ideas we had were to do with scoring. It was easy, and something we have a lot of experience doing: take a bunch of numbers, and merge them down to a singular score. But one number is not informative enough. Eventually we settled for what is in the current version of SquatCoach

The display of the score was inspired by the periodic table. The elements of the periodic table are frequently displayed as letters in a square box with numbers around them. Here’s an example of a chemical I cannot live without:


The periodic table box is a good way to show quite a bit of information in a small area without having to resort to a table. It’s information visualization by convention. By convention, every periodic table box would have the element symbol, the atomic mass and atomic number listed.

The beauty of the periodic table box is that depending on advancedness, you get more information per box. I’ve seen periodic tables where the valence shell numbers are shown, while some other periodic tables might have first ionization energies listed.

The downside to a visualization like this is while it can pack quite a bit of information in a small space, it’s also quite contextless if you don’t know what they numbers stand for, and hence a learning curve is required. This will be addressed in upcoming versions of the app.

We debated using the Wilks Score instead of weight ratio, but that required us asking for more information about the user than what our test group said they were comfortable giving up – namely their gender. It also interfered with our idea that it should ask for as little information from the user as possible (both from a privacy perspective and from a usability perspective).


By design, SquatCoach is exceedingly simple to use. However, in order to calculate the score, we had to have some information from the user that we wouldn’t have had prior: the weight the user was squatting and the user’s body weight. This required a form for data entry – this is simple on a web page, but an app has a lot more subtleties:

  • Do we put everything in a table?
  • Do we restrict input?
  • If we do, how is input entered? On a keypad? On a multi-part selector?
  • Can the field be seen once the onscreen keyboard pops up?

On iOS, restricting the keyboard to the decimal one came with its own interesting problem – the decimal keyboard does not have a dismiss or done button. Yep, that done button we’ve grown so accustomed to seeing is actually a customisation – not default. It made sense now why a lot of applications didn’t update to resize to the iPhone 6 and 6 Plus sizes. We chose the simplest solution – instead of having a fixed Done button above the keyboard, we had it dismiss when you tapped anywhere except the keyboard.

On Android, we didn’t have the above problems, however, after a few attempts at trying have a uniform interface between both platforms, we decided that it would be wisest to follow each platform’s conventions. The result was that the user interface for each platform is slightly different. I will have to say though, the iOS form does indeed look better than the Android version. Upcoming versions aim to fix that.

For example, in the Android version, there is a big visible button to start and stop squats. This was a problem for the iOS version as buttons on iOS are small! In iOS 7 onwards, they only encompass the words on screen, unless you customise it. We grappled with this for while, since we didn’t want to have gigantic text just to have a larger touch target. In the end we chose to make the button a little larger on Weight screen; and we allowed the entire screen to press the button for us on the Squats screen. (After all, that’s when you want to be fumbling less with your phone.)

Interesting Things Learned

We had an iPhone 6 Plus user, so we were very keen on ensuring that our app scaled to all sizes. Storyboards have proven to be an amazing tool for ensuring everything was in the intended place. There is a slight learning curve – like realising that constraints weren’t automatically added for you, so we had some amusing moments trying to figure out why elements decided it belonged in on iPad-sized screen while running on an iPhone 5S.

A slightly interesting thing that we learned was that managing the parallel development of two app on different was easier than expected. It was also a lot more serial and linear than expected. Often, a feature would be developed on one platform first, and then the work would have to slow down until the other platform catches up to feature parity before we would continue.

Perhaps the most interesting thing about this exercise, is that I couldn’t find any app on either app stores that did what I wanted – an app that teaches you how to squat. The apps that are out there are variations of workout tracking. I seriously wonder why. This could either mean that this is such a terrible idea that this project is going to fail, or this is new ground that is being explored, opening a path for people to copy.

Another interesting thing that we learned was that app store deployments took a lot more paperwork than we expected. And way many more questionaires than ever imagined.

Lastly, I was quite amused to find out how hacker-y my test group was. A good number of people in the test group couldn’t wear a phone band around their thighs (they were too developed), so they put the phones in their shorts (and the app still worked!). One guy even extended an existing arm band to make it longer (though to be fair, he is quite handy with a sewing machine, being a cosplay veteran) so that his thigh could fit. This also led to us focusing more on making the app work when the phone is in the pocket. Upcoming versions aim to fix a few issues on that end.

The experience of writing this app has been an interesting one. While we’ve written apps before, we’ve never written one that was meant to be out in an app store. For that, my experience has been to cut, cut and cut. Most features aren’t necessary, and I felt that we wasted a lot of time trying to add features that weren’t necessary. On the other hand, I’m quite glad for just launching the app. I think you should check it out.

]]> 0
The Bane of Communicating Succinctly Thu, 23 Apr 2015 11:46:55 +0000 Continue reading ]]> You may have noticed I have not blogged for a while. And if you do follow me on twitter you’ll note that my tweet rate has also dropped.

Ever increasingly, I find the need to share some ideas, but the ideas cannot be succinctly communicated in a pithy sentence or two. I have a lot of what I consider to be “dangerous” ideas (in the vein of the Festival of Dangerous Ideas), and I think it is imperative to be clear about the ideas.

And so I would sit down and write a blog post about it, only for it to derail into some mega long essay that is at best reads like mindless rambling (for example, see my previous post). It is in these cases that I sometimes feel I’m better off not writing. But sometimes I get passionate about a topic, and start writing a lot

Then midway through, I’d lose steam. Here are the example of titles that I have in my archive that went nowhere:

  • Why Do Ceramics Heat Up in Microwave Ovens (3997 words and I lost steam) – this article began life exactly a year ago today
  • Making Friends – A Rant (1301 words, and still incomplete, as I’m still gathering data, though I’m quite sure I’ll lose steam on that too)
  • Scrambled Eggs, The Guide (1514 words, lost steam already)
  • Logarithmic (A musing on non linear progression of things)
  • Track (A musing on being on track for a plan, and why sometimes it’s ok to let go)
  • Graveyard of Sideprojects (originally written when there was a craze over having side projects. I have 200+ side projects that I have not touched for years)
  • The Virtuous Molecule (a blog post about the fallacies of natural products)
  • Reviews: I have 3 book reviews, 2 movie reviews in my drafts, and they are nowhere

I have since concluded that it’s the length that makes me lose steam.

Yesterday I read Evan Miller’s Four Days of Go. The takeaway is that I wish I could write like him. I actually felt envious that he was able to get his point across straight, and still not be dry.

I have a problem with communicating succinctly. I look at all the work emails that I sent out – most explanation type emails have graphs, definitions and all sorts of background things. Even when I highlight the key takeaway points, written in normal English, sometimes they are missed.

…[P]erfection is attained not when there is nothing more to add, but when there is nothing more to remove

So says the oft-quoted Antoine de Saint Exupery.

The problem is I don’t know what to take away. I don’t know what to remove. The typical advice of how to improve writing is to “write more”. I’ve written this blog for more than 10 years in one form or another. I actually need to know how to improve, not just write more.


]]> 1
Just Fair Tue, 10 Mar 2015 17:50:25 +0000 Continue reading ]]> Preamble: I have not blogged in a while. I have quite a few things to say, and have started at least 7 blog posts but never found the steam to complete them. Last Friday, I was having a rather interesting conversation with my colleagues, and that was cut short by a prior dinner arrangement. Having left the conversation topic unended, I decided that it’s a good point to jump off and continue blogging.

I lean slightly left towards Marxism, and I made it clear what it is that appeals to me. What appeals to me about Marxism is that it is most sci-fi in nature. “From each according to his ability, to each according to his need” is probably one of the most Star Trek-esque thing you can say. Indeed, I dream about a future where society functions like this, and I am actively working towards making such a change in society.

Of course there are other bits of Marxism that are I think outdated – the concept of class warfare, and proleteriats needing to seize control of Das Kapital[1] is in my opinion, a very 19th century view. I do however, note a similarity between today’s society and the society that Marx lived in, one on the verge of a technological revolution[2]. Just as I note that the philosophy of Marx’s time was that people find meaning of life through work, we are similarly in a period where the same has happened. Think of how you would introduce yourself to other people – it’s your name, followed by what you do.

And there we were, seated at the table. Me, J and P were discussing my Marxist leanings. J posited a very interesting question, which I have paraphrased to omit the amount of obscenities that are wont to come about around groups of male humans speaking:

Imagine if there were two students, A and B. A is super hardworking, and does all the work during the semester. A even does extra work to understand the subject deeply. B on the other hand is a party animal, preferring to skip classes and not study, and would rather spend his time partying.

Then comes exam time. Obviously A does better than B. But here’s the twist. The lecturer for whatever reason, approaches A and proposes that A averages out his grade with B.

If you were A would you do that?

This is obviously a variant of the legendary socialism classroom experiment story that has been floating around the Internet for some time now.

Putting aside obvious flaws with the analogy (which I will expand on in the later part of this post), I answered with an affirmative – that I would indeed be okay to average out the grades.

After cries of indignation and disbelief, and logical reasonings why my decision is abnormal, I laid out an alternative view:

Imagine if there were two students, A and B. Both A and B are extremely interested in the subject, and are motivated enough to study hard for the knowledge.

B however, has a string of rotten luck and poor health, which causes her to skip classes and study time as B has to go to the hospital quite a bit.

Come exam time, and the teacher makes the same offer. Would you average out your grades if you were A?

Both scenarios are the same, mechanically speaking, even if the motivations are different. Both J and P said that they would at least think about averaging the grades. Incidentally, during the dinner I adjourned to, I posited the both situations and the guests at the dinner too had similar responses as J and P – that it would not be fair (and that they would not agree to average grades in the first scenario and that they would consider in the second scenario).

This harks to a concepts of justice, equity and fairness – concepts not commonly seen in academic economic literature[3].


But to further discuss this issue, let’s first define the concepts of justice, equity and fairness in the context that I wish to discuss. The main context is that the concepts are to be discussed in terms of distribution of resources, which economics is mainly concerned about.

Justice is the idea that the distribution of resources is done in such a way that the receiver deserves the goods received. This makes it an extremely prickly subject, because justice means different things to different people, and it comprises a wide range of ethics and philosophies, as I shall try to paint in broad strokes here. To religious nutbags, the words of the deity in question is just. If the deity says to allocate resources in a certain manner, it is just. Anything else is unjust and blasphemous. To capitalists[4], it’s only just that people get what they work for. Any other way of gaining resources is unjust. To legal theorists like John Rawls, justice is a form of fairness[5].

Equity and equality refers to the idea that all participants within a transaction should be treated the same (or similarly), without a party being unfairly preferenced. Of the three concepts, this is the simplest concept. There is a subtle difference in the terms equality and equity. In recent years, equality has taken to mean equality of opportunity – it means that all people participating within a certain social situation should be afforded the same opportunity. For the sake of this article, we’ll call this equity. Equality is concerned with the distribution of resource, not the distribution of opportunity[6]. If everyone had an equal share of the goods, then equality is achieved. Note that an equal share does not mean that it is fair, and by extension, just.

Fairness is a concept that is related to both the concepts above. Usually when discussing economic concepts, fairness is not well defined. It’s quite hard indeed to define fairness. For the sake of simplicity, let’s say that fairness is defined as the prospensity to avert inequity and maximize justice. And I believe that this is innate in all humans (and chimps too). Given a game like the Dictator Game, it’s been shown in multiple experiments, across all cultures, that people playing the role of the dictator tend to split the earnings evenly (I believe a 50-50 split is the most common split). It’s still one of the most interesting game theory experiments to be run on the ideas of altruism and reciprocity and fairness.


Fairness, in my opinion, is a complex value[7] that has components in justice and equality. Obviously it has components in other values like reciprocity, compassion and debtedness, but for the sake of simplicity in this article, we’ll just say it’s comprised of justice and equality. We spend a lot of time thinking things in life are either fair or unfair, creating the red line and the dotted borders above. However, if we think depeer, there are other quadrants which can be filled, as I will explain in an example below.

Coincidentally, in the field of psychology, fairness can also be defined in three components – sameness, deservedness and need. The definitions of which sound pretty close to the three concepts I have defined above. However, since I have no expertise in psychology, I shall say no further.

Back to last Friday. What J and P and my dinner guests felt for the first situation was that it was unjust to average the grades in the first scenario above, while it’s at least just to average the grades in the second scenario. It is equal, for sure, but it’s clear that most people feel that in Situation 1, the averaging of grades would be unjust (Student A does NOT deserve it), but it is unarguable that it is equal. Likewise with Situation 2, where it’s somewhat just (student A may deserve it), and equal.

The talk of justice opens up all sorts of cans of worms. But the one I want to talk about is the arbitrator problem.

The Arbitrator Problem

The arbitrator problem can be illustrated with a third scenario:

A, B and C are students in a class. A is the aforementioned party dude in the class in Scenario 1. B is the aforementioned hardworking one in both Scenarios. And C is the aforementioned ill-lucked one.

You play the role of the teacher. Do you average marks between A and B? or B and C? or C and A? Or between all three?

In this scenario, the teacher/lecturer is playing the judge – the arbitrator who will solve the problem of justice and equality. The problem is then the system is reliant upon the arbitrator to make good judgment calls. You’d be lucky if in a situation the arbiter is Superman. But even Superman can fail. In a very clear cut scenario like the one above, it’s easy to make good judgment calls. If A is the nephew of the teacher, and the teacher decides to average the grades for the benefit of A, we can clearly see that it’s unjust. We call this corruption in real life. Real life isn’t clear cut like the above. Real life is murky, with twists and turns.

Which brings me to the main objection of the whole analogy to begin with. There exists an arbitrator in all of the scenarios above. It’s clearly meant to be the analogue of the government (or dictator), where the students are the citizenry of the country. The scenarios above are very shallow analogues of central governments (the teachers) distributing resources (grades) to the people (students). Except, reality doesn’t work like that. There has been experiments on central government directly distributing resources, but none has worked well so far. The problem with this analogy is that it ignores the fact that the citizenry interact amongst themselves.

Of course there will come a day when the hypothetical optimal distributor viable – say a super computer who knows everyone’s indifference and utility curves over a basket of every possible item – I’d be willing to let such a computer distrubute resources and forego the marketplace dynamics if such a supercomputer were possible. But that’s a story for another day, probably one involving star dates.


Let’s say we accept that the only form of information transfer is between the teacher and each student. How would a situation like the scenarios be represented as a game?

It turns out there is indeed a game theory experiment that is similar to the scenarios above – the Public Goods Game. The main difference between the scenarios above (let’s call it the Averaged Grades Game) and the Public Goods Game is that the Averaged Grades Game requires work as an input and some other form of utility as a reward.

This is a important distinction. Firstly, I believe the concept of justice can only be measured if the player has to work (i.e. put through some minor discomfort) for it, and some form of equality has to be factored in as well. The usual design tricks of giving participants some initializing tokens that can be exchanged for money does not fully capture the injustice that is felt if real work were done. A variant of the Public Goods Game where the initial play money is “earned” (or with different levels of “income”) approximates this. Having work unit fuzzies the connection between how much is invested and how much is returned.

Secondly, the Averaged Grades Game has no multiplier (or, has a multiplier set to 1). Again, this is a distinct difference. It’s been shown that without a multiplier, the Public Goods Game doesn’t work. I believe it will work for the Averaged Grades Game, because information is leaked to the player through the averaged grade, as opposed to told to the player.

Thirdly, the rewards the participants get is solely through the “exam results”. In the Public Goods Game, the rewards the players get is the sum of what they kept for themselves, and the the multiplied returns from the public pool.

Lastly the analogue for the Public Goods Game is a public good, which may or may not feature in a participant’s preference set. Whereas in analogue for the Averaged Grades Game is the results of an exam, of which preference for is a strictly increasing monotonic function[8].


Here’s a more formal description of the Averaged Grades Game:

Players are given a task to solve puzzles, where a puzzle is a work unit. The more puzzles players can solve, the more reward (a monetary unit) the player earns. However, at the end of the game, all the players’ rewards are pooled together, and the averaged reward is returned to the player. The player is told at the end of the round, how much he/she has earned, and what he/she will be getting.

Variations can include a surprise rounds of not averaging the results, announced only after the round has ended. The amount of leaked information can also be used to test for justice. For example, at the end of every round, the proportion of students who score more than 50% is broadcast (this will make those who have done extra work feel “vengeful” and “punish” those who haven’t pulled their weight).


Depending on how the game is framed, there may be two Nash equilibria. The first Nash equilibrium of this game is the same as the Public Goods Game – zero work units will be expended. Not contributing any work is the only move a player can make in which other players’ decisions will not make the player worse off. However, just like real life experiments with the Public Goods Game, I expect reality to not converge towards 0. Instead, I believe it will converge on a level that is just above 0.

The second Nash equilibrium is the maximum work units being produced. This would happen if the game were expressed as a modified Public Goods Game where the players only get rewarded after contributing to the pool, and don’t get to keep their initial tokens. I don’t see this happening in real life situations either. People tend to punish, and feel guilt, especially with successive rounds.


Given that I don’t have access to a behavioural economics lab, I decided I would replace human participants with some silicon ones, so I wrote some simulations[9], simulating a few things:

  • Each student has some baseline “intelligence”[10] that does not change.
  • Each student puts in work. The amount of work put in is a function of the intrinsic motivation of the student[11].
  • Results are a function of both intelligence and work
  • After each round, the student receives both a score, and an averaged score
  • The student chooses to make adjustments on the work put in for his/her next exam
  • The adjustment is subject to two other factors: motivation, and locus of identity of the student

The key metric we’d be looking at is the trend of the averaged scores over time, as well as the effort put in by each student over time. Here are the results of a hundred runs. Here’s the code in 200 lines of Python (skip to results and discussions):


And here’s the result:

100 runs of Average Results over time, overlaid with each other

100 runs of Average Results over time, overlaid with each other

Here we note that as expected the average result does indeed drop after an initial spike. This is how it looks like when it’s smoothed and averaged across all the runs:

100 runs smoothed

We’d also have to consider the distribution of scores for each time period. To do that, I have plotted a box plot of the scores of a particular run (Run 47).

run 47 boxplot scores


The simulation is obviously a very very very simple simulation. If you hadn’t understood the code, here’s a general gist of what it does: The study period is drawn from a normal distribution with a mean and standard deviation that is the same for everyone in the first period. When the second period starts, the motivations of the students affect the mean of which the study period is drawn from. The locus of identity of the student affects the standard deviation from which the study period is drawn.

The exam scores are a linear function of the study period and intelligence (of which is only a small component), and noise drawn from a normal distribution with constant mean and standard deviation across all time periods.

It should hence be clear why the average scores converge around the mean of the initial study period. This is a reasonable assumption. Nobody goes to university with a goal of not putting a single ounce of effort into studying.

Another part of the code that I don’t particularly like is the amount of clamp() that is used. This was necessary to clamp down on extreme values, but it may turn out that I don’t quite have a feel for drawing from normal distributions, so instead of tweaking the starting mean and standard deviations, I decided to just cheat by using clamp().

If anyone is willing to make a better program, please, do by all means fork the code and write me your results. I am keen to know.

On Socialism and Equality and Equity

I had mentioned briefly that I would say yes to the first scenario – that I would be willing to average my grades with the scumbag who doesn’t study. Furthermore, I too mentioned that the classroom example is a poor analogue for a central authority distributing resources to the public. Saying this appears to make me go from Oliver Queen to Lonnie Machin.

The issue is one of semantics. I’m not saying the government has no role in distributing resources, but rather, I think the government has a role as a player in the same game. In fact, I would say the government is an entity that it charged with the moral role of ensuring equity amongst other players by using the rules of the game.

What this means is that things like affirmative actions, specifically quota-based ones, are in my opinion, not good, while things like universal living wage is a good thing. The difference between them is universal minimum wage plays by the rules of the game – transactions, etc – while things like affirmative action justly creates equal opportunity by making providing unequal distribution.

Consider a scenario where a university has places for 100 students. Let’s say the university has a 20% quota for peoples of [your choice of disadvantaged demographic]. It just happens that there are 120 students who are equally qualified to enter. Out of that 120 students, 20 of them are from [your choice of disadvantaged demographic] families, so they’re instantly accepted. This means that only 80% of the others would be accepted. While the intention is noble and just (I fully agree with equal opportunity), it is not difficult to reframe the statistics in such a way that it’s unfair. Observe:

100% of students of [insert disadvantaged demographic] gets accepted to the university, while 80% of the students of higher socioeconomics status gets accepted. Does that mean that if I’m a [insert disadvantaged demographic] student, I can still get into this university without good grades?

I grew up in a country where racial quota plays a role in everything, from education to buying houses. I can tell you from first hand experience it is not fun. Even when you logically convince yourself that it’s for the betterment of society, one can’t help but feel that there is some level of unfairness at play. Obviously there is no just basis for having racial quotas, but even if the quotas were based on socioeconomic status (which I feel is most just), I’m quite sure that the feelings will remain.

And yes, I am aware in the the reframing above, I am also conflating entrances from quota with entrances from ability. It doesn’t mean that if a person is accepted as part of a disadvantaged group, the person does not have the ability to be accepted if they’re not part of the disadvantaged group.

Quota-based affirmative action almost always guarantees that a proportion of people of higher skills will not be accepted. That’s rather the whole point of quota-based affirmative actions: There exists a group that is disadvantaged in some way, and do not have the same opportunity as someone who is not in the disadvantaged group. Chances are high that the main opportunity gap is one of skill. That is to say the average skill of the disadvantaged group that is lower than the non-disadvantaged group, mainly due to pre-existing life conditions. If both groups are of the same average skill, then the opportunity gap must lie elsewhere. If the opportunity gap lies elsewhere, a quota based system will not help.

Another issue is that the disadvantaged demographic definition is subject to the arbitrator problem. Someone has to decide that X, Y, Z features are features of the disadvantaged demographic. There is a distinction of feature quality too. For example, socialeconomic status is a much better feature than say, race or ethnicity. One is easily defined and verified, the other is too fuzzy – everyone knows about the ridiculous one drop rules, for example.

This in part is why I am more in favour of softer affirmative action (such as targeted advertising[12]). Or even things like universal living wage all sound like fairly good ideas to me. These are actions in which an arbitrator does not have absolute power. Yes, I am aware that the free market is flawed, and needs to be somewhat reigned in (by said arbitrator/government/state), but the power is not as absolute as in enforcing quotas. I have digressed too much though. I may actually spend another blog post writing about that.

And Yet…

And yet, if you recall earlier in the blog post, I said I would be willing to average the grades with the scumbag who doesn’t study. Considering it as a solely one-off event, I would likely say no. However, when the situation was posited to me, I instantly started thinking about other variations in which the ill-scoring student was subject to other conditions[13]. It is from this thinking that a sense of fairness rooted in higher weightage of equality which prompted me to say yes.

Further Work

This is just a brain dump that arose from Friday Night Drinks at work. However I have a feeling that this is actually a good econs experiment that can be carried out. If anyone would like to carry it out, please do, and also keep me informed. I’d be delighted to read your papers[14].


Congratulations for making it so far through my ramblings. I’m not sure what you gained from it. My views on political systems is abysmal and pessimistic at best, so please don’t follow what I say.

I posited a situation, based on a popular story online. I then defined the concepts of justice, equality and fairness. From there, I digressed slightly to a problem with using a classroom setting as an analogy for government, and tried to define the classroom interaction as a game. I pointed out the similarities with a already known game, and then discussed the possible equilibrium states. I then continued with a simple simulation of the game, and discuss the results. I then end with a rant on socialism and equality and equity.

Also, I tried to hide multiple Justice League related puns and/or references. I think I failed in that aspect.

Reading Material

If you’re interested in this topic, here are some reading material on the topic:

  • Gary Bolton and Alex Ockenfels’ A Theory of Equity, Reciprocity and Competition
  • Erns Fehr and Klaus Schmidt’s A Theory of Fairness, Competition, and Cooperation
  • Matthew Rabin’s Introducing Fairness into Game Theory and Economics
  • John Rawls’ A Theory of Justice[15]
  • Amartya Sen’s Collective Choice and Social Welfare
  • Ken Binmore’s Game Theory and Social Contracts
  • Most of Al Roth’s works
  • Most microeconomic literature
  1. [1] by that I mean, means of production. I think this is a very good pun
  2. [2] Das Kapital was published just as the dust of the Industrial Revolution was settling. Its observations of course, were made by Marx DURING what we call now the Industrial Revolution
  3. [3] OK, I lied. There is an entire field of economics dedicated to this topic – it’s just very new – about fewer than 20 years. Matthew Rabin invented a whole new way of looking at utility based on fairness as a preference in the utility curve. Fehr and Schmidt came up with ways to encode guilt and compassion when looking at games. The past 10 years have been pretty much replicating experiments across cultures, with not much new or groundbreaking
  4. [4] meant in the broadest sense of the word, i.e. if you are Ayn Rand
  5. [5] wait, what? circular reference detected
  6. [6] For the purposes of this article, let’s say that opportunity is distinct from being a resource
  7. [7] think of it as a vector/complex number if you will
  8. [8] Think about it, name me a sane person who would prefer lower grades to higher grades
  9. [9] Yes, I am aware that I can use matrices and linalg, but I wrote it this way to be clear about what is happening, but hey, Python 3.4 features are actually useful!
  10. [10] This is a prickly one too. I use “intelligence” as a term as a catchall. Think of it as confounding variables that are encapsulated into one, if you will.
  11. [11] Again, this encodes things that are not directly measurable
  12. [12] Of course bear in mind that my past work with advertising has coloured my view
  13. [13] I blame alcohol
  14. [14] HAH! As if anyone would be inspired by this blog post enough to write a paper…
  15. [15] It’s a book. A very hard and dry book to get through. I didn’t bother finishing it – just skimmed through
]]> 0
Logarithmic Tue, 24 Feb 2015 10:17:29 +0000 Continue reading ]]> I started lifting weights a few months ago after a bit of health awakening. At first, it was a lot of fuckaround. Eventually I got into a program, and a routine. I started seeing progress in my strength, and I kept a record of how much I can lift – I’ve got nice charts to show my strength progressions. It’s not much but I can bench press about 60% of my body mass now. Slowly but surely I’m getting there.

When you are a newbie to lifting weights, there is a phase you go through what is colloquially called ‘n00b gainz’ online. It’s where an untrained/novice lifter will gain strength faster than a trained lifter. In other words, you will see strength increase (as measured by the weight lifted) linearly as a function of time, until a certain point, where you no longer see that increase.

I’ve been riding the n00b gainz wave since I started, until the last couple of weeks, where I have stalled on my squats and benchpress. The weights I can lift no longer increase linearly with time. And this is frustrating.

It’s mostly psychological, really. There is something nice about linearity. It’s easy work – put in X amount of work, get out Y amount of result. Conversely, we can also say that things get “harder” when the results are logarithmic in response to the effort put in – where you have to put in a lot more work for less result each successive time.

It is said that the n00b gainz phase is determined by one’s genetic potential. The logarithmic progression that comes after is hard work. Some people are more genetically gifted in the strength department, and so spend longer time in the n00b gains phase. By the time they get out into the logarithmic progression bit, they are way ahead.

Thinking about this is kinda stressing. But then I think back on the things I did in life so far. Let’s say everything in life with some sort of progression will follow this form: linear until a certain point, then it becomes logarithmic. It can be studying, understanding of mathematics, or weightlifting. We’ll call this the “easy” and the “hard” parts.

All my life I have coasted on the “easy” parts. Exams? Didn’t have to study much for it, because a lot of things were intuitively understood. Startup? Writing the programs were the easy parts. Initial marketing and press handling was the easy part. Then the going gets tough, and I bail, or abandon the project. It would appear that I have ran from logarithmic progressions all my life.

This isn’t a good thing. How would one be able to persevere? I need to be learning that.

]]> 0