Intuitions From The Price Equation

George Price was a rather interesting fellow. A few months ago, I was reading a rather interesting piece about his life from HN. If you follow my blog posts (hello to the two of you), you’ll note that altruism and cooperative games is one of the things I like to blog about.

Following that article, I discovered the Price equation* Funny story. I was quite surprised I hadn’t heard of the Price equation, so I hit the books. I found the equation being referenced very very very very briefly in Martin Nowak’s Evolutionary Dynamics, and that was all . While grokking the equation, it had suddenly occurred to me that kin selection and group selection were indeed the same thing. It was a gut feeling, and I couldn’t prove otherwise.

So what I told you was true... from a certain point of view

I recently had a lot of time on hand* Being laid off does that to you :) , so I thought I’d sit down and try to make sense of my gut feel that kin selection and group selection were in fact the same thing. Bear in mind I’m neither a professional mathematician nor am I a professional biologist. I’m not even an academic and my interest in the Price equation came from an armchair economist/philosopher point of view. And so, while I grasp a lot of concepts, I may actually have understood them wrongly. In fact, just be forewarned that this entire post was a result of me stumbling around.

So, let’s recap what the Price equations look like (per Wikipedia):

$$ \Delta z = \frac{1}{w} cov(w_i, z_i) + \frac{1}{w} E(w_i \Delta z_i)$$

Simply put, $latex \Delta z$ is the difference in phenotype between a parent population and the child population. And that difference is a function of two things:

  1. The covariance of fitness and phenotype – $latex \frac{1}{w} cov(w_i, z_i) $ where $latex w $ is the average fitness of the population, $latex w_i $ is the individual fitness of $latex i $, and $latex z_i $ is the phenotype shared in the group.
  2. The expected value of the fitness of the difference between the group’s phenotype and the parent group’s phenotype.

Deriving Intuition

Let’s make this a little more intuitive. Wiki says that $latex \Delta z$ is the “… the change in average characteristic…” from one generation to the next. We’ll just call this evolution. Now, evolution can be simplified into a relatively simple equation: evolution is the sum product of the selection process and some kind of inheritance errors. This is commonly phrased as “random mutation + natural selection = evolution”.

In fact, let’s start with that idea, and put it in context of the Price Equation. We’ll say that this is true:

$$ Evolution = RandomMutation + NaturalSelection$$

From here, we can start modelling random mutation and natural selection.

Modelling Random Mutation

Let there be a population of individuals in a set, $latex S$. Now, we’ll say that $latex z$ is the phenotype – say eye colour (and we’ll use RGB values to denote a numerical value of their eye colour), and $latex \bar{z}$ is the average eye colour. Now, we randomly partition the set $latex S$ into $latex n$ mutually exclusive sets, $latex S_i$, where $latex i=0$ to $latex n$. We’ll say that $latex \bar{z_i}$ is the average eye colour of the group $latex i$. We’ll also say that the proportion of each subset $latex q_i$ is $latex q_i = \frac{count(S_i)}{count(S)}$. This gives us a second way to derive $latex \bar{z}$:

$$ \bar{z} = \sum q_i \bar{z_i}$$

Here’s a simple example in Python to prove this (which I wrote to ensure I was sane):

S = set([1,2,3,4,5,6,7,8,9,10])
S_1 = set([2,5,6])
S_2 = set([1,3,4,7,8,9,10])
avgZ = sum(list(S))/len(list(S))
avgZ_1 = sum(list(S_1))/len(list(S_1))
avgZ_2 = sum(list(S_2))/len(list(S_2))

>>> 0.3 * avgZ_1 + 0.7 * avgZ_2
>>> avgZ

Ahh.. the joys of floating point math* Originally I had written `MARKDOWN_HASH8fb0f53b0b3bde79c8bdf648f65ecd6bMARKDOWN_HASH` which returned `MARKDOWN_HASHf8320b26d30ab433c5a54546d21f414cMARKDOWN_HASH`. That nearly sent me into madness as I thought my basic math knowledge were broken .

Anyway, let’s imagine the people of population $latex S$ had plenty of sex, and they created a children population, which we shall call $latex S’$. We can also do the same as we did above to the new population. We’ll denote everything with a $latex ‘ $ as being part of the child population – i.e. $latex \bar{z’_i}$ is the average eye colour of the population at a subset $latex i$ .

So from here we’ll say $latex \Delta \bar{z} = \bar{z’} – \bar{z}$ as the change of average eye colour (over time, which is implied). We can also say the same for each subset: $latex \Delta \bar{z_i} = \bar{z’_i} – \bar{z_i}$. Think of this as the change in average eye colour of a particular subset $latex i$. Modelling the difference between two generations would then require us to also look at the proportion of the child subset. Let’s say for subset $latex i$, $latex q’_i = \frac{count(S’_i)}{count(S’)}$

We can then define random mutation over two generations as:

$$ RandomMutation = \sum q’_i \Delta \bar{z_i}$$

The intuition behind this is quite clear – for each subset, the new (read: mutated) average eye colour is the child proportion multiplied by the change in average eye colour.

Modelling Natural Selection

Natural selection is harder to model. We’ve to think about things like fitness (because that is how selection happens). But we can figure out the progress of selection, by simply looking at the changes of the proportion of a population. The change in proportion is easily defined as $latex \Delta q_i = q’_i – q_i$. Applying the change of proportion of a population on a averaged phenotype then simply becomes selection! It can be defined thusly:

$$ Selection = \sum \Delta q_i \bar{z_i}$$

Now think about the intuition behind this equation. Given an average eye colour, we can say an average eye colour is selected for if there is more subsets with that average eye colour.

Modelling Evolution (of a phenotype)

So, putting them together, we’ll replace the parts of this formula:

$$ Evolution = RandomMutation + NaturalSelection$$

and it becomes this:

$$ Evolution = \sum q’_i \Delta \bar{z_i} + \sum \Delta q_i \bar{z_i}$$

And since we’re mainly concerned with the evolution of one particular phenotype, we can say that Evolution is the change in average phenotype:

$$ \Delta \bar{z} = \sum q’_i \Delta \bar{z_i} + \sum \Delta q_i \bar{z_i}$$

Modelling Fitness

So far we’ve managed to define an equation for evolution[ref]And you can actually reach the same conclusion the other way around. If we define evolution, $latex \Delta \bar{z}$ as the change of proportions of phenotypes in a group, it’ll end up being something like this:

$$ \Delta \bar{z} = \sum q’_i z’_i – \sum q_i z_i $$

And through factoring out you will end up with this:

$$ \Delta \bar{z} = \sum (q’_i – q_i) z_i + \sum (z’_i – z_i) q_i$$

Of which $latex (q’_i – q_i)$ and $latex (z’_i – z_i)$ can be rewritten as $latex \Delta q_i$ and $latex \Delta z_i$ respectively, leading to the same equation as above:

$$ \Delta \bar{z} = \sum q’_i \Delta \bar{z_i} + \sum \Delta q_i \bar{z_i}$$

[/ref], but we’ve not talked about fitness at all, where as the Price equation has a term, $latex 2$ that denotes fitness. How would we model fitness?

We first need to understand that fitness is a characteristic of an individual carrying a phenotype – specifically, it refers to the ability to propagate the phenotype. By now, something should have clicked in your head. We can simply say average fitness of a group is the number of descendants over the number of parents and define it as such:

$latex \bar{w} = \frac{count(S’)}{count(S)}$ is for the entire population, while $latex \bar{w_i} = \frac{count(S’_i)}{count(S_i)}$ is for each subset.

To keep in similar notation from the above, I’ve added the bar to denote average fitness.

And suddenly, we’re able to quickly derive Price’s equation by multiplying both sides of the equation with $latex \bar{w}$:

$$ \bar{w} \Delta \bar{z} = \bar{w} \sum q’_i \Delta \bar{z_i} + \bar{w} \sum \Delta q_i \bar{z_i}$$

Because $latex \bar{w}q’_i = q_i \bar{w_i}$* Do the algebra yourself to confirm , we can break down each of component of the equation:

For random mutation:

$latex \bar{w} \sum q’_i \Delta \bar{z_i} = \sum q_i \bar{w_i} \Delta \bar{z_i}$

And since $latex q_i = \frac{count(S_i)}{count(S)}$, the equation $latex q_i \bar{w_i} $ is $latex w_i$ and the whole equation simplifies to become $latex \overline{w_i \Delta z_i}$* I skipped a couple of steps, and you should check whether I’m right. Also I’m using \overline because \bar is not readable, but just assume that it’s a bar over. , which is the expected value of $latex w_i z_i$, usually written as $latex E(w_i z_i)$

For natural selection:

Natural selection is a bit tricky… we’d have to break $latex \Delta q_i$ up:

$$ \bar{w} \sum \Delta q_i z_i = \bar{w} (\sum q’_i z_i – \sum q_i z_i)$$

We can also simply reduce them into this: $latex \overline{w_i z_i} – \bar{w_i} \bar{z_i}$

The above formula is simply the definition of covariance, and is usually written as $latex cov(w_i, z_i)$

Since $latex cov(w_i, z_i)$ is the covariance of fitness and phenotype, the more they covary, the stronger the selection for $latex z_i$. Now that makes a lot of sense!

So far what I have done is write a layman’s explanation of Steven Frank’s derivation of the Price equation, which frankly is quite a lot better than what I have here… so I don’t even know why I wrote the above. Well, I guess when I did it for myself, I got some intuition on how to think about certain things, so there’s that, and I hope that I’m able to convey the intuitions.

Hamilton’s Rule

George Price had a very interesting relationship with his friend, William Hamilton. He created the Price Equation when he was trying to re-derive Hamilton’s work on kin selection, commonly called Hamilton’s Rule. And yet, a few years later, Hamilton himself reworked his rule to be based off Price’s.

I had found 12 Misunderstanding About Kin Selection by Richard Dawkins and Jonathan Birch’s Hamilton’s Rule and its Discontents to be particularly helpful in understanding kin selection.

In particular, the two variants of Hamilton’s rule is a major cause of confusion. Most people, when talking about kin selection, usually talks about the commonly known on (the one in Wikipedia), often written as $latex rB > C$ as the rule, where $latex r$ is the relatedness of the actor to the receipient of an altruistic/spiteful act, $latex B$ is the benefit conferred upon the receipient, and $latex C$ is the reproductive cost of said altruistic act.

However it’s the general version (which we shall use Birch’s notation and refer to it as HRG) that I am interested in. See Birch’s easy-to-follow paper for derivation from the Price Equation:

$$ \Delta_s \bar{z} > 0 \ \ iff \ \ rb-c > 0$$

The difference of course between the second version of Hamilton’s rule is the definition of the relatedness factor $latex r$. In the HRG version, $latex r$ is defined to be $latex \frac{cov(\bar{z_i}, z_j)}{var(z_j)}$ * where j is an individual, and i is the set $latex S\_i$… the working is derivative of Birch’s paper and I’m not bothered to reproduce it here, so refer to that , it’s more actually a statistical tendency that the recipients of the altruistic act are themselves altruistic, rather than a straight out genetic relatedness.

Group Selection

Since 2012, group selection and kin selection have been accepted to be the same bloody thing – mainly due to the works of Grafen, Gardner and Marshall (and others). As such, after researching more into this, it would appear that this blog post is no longer necessary. But for shits and giggles, let’s just continue (because mainly I’ve sunk about 5 hours* Per RecueTime and 100s of revisions typing in stupid equations that I doodled on a piece of paper)* Ladies and Gentlement, I present to you the Sunk Cost Fallacy .

We’ll start with Price’s Equation:

$$ \Delta \bar{z} = \frac{1}{w} cov(w_i, z_i) + \frac{1}{w} E(w_i \Delta z_i)$$

We’ll start by hypothetically looking at the individual level. We can do that by partitioning $latex S$ into subsets that has only an individual, so we can say that $latex S_i$ is $latex j$. This equation simplifies down to:

$$ \Delta \bar{z} = \frac{1}{w} cov(w_j, z_j)$$

$latex w_j$ can be thought of the fitness of individual $latex j$. Think of it as the number of offsprings to make the more abstract idea concrete. You’ll note that the random mutation component has been eliminated. If you read a number of biology papers, they tend to do that, mainly for simplicity’s sake. The intuitive argument is that random mutation of a gene is too small to bother on an individual level, we’re just going to look at the selection based on the phenotypes* Yea this is a little hard to swallow, but roll with it for a while .

Now this is on an individual level. If we were to put this individual into a group, we’ll work out the average random mutation (i.e. the expected value) to be something like $latex E(cov(w_j, z_j))$ * You should double check this, my notes at this point got very messy . So now the Price Equation looks something like this:

$$ \Delta \bar{z} = \frac{1}{w} cov(w_i, z_i) + \frac{1}{w} E(cov(w_j, z_j))$$

Now we can rephrase the Price equation to think of it this way: The “natural selection” component of the equation can also be thought of as selection between groups (the terms are of $latex i$, which stand for groups). And the “random mutation” component can now be thought about as selection within a group (the terms are of $latex j$, which are individuals). So if you think about this in a roundabout way, you’ll get that for $latex \Delta \bar{z}$ to be selected for, it needs to be $latex > 0$. Therefore

$$ \Delta \bar{z} > 0 \ iff \ \frac{1}{w} cov(w_i, z_i) + \frac{1}{w} E(cov(w_j, z_j)) > 0$$

The intuition about this is relatively straight forwards: if the between group selection process (formerly the “natural selection” component) and the within group selection process (formerly the “random mutation” component) are in agreement, then the phenotype $latex z$ will be selected for. Note however, that this is an inequality. So if a the within-group selection process turns out to be a negative number, the between-group selection component has to be greater than that in order for a phenotype to be selected.


We’ll cheat. We’ll start by saying group selection and kin selection are equivalent. Afterall, both equations look similar:

$$ \Delta \bar{z} > 0 \ iff \ \frac{1}{w} cov(w_i, z_i) + \frac{1}{w} E(cov(w_j, z_j)) > 0$$
$$ \Delta \bar{z} > 0 \ \ iff \ \ rb-c > 0$$

From an intuitive point of view, it makes sense too. In the above section, we can say that the selection between groups is similar, at least intuitively to $latex rb$, except now we’ve defined a group to be a group of individuals who are genetically related ($latex r$). And the cost of an altruistic act? It’s exactly the same as having a negative selection within groups. In fact, you can think of the group selection equation to be a more generalized version of kin selection, because the cost can apparently be… positive.

For me the biggest lightbulb moment for me was realizing that the Price Equation can be read two ways – the latter being the whole in-group and between-group selection thing.


Where is this going? I have no idea. I started writing this blog post to help myself understand if kin selection and group selection were the same thing, based on the Price Equation. I did some research, found that most people nowadays agree that they’re the same thing anyway. I almost stopped blogging by that time * you can tell that I ran out of steam, because I started by meticulously explaining the evolution equation, and then skipped a bunch of working for kin selection and group selection . But then for shits and giggles I completed it. And I think they’re fairly similar, at least from an intuitive point of view. I had wanted to continue on proving that each component mean the same thing, but I think I am happy with it as now.

The reality is the Price equation and the derivatives – kin selection and group selection – are really basic algebra, but the key behind it is the intuition. I think I have understood it well, and wrote down some intuitions behind the simple equations. But you tell me. Do my intuitions make sense? I may have made many mistakes along the way with my intuition. If you spot one, please tell me in the comments section below.

Extra Reading Material

Here are a list of extra reading material that I think may be helpful – some of them I have read, some are references found strewn about that I will hopefully eventually find the time to read:

comments powered by Disqus