Pointer Tagging in Go

Let’s say you’re an idiot. You didn’t know about the interface{} type in Go. You didn’t know about switch foo.(type) {} in Go. Now you want to do something really fancy, like interpreting integers as floats (why would anyone do that?). Maybe you just want to circumvent the very excellent type system in Go. Or maybe you want a much faster type switching/assertion in Go. Or maybe you’re just a bit nuts.

So if you happen to want to circumvent the type system in Go, what would you do? One thing you can do is pointer tagging. What exactly is pointer tagging? Let’s first have a look at what pointers are. Pointers essentially are memory addresses. A pointer points to an object (typically bytes) in memory by storing the address of that memory’s location. Go is a language with pointers (but no pointer arithmetic), and a good place to start is Russ Cox’s blog post on Go data structure. But if you are lazy to open a new page, a pointer is essentially an address to the memory location of whatever you want. Consider the following:


func main(){
    x := 1234
    xPointer := &x
    fmt.Printf("%#vn", xPointer)
}
>>> (*int)(0xc200000018)

The value of xPointer indicates that it is of type (*int), which means it’s a pointer to type int, and it points to address 0xc200000018. Note that the pointer is tightly coupled with the type at the address. No operations save indirection can be performed on it. However, there is a way to manipulate the pointer.

Unsafe Fun

Enter the unsafe package. It’s called unsafe for a good reason: it is unsafe. With the unsafe package you can do all sorts of weird things to the pointer. Any pointer can be converted into the unsafe.Pointer type. From there, it’s only a short step to converting an unsafe.Pointer type into uintptr, which can then be converted into any uint, where all the fun happens. But even without converting into uintptr, much fun can be had.

This for example, converts a (*int) into a (*float64):


func main() {
    x := 1234
    y := *(*float64)(unsafe.Pointer(&x))
    fmt.Printf("%#v %#v n", x, y)
}
>>> 1234 6.097e-321

You will note that of course, that x and y do not have the same value. If you want to convert 1234 into a float, then use float64(1234). No, this one is different because the underlying bits are interpreted differently.

1234 in int64 binary looks like this (Big Endian ordering; spaces delimit a byte):

00000000 00000000 00000000 00000000 00000000 00000000 00000100 11010010

Now, if this binary string were to be interpreted as float64 (which is by the way, IEEE-754), it will be interpreted as 6.097e-321. How?

The IEEE-754 float specification specifies a double precision floating point (i.e. a 64 bit float) as this (spaces delimit a byte for easy readability):

s
eeeeeee eeee
ffff ffffffff ffffffff ffffffff ffffffff ffffffff ffffffff
1
11
52

Encoding of a float is a very weird thing. The first bit, s is the sign bit. The next 11 bits, e are exponent bits. The following 52 bits, f are the mantissa of the number. If we were to interpret the above binary as a float it would look like this:

0
0000000 0000
0000 00000000 00000000 00000000 00000000 00000100 11010010
1
11
52

By contrast, this is what float64(1234) looks like.

0
1000000 1001
0011 01001000 00000000 00000000 00000000 00000000 00000000
1
11
52

They’re obviously very different, and the first one is basically 6.097e-321 encoded as a 64-bit floating point. P/S: A quick verification can be seen here with my gogogadget inspection package:


func main() {
    f := 6.097e-321
    fb := fmt.Sprintf(gogogadget.BinaryRepresentation(f))

    x := 1234
    y := *(*float64)(unsafe.Pointer(&x))
    yb := fmt.Sprintf(gogogadget.BinaryRepresentation(y))


    fmt.Printf("%sn%sn%tn", fb, yb, y==f)
}
> go run main.go
00000000 00000000 00000000 00000000 00000000 00000000 00000100 11010010
00000000 00000000 00000000 00000000 00000000 00000000 00000100 11010010
true

Pointers

As mentioned above, pointers in Go are typed pointers. The type of the object at the referred address is tightly coupled with the pointer. However, this can be subverted by converting the pointer into the uint type, which makes it really fun, because that’s where all the magic happens. Of course we wouldn’t be working directly on pointers, but this is close enough. Let’s jump in shall we (forgive the rather unidiomatic way of writing code, I want everything to be clear)?


func main() {
    var x int64 = 1234

    // xPointer is of the *int64 type, meaning it points to
    // an object in memory that is a int64
    var xPointer *int64 = &x

    // converts xPointer into an unsafe.Pointer, which is then converted
    // into uintptr, which is then converted into uint
    var xPointerUint uint = uint(uintptr(unsafe.Pointer(xPointer)))

    fmt.Println(gogogadget.BinaryRepresentation(xPointerUint))
}
> go run main.go
00000000 00000000 01111111 00110101 01100100 01010000 10011111 01110000

The nice bit about pointers being converted into uint is that the last 3 bits of the number is always 0. In other words, pointers in Go byte-aligned * Check it yourself: MARKDOWN_HASH723576c8231207750d0da6589d92ef5dMARKDOWN_HASH should return 8 . This means all pointers have to have sizes that are multiples of 8. This leaves the last 3 bits open for manipulation and fun!

3 bits of data means you can have up to 8 points of data * {0, 1, 2, 3, 4, 5, 6, 7} = {0b000, 0b001, 0b010, 0b011, 0b100, 0b101, 0b110, 0b111} . In the opening paragraphs of this blog post, I mentioned, if you didn’t know about the interface{} type, then these 8 bits would be quite helpful. For example, if the pointer is *int you could tag the last three bits as 000, and if the pointer is *string you could tag the last three bits as 001.

Bit Twiddling Fun

If you’re confused about what “tagging the last three bits” mean, here’s what I mean: Everything in the computer breaks down to binary. In the last example above, xPointerUint was represented as this:

00000000 00000000 01111111 00110101 01100100 01010000 10011111 01110
000
What's necessary for the pointer. Don't change these.
Manipulatable bits ⬈

The last three bits have been coloured red. Those are the bits we want to change.

Tagging the last three bits simply means changing the last three bits into whatever we want (out of the 8 possible combinations). As long as we do not touch the first 61 bits of the pointer, we should be safe, because we can always change the last three bits back to zero when we want to use the pointer (Recall that xPointerUint is actually a uint).

Get more posts like this

We tag the last three bits by bit twiddling. There is a fantastic book called Hacker’s Delight, which showcases all the fancy bit twiddling hacks. I highly recommend that book for even the casual hacker. However, to tag the last three bits of a uint, which is guaranteed to be always 0, one doesn’t even need to go so far to use the advanced techniques in the book. Simple & and | would suffice.

Here’s a simple exercise to show what I mean. But first a primer (there’s more in Wikipedia’s article on bitwise operation) :

16 is represented as 0b10000 in binary. 7 is represented as 0b00111. NOT 7 is represented as 0b11000* The NOT argument simply flips all the 1s and 0s. If a bit was 1, it's now 0 (its value in decimal is 24). If you were to perform 16 & 24, you would get in return… 16* Check it for yourself at Wolfram Alpha . Conversely, 16 | 24 will yield 24. These properties make 7 a very important number in terms of pointer tagging because it forms the limit at which a tag can be placed upon a pointer.

So, 1024 words later, here’s a pop quiz: what happens when 16 | 1 is performed? The answer is 17, which is represented as: 0b10001. What about 16 | 2? The answer is 18, which is represented as: 0b10010. And so on and so forth. Congratulations, you now know how to tag pointers.

Of course, it’s not just about tagging pointers with unsigned integers. What is the use of tagging a pointer and not being able to retrieve and use it?

Pointer Tagging

Here is the full pointer tagging code, done very quick and dirtily (for example, if you tag a uint greater than 7, it will fail spectacularly):

As you can see, there is a way to retrieve and restore the value. I had written a different return function, but I had in the end decided to use this in the example instead. The alternate one looks like this:


func ReturnPointer(ptr *uint) uint {
    tag := *ptr & uint(7)

    *ptr = (*ptr & ^uint(7))
    return tag
}

It basically replaces the object at the destination address with the original pointer uint ((*ptr & ^uint(7))). It is not very clear, and therefore I didn’t use it.

Usefulness

Is this much use? I have no clue. I had some ideas of what to do with pointer and indeed float tagging, but I ended up not using them. The original reason for it was because I had a circular import issue, which was quite easily solved with storing a uintptr instead of *circularImportPackage.Type . That solution spawned into a fantasy where I wrote far more performant code compared to simple .(type). Indeed, that is one use that I can think of: faster type switching and type assertion.

That never went anywhere. This blog post itself was written months ago, but only completed today. When I was completing the blog post, out of curiosity, I wondered about the performance versus type switching…

Against switch .(type)

So, in procrastination from doing my actual work, I spent this afternoon writing benchmarks (and this blog post). This is what I found (the code can be found here: pointerTaggingBench.go; pointerTaggingBench_test.go).


$ go test -bench . -benchmem
testing: warning: no tests to run
PASS
BenchmarkInitializingInterface     10000        128716 ns/op       45242 B/op       1016 allocs/op
BenchmarkInitializingTagging       50000         70424 ns/op       16567 B/op         11 allocs/op
BenchmarkRetrievingInterface       50000         53246 ns/op       36937 B/op        218 allocs/op
BenchmarkRetrievingTagging     50000         40906 ns/op       36939 B/op        221 allocs/op
ok      github.com/chewxy/pointerTagging    11.197s

$ go version
go version devel +bea6199b09ea Tue Apr 30 17:47:39 2013 -0700 linux/amd64 * Yah, shut it, I know I've not upgraded to Go 1.2. Piss off. 

It really surprised me. I know pointer tagging works in C because there are much fewer allocations on the heap. I had assumed this didn’t matter in Go, because the runtime would be taking care of the whole allocation thing. It was one of the reasons why I didn’t continue on that path (another reason was that the resulting code is buttfuck ugly, what with so many conversions). But having the -benchmem flag in there indicated that the Tagging functions have significantly fewer allocations.

I had also assumed that because Go copies variables (i.e. pass-by-value), all the copying would lead to a slowdown. In my code I tried to do as much zerocopy/in-place updating as possible, which could also be the reason why it’s fast.

So pointer tagging is about 1.8x faster when writing, and 1.3x faster when reading (i.e. type assertion/switching). That’s interesting.

Downsides; Discussions; Caveats

Okay, so we discovered that pointer tagging is slightly faster than traditional interface{}. Should you use it in production? I’d say probably not. Especially not my code. In the TagPointer() function, there is a significant lack of bounds checking for the tag (it should only accept uint between 0 and 7), which could possibly slow things down.

There is also the fact that it can only type-check/switch on 8 different types. That’s a severe limitation on most projects. Not mine though.

Another complain that I’d have is that the resulting code is ugly and verbose. That could be mainly due to the fact that a lot of the code was hacked up in less than an hour. I have however, given it a bit more thought, and it still wouldn’t be able to look as elegant as switch x := y.(type).

In short, do not use pointer tagging, unless you know what you’re doing (and indeed, the same goes for the unsafe package). If you’re new to Go looking for easier ways to do type switching, use the old method. Don’t use pointer or float tagging unless you know what you’re doing

Conclusion

I started this blog post a few months ago. I continued it a bit later, creating the gogogadget package, with the intention of using it as an example. Today I finished this post. Yay for procrastination. I hadn’t expected pointer tagging in Go to be faster than the traditional type switching (which was my use case initially for even thinking this solution up). I had expected that all the type conversion and copying (since Go passes by value, and is strongly typed) that it’d be slower.

I’m pleasantly surprised that it’s faster. I would not recommend using it in production, however, since for most use cases, it isn’t even necessary. The resulting code when this is used is messy and verbose. Plus, I may actually be wrong. If someone could do the benchmarks and share their results, that’d be awesome.

If you’re actually interested in these sorts of bit twiddling, I highly recommend giving Hacker’s Delight a read. It’s not easy to digest, but multiple sittings on the toilet bowl should help you get through it.

As for what I plan to do with this newfound knowledge about tagged pointers being faster, I might clean up my code and put it into my odd little toolbox, gogogadget. I may use it in some future project. Who knows?

Get more posts like this

comments powered by Disqus