On Thursday I decided to do some additional optimization to my Go code. This meant writing some assembly to get some of the AVX goodness into my program (I once gave a talk on the topic of deep learning in Go, where I touched on this issue). I am no stranger to writing assembly in Go, but it’s not something I touch very often, so sometimes things can take longer to remember how to do them. This is one of them. So this blog post is mainly to remind myself of that.
The values in Go slices are 16-byte aligned. They are not 32 byte aligned.