I recently finished a major refactor of `tensor`, which is a package for generic multidimensional arrays in Go. In this blog post I will recount the refactoring process, and why certain decisions were made. Further more I will also share some thoughts with regards to generics in Go while trying not to sound like a complete prat.

There has been major refactors done to the tensor subpackage in Gorgonia - a Go library for deep learning purposes (think of it as TensorFlow or PyTorch for Golang). It’s part of a list of fairly major changes to the library as it matures more. While I’ve used it for a number of production ready projects, an informal survey of found that the library was still a little difficult to use (plus, it’s not used by any famous papers so people are generally more resistant to learning it than say, Tensorflow or PyTorch).

Along the way in the process of refactoring this library, there were plenty of hairy stuff (like creating channels of negative length), and I learned a lot more about building generic data structures that I needed to.

Background

A large part of reason for the refactor was to improve the extensibility of the library. I had been working on optimizing the gradient descent optimization algorithms (Adagrad and Adam specifically) and I realised that there was a major flaw in the design. In Tensorflow, Theano and PyTorch, the gradient descent is part of the computation graph that is built, while in Gorgonia it isn’t. I didn’t like that idea. Rather, the gradient descent algorithms were designed to take advantage of Go’s interface type, allowing for a “pluggable” optimizer. The idea was I could easily swap out Adagrad for RMSProp or whatever when testing ideas. This led to the Solver interface: a gradient descent optimizer was essentially any type that satisfy that interface..

The problem of course, comes when you realize why other packages simply implement gradient descent optimizers as part of the the computation graphs. The weakness in Gorgonia’s design was that the gradient descent bits couldn’t use the same execution backend as the graph. To give a concrete example - while the forward and back propagation of the graph CAN take advantage of any CUDA kernels, when it came time to update the gradients, CUDA couldn’t be used. Well, I could redesign the Solver types to actually use separate kernels, but the design would be in my opinion quite inelegant. My gut feel had been that a re-design of tensor.Tensor would alleviate the problem. So quite early on, I had the seeds of an idea to rework the design of the tensors.

I’m no stranger to redesigning a library. Over the years of writing and re-writing Gorgonia, I’ve tried out various different designs. I even gave a talk at GopherCon Singapore on the various designs. While typically I would post the video of my talks (and my slides), I hadn’t done so for GopherCon Singapore. One of the reasons was I had been mulling the changes since before GopherConSG. A little bit of context: about a week before GopherCon Singapore, I had accepted a new job - to start the Monday following GopherConSG.

As part of the interview process, I was asked to build a classifier/clusterer (the instructions were not specifically to build a clusterer, I misread the original instructions and built a hierarchical LSTM instead) over a weekend. Naturally I built it in Go (with Gorgonia) and presented the results (with Jupyter no less!). During the interview I learned that my employers were doing implicit matrix factorization for their recommender systems. I set myself the challenge to learn as much as I could about implicit matrix factorization, and tried to extend Gorgonia to handle sparse matrices.

Then came GopherCon Singapore and I had to write my slides. Perhaps what I hadn’t said in my talk was as important. I was really struggling with expressing my thoughts in the latter parts of my talk - especially with regards to generics in Go. As a result there are about 30 unused/unpublished slides. After all those months and working on the refactor of the tensor package, I think I now know how to express what I was thinking with regards to Go generics. That will form the latter part of this blog post.

TL;DR of Motivations

I wasn’t happy with the way Solver was designed. I wanted it to be able to take advantage of the different backends that may be used for the forward and backward propagation parts of the graph.
I wanted extend the tensor package to also be able to handle sparse matrices.
After some exploratory work, I came to believe that a refactor of the tensor package would benefit both causes above. In fact it was necessary for additional extensibility.

Brief Introduction to `tensor`

The tensor package does what it says on the tin. It’s a package that handles multidimensional arrays of any type. It provides the data structures necessary to manage multidimensional arrays, and some functions for it.

You can do this and it’s valid:

type GeoCoord [2]float32
T1 := tensor.New(tensor.WithBacking([]int{1, 2, 3, 4}), tensor.WithShape(2, 2))
T2 := tensor.New(tensor.WithBacking([]float64{1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12}), tensor.WithShape(2, 3, 2))
T3 := tensor.New(tensor.Of(Byte), tensor.WithShape(6))
T4 := tensor.New(tensor.Of(GeoCoord), tensor.WithShape(2,3,4,5,6,10))

A lot of how it works has been covered by both the README of the tensor package as well as the slides I presented at GopherCon Singapore.

Those cover the basics. But in case you are too lazy to read those, here’s a quick recap: Internally, there was a pointer to a 3-word structure embedded in the *Dense structure that represented a Go slice (I had used reflect.SliceHeader for simplicity):

type Dense struct {
	// contains elided fields
	hdr *reflect.SliceHeader
	t   Dtype
	v   interface{}
}

This, combined with the AP structure, defines the multidimensional array. Note the structure merely holds the pointer to the head of the slice, but no type information, so we need to actually record the type information in the struct. I used Dtype which embeds a reflect.Type because I want to apply Hindley-Milner style type inference for it in other packages*Which in hindsight wasn't that great of an idea in Gorgonia..

Besides these, I also wrote accessor methods - the *Dense structure is after all, merely a glorified pointer to the underlying array with special rules on how to access the elements in the array. There has to be a way to retrieve a []float64 or a []int16 from a *Dense, so that we could loop through the elements.

Why Not `interface{}` ?

As a side note, here’s one of the more interesting FAQ about the data structure I’ve been asked*"frequently" may be a bit biased because I think this question is a particularly interesting one, hence I actually remember it more. - why not just use interface{}. In this asides section I’d like to chronicle the reasons I did so.

One of the earliest decisions I made for this package was that *Dense should be a structure that held a slice of any type. The first implementation stored a interface{} and performed a type switch when necessary.

The result was unbearably slow, and within the day, I had swapped it out for *reflect.SliceHeader. As part of trying to explain the decisions behind the data structure, here’s a benchmark I ran on Go 1.9 (when I wrote the initial version with interface{} and type switching, I was using Go 1.5):

go test -bench=. -benchtime=10s -benchmem -cpuprofile=test.prof
BenchmarkTypecasting-8           	10000000000	         1.72 ns/op	       0 B/op	       0 allocs/op
BenchmarkTypeSwitchTypesOnly-8   	3000000000	         5.55 ns/op	       0 B/op	       0 allocs/op
BenchmarkTypeSwitchE2I2-8        	1000000000	        13.3 ns/op	       0 B/op	       0 allocs/op

One of the biggest takeaways I got from this is type switching at runtime has a cost. Specifically, the more different kinds of things you type switch on (if you have mixed cases of concrete and interface types for example), the more runtime penalty will be applied. I’ll come back to runtime-type switching in a later bit.

Now, I list these here merely in the interest of explaining certain design decisions for *Dense. Please do not take this as advice to store all your Go values as pointers or some unsafe structures! In fact it is most certainly impractical to keep using type casting.

Why A Generic Multidim Array?

Another follow-up question that would be frequently asked is “why not just write a multidimensional float64 array?” The answer is slightly more complicated. While I agree that having implemented a float64 tensor would have made my life easier, my workflow actually involves using multiple types.

The primitive types in Go are first class citizens. But some types are more first class than others, and for good reason too. Both int and float64 are prime examples of that. For instance, the numeric constant type that the compiler internally uses tend to convert to int, and float64 as their default types, unless otherwise specified. The math library works best with float64 and not float32. And I can actually appreciate why this is done - a large part is architectural reasons (you don’t want to have to waste space by padding a int8 when you’re going to be passing a block of 8-bytes anyway). Plus it’s a lot of work trying to handle different types.

But this would cause an issue for me given my workflow.

A typical workflow for me would be to start with the default types blessed by Go. I’d build my neural networks on float64 based slices and tensors, just as a proof of concept that it works. When I need to make it faster, I would typically switch it down to a float32 based tensors. I had a long time ago attempted to implement float16 types but it turns out to be much hairier than expected so I’ve shelved that for now. And to make it even faster I’d ship my computations off to the GPU. The standard rule is:

make it run
make it right (and the best way is through the use of the most correct of libraries - the stdlib, which favours float64)
make it fast (switch down to using smaller data types)
make it reaaaaally fast (switch to using GPUs)
make it reaaaaaaallly reaaaaaaallly reaaaally fast (switch to a cluster computing method, which I’ve tried multiple times but never got it to really work.)

Additionally when I was working full time on my startup, I (ab/mis)used the data structures in the tensor package to hold 2 and 3-D slices of logical forms (think of them as fancy restrained s-expressions). The logical forms are also subject to many of the methods of the tensors (for example, elementwise addition and multiplication of the logical forms had semantic connotations). I personally think of that as abusing the data structures, but eh, what works works.

A further reason to have a generic multidimensional array is related to the idea of argmax and argmin. Argmax and Argmin returns a tensor.Tensor of ints. Why ints? Well, I could return float32 or float64 but then when using it as an index to a slice, I’d have to remember to covert it to int. Better to return a slice of ints.

I hope I’ve made the case as to why a generic multidimensional array data structure had been useful for me in my work so far. Primarily it’s part of the larger machine learning / deep learning workflow that requires type switching.

Dense Tensors: A Recap

To recap, a *Dense tensor implements the tensor.Tensor interface. It’s a data structure that holds a 3-word representation of a Go slice, giving it the ability to be a container of any type.

Now I won’t lie - you could implement something like this and call it a day:

type Float64Dense struct {
	*AP
	data []float64
}
// implement the Tensor interface methods

type Float32Dense struct {
	*AP
	data []float32
}
// implement the Tensor interface methods

That was one of my earliest implementations. I really liked it because it was clean. I could even abstract the data structure to be like in package sort. The main problem I had was that it was slow - mainly due to a lot of use of runtime.convI2I and runtime.assertI2I2 and the like.

Additionally, having subpackages made it difficult to juggle - it was eventually hard for me to keep track of which packages had the code generatedHere's where having Ada-style package based generics would have been nice - and that was just 4 subpackages! Lastly, the implementation was not a forward-thinking implementation of a Tensor because it relied on the assumption that everything could be accessed with an AP.

Alternative implementations that I had tried can be found in the ALTERNATIVE DESIGNS document in the tensor package which you can peruse at your own leisure.

So I settled in on a slightly more concrete data structure because that meant faster access to the data. Additional accessor method for different types were included. For example, the Float64s() method casts the *Dense into a []float64. This was helpful for most of the other methods like arithmetic operations and the like.

The Refactor

I started the refactor for the tensor package by attempting to implement a CSR/CSC sparse matrix (they’re structurally the same - so I coalesced them into a struct called CS). I quickly ran into the problem of having to re-implement all the accessor methods for it.

Implementing a Sparse Matrix

At first I considered generating those methods for the *CS structure. But that meant a LOT more code would be added. Compile times were long enough due to the sheer size of the library. I noticed that both *CS and *Dense shared an underlying concept of having an array. So I did what most programmers would do: refactor some fields out into an array type. An array was simply:

type array struct {
	Ptr unsafe.Pointer
	Len int
	Cap int
	t   Dtype
}

Eventually of course, I came to group Ptr, Len and Cap under a different structure to ease the casting of a struct into a slice. The array definition became this:

// in package internal/storage
type Header struct {
	Ptr unsafe.Pointer
	L int
	C int
}

// in package gorgonia/tensor
type array struct {
	storage.Header
	t Dtype
	v interface{}
}

So now there is a shared implementation for a generic-ish array. This array type can be embedded in *Dense and *CS types, representing a slice []T. Now we can implement methods that builds on top of these data structures. Afterall, a multidimensional array, sparse or dense, is not useful without methods.

After a bit of trial and error, I came to realize that the *Dense data structure as designed, wasn’t a pure data structure. And that was a major roadblock in refactoring the tensor package.

Towards Pure Data Structures

What do I mean by a pure data structure?

It’s a term that’s completely made up by me. But it’s a useful concept, in my opinion. Think of a pure function. A pure function is a function that takes and input, and produces an output, without any state-changing side effects. Because of its purity, you can treat a pure function as a black box of sorts - you don’t need to know what’s going on inside.

Using that as an analogy, a pure data structure is a data structure that doesn’t need to care about its payload - it’s simply one that stores data in a structured manner. I hesitate to compare it with the plain-old-data-structure class in C++ or POJOs in Java, because those have deeper connotations of having no useful methods, while my idea of a pure data structure is slightly broader than that.

Let’s concrefy this idea with an example, using a simple linked list:

type list struct {
	data int
	next *list
}

There are only two fields in this structure, and they each serve a different purpose. data is a payload field: it stores the data of interest. next on the other hand is a field used for bookkeeping purposes. By and large, we can categorize the fields into either a payload field or a bookkeeping field.

There can be methods to this data structure of course. We could write accessor methods:

func (l *list) Car() int { return l.data }
func (l *list) Cdr() *list { return l.next }
func (l *list) Last() *list { 
	if l.next != nil {
		return l.next.Last()
	}
	return l
}

The *list is still a “pure” data structure, because it doesn’t care what data is being stored. The moment you write a method in which the data of the payload is introspected upon, the data structure ceases to be pure:

func (l *list) Sum() int {
	if l.next != nil {
		return l.next.Sum() + l.data
	}
	return l.data
}

By this point, you might be saying, hang on, Chewxy, you’re just describing generics. Hang on to that thought. I’ll get back to it in the later part of this post. For now, know that the original design wasn’t a pure data structure. The *Dense structure had several methods which introspected on the payload - it ranged from things like Transpose, Inner, to Add. The idea of having a pure data structure is an important one in the works of separating data and algorithm.

Algorithms + Data Structures = Programs

In 1975, Niklaus Wirth wrote an influential book - Algorithms + Data Structures = Programs. In hindsight, it encapsulates the majority of the thoughts I was struggling to express for my talk. I also think my previous experience with OOP-style languages has blinded me to something I had already known, and it took a while for me to re-discover this fact.

I’m a fairly strong believer in the weaker form Sapir-Worf theorem - in rough terms - that the language you speak influences the way you think. Indeed we can see it at play here. The way I think about methods have been so bound to methods of the OOP-paradigm that I forgot for the most part that these two are equivalent:

func (a *Dense) Add(b *Dense) (retVal *Dense) {...}
func Add(a *Dense,  b *Dense) (retVal *Dense) {...}

The former is simply syntactic sugar for the latter. Of course methods also play part in determining if a type belongs in an interface or not, but that’s besides the point. Had I written the methods in the latter manner, it would have been much clearer that Add was a function that took a of *something as a parameter. It’s just a function call.

Once I realized this, it became quite quick for me to figure out how to separate the data structures and the execution of the data on it. Furthermore, it was a short hop from there to being able to swap out execution engines at runtime so the tensor themselves can use a different kernel to execute the same functions.

For example, if I wanted to add the values of two *Dense using CUDA, I’d write a function like this:

func AddWithCUDA(device CUDADevice, a, b *Dense) (*Dense) { ... }

If I were to be fine with adding a new method with each execution engine (CUDA, OpenCL, etc), then the problem of dispatch would turn out to be very ugly. But it did yield one particular insight into refactoring.

The Problem of Dispatch

A dispatch function would look something like this:

func DispatchAdd(ctx Context, a, b *Dense) *Dense {
	switch ct := ctx.(type){
	case CUDAContext:
		return AddWithCUDA(ct, a, b)
	default:
		return Add(a, b)
	}
}

In fact such dispatch functions DID exist in the main Gorgonia library itself (the refactor for that library is next). It was in fact used to dispatch different function calls based on different execution contexts (is this function going to be run on the GPU, or CPU?). Also, the Context was conditional on build tags. A ExternMetadata (a data structure a context would embed) would be different if the build tag were CUDA vs other build tags.

It was ugly and messy. I was reluctant to re-write this dispatch system. So I thought of using Gorgonia itself to generate the new tensor dispatch functions.

Then something caught my eye.

Instead of having to pass in a Context all the time, why not make the Context as part of the definition of the tensor.Tensor interface? After all, if the goal is to make pure data structures, why not go all the way?

And so the definition of the tensor.Tensor interface gained a new method: all Tensors must be able to know what execution engine (I renamed Context to Engine to better represent its use: an execution engine) to use. To concrefy, the *Dense structure now looks like this:

type Dense struct {
	// contains elided fields
	*AP 
	e Engine
	array
}

This allowed for methods that implicitly dispatch. Instead of switching on the Engine type, I decided that it should be wider scoped than that, and just check to see if the Engine does implement any arbitrary interface I may be interested in (the final version has some minor additional optimizations that came about from profiling code):

func (t *Dense) Add(other *Dense) (*Dense, error) {
	if adder, ok := t.e.(Adder); ok {
		return adder.Add(t, other)
	}
	return nil, errors.New("Engine does not support Add")
}

And so, the remaining work has been to refactor the methods out of the data structure, and into methods of a default execution engine, which is pretty comprehensive in coverage of primitive types.

Further Cleanups

Along the way of course, are also some additional work that unifies the ideas of computation. In the past, I had written a code generator, gorgonia/tensor/genlib to generate the methods. It was arbitrarily generated - a little undisciplined, if you ask me. At the end of my GopherCon talk, I mentioned I’d love to try my hand on a Cartesian closed category based approach of generating code. Unfortunately I didn’t get quite as far. I ended up with gorgonia/tensor/genlib2 which is still undisciplined in my code generation.

There are generally two ways that the data in a multidimensional array can be held:

As one contiguous block of memory.
As a block of memory in which the next element requires a jump.

The latter requires the use of an Iterator to get the index of the next value in order to access the values, while the former can take advantage of optimized access methods to touch the data. By definition, sparse matrices fall into the latter group because you don’t store all the data.

This definition is far from static and is highly dependent on an operation though. For example, let’s say you have A and B - both are *Dense. But A is stored in a row-major fashion while B is stored in a column-major fashion. An addition operation would then both require Iterator, as they both have different ways of accessing the data.

This all led to the cleanup of the internal functioning of gorgonia/tensor. Every operation contains a check to see if an Iterator is required. The code paths the diverge if it’s required. In the past this had been a rather arbitrarily generated code, but now it’s a lot more uniform. This has led to both performance gains and losses, but in my opinion, the losses are quite worth it.

The Results

The refactor resulted in more functionality, and less code. And there were huge performance gains and some minimal losses. There were also some exported API changes, all noted below.

Performance change summary

Perhaps one of the most meaningful changes is that I have finally conceded that fused-multiply-add should be a separate function. This was a point that my startup cofounders used to discuss - whether there is a need to expose FMA - which for the most part calls out to a BLAS version of {d|s|z}axpy. After a lot of reasoned analysis, I finally relented, and added it in. Before this, to perform FMA operations, you’d do this: Mul(a, x, WithIncr(y)). Now with the FMA methods and functions exported, there is no longer a need to allocate a function object (WithIncr) anymore, leading to fewer allocations, which leads to faster results.

Overall, the new data structure and access rules and checks add about 600ns overhead for arithmetic operations when compared to using optimized native operations on slices (but is about 5000ns faster than unoptimized native Go slice operations). Compared to the old data structure, these add about 100ns overhead, owing largely to the greater support of more things. These runtime checks are necessary (have to check if the memory is accessible by Go, whether the shapes are correct, etc etc) and I’ll be working to optimize these checks to reduce them even further, especially with regards to runtime type checking (at this point it’s down to a fairly simpleruntime.ifaceeq).

Another change was made to uniformize the APIs for the methods. This led to several performance changes:

For contiguous slices, there has been an almost 100% improvement over the old data structure + algorithms.
For non-contiguous slices, there has been a 70+% loss in performance over the old data structure + algorithms.
Tensor-scalar operations have seen improvements of ~20% over the old data structure + algorithms.

What this means is if you want highly performant operations, you need to be aware of how the data is laid out. This is now made easier with new methods that allow you to inspect how the data is laid out.

Here’s a quick summary of the performance changes that actually changed:

benchmark                                           old ns/op     new ns/op     delta
BenchmarkDense_Mul_Unsafe-8                         9910          9969          +0.60%
BenchmarkNative_Mul_Unsafe-8                        16816         16277         -3.21%
BenchmarkNative_Mul_Unsafe_vec-8                    9322          9117          -2.20%
BenchmarkAPI_Mul_Unsafe-8                           10041         9792          -2.48%
BenchmarkDense_ContiguousSliced_Mul_Unsafe-8        373286        9784          -97.38%
BenchmarkDense_NonContiguousSliced_Mul_Unsafe-8     377164        657557        +74.34%
BenchmarkAPI_MulScalar_Unsafe-8                     16015         11296         -29.47%
BenchmarkNative_MulScalar_Unsafe-8                  11082         10886         -1.77%

benchmark                                           old allocs     new allocs     delta
BenchmarkDense_Mul_Unsafe-8                         2              1              -50.00%
BenchmarkNative_Mul_Unsafe-8                        0              0              +0.00%
BenchmarkNative_Mul_Unsafe_vec-8                    0              0              +0.00%
BenchmarkAPI_Mul_Unsafe-8                           2              1              -50.00%
BenchmarkDense_ContiguousSliced_Mul_Unsafe-8        5              1              -80.00%
BenchmarkDense_NonContiguousSliced_Mul_Unsafe-8     5              5              +0.00%
BenchmarkAPI_MulScalar_Unsafe-8                     2              3              +50.00%
BenchmarkNative_MulScalar_Unsafe-8                  0              0              +0.00%

benchmark                                           old bytes     new bytes     delta
BenchmarkDense_Mul_Unsafe-8                         97            9             -90.72%
BenchmarkNative_Mul_Unsafe-8                        3             3             +0.00%
BenchmarkNative_Mul_Unsafe_vec-8                    1             1             +0.00%
BenchmarkAPI_Mul_Unsafe-8                           97            9             -90.72%
BenchmarkDense_ContiguousSliced_Mul_Unsafe-8        339           10            -97.05%
BenchmarkDense_NonContiguousSliced_Mul_Unsafe-8     339           479           +41.30%
BenchmarkAPI_MulScalar_Unsafe-8                     73            25            -65.75%
BenchmarkNative_MulScalar_Unsafe-8                  0             0             +0.00%

API changes summary

In effort to standardize names, I got rid of the more correct method names in favour for standardized names. Also the signatures have changed for those. Here’s a table of changes:

Old Method Name	New Method Name
Trans	AddScalar
TransInv/TransInvR	SubScalar
Scale	MulScalar
ScaleInv/ScaleInvR	DivScalar
PowOf/PowOfR	PowScalar

Where in the past, the -R suffix indicate that the scalar value is on the right, the new -Scalar methods requires an argument to indicate if the tensor is the left value. This standardization has been fixed in gorgonia proper as well.

Additionally, comparison methods are now exposed and follow a similar naming scheme.

Vanity Statistics

Here are some vanity statistics which aren’t exactly useful. So I’ve tabled them

Statistic	Old	New
Diff	$ git diff origin/master --stat 215 files changed, 83517 insertions(+), 117085 deletions(-)
API Surface	65	138
Test Coverage	72.7%	73.3-ish%

The API surface was found with :

go list . |  
while read pkg; do
	echo $(go doc $pkg |
	grep -E '^(type|func|var|const)' | wc -l) $pkg
done | sort -rn | head

It’s not a particularly useful metric other than to give an idea how big a library is. The test coverage changes from run to run due to the nature of testing/quick, but are generally minor changes. You can get Gorgonia v0.7.1 now

Future

There are some types that function as solely utility types in the package. In the future, they will be unexported. Future work for Gorgonia will mainly involve improving packaging (with dep), removing as much redundant code as possible, and reorganization of the package structures. There’s quite a lot to be done.

Some Thoughts on Generics in Go

I do have some thoughts about generics in Go. I’ve alluded to some of them in previous blog posts, talks and meetup discussions, but never have I actually written it down. Part of the reason why is because it’s a fuzzy idea (it STILL is), and it’d be quite difficult for me to fully express what I mean. However in this section I would like to attempt to share some of the thoughts about generics that are more well-formed.

Types and Representations

Earlier I had mentioned a term I made up - “pure data structures”. In reality it’s just a rebrand of another concept that already exists - container types. I hesitate to use the term because in my opinion, types and representation are separate and should be kept separate in the mind - In the section about pure data structures I’m talking about the representation, not the types. Many discussions online I read about the lack of generics in Go often conflate the ideas of representation and types. This is not helped by the type keyword when used to define new types, which are typically tied to the representation.

Types in my mind are labels/annotations for a representation of data. A []byte can represent everything. After all, everything on a digital computer are just bytes. Once we give a sequence of bytes a name/label, we’re imparting semantics on that byte slice. The semantics of the representation (i.e what the byte slice does) can be defined in the labels AND the structure itself. If we name a byte slice a tree, we can say that this tree byte slice contains reference to at least another two different byte slices somewhere that are also labelled tree.

Here I’d like to step away from standard nomenclature, and think about labels for representation. They can be simple, like tree or list. Or they can encode more information about the behaviour of the representation - tree a is the type of a representation of a tree that has a payload of type a. For the most part in Go, types are simple*with the exception of maps, slices, arrays, chans and pointers of course - they’re just a simple label/name.

To really concrefy what I mean by separation of types and representation, I want you to think about a slice. Specifically when we write this:

var intslice []int

The abstracted*Recall that everything is nothing more than a byte slice... the structure and meaning of the byte slice is given meaning by our abstraction representation of intslice is this:

type intslice struct{
	Ptr uintptr
	Len int
	Cap int
}

The type of inslice is this in conjunction with this. If you’re more familiar with go/types you may also note that this is the type of intslice above. The elem field of types.Slice is the parameterized type. For the most part in programming in Go, when we use the keyword type, we’re really defining the representation rather than the type (or more accurately, we’re defining the representation, and labelling it with a simple label). The compiler defines the type based on the representation we’ve defined.

A type system is therefore a system for describing, manipulating and transforming the labels/annotations. It knows about the behaviour of these labels, and therefore try to do something about it (typically type inference, but you could also apply a type system for instantiation and generalization… but this topic is starting to get out of hand).

And to be clear, I love Go in spite of its minimal, almost non-existent type system. Specifically it’s the structural type system that I enjoy about Go - the use of interface{} to find composable programs. I think it’s a brilliant stroke of genius that sadly wasn’t taken to its maximum potential*I suspect I can actually empathize with the reasons as to not go the whole way with `interface{}`es.

I personally find that having a mental mode that ties representation and types are useful for certain kinds of tasks (performance optimization for example), but not other kinds of tasks (thinking about generics, or discussing programming languages as an abstract subject). So from this point forth in this post, I’d like to continue as though types and representations are different things.

Different Notions of Genericity

While writing Gorgonia and now refactoring ths tensor subpackage, I’ve come to notice that there are differing notions of genericity. And I don’t just mean that different people mean different things when talking about genericity. I think that the concept of “generics” itself overloaded in meaning is comprised of multiple forms of genericity, each of which has its own solution.

These are what I think to be differing notions of genericity - they’re somewhat arbitrarily classed, and they do overlap one another:

Generic data structures
Generic types
Generic functions
Generic computation

I’ll attempt to address them one by one.

Generic Data Structure

This is the most common notion of genericity that people talk about: generic data structures - data structures that can hold values of any types. The example from the “Pure Data Structures” section above is actually a generic data structure. Go does come with a few generic data structures: slices, maps, chans, arrays and pointers.

What people want of course, is the ability to create generic data structures - linked lists, trees etc. I can see the benefit of having the ability to create generic data structures. I also have my reservations over such things. I’m not sure if my experience has been a common thing, but in any case, here’s what it’d look like:

In my past experience, I would start out with a canonical-ish data structure as you would learn them in text books. They’re typically not generic though. For example, I would implement a separate AVL tree for say strings, and ints. Over time, after repeatedly profiling code, they’d end up being optimized to a point where it no longer looks like the canonical data structure that you find. Usually these optimizations take advantage of particular behaviours of the payload data type.

If I had started with a generic data structure from a library that uses interface{} to hold the payload, I’d still have to create my separate data structures when I optimize my code. Alternatively, I’d imagine a scenario where Go provides the facilities to created templated data structures ala STL in C++. I would imagine then the meta program would increasingly get complicated given that Go’s code analyzer isn’t quite as powerful as GCC’s or LLVM’s yet - I’d have to insert additional clauses to in the meta-program to generate specific optimizations for those types.

I’d be lying if I say I’m not tired of implementing copy-and-paste code for data structures in the beginning though. I use gen for a lot of things in combination with go:generate and for the tensor library, I wrote my own code generator called genlib. But to be really honest, it’s a terrible solution. I’ll come back to code generation in a bit.

In Gorgonia

The main data structures in Gorgonia are *Dense and *CS. Both implement the Tensor interface. Both are built on top of generic a array structure. My solution has been ugly: store pointers and the type. This relegates type checking to happen at run time. I also provide “specialized” execution engines for when you know that your Tensors won’t change types (which is the majority of use cases in deep learning work, but not so in general use).

A particularly ugly thing I had to do was try to mimic the Go runtime as much as possible. It sometimes felt like I created a “shadow” runtime subset. Eventually all the unsafe things were wound back, but that was a journey I felt was kinda unnecessary.

Generic Types

A corollary to generic data structures are generic types. One particular example is a set type. The set type and its uses in my experience is also in my opinion a very good canonical way of explaining the difference between type and representation. These are the various implementation of set data structures that I have used over the years:

map[key]struct{}
xtgo/set
[]key - slice of keys
[N]key - array of N keys
uint64 as a bit map

When my needs change, the underlying representation of the set changes too. For example, I may find start with a set that is based on map[key]struct{} but over time of analysis and profiling of the application I’m working on, I may discover that there are only a grand total of 30 items that I’m interested in keeping in a set. So I’d switch over to using a slice as the basis of my set. Or if I want it to be an ordered set, I can simply implement the methods for xtgo/set on the slice.

Speaking of xtgo/set, I think it’s pretty much one of the more brilliant packages out there when it comes to generic types. The xtgo/set package is similar to the sort package that comes in the standard library of Go. In fact xtgo/set functions rely on types having implemented the sort.Interface*side note: have you noticed how poor the Go authors are at naming things??? Why isn't this interface called `Sorter`?? Calling it Interface just confuses the shit out of everyone.. And this is only accomplished by understanding the fundamental nature of the type that’s being worked on.

In my opinion, xtgo/set it’s a true display of the understanding of an abstract data type, with emphasis on the word abstract.

In Gorgonia

Gorgonia in its earliest days, started out with the intention to provide a generic tensor type in the same way as xtgo/set. Unfortunately I clearly do not have a solid enough foundation in my understanding of multidimensional tensors. In particular I ran into a lot of trouble with Transpose.

Specifically it was my insistence at having an in-place transpose that made it difficult to abstract in this way (for context, when I wrote the transpose functions I was consistently running huge networks and running out of memory was a big problem so I opted for an in-place algorithm). Having an in-place transpose requires multiple temporary variables.

If I were to define a Transposer interface the same way sort.Interface is defined, then I’d require implementors of the method to return interface{} and read values from interface{}. This imposed a huge penalty on performance. The alternative would be to require implementors to return unsafe.Pointer… but that’s actually a far worse decision. It’s my personal opinion that any use of unsafe should NEVER cross the package boundary.

And from there it snowballed on to having faced a bunch of performance-related issues that caused the slow change of methods into more a type that is more bound to the representation as opposed to having a more abstract data type.

Generic Functions

The second notion of genericity is generic functions. Let’s consider a function func max(a, b T) T, which finds the maximum between two values, a and b, each of type T. It returns a value of type T. Unsurprisingly you can do something like this in Go. The solution in Go is the same solution in Java or C#: box it up.

What you’d do is to do something like this -

func max(a, b interface{}) interface{} {
	switch at := a.(type) { // unbox a
		case int:
			bt := b.(int) // unbox b
			if bt < at {
				return bt // box retVal
			}
			return at // box retVal
		case int32:
		// etc etc
	}
	panic("Unreachable")
}

When you call max(1,2), the compiler will automatically wrap 1 in a box - an interface{} *For those not in the know, an interface is essentially 2 pointers: one pointer to the table of methods, which is static and known at compile time and the other pointer to the value allocated on the heap, pass it into max, which then unpacks the value from the box, and performs the operations, and then packs it back into the box before returning the value.

Essentially the problem of generic functions boil down to the problem of dispatch. Having to write your own dispatch table is quite an unpleasant experience - and you cannot be certain that you have covered all the cases. It’s the latter part that bugs me quite a bit.

An alternative solution which I quite like is to assume a style of programming where interface types are abstract data types. This is done in the package sort. sort.Sort is a generic function that takes anything that implements sort.Interface.

People say that having generics in Go is going to obiviate the need for things like sort.Interface… to which I only half-agree. You’re simply moving the levels of abstraction to a different part of the program. Instead of having to implement sort.Interface, you could imagine a generic []T:Lesser, where T would have to implement a Less function.

I think the takeaway is simply this for generic functions: abstractions HAVE to be somewhere in the program - Here are three quick ones I’ve gone through:

Boxing activity (like putting things in an interface{} or to the extremes, monads).
As part of the type signature (what people normally think of as generics).
As part of your representation (what we have right now with sort.Sort).

In Gorgonia

In Gorgonia I noticed two general classes of generic functions:

Data-type agnostic functions
Data-type gnostic functions

No, that’s not a typo. The first simply means that the function doesn’t care about the semantics of the type. uint16 and int16 are treated the same way - as 2 bytes. The latter are functions which care about what the data types are, because the types carry semantic meaning over its representation. uint32 and float32 are both a 4-byte slice, but how you interpret that 4 bytes is dependent on the types.

Typically data-type agnostic functions can be characterized as memory-moving operations. In the tensor package there are a surprising number of these: Transpose, Stack, Hstack, Vstack and Concat are all data-type agnostic. As long as a data can be copied from one place to another in memory correctly, that’s a job well done. For these the solution was simple - I wrote simple functions that would be optimized by the compiler (helped by me of course) into a bunch of runtime.typedmemmove calls. And because I have an inkling of affinities for particular sizes, I wrote optimizations for data of sizes 1, 2, 4, 8 bytes. The optimizations weren’t strictly necessary.

Data-type gnostic functions are a little more tricky. I ended up making an executive decision on only supporting the primitive types provided by Go. To do so, I wrote a code generator to generate methods*Why? I'm the kind of guy who gets "COMPLETIONIST" badges in video games... I think that's why. But of course if you introspect on what I do, it's highly non-internally consistent. So the real answer is I don't know why.. But also due to the way the data structure is designed, it’s trivial to extend to any other types - simply use the Tensor with an Engine that supports said type. You can see more in the examples section.

There is a third class of generic functions, of which the solution is pointer arithmetics. I won’t go much into the details of that because it’s largely irrelevant. In the case of Gorgonia, only one such function exists: the slicing of tensors.

Generic Computation

The solution to the last class of genericity was hinted in the section above. But the problem essentially lies with the fact that there may be a need to use a different computation engine to perform computations. By this I mean, using GPUs, FFIs or even a network device to perform computation. Practically it means using GPUs and cgo.

How does one even think about alternative modes of computation? Like all generic problems, the solution is generally find the points of commonality between the things we want to be generic about and abstract it out.

In Gorgonia

Considering only the very small subset of alternative computation engines, I found something in common: the concept of unsafe pointers. Having this shared notion of pointers allowed my code to freely swap out execution engines (with checks of course).

In Conclusion: Thinking about Abstractions

The problem of generics is essentially a problem of abstraction. Specifically, it’s a syntactic abstraction (as opposed to a semantic abstraction or a pragmatic abstraction which are whole different beasts). When thinking about abstractions it’s useful to think about invariants - what things don’t change. We can hence broadly categorize the kinds of abstractions available to us as one of these two:

Behavioural abstraction - the invariants are the behaviours of the data structure or type. If data structure A and B can both do something (by name, we’re not going to consider actual pragmatics), we consider them to be a instantiation of the same abstract thing.
Property-based abstraction - the invariants are the properties of the data structure or type. If data structure A and B both have the same property (of which it can take many values), we consider them to be an instantiation of the same abstract thing.

There are two things to consider for each syntactic abstractions when it comes to talking about generics - runtime consideration and compile time consideration. What a syntactic abstraction means and does at compile time? What does a syntactic abstraction mean at runtime? For the most part with Go, the focus has largely been on the run time consideration of abstraction. The majority of discussions around generics on the other hand busy themselves around compile-time consideration of such abstractions.

Recall earlier I mentioned the conflation of types and representation. Types are essentially compile-time consideration of abstractions, while data structures are the runtime consideration of abstraction. In short, here’s a table showing typical examples of solutions to said considerations:

	Behavioural Abstraction	Property-based Abstraction
Compile Time Consideration	Type classes	OOP-style subclassing/checks;
Run Time Considerations	Interfaces	Run-time property checks (ala Python)

The question ultimately is how do we reconcile runtime and compile-time considerations?

Some Solutions for Generics in Go

So in the section above I have laid out the different kinds of problems that I faced while writing and refactoring Gorgonia and dealing with generics. Naturally being the chatterbox that I am, I too have some (mostly uneducated) opinions about the possible solutions. They’re ranked in what I think is best for the language. I will also list some of the trade-offs I’d consider if I were designing Go.

But before that it’d be probably wise to state my biases upfront. Here’s what I care about, in priority (1 and 2 are tied):

Cognitive crutch for my ability to think - I like to think of programming languages as a new kind of pen-and-paper. It helps you think about a problem. Some languages causes you to think in more disciplined ways than others, and Haskell is one of them. I will admit to using Haskell as a “thinking” language when trying to solve problems. Using Haskell as a language to think about ways of solving a problem then writing the solutions down in Go or Python has improved my code significantly. This is the one thing that Go lacks in the more abstract parts of problem solving which Haskell, aided by its superior type system*I can't wait for someone to tell me to use Agda or Coq..
Mechanical sympathy - Something that is lost in Haskell is idea of mechanical sympathy. Because it’s such a high level language, you just have to trust GHC to do the right thing and compiles to the correct layout of bits that runs in the correct order. The user of the language is completely divorced from the running of the application. Indeed, many people have suggested this should be the way to go. Personally I’m a bit uncomfortable with that idea. I’m of course confident in SPJ and the GHC core team. But we still live in a world where Intel processors are built on the von Neumamnn architecture. Things are stateful almost all the time. Functional languages simply hide a lot of state in the runtime, making it inaccessible to the programmer. I’m personally not sure if that’s a good thing. I still however want to know how my data is laid out and how it will be executed in the machine, and how it got there*I've been told many times I have control issues and I tend to be a control freak. Make of that what you will. I personally find this to be much more of a flaw than anything..
Speed of iteration - Go is an absolute killer at this. It’s fast compilation to binary is probably one of the best things ever. I can switch mental modes from “experimental” mode to “production” mode in Go just like that while in Python, it would take a lot more instrumenting and refactoring of code around to go into production mode. Because it’s so fast, I would be willing to take some tiny amounts of hits if it means improving 1 and 2 better.
Ease of sharing code - One of the best things about Python is that it’s virtually pseudocode that runs. People can easily understand Python code. But we all know a guy who has been writing Python for the last decade or so and writes it in a strange way that is hard for people to understand*ahem, me. Go on the other hand takes the “only one way to do things” mantra of Python to the next level. Reading other people’s Go code is generally a pleasant thing.
Memory/Type safety - This is surprisingly low on the things I care about. I think memory and type safety are absolutely vital in modern day programs. However there are levels of memory and type unsafety that I am willing to accept and trade in for. For example, for a large number of programs I write, I’m not willing to have to think about a borrow checker although that would make my programs a lot more deterministic. Likewise, for many programs I shouldn’t have to think about types. The ideal programming language shouldn’t even expose the concept of types to the programmer.
Ease of sharing programs - Cross compiling on Go is so stupidly simple nowadays, for the majority of use cases (i.e. x86 and x86-64, for all three major OS families), this is a no-brainer. While I work on a primarily *nix based environment, I recently had to compile a program to be run on Windows and the process has been nothing but short, sweet and simple.

A Constrained Parametric Type System

The differing notions of genericity I listed in the section above, save for generic computation, can be indeed unified. As noted above, the problem of generic is a restatement of the problem of abstraction. A type system, particularly something like a HM type system or something more advanced like a forward-reverse PT type system in particular aids with the problems of abstraction. In fact having a sound type system would unify all the subclasses of generic problems above (with the caveat that functions are treated as a type, but it already is in Go as it is.)

The purpose of a type system is many. For one it prevents people from making stupid mistakes. As part of the tooling improvements for Gorgonia, I once fooled around with building a type-checker for Gorgonia programs. I never finished it but I suspect it would be easy for a large majority of cases, but the solution wouldn’t be able to generalize to more pathological use cases. I suspect a full coverage solution would be monstrously complicated - after all, there are elements of having dependent type system in here. That’d be taking a step too far for me - I just want to build my deep neural networks!

A second benefit of a type system (at least for me) is that I find having a formal type system actually helps me think about a problem. In particular, I found typeclasses to be one of the best aids of thinking (it should be no surprise why I like Go so much because of interface{}, or traits in Rust… not so much concepts in C++). I found the main benefit of typeclasses and interfaces is that they act as constraints, albeit at different levels.

The differences are subtle and would probably take me a whole poorly worded blog post to poorly explain it. But the general gist of it is that typeclasses perform its work in mainly the compile phase while interfaces acts as a constraint at compile time (prevents wrongly typed values from being placed in variables), but a passageway to let different values in at run time.

The way I think about it is thusly:

Typeclasses enable meaningful parametricity of types*You could build a parametric type system with no typeclasses ... ahem like in Gorgonia ... but it wouldn't be particularly useful.
Interfaces enable meaningful parametricity of structure*You can build a parametric data structure if you're willing to use unsafe pointers everywhere.

The compile-time work nature of typeclasses means that a compiler could generate code such that no checks are required at runtime. This comes with a cost of course - the cost of additional program analysis during the compile phase and also the cost of tree-shaking if the intent is to keep binaries small.

The good news of this idea is Go already has the ingredients it needs to actually implement a parametric type system - nothing much has to change, except the program analysis part, and perhaps building in more interfaces (the only one in the language is error). In fact this document by Ian Lance Taylor pretty much covers the majority of what I want to say. The only thing I want to go further and say is that a syntax change is in fact not even required!

If you squint very carefully, the restrictions that Ian talks about are pretty much typeclasses. The difference is that while he wants to make the typeclasses known only to the compiler, my suggestion is to simply make interfaces work as typeclasses in the compile phase and instantiate every named interface. The only language change would be if the typeclasses were made into public, built-in interfaces like error.

The third benefit which I think is missed is type inference. In the document above, Ian actually laboriously listed out all the rules for type inference. And one of the reasons why the proposal wasn’t implemnted was because there are very many rules for inferencing types. However I think while I applaud Ian’s efforts in exhaustively listing the rules, I think the rules could do with a bit more discipline, and pretty much be abstracted into 4 or so general rules. Of course this is me speaking with a 1000-foot view of things. The reality of a proper programming language used in production is always a little more nuanced than that and there are always more complications than expected.

Having a sound type system in Go would mean that type inference for return values would be fairly trivial, instantly solving the problems of generic functions. Instead of having to write func max(a, b interface{}) interface{}, I can now write func max(a, b Ord) Ord. In this case the type variable is implicit. An alternative syntax that includes a type variable would be func max(a, b T:Ord) T

Concretely, if I were to plan a transition to a more sound type system, these are the rough steps I’d take*This is where you go.. phew thankfully chewxy's not on any Go committee!:

Move to instantiate all named interfaces (with one or two exceptions: error for example). This makes interfaces act like typeclasses in the compile phase. This has a huge downside of having much larger binaries because every data structure that has a named interface field will now have copies.
“Smarten” up the instantiation process - only instantiate data types of a particular size once.
Add a type inferencing system that works globally (this means given any statements/expressions and environment, a Go compiler would be able to infer the type).
Introduce any syntax change if necessary.

What I’m suggesting is simply adding a small language like GHCCore into Go that only runs at compile time (think of it as a IR that does all the controls of types before SSA). I personally would like to see more type inference work being done in Go because I think it’s a fantastic small language.

Code Generation

Code generation is a particularly interesting one. I actually wrote a program to generate the code necessary to handle generic functions. I’m not sure what I actually feel about it. On the one hand it was the singular most helpful tool I had while writing Gorgonia. On the other hand I think it’s the largest wart.

A friend of mine, Gary Miller once said something to me at a meeting that left quite a deep and lasting impression. He said “there are only two kinds of code you generate: code that you generate that is destined to be edited by humans, or code you generate and don’t ever touch”. The implication of course, for the latter case, we might as well just generate binary straightaway (or assembler or something that humans would not like to touch). And if you phrase the question just right, all generics question are indeed questions about the latter case: the discussion of generics are all about generating object code that humans will not touch.

I think here is value in the former case as well. At least based on my experience - generating specific code for humans to touch and edit affords the opportunity to introspect and optimize the data structure to better fit the purpose of the application, rather than have a one-size-fits-all thing. Now I note that that’s not necessarily the goal of everyone. Some people just want to get shit done, and I appreciate that. In fact a majority of career programmers do that - as a data scientist at work, I do that a lot too: I just want shit that works because everyone’s breathing down my neck on the latest machine learning thing, optimizing opportunity be damned.

Given that Gorgonia at this point has evolved into my passion project, I can actually spend some time to optimize and remove unnecessary data structures that were put in place earlier on.

I’ll give some thought to two kinds of code generation we generally see (and both are what I did for Gorgonia), and mainly talk about what I don’t like about them:

Templated code - template code like in C++’s STL contains logic. And it’s Turing complete. That’s a nightmare in debugging there, ladies and gentlemen. Yes I am aware that it’s merely a choice to write convoluted logics in your template - you don’t have to, but other people might. And you may have to read their code.
Macro expansion - what I don’t like about macro expansion are macros in macros in macros. It makes reading and understanding code very difficult. In C, you could even define circular macros (which apparently is rife for abuse which is why it had to be spelled out in Section 6.10.3 in the C specs on how to handle such things)

Within genlib and genlib2, I used the text/template package to generate Go code. Thankfully the package doesn’t allow users to do really complicated logic things, which is why those logic things had to be implemented in Go. In the original genlib library my rule was to make the substitution bit as dumb as possible - that meant a lot of copy and pasting of templates. But when I wrote genlib2 I just went ahead with calling templates within templates… like you would do in C macros. I’m not particularly proud of it because it makes my code difficult to read, but I wasn’t really feeling up for writing more copy and paste template at that point.

Type Erasure-style Generics + Specialization

When people say type safety is one of the benefits of having generics, they usually mean it in the sense of as type erasure. It’s simply another way to implement generics. I personally am not a fan of the idea. On the surface it may seem that instantiation within a type system mentioned above is similar to this, but there are subtle differences. Type instantiation simply means you create another label (and cloned representations, smartly), while type-erasure style generics usually implies that there are magic dispatch tables written behind the scenes.

Let’s imagine a variant of Go with type-erasure style generics. We declare this:

type list<T> struct {
	data T
	next *list
}

func main() {
	a := new(list<int>)
	a.data = 1 // works
	a.data = "hello" // compile error
}

What would happen if Go had Java-style type erasure style generics? Internally list would be represented like this:

type list struct {
	data interface{}
	next *list
}

During the type checking phase, the compiler will recognize that a is a list of int. It checks that 1 is indeed an int, and allows that statement to pass. Then the type information is erased. At run time, the runtime has no way of knowing that a was list<int>. Well, there are ways around that but the simplest implementations (which is still being used in Java IINM) simply erases the type information.

When executing methods that require the use of the data field, a type switch is done internally. Another way to work around this is to have type specializations: basically the compiler will create a clone of list, one for each type. Calling its methods then essentially requires a dispatch table.

The problem with dispatch tables and type switches is - Go isn’t very good at optimizing them. If it were, this blog post wouldn’t exist. If runtime.getitab and the various interface switching functions work fast enough and don’t allocate memory on the heap, I wouldn’t have gotten on a 10000 word rant about generics. Adding another layer of dispatch tables would only serve to make things worse, in my opinion.

Additionally there is also the same issue of code bloat. Of course, the workaround is to specialize for types known at compile time and everything else will have to use a slow path. But that requires smarter program analysis. If the workaround this is better program analysis, we might as well go to the first option and implement a proper type system for Go.

There isn’t really much else for me to say about type erasure style generics. I don’t think it is particularly a great solution. Even in C# and Java where generics have been there for yonks, it still feels like a tack-on solution. If Go goes down the path of type-erasure style generics, you can bet that all the solutions will have more tacked-on solutions which will have more tacked-on solutions. The more “wholesome” way to go in my opinion is to gradually introduce a sound type system.

JIT in the Runtime

Instead of having type specializations, the backend could change and a JIT compiler is used to generate code once a hot loop has been found. HAHAHAHAHAHAHAHAHA. No.

This won’t happen. At least not until AppEngine allows for such things. Go has a long standing policy of not having runtime code generation due to use in AppEngine.

Conclusion

This blog post is way too long. It originally started out as a outline of refactors I did for Gorgonia’s tensor package and how it was designed to enable generic types. I then (over)shared some thoughts regarding generics in Go, and the complications that I went through while implementing the tensor package. Finally I gave some half-assed suggestions before running out of steam and here we are.

Throughout the blog post I hadn’t really discussed trade-offs. Instead I merely hinted at them. I think a much more in-depth analysis of the trade-offs need to be done. Please do not take any of my suggestions as statement of fact as once again, I’m not actually a professional in this. I don’t design languages for a living. To be honest I’m not even sure if this qualifies as a good Go experience report, because if it is, it’s extremely unfocused, and I am more than confident that it restates a lot of the things other people have noticed.

Do I think Go needs a system to handle generics? It’s complicated. Outside of Gorgonia, I’ve never really needed to handle generic data types. Or if I did, it would be usually one or two functions where a type switch would suffice. The majority of gripes I have with the lack of generics in Go is quite simple: interface{} causes way too much allocations, which cause performance degradation. I miss the good old days of pre-Go 1.4 where values could be stuffed into an interface. My concern is largely with performance. I have to jump through a lot of hoops to get something relatively performant out of Go - this was due to not having access to the runtime internals. I guess this is a reason as good as any for having generics - not having to dick around runtime internals to get performance that should have been there in the first place.

Really, it’s not so much the lack of generics that frustrates me either. To me the most frustrating bit is that Go already has all the ingredients necessary to make a decent language with a sound type system without much changes in the front end (for example, I think the go/types library written by rsc and adonovan is fantastic) . Again I can definitely understand why this wasn’t implemented. You could implement a “shadow” type system in Go’s compiler right now (which is to say, there is no need to change anything in the frontend), but the program analysis phase would take much longer (I’d assume it’d take almost as long as if you did pointer analysis with go oracle). The Go core team has clearly identified compile times as a main issue of concern and a longer analysis time would be detrimental. That much I can understand - in fact I removed typeclasses from hm precisely because constraint solving took too long for my program.

I doubt my experience writing Go programs is quite the same for a large majority of people out there. I don’t seem to write a lot of network related stuff, nor do I write a lot of web stuff. And when I do, I have never needed to use generics - I tend to be quite conservative with my data structures and write really specific ones (for example, I often write specific structs for specific SQL results). I find personally for me this makes me think of the problem clearer. I guess you could say that this is the blub paradox speaking.

All in all, my thought about this matter is: It’s unclear. I almost feel like a lot of the problems I have stem from not the lack of generics but elsewhere. Having generics would definitely make a large proportion of the problem Gorgonia fixes go away, but thinking deeper about it tells me some solutions are better than others (type system is better than anything else), and some are far worse because they only cause more problems down the road (type erasure style generics). The more in-depth I go about the problem, the more I lose sight of the problem. Now everything is a muddy grey.

Tensor Refactor: A Go Experience Report

May Contain Some Thoughts on Generics in Go

Background

TL;DR of Motivations

Brief Introduction to `tensor`

Why Not `interface{}` ?

Why A Generic Multidim Array?

Dense Tensors: A Recap

The Refactor

Implementing a Sparse Matrix

Towards Pure Data Structures

Algorithms + Data Structures = Programs

The Problem of Dispatch

Further Cleanups

The Results

Performance change summary

API changes summary

Vanity Statistics

Future

Some Thoughts on Generics in Go

Types and Representations

Different Notions of Genericity

Generic Data Structure

In Gorgonia

Generic Types

In Gorgonia

Generic Functions

In Gorgonia

Generic Computation

In Gorgonia

In Conclusion: Thinking about Abstractions

Some Solutions for Generics in Go

A Constrained Parametric Type System

Code Generation

Type Erasure-style Generics + Specialization

JIT in the Runtime

Conclusion

Further Reading

Background

TL;DR of Motivations

Brief Introduction to tensor

Why Not interface{} ?

Why A Generic Multidim Array?

Dense Tensors: A Recap

The Refactor

Implementing a Sparse Matrix

Towards Pure Data Structures

Algorithms + Data Structures = Programs

The Problem of Dispatch

Further Cleanups

The Results

Performance change summary

API changes summary

Vanity Statistics

Future

Some Thoughts on Generics in Go

Types and Representations

Different Notions of Genericity

Generic Data Structure

In Gorgonia

Generic Types

In Gorgonia

Generic Functions

In Gorgonia

Generic Computation

In Gorgonia

In Conclusion: Thinking about Abstractions

Some Solutions for Generics in Go

A Constrained Parametric Type System

Code Generation

Type Erasure-style Generics + Specialization

JIT in the Runtime

Conclusion

Further Reading

Brief Introduction to `tensor`

Why Not `interface{}` ?