This may come as a surprise for many people, but I do a large portion of my data science work in Go. I recently gave a talk on why I use Go for data science. The slides are here, and I’d also like to expand on a few more things after the jump:
[Read More]Data Empathy, Data Sympathy
Today’s blog post will be a little on the light side as I explore the various things that come up in my experience working as a data scientist.
I’d like to consider myself to have a fairly solid understanding of statistics*I would think it's accurate to say that I may be slightly above average in statistical understanding compared to the rest of the population.. A very large part of my work can be classified as stakeholder management - and this means interacting with other people who may not have a strong statistical foundation as I have. I’m not very good at it in the sense that often people think I am hostile when in fact all I am doing is questioning assumptions*I get the feeling people don't like it but you can't get around questioning of assumptions..
Since the early days of my work, there’s been a feeling that I’ve not been able to put to words when I dealt with stakeholders. I think I finally have the words to express said feelings. Specifically it was the transference of tacit knowledge that bugged me quite a bit.
Consider an example where the stakeholder is someone who’s been experienced in the field for quite sometime. They don’t necessarily have the statistical know-how when it comes to dealing with data, much less the rigour that comes with statistical thinking. More often than not, decisions are driven by gut-feel based on what the data tells them. I call these sorts of processes data-inspired (as opposed to being data-driven decision making).
These gut-feel about data can be correct or wrong. And the stakeholders learn from it, becoming experienced knowledge. Or what economists call tacit knowledge.
The bulk of the work is of course transitioning an organization from being data-inspired to becoming actually data-driven.
[Read More]Sapir-Whorf on Programming Languages
Or: How I Got Blindsided By Syntax
Tensor Refactor: A Go Experience Report
May Contain Some Thoughts on Generics in Go
There has been major refactors done to the tensor
subpackage in Gorgonia - a Go library for deep learning purposes (think of it as TensorFlow or PyTorch for Golang). It’s part of a list of fairly major changes to the library as it matures more. While I’ve used it for a number of production ready projects, an informal survey of found that the library was still a little difficult to use (plus, it’s not used by any famous papers so people are generally more resistant to learning it than say, Tensorflow or PyTorch).
Along the way in the process of refactoring this library, there were plenty of hairy stuff (like creating channels of negative length), and I learned a lot more about building generic data structures that I needed to.
[Read More]Garbage Collection is Also a Side Effect
21 Bits Ought to Be Enough for (Everyday) English
Or, Bitpacking Shennanigans
I was working on some additional work on lingo, and preparing it to be moved to go-nlp. One of the things I was working on improving is the corpus
package. The goal of package corpus
is to provide a map of word to ID and vice versa. Along the way package lingo
also exposes a Corpus
interface, as there may be other data structures which showcases corpus-like behaviour (things like word embeddings come to mind).
When optimizing and possibly refactoring the package(s), I like to take stock of all the things the corpus
package and the Corpus
interface is useful for, as this would clearly affect some of the type signatures. This practice usually involves me reaching back to the annals of history and looking at the programs and utilities I had written, and consolidate them.
One of the things that I would eventually have a use for again is n-grams. The current n-gram data structure that I have in my projects is not very nice and I wish to change that.
[Read More]The Double Blind Monty Hall Problem
The picture above is of my lunch today: three muffins baked with MyProtein’s muffin mix. Two of them contain raisins, and one of them contains chocolate chips. I had forgotten which is which. I personally prefer raisins, as the chocolate chips had sunk to the bottom of the pan making a gooey mess that sticks to the muffin papers during the baking process. An initial thought that I had was concerning the probability of choosing a subsequent raisin muffin after I had eaten one. Naturally, in scenarios where there are 3 unknowns and one was revealed, my thoughts get pulled towards the Monty Hall problem.
[Read More]The Handmaid's Tale
Or, How to Successfully Oppress Women
Hulu just released three episodes of The Handmaid’s Tale, an adaptation of Margaret Atwood’s book of the same name. I binged watched it over the weekend, and I had some difficulty being immersed into it. There was something about the world that didn’t sit quite right with me in this adaptation, but I couldn’t quite put a finger on it. Eventually of course, I figured it out.
The story is set after the fall of the United States into Gilead. Not much is known about the world at this point, and so much is still up for speculation. However we the audience are entreated to these details:
- Pollution and toxicity is so bad that it caused fertility rates to drop.
- Congress was slaughtered, martial law was enacted.
- A theocracy hijacked the government under martial law.
- Communications between the populace and government is reduced - it would appear that the Internet doesn’t exist.
- Swift changes were made to the laws of the land, leading up to the scenarios we see in the show.
So what was it that made me unable to pay attention to the world building of the show? The speed at which things fell into place - it all happened within a few years of Offred’s lifetime.
[Read More]Deep Learning from Scratch in Go
Part 1: Equations Are Graphs
Summary: In this post I talk about the basics of computations, then I go on to show parallels between Go programs and mathematical equations. Lastly I will attempt to persuade you that there is a point in using Go for doing deep learning.
Welcome to the first part of many about writing deep learning algorithms in Go. The goal of this series is to go from having no knowledge at all to implementing some of the latest developments in this area.
Deep learning is not new. In fact the idea of deep learning was spawned in the early 1980s. What’s changed since then is our computers - they have gotten much much more powerful. In this blog post we’ll start with something familiar, and edge towards building a conceptual model of deep learning. We won’t define deep learning for the first few posts, so don’t worry so much about the term.
There are a few terms of clarification to be made before we begin proper. In this series, the word “graph” refers to the concept of graph as used in graph theory. For the other kind of “graph” which is usually used for data visualization, I’ll use the term “chart”.
[Read More]