“I like small languages,” said a friend of mine.
“Yeah, me too. Wait. What do you mean by small languages?” I replied
That led to a series of discussions about what a small language is. We eventually enumerated a list of languages which we knew and could classify. Languages which we mutually agree are small are listed in small fonts; languages which we mutually agree are large are listed in large fonts:
So what exactly is a “small language”, and why do people clamour for it? One metric discussed is the number of reserved keywords a language has. We quickly (and I must add, roughly) looked up and tabulated the results:
|Language||Reserved Words Count||Sources|
|Scheme||20(R4RS); 0(R5RS)||R4RS (PDF); R5RS|
|Lua||21||Lua Reference Manual(§2.1)|
|Python||31 (2.7.8); 33 (3.4)||Python 2.7.8 reference; Python 3.4 reference (§2.3.1)|
A big problem with using keyword count is clearly that it’s not really representative of the language, though it is indicative of what we’re looking for. C# and Java have a large number of keywords (the word “enterprise” and “bureaucracy” also comes to mind when these two languages are mentioned), and are often thought of as large languages.
The use of keywords count also shed some light on the question of “what is a small language”. For such a short phrase, it’s surprisingly ambiguous. What could a small language mean? Is it small in terms of its built in functions? Is it small in terms of the standard library that comes with the language? These two do not seem to be likely what people are most often talking about when it comes to small languages. A small grammar could conceivably be what people mean when they talk about “small languages”.
So I thought to myself: why not compare grammars?
Fortunately most programming languages have a well-defined grammar and they can be usually be expressed in (Extended) Backus-Naur Form. And fortunately too, most language specifications actually do specify the grammars of the language in some vague EBNF-like form.
So the solution is to count the number of production rules in the official language specifications. I’m going to ignore C# (because if there are 79 keywords, imagine how long the EBNF is going to be - Here’s where to find the C# 4.0 EBNF), and Java (the Java 8 specification has 17 pages worth of EBNFs.)
Some grammar specifications were embedded in long web pages, which I have extracted. Most of these grammars are not proper EBNF, but they resemble closely enough in the sense they follow this pattern
Production Rule [separator] Nonterminals (some language spec has terminals in them). I also transformed all the grammars so that 1 line contained 1 production rule. Counting the number of production rules is simply counting the lines in a file.
Here’s what the languages I considered looked like:
|Lisp * that you built in a day||< 10|
|Lua||22||Lua 5.1 Reference Manual|
|Python||82||Python 3.4.1 Complete Grammar|
|Go||153||Go specification; Extraction Script|
|Haskell||78||Haskell 2010 report|
So there you have it. A comparison of grammars based on how many rules there are. These are the “lightweight” languages. Needless to say C# and Java will have a lot more rules.
As always (in most of my blog posts I have one of these), there is a caveat. For each language there can be multiple EBNFs - it depends on how you want to specify your grammar. The EBNFs I used are mostly official grammars from the language specifications themselves (some like C are printed on dead tree paper, so I’m not going to type that out, so I shortcutted and used someone else’s).
Perhaps another way to think of “small languages” is what kind of parser could be built for those languages. If a LL(1) parser could be built for the grammar, that implies it’s a simple grammar, and that means it’s a small language (yes, I am aware that I am conflating the concepts of simplicity with size). That pretty much also means that in the list above, only Python could be parsed (and even so it does with a lot of crutches).
Small Languages, A Conclusion
I can definitely see the appeal of small languages. A small grammar implies a smaller cognitive load. This leaves more space for the structure of the program that you’re writing itself. Of course small languages can be complex too - I often had to look up where to put the
* during the course of whatever little C programming I have done (can’t remember the spiral rule for the life of me).
I think this also highlights how important the stdlib of a language is. Out of these languages, I love and use Go and Python the most simply because they have fantastic stdlibs. Small core, but great stdlibs, that’s what I look for in a language.
But surely this is not the only appeal of small languages? Tell me why you like small languages below.