I recently visited China (my writeup was in three parts: Part I, Part II, Part III). An incident of particular note was in a taxi in Nanjing. Picture this: The driver is on the left side of the vehicle. On left edge of the windshield, a Samsung Galaxy Note sits on windscreen mount, connected to the cigar lighter on his right. The cigarette lighter also powers another smaller feature phone which sits on top of his dashboard. Next to the air conditioner vent of the front panel, a walkie talkie sits on its cradle.
We were on a fairly long journey (about 20km ish), and the driver was talking to us, trying to upsell us his services for the whole day. We talked about the local sights, the museums and what nots. Then CRRSSSHHH, an incoming message from the walkie talkie – it was something traffic related. The driver pressed the transmit button on the walkie-talkie, acknowledging the message. Then came a different TCHSSHH sound. A woman’s voice came into hearing. She asked about lunch. The driver leaned forwards, picked up the feature phone, and pressed a button and talked into it, explaining that he was with passengers and his general direction. Upon finishing that conversation, he continued our conversation, picking up from where he left off.
This continued to happen throughout the journey – the driver would be switching between different modes of verbal communications – real life, push to talk, walkie talkie and even his mobile phone. The driver was dealing with 4 different networks at the same time (walkie talkie – some kind of trunked system, since most of it were traffic related; the push-to-talk feature phone – which I assume to be some kind of PTT powered by cellular tech; mobile phone – full duplex radio; and talking with the passengers of the car). That sparked an idea.
Here’s a bit more background. I had developed an interest in trunked radio networks and half-duplex communications when my way-more-accomplished-than-me partner was working as an E&E engineer for the telecomms industry* I think she’s more accomplished than I am, given that she’s now working for a certain search engine company while I have tried and failed at least 5 times with that same company, twice within the last 7 months . So I had some good ideas on how CB and trunk radio networks worked.
At the same time I was having a bit of trouble with the VPN the previous night. The solution was simple – I ended up rolling my own VPN on AWS, swapping elastic IPs for the EC2 instance every few hours and updating encryption key everyday. In short, it was a mess.
So the idea was born: what if you could have an ad-hoc (read: P2P) chat network that was private (read: encrypted), and you could juggle different networks at the same time? After a few rounds of refinement of the idea, I started working on the prototype application that night.
 
How It Got Here
Why P2P
The idea that the program had to be P2P was there since the earliest idea. You see, the way the Internet is structured isn’t quite like the way radio waves work. Radio waves work by broadcasting. A message is sent from the source, and anyone who can receive the radio wave will be able to receive the message. Of course there are stuff like encryption, legally distributed frequencies, and similar things in play to ensure that not everyone gets the message.
The Internet doesn’t work like this. The Internet works by having valid connections between two endpoints. Yes, there are connectionless protocols (IP is one such protocol as is UDP), but they don’t work like radio. A radio transmitter can continuously transmitting even if there were no receivers. Heck, pulsars are nature’s own radio transmitters – they’re essentially transmitting to nobody. UDP comes closest to radio, and is high level enough that I am able to quickly read up and roughly understand it.
I could of course simulate a broadcast system. I had written a number of chat applications before (one of my finest was basically a MUD, but with Second-Life-esque twist to it – this was in the late 90s/early 00s). Most of the chat applications I have ever implemented relied on the fact that there is a need for a server and client. A broadcast system would be easy to simulate with a server and client. I didn’t want that, however. It meant that if I were to productize this idea, it’d be a lot of work for me. I was on holiday and didn’t want to deal with that.
The obvious solution was then a P2P system. Indeed, VOIP systems like Skype use a special form of P2P (with supernodes and all that). I thought I could deal with P2P. In fact, the simplest chat systems are almost always P2P. Of course the simple chat examples that float out there on the Internet almost always only worked with two participants. Add any more than that, and complexity grows (or one of the participant becomes a server).
I clearly didn’t want that either. But I wanted the system to be P2P and distributed – meaning no central server. In the early days of my startup, I wrote a Kademlia based deployment system* It was my second time writing a Kademlia based system. The first time I had help from none other than the late aaronsw himself. with Python and 0MQ so that we could deploy it onto AWS or any cloud based system and it would autoconfigure everything by itself. I am sorta familiar with DHTs. So I thought, why not make a DHT-based chat, and to keep the walkie-talkie idea around, turn the idea of client and server on its head?
Why Go
I considered three languages for this project: Python, Go and JavaScript. A day prior to my trip to China, I met with Sylvia and Damon of rtc.io. They are pretty cool people (with technical abilities that far outstrip mine). Normally, writing something like that in JavaScript will not even cross my mind, since you know, I’m pretty vocal about my dislike for JavaScript (and yet, I wrote a book on JavaScript!). But their project was pretty cool, so I thought I’d give it a hack.
Python was considered because it’s the language I like the second most (Go has taken over as my favourite language since a couple of years ago). Also I have been writing Python for close to 10 years already, so I’m exceedingly familiar with Python. Plus I had written a Kademlia-based system twice in Python. That was also a con, since I had implemented it in Python twice, I didn’t want to re-implementing it in Python again.
Ultimately, I chose Go because well, I had all the packages and documentation (thank you godoc). An internet related issue (aka my self-rolled VPN fucked up somewhere along the way) made me unable to access anything from rtc.io, so that was out of the window. I didn’t have some of the required python packages in my laptop (notably 0MQ), so it couldn’t work either.
Why Kademlia?
There were two main reasons for choosing Kademlia: 1. I am familiar with it – and given enough concentration I could write it in a hotel room without Internet access. 2. Broadcast over Kademlia is difficult. When I worked on my Kademlia based deployment system, I tried to make broadcast work. I felt I was close, and I wanted a challenge. After a few years of sitting at the back of my head, I think I figured out how to broadcast over Kademlia.
Broadcast on Kademlia is hard because it’s slow and it’s subject to the classic CAP-theorem-choose-two dilemma. A brute-force approach (also known as flooding) – i.e. going through every single node in the address book and sending a message will also take a long time. Any distributed method of broadcast will almost certainly yield replication. My idea was this: since the address space is partitioned already, instead of sending a message to each remote node individually, send a message to the top two nodes of each bucket (i.e. the closest two nodes to the current node). These two nodes are then individually responsible for sending the message out to their individual buckets. Minimal replication will occur, and is used to signal the end of a process.
Now, that didn’t turn out as expected. Once again, I feel it was close, but other design concerns came forwards and the broadcast-across-Kademlia thing had to take a step back.
There are also other DHT-based systems that could be used. In particular, I considered the use of the Pastry DHT, which would be a good fit for what I have originally had in mind, since broadcasting is a piece of cake* Geddit? Pastry, piece of cake? If I have to explain the joke, it’s no longer funny . However, I wasn’t too familiar with Pastry. I was far more familiar with Kademlia to be able to implement it without internet access.
On Chatrooms
One feature I knew I wanted was multiple channels* Also, am I mad or did Go once have something like channels over networking topology? I somewhat recall this because I was doing a lot of 0MQ stuff then . This was indeed inspired by the taxi driver’s crazy ridiculously awesome way of handling multiple conversations at the same time.
The initial design was simple: Connect to a Kademlia network. When broadcasting, broadcast the channel ID along with the message. If the receiver has the channels on the listen list, then accept the message.
This works pretty much like radio. It also has the same flaws radio has: anyone with access to the channel ID will be able to listen in to communications. So the logical answer is encryption, right? Do advanced key-exchange behind the comms channel, and encrypt all messages. Only people with the correct public keys can decrypt the message and read it. Problem solved, right?
Not really. A typical person has about 30 people that are in constant communication. That means for each message, each client will have to attempt to decrypt 30 times. This would be a massive waste of computing power. My back-of-the-envelop calculations also informed me that it would take minutes for the message to propagate to a large-enough network. And this is with taking the best case scenarios of the fallacies of distributed computing in mind. In fact, the whole simulating-radio broadcast idea was terrible. Clearly not good enough.
I also tinkered around with subnets – the idea is that since the Kademlia network automatically rearranges nodes based on their last-contacted recency, channels would be mostly at the top of the buckets. The Kademlia network can hence be partitioned into subnets, and priority would be placed to transmit over the clique. That idea led somewhere but I didn’t have the time to fully implement it. I ended up being too lost in figuring out graph colouring algorithms for another project I had.
Yet another idea I had was multiple DHTs – a client can handle multiple DHTs. This turns out to be a silly idea. If a chatroom contains 5 participants, and only two are online, a third person would have to know either one of their IPs. The solution to this would be to have some sort of authenticating server, which I didn’t want to have. I wanted it entirely P2P.
As always in computing, the solution can be found in caching (leading to cache invalidation as one of two hard things in computer science. The other is naming things, and off-by-one errors). Instead of broadcasting the message across the entire network, the idea evolved into a chatroom like idea instead of a channel-like idea.
The idea for chatrooms is this: Everyone who’s allowed onto a chatroom is given a public key. Using the FIND_VALUE command of the network, the new node searches for machines within the network that has access to the chatroom. Upon success, the new node will contact the node with the same public key, and some sort of challenge question involving crypto is issued. If the new node passes the challenge, then the node with the public key shares information about the room’s participants to the new node.
This solves the problem of having to broadcast and propagate a message throughout an entire Kademlia network, and probably save a few computing cycles of different machines.
Audio??
I started off with this project with great deal of hope of actually simulating a trunked radio network on the Internet using P2P methods. I also like push-to-talk type of thing (half duplex systems in general) – in one of my Python 0MQ talks (this one is the closest to the one I actually gave), I demonstrated live how to build a push-to-talk system with 0MQ.
Unfortunately I spent more time than expected wrangling with PortAudio on OS X. I’ve never been good at writing multimedia software* heh, I am not that good a programmer either . But I couldn’t quite figure out how to serialize the PortAudio chunks into []byte and get it to be playable at the receiving end. Also, I didn’t have enough resources to delve into the murky world of mp3 and AIFF binary formats. That, and UDP based communications were a hassle. In the end I settled with a text based system.
Maybe one day I’ll actually add the voice thing in.
Overview
So, how does the system work?
- Start client
- Start UDP listener for incoming chat messages
- Bootstrap or connect to a Kademlia network
- Start New Chatroom
- Register on kademlia that the node is on the chatroom (so that other people can find the chatroom)
 
- Request A Chatroom
- Issue FIND_VALUEto Kademlia
- Kademlia replies with a remote node that has the chatroom
- Request permission to remote node for chatroom
- Remote node issues a challenge
- Reply challenge with encrypted digest
- Remote node verifies encrypted message == challenge message
- If verified, remote node sends chatroom information to requester
- Accept chatroom information, send hello message to everyone
 
- Issue 
- Once access is given to a chatroom, messages are sent and received freely
If I can summarize in a picture, this is how it’d look like:
 
The three blue nodes belong to a chatroom. The grey nodes are just part of the network which enables the finding of peers easier. The blue nodes can find one another via the network. Once they have found each other and performed the correct authentication, the picture now looks something like this:
 
The blue lines are communications outside the Kademlia network. They’re direct P2P connections with one another. Messages are sent directly to each other, bypassing the Kademlia network altogether. Note that the blue lines now form a complete subgraph. That’s what chatrooms essentially are: complete subgraphs. The Kademlia network exists merely to assist in building this complete subgraph.
The Project
The project can be found on Github: NanjingTaxi
Caveats
There are a few caveats. Since I’m not that great a programmer, please do not assume that because RSA is used, and key exchanges are done, it’s secure. In fact, you should treat it as far from secure. My knowledge in distributed systems and network programming and P2P stuff isn’t as great as say Henry Robinson, Peter Maymounkov or Beej. I probably made a lot of errors in my reasoning for a lot of things, and even more in my implementations.
I’ve only ever tested this in a local area network. In fact I would even go so far to say that it’d only work if there are no port forwarding or NAT type of thing happening. I have a multi-router set up at home. When my laptop is connected to the NAT’d router, nothing gets through. I’m quite sure there is something that can be done with regards to the greater internet (UDP hole punching, NAT handling and shit like that), but I haven’t yet implemented them. Perhaps one day.
My code isn’t the most beautiful either. Please forgive the ugliness of my code. I left in a lot of hacky stuff, like panic() everywhere, instead of properly handling the error. I’ll fix that one day. I often imagine myself going through my own code as a third person, and making remarks to my own code. If I were my own code reviewer, I’d have thrown this code back to myself to rewrite.
Using this application is cumbersome. I should probably find some way to make it easier. Currently it’s not even a proper CLI. It’s a super hacked up version of a CLI, with preset commands to connect, and introspect.
Afterthoughts and Lessons Learned
I decided to open source this because I couldn’t figure out a way to productize Nanjing Taxi. I mean, builidng a proper UI and feedback loop is probably quite easy. But to properly bring this to market requires a lot more work. I have to juggle quite a lot more things which requires a lot of my time – applying for jobs* Since I apparently suck so hard at trying to raise funds for Pressyo , working on eyemap.io, or even actually updating my books to the latest version.
It took me quite a while to get this onto a new github repo since I pretty much spent the last week and most of this week trying to salvage yet another drive failure (for the 2nd time in a 12 month span). I also had to fix some of the problems I had found.
I should also have probably used the net/rpc package instead of trying to recreate everything on a low level. Although, it was indeed quite fun to figure these things out on a lower level of abstraction than what I am used to.
I relearned a lot of things with regards to basic network programming – things that I had forgotten. I also learned a lot of new stuff. For example, one of the examples I tinkered with was with UDP multicasting. That didn’t work but I learned quite a bit about multicasting. I also finally got around to playing with PortAudio, but alas no result came of it.
Lastly I discovered that it wasn’t until Go 1.2 that newlines in OS X’s terminals were fixed when fmt.Scanf (or anything that dealt with os.Stdin) was used.
Conclusion
This was a little fun project I threw up in a couple of bored nights in Shanghai and Nanjing. I can see a lot of advantages to having a P2P chat system that is somewhat secure. Alas, I have no resources to properly productize this idea, so I’m releasing it to the open source world. It’s not the best code, nor is it the best system, but it’s a good learning point for me.
I hope you have fun with the project too.