same-o-matic

Need to connect a Freenode channel to a room in HipChat? Or want to give customers a simple way to connect to a channel in your Slack team?

Sameroom does that!
Learn more

 Subscribe via RSS

Tag: development

Why Erlang Matters

By Peter Hizalev

Peter is the co-founder of Sameroom and its chief technical officer. This is a re-post of a 2012 paper.

I still remember vividly my first encounter with a SMP computer. This was around 1996 and it was a dual-socket Pentium Pro monster. It blew my mind.

Growing up with the x86 [r]evolution, I was always curious about the innerworkings of computer hardware. My first 8088 board was simple enough that I could understand every signal line in it. Now, this Pentium Pro machine before me had two processors and two sets of L1 and L2 caches, with RAM shared between them. And I was wondering—how was it possible to keep caches and RAM in sync? How would someone actually write software for this incredible beast?

Fast-forward to 2012. Two things happened: the internet and the power wall.

The internet ignited the renaissance of distributed computing. We often cite operations of a scale unprecedented in pre-internet information systems. But this is actually true for just a handful of companies. What really undid traditional information systems is how quickly the capacity requirements grow once а product hits the hockey stick. This made old systems terribly uneconomical. Seemingly overnight, the scale-up approach of expensive, proprietary systems was replaced by scaling-out generic, consumer-level kits.

The power wall killed the great hope that we can keep scaling single-threaded performance forever. While superscalar improvements were still there, CPU designers finally took the SMP revolution to the streets in the mid 1990s, when we witnessed the emergence of multi-socket PC servers. By the 2010s, multiple core CPUs became commonplace, all the way from laptops to smartphones.

A seemingly independent development was the idea of connecting multiple computers with a network and no shared memory. It was years in the making: core theoretical concepts behind distributed computing were developed in the 1970s, with continued research and some commercial applications emerging in the 1980s. In the 1990s, the quickly-evolving Ethernet made local computer networks ubiquitous and "client-server architecture" became the buzzword of the day.

In a distributed system, we have two or more processes talking to each other over a network. They do this by sending each other messages.

A message from process A to process B is copied from the memory of process A, travels over the network wire, and into to the memory of process B.

Now, it seems that in a SMP computer we don’t have the problem with copying data between memories of processes running on different CPUs, since we can share a memory block between threads. Wrong! In a SMP system, when process B reads a memory block written by process A, there is still actual copying going on: from process A’s CPU cache over the interconnect, to process B’s CPU cache.

Cache coherence is the under-the-hood copying mechanism that tries to pretend the copying is not happening, and that memory is really shared.

This copying can get pretty involved if you're running a NUMA system, where a message from process A to B travels over multiple hops in the interconnect—much like a network packet would in a switched data center. The more CPU cores we place on the interconnect, the less it looks like shared memory and more like separate networked computers—cache coherence does not scale.

So here we are, with these awesome systems built out of different flavors of processes talking over the network and pretending to share memory. Hardware threads are used when communication is more frequent or the amount of shared data is substantial. Networked processes are used when scaling requirements are less predictable, or fault tolerance is required.

Some applications exhibit embarrassing parallelism, where we can nicely package computation to saturate a given SMP computer and scale out—and speed up—by adding more computers.

Unfortunately, such applications are rare. Most interactive applications can be represented as graphs of interdependent processes waiting for I/O to complete. This makes it really hard to nicely saturate today’s entry-level SMP computer.

Enter another renaissance: virtualization.

Economical efficiency in mind, we would love to co-locate multiple parts of an application on the same physical computer. The caveat is that these applications may have different, or even conflicting dependencies within an operating system. Virtualization offers full isolation by running many instances of an operating system on a single physical computer. I mention renaissance, because the utility of virtualization was well known before today. Mainframes used virtualization extensively for decades. They simply became way too powerful to run anyone’s single application and it made sense economically to start virtualizing.

Mainframes still exist, but they were largely disrupted by minicomputers in the 1970s and then minicomputers were further disrupted by PC servers in the 1990s.

We can argue that PC servers are becoming too powerful to run distributed systems economically. Much like in the advent of the PC server era, when the economy of scale in personal computers made them superior to the previous generation of computers, today’s smartphones will do the same to the ARM-based servers of tomorrow. At the same time, the speed of local networking will be approaching the speed of CPU interconnect, while the complexity of CPU interconnect will be approaching that of a switched packet network. In the end, this new type of computer will bring about data-centers-on-a-chip or high-density computing, whichever way we look.

What abstractions do we have today to deal with these type of a computer? Processes over a TCP/IP network and threads over shared memory seem like two very different ways of dealing with one problem—communication between concurrent workflows. Both are arguably antiquated for this new computer. TCP/IP was designed for unreliable global networks and carries significant overhead for fast local interconnects. Shared memory does not scale to a large number of CPU cores. But all we really want is to send a message from workflow A to workflow B.

Enter Erlang.

Erlang was designed in the 1980s by group of engineers at Ericsson. They specifically were trying to address the shortcomings of existing languages with respect to handling highly-concurrent telephony applications with extreme reliability requirements. Concurrency meant handling millions of small processes that would occasionally communicate with each other. Reliability meant guarding against hardware failure and, more importantly, against bugs in the program. Erlang designers made a great tradeoff from the outset: immutability and single assignment (very uncommon in conventional programming languages), enforced by the VM.

On top of immutability, the Erlang VM defines lightweight processes that can send and receive messages. This enables the level of isolation for much nicer garbage collection—processes can be garbage collected independently, eliminating the need for a stop-the-world model. Further, Erlang promotes the least amount of error handling in favor of quickly crashing processes. To address error recovery, Erlang defines the concept a supervision tree—a hierarchy of watchdog processes whose only responsibility is to restart failed worker processes. Another key built-in construct is the ability to send and receive messages between processes on VMs running on connected computers, and yet another is the ability to hot-load code without stopping the VM, or any running process.

I would argue that the Erlang execution model is very well suited to run directly on a future data-center-on-a-chip computer without an operating system and virtualization. Immutability removes the need for cache coherence (although signaling inside the VM may need one, or some sort of specialized hardware). The Erlang VM already maps one process scheduler to one hardware thread. Processes are load-balanced to evenly saturate the available CPU cores and maintain cache locality. Messages are precisely what they are—immutable memory blocks shuttled between caches over a switched interconnect. Supervision trees can span a hardware topology and handle both hardware failures and software bugs for great fault tolerance.

It would likely take a very evolved Erlang to run on this computer of the future. It may well be the Erlang VM that hosts other—non Prolog-based—languages on top of it, while enforcing all the important semantics (Elixir is a good candidate). Or, it may be some new software altogether.

Erlang matters today because it demonstrates how these semantics can be elegantly packaged in one language, execution model, and virtual machine.

Comments on Hacker News

Type-Safe Flux Architecture Using TypeScript

By @jlarky

At Sameroom, we use TypeScript (a strict superset of JavaScript which adds static typing) as a way to address the tendency of complex user interfaces to become rigid, error-prone, and slow to evolve. We use TypeScript with Flux, which gives us both type safety and loosely-coupled components. In this blog post, I'll share our approach.

The Backstory: Using TypeScript to Control Runaway Code at Kato

It all started when we were working on our company’s first product, Kato. Our product was a Slack-like application (launched eight months before Slack's private beta came out) that we built with some of the best technologies of that time: Knockout.js, jQuery and Bootstrap on the front-end, and Erlang (with Cowboy) and Postgres on the backend. We liked Knockout because of how quickly we could build with it, and how easy it was compared to many other frameworks (I’m looking at you, Angular).

But after working on Kato for over a year and adding more and more developers to the project, the complexity got to be too much. We were adding new features, handling special cases, and spending way too much time chasing bugs and plugging holes. Crashes and infamous errors like “undefined is not a function” or “script error” were starting to crop up in our error collection system. We were losing control over our application and could no longer estimate how much time adding a feature or fixing a bug would take.

The Kato server was written in Erlang and the iOS and Android applications were written in C# (with Xamarin). Erlang—a dynamically-typed language, like JavaScript—was working great for us because the server codebase was relatively small and evolved along well-understood dimensions. But C# made us realize once again (we dabbled in MFC and early .NET for a number of years in past lives) the benefits of a truly object-oriented, strongly-typed language for building complex user interfaces.

That’s how we got started with TypeScript—by hoping to harness some of the sanity we’d found with C# and our mobile apps, which were no less complex than the web frontend, but so much easier to write and maintain.

We used TypeScript to rewrite the core of our webapp: the bits dealing with networking, messages, users, connections, sessions, and the "roster"—our list of users and rooms. We were excited to move away from having observables and callbacks all over the place to unidirectional flow (action -> dispatcher -> store -> view) and type safety, since stores were written as plain TypeScript classes with just two callbacks: dispatcher.register and this.emit.

The rest of the Kato UI remained in JavaScript and Knockout forever—Slack kicked our ass and we pivoted to Sameroom. But we’d already gotten our taste of the future, and it was good.

Looking at some of our earlier TypeScript code, it’s obvious that we were making things unnecessarily difficult for ourselves. Extra type specifications everywhere, full ceremonious declarations—it was more like Java than JavaScript. It was for a noble cause, though: we were doing it to regain control over runaway code.

When I examined how others used TypeScript, I noticed two distinct approaches. Some either used it very loosely, to the point where it wouldn’t even catch typos (essentially using it almost like a poor man’s Babel to get some of es6/es7 features), while others were overdoing it, just like us. Eventually we found an approach to using TypeScript that strikes a balance between precision and comfort (both in writing and reading).

The Solution: TypeScript and Flux

As a starting point, I chose es6-babel-react-flux-karma from the TypeScript Samples repo. If you are using TypeScript and Flux, you probably have something very similar, and I believe it follows the JavaScript implementation of Facebook’s Flux pretty well (maybe even too well!).

This code provides almost no type safety. As you can see, most of the GreetingStore code uses the implicit any. For example, here state is being initialized:

And now let’s change the code as shown below to see what happens:

Even with this change, it will compile without any warnings. To be fair, the unit tests will fail:

img

I don’t know about you, but if I was cool with writing tests for this stuff, I'd just continue with JavaScript :)

So let’s start with changing noImplicitAny to true. Now we see the error we were looking for:

img

If you look at the change we made in store, we didn’t need to change much. It was actually FluxStore.ts that was implemented lazily and was losing type information. If you examine this commit, you will notice that we made two things explicitly any:

img

img

This makes sense, because our dispatcher can send any data as the action. But that’s not actually true. Conventionally, we’d use the type property of an object to handle actions in store, but right now we don’t enforce any checks in either the dispatch method or _onDispatch.

We can introduce the Event type defined as {type: string}.

This will ensure that we are sending type in dispatch, but checking for this in _onDispatch is still far from ideal. To put even more restrictions on action object, let’s use {type: string; payload: any}.

Now we’ve ensured that we always check the type property and that we don’t forget to implement onDispatch in our stores. But is there a way to somehow annotate types for payload? Here is a simplified version of what we used in Kato:

While this code may not be very inspiring, it seems to be the best we can do while implementing Flux as recommended by Facebook. The generated JavaScript will be as idiomatic and performant as Facebook advertised it to be.

Step into Type-Safe Flux

One thing stressed in discussions about Flux performance is that since actions are simple, data-only objects, and since action.type is just a string you compare, dispatching actions should be very fast. But as you have seen above, string checks make the compiler helpless in determining object types. You have a choice between not checking types at all (by using any) or manually keeping track of all events and names.

When we started work on Sameroom and the codebase was still very small, we had some flexibility in testing new approaches.

One was to get rid of all type declarations in actions and stores and remove the type attribute in action events. No, I’m not talking about reverting back to any, I’m talking about instanceof. My understanding is that while instanceof may be a few times slower, its type safety is far greater. That brings us to our last commit and our onDispatch in GreetingStore is:

In this approach, TypeScript has the precise type information for action and payload. Also, since we moved to using if in favor of switch, we now get separate scopes for each branch (by using let or const instead of var). This means that the types and values of payload defined in each scope are separate from each other, while in the previous version we had to manage this ourselves (payload1/payload2).

Conclusion

Adding type information to actions is just the first step. Next, we'll want to keep all this information inside the stores, and then connect those stores to React components.

I’m fairly certain that the use of instanceof may be seen as controversial, and I’m a bit afraid of stepping on the toes of such giants as Immutable, Redux, Om, and Relay their apostles.

However, I will try to cover those topics in my future articles, where I will try to use the full might of TypeScript, and avoid the creation of Java all over again ;)

Comments on HN