Sunday, August 23, 2009

Intermission: Erlang production nirvana

Well, almost. The XMPP Web demo project, despite using some of the best tools available today, is still a typical software project with bugs popping up here and there. Since it went on air couple of days back, one of the server-side components kept going down after few hours of up-time. The Erlang node it was running on would suddenly cease to exist without a sound or even a crash dump left anywhere. The cause is not yet completely clear, and it's going to take some time to figure out and fix.
This would probably be a show-stopper for any production system I had ever encountered. Not for the system made with Erlang! While I'm working on the bug, I'd like to keep my demo up and running without having to monitor that failing component, so here's how:
  • The node that runs component in question will be started from master node using slave:start;
  • Once the node is started, the component application will be launched using rpc:call
  • The node will be monitored with erlang:monitor_node; in case monitor process detects node going down, we'll repeat it all again.
The code is only a few lines long. It works with packaged application, but can be easily adapted to Module:Function form. Note that there is a stop_node/2 function, in case you do want to stop the application without exiting your Erlang shell.
"Normal" crashes (the ones that don't bring node down) have been already taken care of with Erlang supervisor. So nothing can now stop you from trying the demo any time of day :-)

Update: I should have mentioned that another option to keep your node up is to use -heart switch of erl command. Coding solution allows more control though, for instance you may want to be able to run some checks before restarting node, whereas -heart will restart unconditionally.


  1. Hi - could I recommend a Javascript based Erlang Syntax Highlighter for your site?

  2. Thanks a lot Jean-Lou, that's a life-saver, I can now stop spending hours on formatting and still having it all wrong. Impressive!