CLOWN - CLuster Of Working Nodes
Christian Kirsch, Florian Lohoff, André von Raison
It had to be to the power of two, and therefore they decided on 512 computers that should be connected over a 100 MBit Ethernet. Many computers and the complete infrastructure (switches and so on) came from sponsors, the box Cluster-Sponsors contains a list; private persons could also join with their computers. It was especially exciting that it was not a homogenous network: apart from the majority of Intel machines about 60 Alphas participated as well. The project was named CLOWN: CLuster Of Working Nodes.
A lot of hardware takes up plenty of space but it does not look very pretty on television. Therefore software was needed that could demonstrate the performance of a cluster. To avoid that these special programs interfered with the software already installed Florian Lohoff initially created a special distribution that could be installed as a file on the harddisk (see ‘What on the Linux systems ran’).
PVM (Parallel Virtual Machine) coordinated the distribution of the tasks between the computers. Apart from the modifications of the actual distribution two users are needed (pvm34 and povray) to run the application. pvm34 should allow experimenting with the current developer version 3.4beta7 of PVM. The root directory of pvm34 contains a pvm3d and the demo client mtile for the distributed calculation of Mandelbrodt fractals. The current version of povray (3.02) resides on the user povray. It has been extended with an isosurface patch that is necessary for computing one of the films supplied by FH Konstanz. Here Povray demonstrated its performance while rendering different films.
To compute the films in parallel based on frames povray is called for every single picture. A program based on RPC created by Roger Butenuth (Uni Paderborn) organizes the load distribution. RPC was chosen because of its robustness and the connectionless protocol (UDP). In case of TCP the load distribution server would have to handle 512 open connections.
The program can be used for any other tasks that utilize the master-worker scheme. The fact that it allows rebooting with the information from a state file after a crash adds to its robustness. The server sends all tasks that a client does not deliver in time to a different computer. An awk script generates the command lines on the server. When a number of a work packet is entered it only needs to reply with a command line for this packet. After the television night the load distributor will be published under GPL.
The program Cactus (http://cactus.aei-potsdam.mpg.de) developed by the Albert-Einstein-Institute in Postdam took care of universal things. It solves the Einstein equations, consisting of ten non-linear, coupled hyperbolical-elliptical partial differential equations. They describe gravitational waves, black holes, neutrino stars et cetera and are some of the most complicated equations in mathematical physics. Cactus has a modular character and runs on several super computers and Unix systems. Due to network problems Cactus did not scale as well as expected on the cluster - for example with 20 nodes the scaling factor was only 0,78. During the Computer Night Cactus calculated linear gravitational waves on 32 nodes while IDL visualized the results .
Besides getting all the hard and software together a few complications arose from the mere number of machines: The power supply had to be taken care of as well as ventilating the heat created by the devices and the network needed an infrastructure capable of sufficient performance. In the end the fire brigade of Paderborn came in to control the heat: they laid large plastic pipes that blew cold winter air into the building keeping the temperature in a tolerable range between 30 and 32 Degrees Celsius ([[bild_url1] see picture]).
The cluster event is definitely a candidate for an entry in the Guinness Book of Records: in total there were up to 570 nodes active in the cluster with up to 520 computing together on films or mathematical problems as one virtual Linux supercomputer. The chances are also quite good for an entry in the Top-500 list of supercomputing. After some adaptations the necessary Linpack-Benchmark ran on an Alpha subcluster with 48 nodes. The results still need to be verified but indicate an entry around rank 250. iX will report on the details in the next issue. By the way: the Computer Night was the second largest WDR production of the year after the soccer world championship - a suitable scale for the cluster event ...