Skittles

An iMac Beowulf cluster running Yellow Dog Linux





They said it couldn't be done. Well, actually they said, "Why would you want to?" The answer is, of course, "Because they're there, and it's the geek thing to do."

The idea of skittles was first planted when the sysadmins of the Space Studies Department retired a group of generation 1 (or 'Revision A' as Apple calls them) iMacs. When I heard of this, I half-jokingly told them they should build a Beowulf cluster. They agreed, and informed me that the old computers would run a varient of linux known as Yellow Dog. We also found an old Synoptics 10 megabit ethernet hub, which would hook all the iMacs together. The only sticking point was that the iMacs each had only one ethernet adapter, and the head node of a Beowulf usually needs two.

Time went by, one sysadmin graduated, and the other got a job offer in another city. Before he left I figured we should make the effort to actually get skittles built and running. The dual-ethernet-for-the- head-node problem was solved by borrowing a DLink NAT router, and configuring it to forward incoming ssh connections to the head node. The iMacs had only one adapter, which was built-in, and to add more might be a problem. We could have bought a USB network adapter, but that would have violated the spirit of skittles, which was to use all old/borrowed/obsolete hardware and thus not spend any money. Besides, the NAT router would also serve as the firewall to protect the cluster, so the head node would not need to run a firewall itself.



After discovering that Yellow Dog Linux version 4 didn't recognize the iMac's ethernet adapters, I installed version 3 on all the nodes. I had to keep the install fairly clean since the iMacs only had limited hard drive space (around 6 GB). The head node got compilers and editors as well as some other services, while the slave nodes were mostly stripped-down processors. None got X windows or any other memory and CPU hogs.

The cluster nodes were assigned RFC 1918 IP addresses, and attached to the Synoptics hub. The DLink router was assigned the outside IP address and the ssh port was forwarded to the head node. The hard drive was divied up into an Apple BIOS partition (required to boot the iMac), a /boot partition, and the rest in a single filesystem mounted as /. I set up NFS to export the /usr/home directory on the head node, and mounted it on the slaves. I activated the NTP server on the head and pointed the NTP clients on the slaves at it to keep the clocks all in sync. The head node's NTP daemon was synchronized to a local stratum 2 NTP server. I also generated an ssh keypair and copied the public key to the authorized_keys file to allow remote job execution without a password.

The message-passing software I installed was MPICH 1.2.6. I chose this version since I had more experience installing and configuring it than version 2. I adjusted the machines.LINUX file in the MPICH share directory to reflect the node names of the cluster machines. I also exported the mpich directory from the head to the slave nodes via NFS. After some fits and starts, I finally got the slave nodes to run jobs by including the -nolocal flag on mpirun. This is the same thing I had to do with Miniwulf, my first Beowulf cluster. I ran the modified Pi calculation benchmark I've used on other clusters on skittles. Here are the results:

Test program results on April 18, 2005:

[purple]$ mpirun -np 1 -nolocal ./flop
Process 0 of 1 on purple
pi is approximately 3.1415926535899708, Error is 0.0000000000001776
wall clock time = 247.101850
Estimated MFLOPs = 24.281486

[purple]$ mpirun -np 2 -nolocal ./flop
Process 0 of 2 on purple
pi is approximately 3.1415926535900072, Error is 0.0000000000002141
wall clock time = 154.564619
Estimated MFLOPs = 38.818716
Process 1 of 2 on green

[purple]$ mpirun -np 3 -nolocal ./flop
Process 0 of 3 on purple
pi is approximately 3.1415926535899148, Error is 0.0000000000001217
wall clock time = 103.156190
Estimated MFLOPs = 58.164226
Process 1 of 3 on green
Process 2 of 3 on red

[purple]$ mpirun -np 4 -nolocal ./flop
Process 0 of 4 on purple
pi is approximately 3.1415926535897682, Error is 0.0000000000000249
wall clock time = 76.938856
Estimated MFLOPs = 77.984003
Process 3 of 4 on cyan
Process 2 of 4 on red
Process 1 of 4 on green
This compares well with the results for Miniwulf, which was a 5 node cluster based on FreeBSD running on Intel Pentium 133 CPUs. Mac enthusiasts can take heart that skittles surpassed that cluster with only 4 active nodes :).


Skittles, circa March, 2006

Monday, April 11, 2005: Skittles operational.

Update: April 29, 2005
I've run the Pallas benchmark on skittles. This benchmark, unlike the one above that mostly tests CPU power, tests internode network bandwidth. Not surprisingly, the results are less than impressive, mostly due to the use of a 10 megabit hub for the interconnect.

Update: June 14, 2005
Some people have asked what skittles is good for. Skittles is a training cluster. It has enough functionality for people to learn the ins and outs of MPI programming under C and FORTRAN (using the GNU compilers) without needing access to more powerful production clusters. While skittles was initially set up partly as a joke, it has since become a tool for teaching parallel programming.

Update: January 9, 2006
Switched back to the D-Link broadband router, and removed the 10Mb/s hub. Not so much to boost the performance of the cluster, but because I needed the IPCop firewall I'd been testing as a front end elsewhere. I hadn't planned on leaving the cluster up and running this long. We'd had our joke, most slashdotters got it, and a few were clueless (pretty typical). But now not only have some students been using it to learn MPI, but one is even developing code for his thesis on the cluster. While that wasn't the intent, it's gratifying to see the obsolete hardware being put to good use. I guess the joke's on us :).

Update: January 11, 2006
Ran the Pallas benchmark again. The 100 megabit switch built into the router has increased the inter-node communication speed from 1 to over 8 megabytes per second (averaging around 60 megabits per second). Not bad for generation 1 iMacs.

Update: March 21, 2006
Scavenged the 64MB RAM module from DOA node Cyan 2, and installed it in Green node. Now all 4 nodes have 96MB of RAM.

Update: March 27, 2006
10 second power outage on March 26 brought skittles down. All nodes restarted the next day with no problems. Just two weeks shy of one solid year of uptime on the master node (argh!). Some of our production clusters aren't that stable! A UPS would have prevented the shutdown, but since skittles is made from retasked hardware, and there were no old/retasked/spare UPSs available, skittles doesn't have one.

Update: March 29, 2006
Skittles had to be shut down again yesterday for another power outage. This one was actually planned, the power company needed to make repairs on the grid near Clifford Hall. I took this opportunity to reconfigure the power strip and cable setup, and also measured the power drawn by the cluster. Total current draw is 2.4 amps at 120 VAC, which works out to about 290W assuming a power factor of 1.0. Pretty low for four computers and a switch.

Update: March 30, 2006
Reran the Pi benchmark to test the effects of the upgraded RAM in the green node. Cluster now scales differently, but still not in a strictly linear fashion (there is a step-discontinuity at 3 nodes):
[purple pi]$ mpirun -nolocal -np 1 ./flop
Process 0 of 1 on purple
pi is approximately 3.1415926535899708, Error is 0.0000000000001776
wall clock time = 248.873376
Estimated MFLOPs = 24.108646

[purple pi]$ mpirun -nolocal -np 2 ./flop
Process 0 of 2 on purple
pi is approximately 3.1415926535900072, Error is 0.0000000000002141
wall clock time = 123.200154
Estimated MFLOPs = 48.701238
Process 1 of 2 on green

[purple pi]$ mpirun -nolocal -np 3 ./flop
Process 0 of 3 on purple
pi is approximately 3.1415926535899148, Error is 0.0000000000001217
wall clock time = 102.581632
Estimated MFLOPs = 58.490003
Process 1 of 3 on green
Process 2 of 3 on red

[purple pi]$ mpirun -nolocal -np 4 ./flop
Process 0 of 4 on purple
pi is approximately 3.1415926535897682, Error is 0.0000000000000249
wall clock time = 76.939021
Estimated MFLOPs = 77.983836
Process 1 of 4 on green
Process 2 of 4 on red
Process 3 of 4 on cyan1
Graphical version here.

Update: April 11, 2006
It's official: skittles is 1 year old today! Would have had the master node running solid all that time as well if we hadn't had a power outage a couple weeks ago. Oh well. The cluster's still chugging along.

Update: May 22, 2006
Skittles is back up and running. On May 19, 2006, a lightning arrestor on a power pole west of the UND campus failed, causing a short which tripped substation circuit breakers and blacked out the west end of campus and surrounding areas (and apparently caused a rather spectacular explosion, according to those nearby). The cluster went down hard, and though power was restored an hour later, the system was left off over the weekend. An uninterruptable power supply with possibly dubious batteries was installed to power the cluster for test purposes. This, and running the cluster from a standby-generator- backed power circuit should avoid future blackouts, assuming the batteries can hold the cluster power up for the 10 seconds the generator requires to start and transfer the load.

Update: June 20, 2006
More power glitches caused skittles to go down again, despite the UPS. Further testing revealed that the UPS, although rated at 650 VA and containing a fresh battery, would not hold the load of the entire cluster, and would shut down instantly when the plug was pulled. The UPS would hold the master node and the broadband router though. The power layout was reconfigured such that the master and router were on the UPS, but the slaves were all run from a surge-suppressing strip hooked directly to the mains. The slaves will still go down hard if the power glitches, but the main filesystem on the master should be protected and still accessable. Not an ideal solution, but with no funds available for a proper UPS, it's all that could be done.

Update: June 28, 2006
Solved the '-nolocal' requirement issue with mpirun. This is something I've seen in a few of my other clusters years ago, but never figured out. The problem was a misconfigured /etc/hosts file: linux puts the node name in the loopback line (127.0.0.1), along with the 'localhost' aliases. Turns out MPICH doesn't like that. I removed the node name from that line and all was well: each node name is specified along with it's IP later in the /etc/hosts file, so the system still knows where to find things.

In other words, in the master's /etc/hosts I changed this line:
127.0.0.1	purple localhost.localdomain localhost
to this:
127.0.0.1	localhost.localdomain localhost
The rest of the file details the cluster IPs:
10.0.0.1	purple
10.0.0.2	green
10.0.0.3	red
10.0.0.4	cyan
10.0.0.10	gateway
I found this in Kurt Swendson's Beowulf HOWTO. Thanks Kurt!

Update: September 10, 2006
The UPS showed a battery alarm light, despite the battery being less than 8 months old. The battery was removed for testing, which it passed. In the mean time the UPS went into a test cycle, and with no battery connected it instantly dumped the power to the master node and switch. The UPS was removed from the circuit and the cluster is again entirely powered by a surge-suppressed power strip.

Update: December 29, 2006
The cluster ran fine sans UPS until the morning of Thursday, December 28, when a 4-second power outage shut it down. A different, old, and rather weak UPS was located that would run the master node and switch. This was installed on December 29, the cluster was brought back up, and a minor NFS configuration issue on two of the slave nodes was corrected.

Update: February 27, 2007
Since the start and end dates for Daylight Saving Time changed here in the US, all computers need to be patched (joy). Skittles was used as a test-case for a modified patching technique, which consisted of copying a new timezone file over /etc/localtime on each node and rebooting. When this proved successful on skittles it was applied to other linux systems with less concern about breaking them. Other than that and occasional use by some Computer Science Department professors experimenting with MPI, the cluster has seen little use since 2007 began.

Update: March 8, 2007
Cleaned up the Pi calculation benchmark. Here is the new version, which allows input of the loop depth. The program was run including all nodes for various loop values. Textual outputs are here. Graphs were also constructed for Depth vs. execution time, Depth vs. MFLOPs, and Depth vs. error.

Update: March 17, 2007
Installed the GNU Multiple Precision Arithmetic Library (GMP).

Update: April 11, 2007
Skittles is two years old.

Update: May 7, 2007
A misconfiguration of the slave ntp.conf files was allowing their clocks to run unsynchronized. The problem stemed from changes to the /etc/hosts files on the slaves. The ntp.conf files were changed to refer to the master node by numerical IP address, and the clocks were all sync'd using ntpdate before ntpd was restarted. All clocks seem to be running properly now.

Skittles is about 15 days into a run of the MPI version of John the Ripper, a password recovery tool. This run is to benchmark skittles on cracking an 8-character mixed letter and number password in MD5 format, running in brute-force mode. JtR can also scan databases of common passwords for possibly faster checking. For example, JtR found the test password in this case in 23 seconds, since it was in a list of commonly-used passwords. The same password is being cracked in incremental mode for comparison.

Update: May 24, 2007
Skittles has been chewing on the JtR password problem for 31 days now. Brute-forcing an 8 character alpha-numeric password is obviously not a trivial task for the cluster. Some have asked for specs on the system, here they are:
Nodes: 4 Revision A iMacs: 1 master node, 3 processor nodes, each containing:
	233 MHz PowerPC 750 CPU
	66 MHz bus
	96 MB RAM
	4 GB hard drive
	24x CD-ROM drive
	10/100 base-T ethernet
	more details here

OS: Yellow Dog Linux version 3.
Interconnect/Firewall: D-Link DI-604 broadband router and 10/100 base-T ethernet switch.
Message Passing: MPICH 1.2.6.

Update: June 11, 2007
The John the Ripper job was stopped and the three compute nodes shut down to install a spare Tripp-Lite UPS on the compute nodes (the master and router/switch still have their own, seperate UPS). This was in preperation for a planned power outage. The building generator kept the cluster running, but UPSs were required to prevent node reboots. JtR is written to save its state automatically every 10 minutes, so it can restart an interrupted job without begining from scratch. In each node's log file the following message appeared, seeming to indicate this function worked:
49:17:00:00 Continuing an interrupted session
49:17:00:00 Loaded a total of 1 password hash
49:17:00:00 Remaining 1 password hash
49:17:00:00 - Hash type: FreeBSD MD5 (lengths up to 15)
49:17:00:00 - Algorithm: 32/32 X2
49:17:00:00 - Candidate passwords will be buffered and tried in chunks of 2
49:17:00:00 Proceeding with "incremental" mode: All
49:17:00:00 - Lengths 0 to 8, up to 95 different characters
All the log files contained this message, all synchronized to 49 days, 17 hours into the project.

The command to restart all the jobs was:
mpirun -np 4 ./john --restore &
As opposed to the original start command:
mpirun -np 4 ./john --incremental experiment_1/passwords.txt &


Update: October 26, 2007
Another day, another power outage. This one was planned, and I was on hand to make sure the UPSs were plugged into the proper outlets. Murphy's law was in full effect, and in the 30 seconds it took me to move the UPS plugs for two critical satellite receivers, the master node UPS for skittles went down and so did the node. Ugh! The John the Ripper logs showed 116 days of crunching on the test password, with still not solution found. This is incorrect: the program has actually been running for just over six months, but the counter reset to 49 days during the last restart due to a program bug somewhere.

I tried digging through the code to find the numbers for things like total passwords tried, but came up with some results I'm not convinced are correct. If they are the cluster only churned through ~0.3% of the search space, running 850 guesses per second per node, or 3403 passwords per second for the whole cluster. Even with their 233 MHz CPUs I have trouble believing the cluster is that slow. However, output from later tests does seem to confirm a test rate of between 650 to 850 crypts per second. If it is correct, I don't fancy spending the next 167 years waiting for the system to run through all the possibilities, so I've decided to terminate this version of the experiment at this time.

Update: October 29, 2007
To test the cluster in a reasonable amount of time, I ran a different experiment with trivial passwords. I created four username/password pairs, and ran JtR on the file containing all of them, in parallel, and using incremental mode. Here are the results:

PasswordTime crackedNode #
a0:092
ab1:440
abc0:070
abcd1:082


The "Time cracked" is in minutes:seconds, starting from the begining of the experiment. "Node" is which node of the cluster actually cracked the given password (0-3).

When the passwords were all cracked, I still had to stop the program manually (it apparently doesn't shut down on its own). At that time each program reported runtime, the number of passwords cracked, and the crypts/second. Here are the crypts/second of the cluster nodes:

NodeC/S
0726
1817
2652
3820


This is pretty poor performance, considering that even a Pentium I should be cranking out crypts/second in the tens of thousands: Link.

Intrigued that the program cracked the passwords in a different order than I thought, I performed a second experiment configured such that the system would only work on a single password at a time. I also expanded the total number of passwords to 8. Most passwords were run multiple times to verify that the system found them in the same amount of time and on the same node each time. The results:

PasswordTime to crack (seconds)Node #
a32
ab350
abc21
abcd232
abcde153
abcdef122
abcdefg11832
abcdefghRun terminated after 12 days


Yes, that's not a typo, "abcdefg" took over 19 minutes to crunch. The next entry, "abcdefgh", is still being crunched (over 12 hours as of this writing). There's definatly an upper bound in password length that this system can handle.

Update: November 9, 2007
After 12 days of cracking on the 8-character password and still no solution, I stopped the experiment. It seems JtR running on this particular cluster has real trouble with anything over 6 characters, and hits the wall at 8 characters, even for trivial passwords.

I'll need to repeat the experiment on a different cluster with the same software to determine where the weakness lies.

Update: November 14, 2007
Of course, when wondering about how a program performs on a given platform, it's always a good idea to RTFM. JtR already has a benchmarking mode available. Here are the specs on Skittles' master node:
[purple run]$ mpirun -np 1 ./john --test
Benchmarking: Traditional DES [32/32 BS]... DONE
Many salts:     46355.00 c/s real, 46355.00 c/s virtual
Only one salt:  44454.00 c/s real, 44454.00 c/s virtual

Benchmarking: BSDI DES (x725) [32/32 BS]... DONE
Many salts:     1548.00 c/s real, 1548.00 c/s virtual
Only one salt:  1536.00 c/s real, 1536.00 c/s virtual

Benchmarking: FreeBSD MD5 [32/32 X2]... DONE
Raw:    832.00 c/s real, 832.00 c/s virtual

Benchmarking: OpenBSD Blowfish (x32) [32/32]... DONE
Raw:    74.00 c/s real, 74.00 c/s virtual

Benchmarking: Kerberos AFS DES [24/32 4K]... DONE
Short:  25395.00 c/s real, 25395.00 c/s virtual
Long:   69785.00 c/s real, 69785.00 c/s virtual

Benchmarking: NT LM DES [32/32 BS]... DONE
Raw:    696832.00 c/s real, 696832.00 c/s virtual

Benchmarking: Apache MD5 [32/32 X2]... DONE
Raw:    833.00 c/s real, 833.00 c/s virtual

Benchmarking: mysql [mysql]... DONE
Raw:    221352.00 c/s real, 221352.00 c/s virtual

Benchmarking: Netscape LDAP SHA [SHA1]... DONE
Raw:    357451.00 c/s real, 357451.00 c/s virtual

Benchmarking: NT MD4 [TridgeMD4]... DONE
Raw:    357587.00 c/s real, 357587.00 c/s virtual

Benchmarking: Lotus5 [Lotus v5 Proprietary]... DONE
Raw:    33742.00 c/s real, 33742.00 c/s virtual

Benchmarking: M$ Cache Hash [mscash]... DONE
Raw:    204781.00 c/s real, 204781.00 c/s virtual

Benchmarking: Raw MD5 [raw-md5]... DONE
Raw:    329676.00 c/s real, 329676.00 c/s virtual

Benchmarking: Eggdrop [blowfish]... DONE
Raw:    3977.00 c/s real, 3977.00 c/s virtual

Benchmarking: Raw SHA1 [raw-sha1]... DONE
Raw:    354075.00 c/s real, 354075.00 c/s virtual

Benchmarking: MS-SQL [ms-sql]... FAILED (get_hash[0])

Benchmarking: HMAC MD5 [hmac-md5]... DONE
Raw:    100068.00 c/s real, 100068.00 c/s virtual

Benchmarking: WPA PSK [wpa-psk]... DONE
Raw:    12.02 c/s real, 12.02 c/s virtual

Benchmarking: Netscape LDAP SSHA [salted SHA1]... DONE
Raw:    364525.00 c/s real, 364525.00 c/s virtual

This particular experiment uses the FreeBSD MD5 password hash, so we're not playing to Skittles' strengths as a password cracker (though it could be a lot worse).

The test mode also works for cluster mode:
[purple run]$ mpirun -np 4 ./john --test
Benchmarking: Traditional DES [32/32 BS]... DONE
Many salts:     176120.00 c/s real, 176120.00 c/s virtual
Only one salt:  168881.00 c/s real, 168881.00 c/s virtual

Benchmarking: BSDI DES (x725) [32/32 BS]... DONE
Many salts:     5883.00 c/s real, 5883.00 c/s virtual
Only one salt:  5836.00 c/s real, 5836.00 c/s virtual

Benchmarking: FreeBSD MD5 [32/32 X2]... DONE
Raw:    3162.00 c/s real, 3162.00 c/s virtual

Benchmarking: OpenBSD Blowfish (x32) [32/32]... DONE
Raw:    281.10 c/s real, 281.10 c/s virtual

Benchmarking: Kerberos AFS DES [24/32 4K]... DONE
Short:  96521.00 c/s real, 96521.00 c/s virtual
Long:   265368.00 c/s real, 265368.00 c/s virtual

Benchmarking: NT LM DES [32/32 BS]... DONE
Raw:    2647493.00 c/s real, 2647493.00 c/s virtual

Benchmarking: Apache MD5 [32/32 X2]... DONE
Raw:    3162.00 c/s real, 3162.00 c/s virtual

Benchmarking: mysql [mysql]... DONE
Raw:    841118.00 c/s real, 841118.00 c/s virtual

Benchmarking: Netscape LDAP SHA [SHA1]... DONE
Raw:    1358339.00 c/s real, 1358339.00 c/s virtual

Benchmarking: NT MD4 [TridgeMD4]... DONE
Raw:    1367507.00 c/s real, 1367507.00 c/s virtual

Benchmarking: Lotus5 [Lotus v5 Proprietary]... DONE
Raw:    128262.00 c/s real, 128262.00 c/s virtual

Benchmarking: M$ Cache Hash [mscash]... DONE
Raw:    798270.00 c/s real, 798270.00 c/s virtual

Benchmarking: Raw MD5 [raw-md5]... DONE
Raw:    1252500.00 c/s real, 1252500.00 c/s virtual

Benchmarking: Eggdrop [blowfish]... DONE
Raw:    15103.00 c/s real, 15103.00 c/s virtual

Benchmarking: Raw SHA1 [raw-sha1]... DONE
Raw:    1345390.00 c/s real, 1345390.00 c/s virtual

Benchmarking: MS-SQL [ms-sql]... FAILED (get_hash[0])

Benchmarking: HMAC MD5 [hmac-md5]... DONE
Raw:    382865.00 c/s real, 382865.00 c/s virtual

Benchmarking: WPA PSK [wpa-psk]... DONE
Raw:    45.13 c/s real, 45.13 c/s virtual

Benchmarking: Netscape LDAP SSHA [salted SHA1]... DONE
Raw:    1385954.00 c/s real, 1385954.00 c/s virtual

The increase is fairly linear, though not quite a 4x speed-up. Cluster communications overhead and variations between processors are likely to blame.

Update: December 17, 2007
Sometime between November 14 and December 17 the 'red' node failed. The node would power on, but when the screen would normally switch on there was a nasty electrical crackling sound, and the power supply would shut down.

With no spares available, the cluster is now down to three nodes.

Update: July 8, 2008
It was fun, but now it's done. The space the cluster took up was needed for another project, so skittes was taken down and disassembled, likely for good.