Skittles
An iMac Beowulf cluster running Yellow Dog Linux

They said it couldn't be done. Well, actually they said, "Why would you want to?"
The answer is, of course,
"Because they're there, and
it's the geek thing to do."
The idea of skittles was first planted when the sysadmins of the
Space Studies Department retired
a group of generation 1 (or
'Revision A' as Apple calls them) iMacs.
When I heard of this, I
half-jokingly told them they should build
a Beowulf cluster. They agreed, and informed me that the old computers
would run a varient of linux known as
Yellow Dog. We also found an old Synoptics 10 megabit ethernet
hub, which would hook all the iMacs together. The only sticking point
was that the iMacs each had only one ethernet adapter, and the head node
of a Beowulf usually needs two.
Time went by, one sysadmin graduated, and the other got a job offer
in another city. Before he left I figured we should make the effort
to actually get skittles built and running. The dual-ethernet-for-the-
head-node problem was solved by borrowing a DLink NAT router, and
configuring it to forward incoming ssh connections to the head node.
The iMacs had only one adapter, which was built-in, and to add more might
be a problem. We could have bought a USB network adapter, but that would
have violated the spirit of skittles, which was to use all old/borrowed/obsolete
hardware and thus not spend any money. Besides, the NAT router would also serve
as the firewall to protect the cluster, so the head node would not need
to run a firewall itself.
After discovering that Yellow Dog Linux version 4 didn't recognize the
iMac's ethernet adapters, I installed version 3
on all the nodes. I had
to keep the install fairly clean since the iMacs only had limited hard
drive space (around 6 GB). The head node got compilers and editors
as well as some other services, while the slave nodes were mostly
stripped-down processors. None got X windows or any other memory and
CPU hogs.
The cluster nodes were assigned
RFC 1918 IP addresses, and attached to
the Synoptics hub. The DLink router was assigned the outside IP address
and the ssh port was forwarded to the head node. The hard drive was
divied up into an Apple BIOS partition (required to boot the iMac),
a /boot partition, and the rest in a single filesystem mounted as /.
I set up NFS to export the /usr/home
directory on the head node, and mounted it on the slaves. I
activated the NTP server on the head and pointed the NTP clients on
the slaves at it to keep the clocks all in sync. The head node's NTP
daemon was synchronized to a local stratum 2 NTP server.
I also generated an
ssh keypair and copied the public key to the authorized_keys file to
allow remote job execution without a password.
The message-passing software I installed was
MPICH 1.2.6. I chose this version since I had more experience installing
and configuring it than version 2. I adjusted the machines.LINUX
file in the MPICH share directory to reflect the node names of the
cluster machines. I also exported the mpich directory from the head
to the slave nodes via NFS.
After some fits and starts, I finally got the
slave nodes to run jobs by including the -nolocal flag on mpirun. This
is the same thing I had to do with
Miniwulf, my first Beowulf cluster.
I ran the modified Pi calculation benchmark I've used on other clusters on
skittles. Here are the results:
Test program results on April 18, 2005:
[purple]$ mpirun -np 1 -nolocal ./flop
Process 0 of 1 on purple
pi is approximately 3.1415926535899708, Error is 0.0000000000001776
wall clock time = 247.101850
Estimated MFLOPs = 24.281486
[purple]$ mpirun -np 2 -nolocal ./flop
Process 0 of 2 on purple
pi is approximately 3.1415926535900072, Error is 0.0000000000002141
wall clock time = 154.564619
Estimated MFLOPs = 38.818716
Process 1 of 2 on green
[purple]$ mpirun -np 3 -nolocal ./flop
Process 0 of 3 on purple
pi is approximately 3.1415926535899148, Error is 0.0000000000001217
wall clock time = 103.156190
Estimated MFLOPs = 58.164226
Process 1 of 3 on green
Process 2 of 3 on red
[purple]$ mpirun -np 4 -nolocal ./flop
Process 0 of 4 on purple
pi is approximately 3.1415926535897682, Error is 0.0000000000000249
wall clock time = 76.938856
Estimated MFLOPs = 77.984003
Process 3 of 4 on cyan
Process 2 of 4 on red
Process 1 of 4 on green
This compares well with the results for
Miniwulf, which was a 5 node cluster based on FreeBSD running on Intel Pentium 133 CPUs. Mac
enthusiasts can take heart that skittles surpassed that cluster with only 4 active nodes :).

Skittles, circa March, 2006
Monday, April 11, 2005: Skittles operational.
Update: April 29, 2005
I've run the Pallas benchmark on skittles. This benchmark, unlike the one above that mostly tests
CPU power, tests internode network bandwidth. Not surprisingly, the results
are less than impressive, mostly due to the use of a 10 megabit hub for the interconnect.
Update: June 14, 2005
Some people have asked what skittles is good for. Skittles is a training cluster. It has enough
functionality for people to learn the ins and outs of MPI programming under C and FORTRAN (using
the GNU compilers) without needing access to more powerful production clusters. While skittles was
initially set up partly as a joke, it has since become a tool for teaching parallel programming.
Update: January 9, 2006
Switched back to the D-Link broadband router, and removed the 10Mb/s hub. Not so much to boost the
performance of the cluster, but because I needed the IPCop
firewall I'd been testing as a front end elsewhere.
I hadn't planned on leaving the cluster up and running this long.
We'd had our joke,
most slashdotters got it, and a few were clueless (pretty typical). But now not only
have some students been using it to learn MPI, but one is even developing code
for his thesis on the cluster. While that wasn't the intent, it's gratifying to
see the obsolete hardware being put to good use. I guess the joke's on us :).
Update: January 11, 2006
Ran the Pallas benchmark again. The 100 megabit switch built into the router has increased the
inter-node communication speed from 1 to over 8 megabytes per second (averaging around
60 megabits per second). Not bad for generation 1 iMacs.
Update: March 21, 2006
Scavenged the 64MB RAM module from DOA node Cyan 2, and installed it in Green node. Now all 4 nodes have
96MB of RAM.
Update: March 27, 2006
10 second power outage on March 26 brought skittles down. All nodes restarted the next day with no problems.
Just two weeks shy of one solid year of uptime on the master node (argh!). Some of our production
clusters aren't that stable! A UPS would have prevented the shutdown, but since skittles is made from
retasked hardware, and there were no old/retasked/spare UPSs available, skittles doesn't have one.
Update: March 29, 2006
Skittles had to be shut down again yesterday for another power outage. This one was actually planned,
the power company needed to make repairs on the grid near Clifford Hall. I took this opportunity
to reconfigure the power strip and cable setup, and also measured the power drawn by the cluster.
Total current draw is 2.4 amps at 120 VAC, which works out to about 290W assuming a power factor
of 1.0. Pretty low for four computers and a switch.
Update: March 30, 2006
Reran the Pi benchmark to test the effects of the upgraded RAM in the green node. Cluster now scales
differently, but still not in a strictly linear fashion (there is a step-discontinuity at 3 nodes):
[purple pi]$ mpirun -nolocal -np 1 ./flop
Process 0 of 1 on purple
pi is approximately 3.1415926535899708, Error is 0.0000000000001776
wall clock time = 248.873376
Estimated MFLOPs = 24.108646
[purple pi]$ mpirun -nolocal -np 2 ./flop
Process 0 of 2 on purple
pi is approximately 3.1415926535900072, Error is 0.0000000000002141
wall clock time = 123.200154
Estimated MFLOPs = 48.701238
Process 1 of 2 on green
[purple pi]$ mpirun -nolocal -np 3 ./flop
Process 0 of 3 on purple
pi is approximately 3.1415926535899148, Error is 0.0000000000001217
wall clock time = 102.581632
Estimated MFLOPs = 58.490003
Process 1 of 3 on green
Process 2 of 3 on red
[purple pi]$ mpirun -nolocal -np 4 ./flop
Process 0 of 4 on purple
pi is approximately 3.1415926535897682, Error is 0.0000000000000249
wall clock time = 76.939021
Estimated MFLOPs = 77.983836
Process 1 of 4 on green
Process 2 of 4 on red
Process 3 of 4 on cyan1
Graphical version here.
Update: April 11, 2006
It's official: skittles is 1 year old today! Would have had the master node running solid all that time
as well if we hadn't had a power outage a couple weeks ago. Oh well. The cluster's still chugging along.
Update: May 22, 2006
Skittles is back up and running. On May 19, 2006, a lightning arrestor on a power pole west of the UND
campus failed, causing a short which tripped substation circuit breakers and blacked out the west
end of campus and surrounding areas (and apparently caused a rather spectacular explosion, according
to those nearby). The cluster went down hard, and though power was restored an hour later, the system
was left off over the weekend. An uninterruptable power supply with possibly dubious batteries was
installed to power the cluster for test purposes. This, and running the cluster from a standby-generator-
backed power circuit should avoid future blackouts, assuming the batteries can hold the cluster power up
for the 10 seconds the generator requires to start and transfer the load.
Update: June 20, 2006
More power glitches caused skittles to go down again, despite the UPS. Further testing revealed
that the UPS, although rated at 650 VA and containing a fresh battery, would not hold the load of the entire
cluster, and would shut down instantly when the plug was pulled. The UPS would hold the master node and the
broadband router though. The power layout was reconfigured such that the master and router were on the UPS,
but the slaves were all run from a surge-suppressing strip hooked directly to the mains. The slaves will still
go down hard if the power glitches, but the main filesystem on the master should be protected and still
accessable. Not an ideal solution, but with no funds available for a proper UPS, it's all that could be done.
Update: June 28, 2006
Solved the '-nolocal' requirement issue with mpirun. This is something I've seen in a few of my other
clusters years ago, but never figured out. The problem was a misconfigured /etc/hosts file: linux
puts the node name in the loopback line (127.0.0.1), along with the 'localhost' aliases. Turns out
MPICH doesn't like that. I removed the node name from that line and all was well: each node name is
specified along with it's IP later in the /etc/hosts file, so the system still knows where to find
things.
In other words, in the master's /etc/hosts I changed this line:
127.0.0.1 purple localhost.localdomain localhost
to this:
127.0.0.1 localhost.localdomain localhost
The rest of the file details the cluster IPs:
10.0.0.1 purple
10.0.0.2 green
10.0.0.3 red
10.0.0.4 cyan
10.0.0.10 gateway
I found this in Kurt Swendson's
Beowulf HOWTO. Thanks Kurt!
Update: September 10, 2006
The UPS showed a battery alarm light, despite the battery being less than 8 months old. The battery
was removed for testing, which it passed. In the mean time the UPS went into a test cycle, and
with no battery connected it instantly dumped the power to the master node and switch. The UPS was
removed from the circuit and the cluster is again entirely powered by a surge-suppressed power strip.
Update: December 29, 2006
The cluster ran fine sans UPS until the morning of Thursday, December 28, when a 4-second power
outage shut it down. A different, old, and rather weak UPS was located that would run the
master node and switch. This was installed on December 29, the cluster was brought back up,
and a minor NFS configuration issue on two of the slave nodes was corrected.
Update: February 27, 2007
Since the start and end dates for Daylight Saving Time changed here in the US, all computers need
to be patched (joy). Skittles was used as a test-case for a modified patching technique, which
consisted of copying a new timezone file over /etc/localtime on each node and rebooting. When
this proved successful on skittles it was applied to other linux systems with less concern about
breaking them. Other than that and occasional use by some
Computer Science Department professors
experimenting with MPI, the cluster has seen little use since 2007 began.
Update: March 8, 2007
Cleaned up the Pi calculation benchmark. Here is the new version, which
allows input of the loop depth. The program was run including all nodes for various loop
values. Textual outputs are here. Graphs were also constructed for
Depth vs. execution time,
Depth vs. MFLOPs, and
Depth vs. error.
Update: March 17, 2007
Installed the
GNU Multiple Precision Arithmetic Library (GMP).
Update: April 11, 2007
Skittles is two years old.
Update: May 7, 2007
A misconfiguration of the slave ntp.conf files was allowing their clocks to run
unsynchronized. The problem stemed from changes to the /etc/hosts files on the slaves. The
ntp.conf files were changed to refer to the master node by numerical IP address, and the
clocks were all sync'd using ntpdate before ntpd was restarted. All clocks seem to be
running properly now.
Skittles is about 15 days into a run of the
MPI version of John the Ripper, a password
recovery tool. This run is to benchmark skittles on cracking an 8-character mixed letter
and number password in MD5 format, running in brute-force mode. JtR can also scan databases
of common passwords for possibly faster checking. For example, JtR found the test password
in this case in 23 seconds, since it was in a list of commonly-used passwords. The same
password is being cracked in incremental mode for comparison.
Update: May 24, 2007
Skittles has been chewing on the JtR password problem for 31 days now. Brute-forcing an 8 character
alpha-numeric password is obviously not a trivial task for the cluster. Some have asked for
specs on the system, here they are:
Nodes: 4 Revision A iMacs: 1 master node, 3 processor nodes, each containing:
233 MHz PowerPC 750 CPU
66 MHz bus
96 MB RAM
4 GB hard drive
24x CD-ROM drive
10/100 base-T ethernet
more details here
OS: Yellow Dog Linux version 3.
Interconnect/Firewall: D-Link DI-604 broadband router and 10/100 base-T ethernet switch.
Message Passing: MPICH 1.2.6.
Update: June 11, 2007
The John the Ripper job was stopped and the three compute nodes shut down to
install a spare Tripp-Lite UPS on the compute nodes (the master and router/switch still have
their own, seperate UPS). This was in preperation for a planned power outage. The building
generator kept the cluster running, but UPSs were required to prevent node reboots.
JtR is written to save its state automatically every 10 minutes, so it can restart an
interrupted job without begining from scratch. In each node's log file the following message
appeared, seeming to indicate this function worked:
49:17:00:00 Continuing an interrupted session
49:17:00:00 Loaded a total of 1 password hash
49:17:00:00 Remaining 1 password hash
49:17:00:00 - Hash type: FreeBSD MD5 (lengths up to 15)
49:17:00:00 - Algorithm: 32/32 X2
49:17:00:00 - Candidate passwords will be buffered and tried in chunks of 2
49:17:00:00 Proceeding with "incremental" mode: All
49:17:00:00 - Lengths 0 to 8, up to 95 different characters
All the log files contained this message, all synchronized to 49 days, 17 hours into the project.
The command to restart all the jobs was:
mpirun -np 4 ./john --restore &
As opposed to the original start command:
mpirun -np 4 ./john --incremental experiment_1/passwords.txt &
Update: October 26, 2007
Another day, another power outage. This one was planned, and I was on hand to make sure the
UPSs were plugged into the proper outlets. Murphy's law was in full effect, and in the 30
seconds it took me to move the UPS plugs for two critical satellite receivers, the master
node UPS for skittles went down and so did the node. Ugh! The John the Ripper logs
showed 116 days of crunching on the test password, with still not solution found.
This is incorrect: the program has actually been running for just over six months, but
the counter reset to 49 days during the last restart due to a program bug somewhere.
I tried digging through the code to find the numbers for things like total passwords tried,
but came up with some results I'm not convinced are correct. If they are the cluster
only churned through ~0.3% of the search space, running 850 guesses per second per node,
or 3403 passwords per second for the whole cluster. Even with their 233 MHz CPUs I have
trouble believing the cluster is that slow. However, output from later tests does seem
to confirm a test rate of between 650 to 850 crypts per second.
If it is correct, I don't fancy spending the
next 167 years waiting for the system to run through all the possibilities, so I've
decided to terminate this version of the experiment at this time.
Update: October 29, 2007
To test the cluster in a reasonable amount of time, I ran a different experiment with
trivial passwords. I created four username/password pairs, and ran JtR on the file
containing all of them, in parallel, and using incremental mode. Here
are the results:
Password | Time cracked | Node # |
a | 0:09 | 2 |
ab | 1:44 | 0 |
abc | 0:07 | 0 |
abcd | 1:08 | 2 |
The "Time cracked" is in minutes:seconds, starting from the begining of the experiment.
"Node" is which node of the cluster actually cracked the given password (0-3).
When the passwords were all cracked, I still had to stop the program manually (it
apparently doesn't shut down on its own). At that time each program reported runtime,
the number of passwords cracked, and the crypts/second.
Here are the crypts/second of the cluster nodes:
Node | C/S |
0 | 726 |
1 | 817 |
2 | 652 |
3 | 820 |
This is pretty poor performance, considering that even a Pentium I should be
cranking out crypts/second in the tens of thousands:
Link.
Intrigued that the program cracked the passwords in a different order than I thought,
I performed a second experiment configured such that the system would only work on a
single password at a time. I also expanded the total number of passwords to 8. Most
passwords were run multiple times to verify that the system found them in the same
amount of time and on the same node each time. The results:
Password | Time to crack (seconds) | Node # |
a | 3 | 2 |
ab | 35 | 0 |
abc | 2 | 1 |
abcd | 23 | 2 |
abcde | 15 | 3 |
abcdef | 12 | 2 |
abcdefg | 1183 | 2 |
abcdefgh | Run terminated after 12 days | |
Yes, that's not a typo, "abcdefg" took over 19 minutes to crunch. The next entry,
"abcdefgh", is still being crunched (over 12 hours as of this writing). There's
definatly an upper bound in password length that this system can handle.
Update: November 9, 2007
After 12 days of cracking on the 8-character password and still no solution, I stopped
the experiment. It seems JtR running on this particular cluster has real trouble with
anything over 6 characters, and hits the wall at 8 characters, even for trivial passwords.
I'll need to repeat the experiment on a different cluster with the same software to determine
where the weakness lies.
Update: November 14, 2007
Of course, when wondering about how a program performs on a given platform, it's always a good idea to
RTFM. JtR already has a benchmarking mode available. Here are the specs
on Skittles' master node:
[purple run]$ mpirun -np 1 ./john --test
Benchmarking: Traditional DES [32/32 BS]... DONE
Many salts: 46355.00 c/s real, 46355.00 c/s virtual
Only one salt: 44454.00 c/s real, 44454.00 c/s virtual
Benchmarking: BSDI DES (x725) [32/32 BS]... DONE
Many salts: 1548.00 c/s real, 1548.00 c/s virtual
Only one salt: 1536.00 c/s real, 1536.00 c/s virtual
Benchmarking: FreeBSD MD5 [32/32 X2]... DONE
Raw: 832.00 c/s real, 832.00 c/s virtual
Benchmarking: OpenBSD Blowfish (x32) [32/32]... DONE
Raw: 74.00 c/s real, 74.00 c/s virtual
Benchmarking: Kerberos AFS DES [24/32 4K]... DONE
Short: 25395.00 c/s real, 25395.00 c/s virtual
Long: 69785.00 c/s real, 69785.00 c/s virtual
Benchmarking: NT LM DES [32/32 BS]... DONE
Raw: 696832.00 c/s real, 696832.00 c/s virtual
Benchmarking: Apache MD5 [32/32 X2]... DONE
Raw: 833.00 c/s real, 833.00 c/s virtual
Benchmarking: mysql [mysql]... DONE
Raw: 221352.00 c/s real, 221352.00 c/s virtual
Benchmarking: Netscape LDAP SHA [SHA1]... DONE
Raw: 357451.00 c/s real, 357451.00 c/s virtual
Benchmarking: NT MD4 [TridgeMD4]... DONE
Raw: 357587.00 c/s real, 357587.00 c/s virtual
Benchmarking: Lotus5 [Lotus v5 Proprietary]... DONE
Raw: 33742.00 c/s real, 33742.00 c/s virtual
Benchmarking: M$ Cache Hash [mscash]... DONE
Raw: 204781.00 c/s real, 204781.00 c/s virtual
Benchmarking: Raw MD5 [raw-md5]... DONE
Raw: 329676.00 c/s real, 329676.00 c/s virtual
Benchmarking: Eggdrop [blowfish]... DONE
Raw: 3977.00 c/s real, 3977.00 c/s virtual
Benchmarking: Raw SHA1 [raw-sha1]... DONE
Raw: 354075.00 c/s real, 354075.00 c/s virtual
Benchmarking: MS-SQL [ms-sql]... FAILED (get_hash[0])
Benchmarking: HMAC MD5 [hmac-md5]... DONE
Raw: 100068.00 c/s real, 100068.00 c/s virtual
Benchmarking: WPA PSK [wpa-psk]... DONE
Raw: 12.02 c/s real, 12.02 c/s virtual
Benchmarking: Netscape LDAP SSHA [salted SHA1]... DONE
Raw: 364525.00 c/s real, 364525.00 c/s virtual
|
This particular experiment uses the FreeBSD MD5 password hash, so we're not playing
to Skittles' strengths as a password cracker (though it could be a lot worse).
The test mode also works for cluster mode:
[purple run]$ mpirun -np 4 ./john --test
Benchmarking: Traditional DES [32/32 BS]... DONE
Many salts: 176120.00 c/s real, 176120.00 c/s virtual
Only one salt: 168881.00 c/s real, 168881.00 c/s virtual
Benchmarking: BSDI DES (x725) [32/32 BS]... DONE
Many salts: 5883.00 c/s real, 5883.00 c/s virtual
Only one salt: 5836.00 c/s real, 5836.00 c/s virtual
Benchmarking: FreeBSD MD5 [32/32 X2]... DONE
Raw: 3162.00 c/s real, 3162.00 c/s virtual
Benchmarking: OpenBSD Blowfish (x32) [32/32]... DONE
Raw: 281.10 c/s real, 281.10 c/s virtual
Benchmarking: Kerberos AFS DES [24/32 4K]... DONE
Short: 96521.00 c/s real, 96521.00 c/s virtual
Long: 265368.00 c/s real, 265368.00 c/s virtual
Benchmarking: NT LM DES [32/32 BS]... DONE
Raw: 2647493.00 c/s real, 2647493.00 c/s virtual
Benchmarking: Apache MD5 [32/32 X2]... DONE
Raw: 3162.00 c/s real, 3162.00 c/s virtual
Benchmarking: mysql [mysql]... DONE
Raw: 841118.00 c/s real, 841118.00 c/s virtual
Benchmarking: Netscape LDAP SHA [SHA1]... DONE
Raw: 1358339.00 c/s real, 1358339.00 c/s virtual
Benchmarking: NT MD4 [TridgeMD4]... DONE
Raw: 1367507.00 c/s real, 1367507.00 c/s virtual
Benchmarking: Lotus5 [Lotus v5 Proprietary]... DONE
Raw: 128262.00 c/s real, 128262.00 c/s virtual
Benchmarking: M$ Cache Hash [mscash]... DONE
Raw: 798270.00 c/s real, 798270.00 c/s virtual
Benchmarking: Raw MD5 [raw-md5]... DONE
Raw: 1252500.00 c/s real, 1252500.00 c/s virtual
Benchmarking: Eggdrop [blowfish]... DONE
Raw: 15103.00 c/s real, 15103.00 c/s virtual
Benchmarking: Raw SHA1 [raw-sha1]... DONE
Raw: 1345390.00 c/s real, 1345390.00 c/s virtual
Benchmarking: MS-SQL [ms-sql]... FAILED (get_hash[0])
Benchmarking: HMAC MD5 [hmac-md5]... DONE
Raw: 382865.00 c/s real, 382865.00 c/s virtual
Benchmarking: WPA PSK [wpa-psk]... DONE
Raw: 45.13 c/s real, 45.13 c/s virtual
Benchmarking: Netscape LDAP SSHA [salted SHA1]... DONE
Raw: 1385954.00 c/s real, 1385954.00 c/s virtual
|
The increase is fairly linear, though not quite a 4x speed-up. Cluster communications
overhead and variations between processors are likely to blame.
Update: December 17, 2007
Sometime between November 14 and December 17 the 'red' node failed. The node would
power on, but when the screen would normally switch on there was a nasty
electrical crackling sound, and the power supply would shut down.
With no spares available, the cluster is now down to three nodes.
Update: July 8, 2008
It was fun, but now it's done. The space the cluster took up was needed for
another project, so skittes was taken down and disassembled, likely for good.