Introduction
| host | IP | CPU (MHz) | RAM (MB) | functions |
| ajax | 192.168.1.1 | 133 | 16 | NFS server, default route, timed server, location of master passwords |
| rosebud | 192.168.1.2 | 133 | 32 | |
| epic | 192.168.1.3 | 133 | 32 | |
| fusion | 192.168.1.4 | 200 | 128 | |
| optimus | 192.168.1.5 | 75 | 16 | |
| void | 192.168.1.6 | 75 | 32 | |
| magnum | 192.168.1.7 | 133 | 64 | X workstation |
| octane | 192.168.1.8 | 166 | 64 | |
| spectre | 192.168.1.9 | 166 | 48 | X workstation |
#!/bin/csh # if /etc/uploads doesn't exist, create it if !( -d /etc/uploads ) then mkdir /etc/uploads endif chown root /etc/uploads chmod 700 /etc/uploads mount ajax:/etc/uploads /etc/uploads # copy over publically readable files foreach f (hosts passwd group pwd.db motd ) cp /etc/uploads/$f /etc/ chmod 644 /etc/$f chown root /etc/$f chgrp wheel /etc/$f end # copy over private files foreach f (master.passwd spwd.db hourly_upload ) cp /etc/uploads/$f /etc/ chmod 600 /etc/$f chown root /etc/$f chgrp wheel /etc/$f end # copy over new crontab cp /etc/uploads/root /var/cron/tabs/ crontab -u root /var/cron/tabs umount /etc/uploads
In addition to various configuration files, the script copies itself out also, allowing global changes to be made easily. Other network services that were added were rwho and timed. Rwho was added so one can quickly determine the load on each of the compute nodes. Timed was added so each node has a consistent clock. Clock consistency is important, becuase the NFS server would update its uploads at five minutes to the hour, and the slaves would fetch these updates on the hour. Any sort of skew would cause headaches should the file be only partially copied over. Timed resolved this before it became an issue.
Lastly, ssh needed to be configured to fall back to rsh, in order for pvm to be able to spawn itself on each of the slave nodes. In /etc/sshd.conf, the line FallbackToRsh = Yes was uncommented. Some scripts were created to facilitate configuring each node. The first script was downloaded via ftp off the NFS server. Running the script started the necessary rpc services for NFS, then mounted /home. Next, changes were appended to /etc/rc.conf so that the nfs client will be started on boot, and the nfs mount was also appened to the fstab. Lastly, the script determines the hostname by looking for its ip address in the hosts file, then creating the correct /etc/myname file so the hostname gets set on next boot. Finally, the machine was rebooted, to make sure all the changes persisted.
#!/bin/csh
# gets this node onto the network
rpcbind -l
nfsd
mount 192.168.1.1:/home /home
cat /home/conf/rc.conf >> /etc/rc.conf
cat /home/conf/fstab >> /etc/fstab
cp /home/conf/hosts /etc/hosts
set addr = `cat /etc/ifconfig.* | awk '{print $2}'`
set name = `grep $addr /etc/hosts | awk '{print $3}'`
echo $name > /etc/myname
Even with only 8 machines, pvm will spawn more than one process per host if necessary, so with our 8 machines, we were able to varry two parameters (typically the number of hosts, and processes, or packet sizes ) to determine optimal parameters. On the graphs that follow, time is illustrated as a function of both number of hosts, and either processes (itteration by lines), or packet sizes (itteration by blocks ).
Master:
spawn n slaves
tell each slave the size and coordinates of the image
send out n pieces of initial work
for( j=0; j<height; j++ )
for( i=0; i<width; i++)
receive a message from any slave
tell that slave to find point (i, j )
tell all slaves to halt
Slave:
receive initial information
while( 1 ) {
get a point from the master
if its a -1, then halt
determine if the point is in the set, and send the result back
}
The point-by-point algorithm is decidely network-bound. Because the message size (8 bytes) and the response size ( 4 bytes ) were so small, and each task (computing the point) was relatively quick, most of the time is spent sending and receiving messages. As the histogram below illustrates, there was almost even distribution of work (the master machine accounts for slightly more work because it talks over loopback instead of the network interface). The collision light on the hub was lit almost as often as the activity lights for the other hosts. In addition, the histogram yielded intersting results: a static-like pattern, with near-even distribtuion of colors representing each host. The amount of overhead involved in this algorithm is staggering: on a typical system, 5% of cpu time was devoted to interrupt handling, 40% of cpu time was devoted to system (kernel) processes, 35% was used by the pvm daemon, and the remaining 20% for the actual slave process. All these signs support the conclusion that this algorithm was decidely network-bound. More time was spent waiting on lower-level protocols (ie ethernet collisions).
Itteration by lines
Master: spawn N slaves Tell each slave the size and coordinates of the image for (i=0;i<N;i++) tell the slave to process line i for(j=N;j<width;j++) recieve an output from a slave put the output into a data structure tell the slave to work on the next line for(i=0;i<N;i++) recieve an output from a slave kill the slave that just returned something Slave: Receive Initial information about the set while(1) Get a line from the master determine the output of the line. store the output in a output structure send the entire output structure to the masterThe Line-by-Line algorithm is not network bound as the previous algorithm. The message size is exactly the number of pixles in a line + 1. This means that for the test set our message size is 601 integers. The number of messages is also smaller by far than the point-by-point algorythm because there are only width number of messages sent and width number of messages recieved. (In out test case 800). The histogram results display a pattern that implies that the faster machines get more messages done. The distribution of the colors in the histogram changes with the complexity of the space being generated. The overall color of a region of great complexity is decidedly not an even distribution of the hosts colors as it is for an easy region. The typical run of this algorithm yielded nearly 0% interrupt handling and a similar value for system processes. The slave process was also very low. most of the computing time is spent on the actual slave process. This means that it is more efficient than the Point-by-point method because there is less communication by the pvmd to the master. The large packet size sent also implies that there would be less transmissions and hence fewer interrups on all the systems.

Itteration by blocks
Master:
spawn n slaves
tell each slave the size, coordinates of the image, and block size
for( j=0; j<height; j+= yskip )
for( i=0; i<width; i+= xskip)
receive a message from any slave
tell that slave to find block yskip by xskip at (i, j )
tell all slaves to halt
Slave:
receive initial information
while( 1 ) {
get a point from the master
if its a -1, then halt
compute the block starting at that point
send the results back to the master
}
8 Processor timing results:

