After two straight all-nighters spent working on his Beta design, Ben Bitdiddle rushes off to his final 6

Problem 2: Designing a Parallel Machine

In this problem we will design a parallel machine for sorting

numbers.

Each processor has N input connections and N output connections.

Each processor runs a single function called run() once per
communication clock cycle. (Note: the communication clock cycle is
the frequency at which messages can be sent or received. It is
significantly slower than the processor clock cycle.) For each
communications cycle run is called with an array of inputs,
containing one value for each of the input connections. As the
function runs it can send one message out along each of its N output
connections. (Note: The outputs messages are collected and sent out
all at once at the beginning of the next communication cycle. In
other words, all communication is synchronous.)

When the parallel machine is turned on, the value for the variable
CURRENT is initialized to an integer. (Note: Every processor
has its own independent value of CURRENT.) On the first communication
cycle the function first_cyle is run. From then on the function
run is called for each subsequent cycle.

You are given the following definitions:

int CURRENT;

run(int *inputs)
{
int i;
int *sorted;

sorted = sort_ints(inputs);

CURRENT = sorted[N/2];

for (i = 0; i<N; i++)
send(i, sorted[i]);
}

first_cycle(int *inputs)
{
    int i;
    for (i = 0; i<N; i++)
       send(i, CURRENT);
}

Note: the variable N is set to the number of connections. The
expression N/2 yields the largest integer less than or equal to
N/2.

You are told that these functions are useful for building a parallel
machine that can sort integers quickly. (Note: sort_ints takes
an array of ints and returns an array which is sorted smallest to
largest.)

A) You are given 5 processors labelled A through E, each with 3 inputs
and 3 outputs. Based on the definitions above, draw a connection
diagram so that the parallel system will eventually sort the numbers
initially stored in CURRENT (the smallest value should appear in
A, the largest value in E).

B) Place the values 3 in A, 4 in B, 2 in C, 1 in D, and 5 in E. Show
values for CURRENT and the messages on the second cycle. Show the
values for CURRENT and the messsage on the 10th cycle.

C) In the general case where there are M processors and M integers
arranged in this fashion, how many communication cycles are required
to sort the numbers? Please use "theta" notation.

E) Ben Bitdiddle shows up and suggests that instead we build our
machine with processors that have M+1 connections. "That would make the
machine really fast!", he says. Could the programs given above be
used to wire up a machine with M processors, M+1 inputs connections, and
M+1 output connections? Describe how this would work. How many
communication cyles are required to sort the numbers?

F) Does Ben's idea make any sense? Is this really going to save
anything over a serial implementation.

G) (Extra Credit) Alyssa shows up and suggests that it might be better
to build a machine with 4 input connections and 4 outputs. Can you
propose a wiring scheme for such a machine? How would it work? Would
it work better than the original 2 input design?