Each processor has N input connections and N output connections.
Each processor runs a single function called run()
once per
communication clock cycle. (Note: the communication
clock cycle is
the frequency at which messages can be sent or received.
It is
significantly slower than the processor clock cycle.)
For each
communications cycle run is called with an array
of inputs,
containing one value for each of the input connections.
As the
function runs it can send one message out along each
of its N output
connections. (Note: The outputs messages are collected
and sent out
all at once at the beginning of the next communication
cycle. In
other words, all communication is synchronous.)
When the parallel machine is turned on, the value for
the variable
CURRENT is initialized to an integer. (Note:
Every processor
has its own independent value of CURRENT.)
On the first communication
cycle the function first_cyle is run.
From then on the function
run is called for each subsequent cycle.
You are given the following definitions:
int CURRENT;
run(int *inputs)
{
int i;
int *sorted;
sorted = sort_ints(inputs);
CURRENT = sorted[N/2];
for (i = 0; i<N; i++)
send(i, sorted[i]);
}
first_cycle(int *inputs)
{
int i;
for (i = 0; i<N; i++)
send(i, CURRENT);
}
Note: the variable N is set to the number of connections.
The
expression N/2 yields the largest integer less than or
equal to
N/2.
You are told that these functions are useful for building
a parallel
machine that can sort integers quickly. (Note:
sort_ints
takes
an array of ints and returns an array which is sorted
smallest to
largest.)
A) You are given 5 processors labelled A through E, each
with 3 inputs
and 3 outputs. Based on the definitions above,
draw a connection
diagram so that the parallel system will eventually sort
the numbers
initially stored in CURRENT (the smallest value
should appear in
A, the largest value in E).
B) Place the values 3 in A, 4 in B, 2 in C, 1 in D, and
5 in E. Show
values for CURRENT and the messages on the second
cycle. Show the
values for CURRENT and the messsage on the 10th
cycle.
C) In the general case where there are M processors and
M integers
arranged in this fashion, how many communication cycles
are required
to sort the numbers? Please use "theta" notation.
E) Ben Bitdiddle shows up and suggests that instead we
build our
machine with processors that have M+1 connections.
"That would make the
machine really fast!", he says. Could the programs
given above be
used to wire up a machine with M processors, M+1 inputs
connections, and
M+1 output connections? Describe how this would work.
How many
communication cyles are required to sort the numbers?
F) Does Ben's idea make any sense? Is this really
going to save
anything over a serial implementation.
G) (Extra Credit) Alyssa shows up and suggests that it
might be better
to build a machine with 4 input connections and 4 outputs.
Can you
propose a wiring scheme for such a machine? How
would it work? Would
it work better than the original 2 input design?