Problem 2: Designing a Parallel Machine

    In this problem we will design a parallel machine for sorting
    numbers.

    Each processor has N input connections and N output connections.

    Each processor runs a single function called run() once per
    communication clock cycle.  (Note: the communication clock cycle is
    the frequency at which messages can be sent or received.  It is
    significantly slower than the processor clock cycle.)  For each
    communications cycle run is called with an array of inputs,
    containing one value for each of the input connections.  As the
    function runs it can send one message out along each of its N output
    connections.  (Note: The outputs messages are collected and sent out
    all at once at the beginning of the next communication cycle.  In
    other words, all communication is synchronous.)

    When the parallel machine is turned on, the value for the variable
    CURRENT is initialized to an integer.  (Note: Every processor
    has its own independent value of CURRENT.)  On the first communication
    cycle the function  first_cyle is run.  From then on the function
    run is called for each subsequent cycle.

    You are given the following definitions:

    int CURRENT;

    run(int *inputs)
    {
        int i;
        int *sorted;

        sorted = sort_ints(inputs);

        CURRENT = sorted[N/2];

        for (i = 0; i<N; i++)
           send(i, sorted[i]);
    }
     

    first_cycle(int *inputs)
    {
        int i;
        for (i = 0; i<N; i++)
           send(i, CURRENT);
    }

    Note: the variable N is set to the number of connections.  The
    expression N/2 yields the largest integer less than or equal to
    N/2.

    You are told that these functions are useful for building a parallel
    machine that can sort integers quickly.  (Note: sort_ints takes
    an array of ints and returns an array which is sorted smallest to
    largest.)

    A) You are given 5 processors labelled A through E, each with 3 inputs
    and 3 outputs.  Based on the definitions above, draw a connection
    diagram so that the parallel system will eventually sort the numbers
    initially stored in CURRENT (the smallest value should appear in
    A, the largest value in E).

    B) Place the values 3 in A, 4 in B, 2 in C, 1 in D, and 5 in E.  Show
    values for CURRENT and the messages on the second cycle.  Show the
    values for CURRENT and the messsage on the 10th cycle.

    C) In the general case where there are M processors and M integers
    arranged in this fashion, how many communication cycles are required
    to sort the numbers? Please use "theta" notation.

    E) Ben Bitdiddle shows up and suggests that instead we build our
    machine with processors that have M+1 connections.  "That would make the
    machine really fast!", he says.  Could the programs given above be
    used to wire up a machine with M processors, M+1 inputs connections, and
    M+1 output connections?  Describe how this would work.  How many
    communication cyles are required to sort the numbers?

    F) Does Ben's idea make any sense?  Is this really going to save
    anything over a serial implementation.

    G) (Extra Credit) Alyssa shows up and suggests that it might be better
    to build a machine with 4 input connections and 4 outputs.  Can you
    propose a wiring scheme for such a machine?  How would it work?  Would
    it work better than the original 2 input design?