P
is a perl library that can speed up Unix scripts
that process line oriented files by utilizing
an array of inexpensive machines with associated
disks.
The
idea is to take a normal shell script and add
parallel directives to allow it to run the functions
on each of several machines on pieces of the
input files to produce pieces of the output
file. The pieces of the output are merged back
together if need be. Therefore no new programming
needs to be done, rather these are directives
that can speed up existing programs by running
them in parallel.
P
is meant for non-computer scientists to do first-class
Datamining on the largest datasets using inexpensive
hardware. Here are some topics and the idea
of P: