--------------------------------------------------------------------------------
- Required CGI arguments:

identifier=<identifier>

from_url=<url>:
    Must start with 'http://', 'ftp://' or 'rsync://'.

    For 'http://' and 'ftp://' urls, from_url is to be a single file that
    will be transferred into an EXISTING item's directory, overwriting
    if the single file already exists.

    For 'ftp://' urls that END with '/', we will treat the url as a directory
    and recursively copy the contents into the item.
    If the directory represented by the url contains additional subdirs, those
    subdirs will be preserved.  For example, if these files exist:
      ftp://moo.org/tmp/stairs/a.txt
      ftp://moo.org/tmp/stairs/b.txt
    and you use 'from_url=ftp://moo.org/tmp/', the item will end up with
    new subdir 'stairs' with 'a.txt' and 'b.txt' in it.
    Do not use '/./' or '/../' strings with your url when remote url has subdirs!

    For 'rsync://' urls, from_url may be a single file (like above)
    by using 'update_mode=1' argument.  If  'update_mode=1' is NOT present,
    rysnc url is an entire directory that will entirely REPLACE an existing
    item's directory with the contents of the rsync url's directory or create it
    if does not exist (this is typically ONLY used for shuffling an existing
    item from one server to another).

    The 'basename' of the from_url will be used for the name of destination
    file unless optional argument 'filename' is used.


--------------------------------------------------------------------------------
- Optional CGI arguments:

update_mode=1:
    Only makes sense when 'from_url' argument is using 'rsync://'.
    This indicates rsync source is NOT a directory and the 'from_url' is
    an individual file instead.

submitter=<email or other identifying string>
    Used to help us determine which person submitted this request.

comment=<free form string>
    Allows submitter to add some extra information to the submission.

nolo=1:
    NO LOcate.  This can *only* be used for *new items*.  It makes our system
    perform a bit faster and verify identifier does *not* locate later when
    the task actually starts up.  You should be fairly confident the identifier
    you are using not in use already or else your task will red row until an
    admin can sort the item out, likely ending in the need for you to resubmit
    with a different identifier.

checksum=0:
    For items being entirely replaced via rsync source or items adding files
    via rsync source; if task is being run first time (ie: not a rerun)
    do *not* add the " -c " param to rsync to do full checksumming of
    source files vs. dest files, during the transfer.

--------------------------------------------------------------------------------
- Optional (less common) CGI arguments:

filename=<destination name>:
    Used to override the basename of the 'from_url' argument.
    For example, if you want to create a '_meta.xml' file with a CGI script,
    on some other server, you can set this to '<identifier>_meta.xml'.
    This can NOT be used when: the 'from_url' is using 'rsync://' AND
    the 'update_mode=1' param is NOT set (and thus param will be ignored).


md5=<checksum>:
    Checksum for the 'from_url' file.
    When used, after copying the from_url file, the script will take this MD5
    and verify that the md5 of the destination file is the same as this value.
    When set to special checksum '0', that means:
       "compute the md5 for me and insert it into the '_files.xml'".
       (Typically this is used to produce a verifiable '_files.xml'.)

verify=1:
    will use the verifiable '_files.xml' file (with full MD5 checksums for each
    and every file in the item) after a transfer to ensure that all files in
    the item have the expected MD5s.

shuffle=1:
    Indicates we are shuffling an item from one machine/drive to another
    machine/drive.

delete_from_source=2:
    Means the source directory of the from_url should be deleted after
    successful transfer.  (Typically only used during shuffle.)

item_size=<number>:
    When known, total size of item in KB (Typically only used during shuffle.)

filesize=<number>:
    When known, size of file(s) in KB.

priority=<number>:
    [Value may be -127..127]
    HIGHER value indicates HIGHER priority (run sooner).
    Task priority defaults to 0 unless specified otherwise.
    PLEASE use -2 or lower when submitting bulk tasks (like S3) to us!

imagecount=<number>:
    Some tasks can use this number for graphing purposes.  It's the number
    of pages in a book or the number of seconds in a video.

to=<destination url>:
    Used for shuffling a single item to the hostname and subdir indicated.

to_host_type=SOLO:
    Places an item being newly created on a SOLO node (no backup copy), instead
    of the usual paired storage.

next_cmd=<work task>:
    Can be used to cause another task (eg: 'derive') to queue after this 'task'
    to copy the file(s) to the item is done.  You may pass arguments to the task
    via urlencoding.  Simple example: 'next_cmd=derive'.
       Complex example: 'next_cmd=book_op%3Fcmd%3Dstage2%26next_cmd%3Dderive'.

new_item_http=1:
    Allows a new item to start out with a 'from_url' that starts with 'http://'

rsync_by_file=1:
    Allows a new item to start out with an rsync URL that refers to a single
    file rather than to a directory.

no_derive=1:
    indicates that after the file(s) have been copied, no matter what, do not
    kick off a subsequent derive task.
    (certain situations trigger a derive normally)

replacing=1:
    Indicates that the invoker is using an 'rsync://' url in 'from_url' and
    is specifically overwriting (or updating) the entire contents of an
    item that already exists.

exclude=<pattern>:
    Typically used when invoker is using an 'rsync://' url in 'from_url' and
    wants to exclude files on source from being transferred, like:
     *_cr2.tar

stub_meta=1:
    Typica1lly used when invoker is using an 'rsync://' url in 'from_url' and
    would like the barest of _meta.xml files created for the item (if none
    exists after the contents in rsync url have been copied into item).

stub_files=1:
    Typica1lly used when invoker is using an 'rsync://' url in 'from_url' and
    would like the barest of _files.xml files created for the item (if none
    exists after the contents in rsync url have been copied into item).
    By default, this will also trigger &lt;format&gt; picking for each file
    and derive queuing for new items.

scribe=<number>
    Indicates to the catalog that the operation is related to the scribe
    scanning process. Valid arguments are 1, 2, or 3, where 1 indicates
    a Windows Scribe station, 2 indicates a Linux Scribe station, and
    3 indicates a Windows Microfilm station.

tester=<name>
    Indicates a 'tester' arg to be passed to the catalog; catalog will run the
    version of archive.php in /home/<name>/petabox and pass the 'tester' arg
    along to any succeeding commands.

delete_single_file=<filename>
    Delete a single file from an item, careful with this!

s3_keep_old_version=(0|1)
    If set to 1 will cause deletes and updates via the s3-put mechanism
    to create emacs style version files.

rush=1
    If the task is known to have very small updates (eg: XML changes to item)
    this can be set to cause quicker consideration when evaluating resource
    limits at runtime.

complete_multipart_upload_id=<id>
    Stitch and complete an s3 multipart upload already in the item.
    Requires complete_multipart_upload_filename to specify the
    file the upload is assembled into.

complete_multipart_upload_filename=<id>
    Filename to put the multipart upload into.

delete_multipart_upload_id=<id>
    Remove an incomplete s3 multipart upload (and all its component parts)
    from an item.

overwrite=1
    Allow upload task in mode scribe=2 (books from scribes) to ignore check for
    existing image stack and scandata.

from_SEC_allowed=1
    If the from_url host is an archive.org PRIMARY datanode, allow the file to
    be copied from the SECONDARY instead when all of the following apply:
      - the PRI isn't in the "site" (datacenter) we're running in
      - that host's SEC *is* in the datacenter we're running in
      - the mtime of the itemdir on the PRI is no later than that on the SEC

--------------------------------------------------------------------------------
- Way to upload multiple files to EXISTING item:

  You may use extra CGI arguments to indicate 2nd, 3rd, ... files to transfer
  via 'from_url2', 'from_url3', ...
  Files will copied in order from_url, from_url2, from_url3, etc...
  You may pair an 'md5', 'filename', and/or 'filesize' argument to the file with arguments
  like:
    'filename2=my2ndfile.jpg'
    'md52=8c99426b825e6fff0bf3ad00fd0f7c07'
    'md53=83ff480e9ceba842ea358eea4bdda729'
    'filename4=my4thfile.jpg'
    'filesize2=11912'


--------------------------------------------------------------------------------
- Complex item example (to add three files into existing item; will verify one file's MD5):
identifier=XXX
priority=-3
submitter=XXX@archive.org
update_mode=1

from_url=[url dir]/XXX.mov
filesize=2445667

from_url1=[url dir]/XXX.avi
md51=XXX
filesize1=122912

from_url2=http://archive.org/services/arc_meta.php?identifier=XXX
md52=0
filename2=XXX_meta.xml


--------------------------------------------------------------------------------
PLEASE check the from_url to verify the FTP, HTTP, or rsync (file or directory)
works first when doing bulk submissions!  8-)