-------------------------------------------------------------------------------- - Required CGI arguments: identifier= from_url=: Must start with 'http://', 'ftp://' or 'rsync://'. For 'http://' and 'ftp://' urls, from_url is to be a single file that will be transferred into an EXISTING item's directory, overwriting if the single file already exists. For 'ftp://' urls that END with '/', we will treat the url as a directory and recursively copy the contents into the item. If the directory represented by the url contains additional subdirs, those subdirs will be preserved. For example, if these files exist: ftp://moo.org/tmp/stairs/a.txt ftp://moo.org/tmp/stairs/b.txt and you use 'from_url=ftp://moo.org/tmp/', the item will end up with new subdir 'stairs' with 'a.txt' and 'b.txt' in it. Do not use '/./' or '/../' strings with your url when remote url has subdirs! For 'rsync://' urls, from_url may be a single file (like above) by using 'update_mode=1' argument. If 'update_mode=1' is NOT present, rysnc url is an entire directory that will entirely REPLACE an existing item's directory with the contents of the rsync url's directory or create it if does not exist (this is typically ONLY used for shuffling an existing item from one server to another). The 'basename' of the from_url will be used for the name of destination file unless optional argument 'filename' is used. -------------------------------------------------------------------------------- - Optional CGI arguments: update_mode=1: Only makes sense when 'from_url' argument is using 'rsync://'. This indicates rsync source is NOT a directory and the 'from_url' is an individual file instead. submitter= Used to help us determine which person submitted this request. comment= Allows submitter to add some extra information to the submission. nolo=1: NO LOcate. This can *only* be used for *new items*. It makes our system perform a bit faster and verify identifier does *not* locate later when the task actually starts up. You should be fairly confident the identifier you are using not in use already or else your task will red row until an admin can sort the item out, likely ending in the need for you to resubmit with a different identifier. checksum=0: For items being entirely replaced via rsync source or items adding files via rsync source; if task is being run first time (ie: not a rerun) do *not* add the " -c " param to rsync to do full checksumming of source files vs. dest files, during the transfer. -------------------------------------------------------------------------------- - Optional (less common) CGI arguments: filename=: Used to override the basename of the 'from_url' argument. For example, if you want to create a '_meta.xml' file with a CGI script, on some other server, you can set this to '_meta.xml'. This can NOT be used when: the 'from_url' is using 'rsync://' AND the 'update_mode=1' param is NOT set (and thus param will be ignored). md5=: Checksum for the 'from_url' file. When used, after copying the from_url file, the script will take this MD5 and verify that the md5 of the destination file is the same as this value. When set to special checksum '0', that means: "compute the md5 for me and insert it into the '_files.xml'". (Typically this is used to produce a verifiable '_files.xml'.) verify=1: will use the verifiable '_files.xml' file (with full MD5 checksums for each and every file in the item) after a transfer to ensure that all files in the item have the expected MD5s. shuffle=1: Indicates we are shuffling an item from one machine/drive to another machine/drive. delete_from_source=2: Means the source directory of the from_url should be deleted after successful transfer. (Typically only used during shuffle.) item_size=: When known, total size of item in KB (Typically only used during shuffle.) filesize=: When known, size of file(s) in KB. priority=: [Value may be -127..127] HIGHER value indicates HIGHER priority (run sooner). Task priority defaults to 0 unless specified otherwise. PLEASE use -2 or lower when submitting bulk tasks (like S3) to us! imagecount=: Some tasks can use this number for graphing purposes. It's the number of pages in a book or the number of seconds in a video. to=: Used for shuffling a single item to the hostname and subdir indicated. to_host_type=SOLO: Places an item being newly created on a SOLO node (no backup copy), instead of the usual paired storage. next_cmd=: Can be used to cause another task (eg: 'derive') to queue after this 'task' to copy the file(s) to the item is done. You may pass arguments to the task via urlencoding. Simple example: 'next_cmd=derive'. Complex example: 'next_cmd=book_op%3Fcmd%3Dstage2%26next_cmd%3Dderive'. new_item_http=1: Allows a new item to start out with a 'from_url' that starts with 'http://' rsync_by_file=1: Allows a new item to start out with an rsync URL that refers to a single file rather than to a directory. no_derive=1: indicates that after the file(s) have been copied, no matter what, do not kick off a subsequent derive task. (certain situations trigger a derive normally) replacing=1: Indicates that the invoker is using an 'rsync://' url in 'from_url' and is specifically overwriting (or updating) the entire contents of an item that already exists. exclude=: Typically used when invoker is using an 'rsync://' url in 'from_url' and wants to exclude files on source from being transferred, like: *_cr2.tar stub_meta=1: Typica1lly used when invoker is using an 'rsync://' url in 'from_url' and would like the barest of _meta.xml files created for the item (if none exists after the contents in rsync url have been copied into item). stub_files=1: Typica1lly used when invoker is using an 'rsync://' url in 'from_url' and would like the barest of _files.xml files created for the item (if none exists after the contents in rsync url have been copied into item). By default, this will also trigger <format> picking for each file and derive queuing for new items. scribe= Indicates to the catalog that the operation is related to the scribe scanning process. Valid arguments are 1, 2, or 3, where 1 indicates a Windows Scribe station, 2 indicates a Linux Scribe station, and 3 indicates a Windows Microfilm station. tester= Indicates a 'tester' arg to be passed to the catalog; catalog will run the version of archive.php in /home//petabox and pass the 'tester' arg along to any succeeding commands. delete_single_file= Delete a single file from an item, careful with this! s3_keep_old_version=(0|1) If set to 1 will cause deletes and updates via the s3-put mechanism to create emacs style version files. rush=1 If the task is known to have very small updates (eg: XML changes to item) this can be set to cause quicker consideration when evaluating resource limits at runtime. complete_multipart_upload_id= Stitch and complete an s3 multipart upload already in the item. Requires complete_multipart_upload_filename to specify the file the upload is assembled into. complete_multipart_upload_filename= Filename to put the multipart upload into. delete_multipart_upload_id= Remove an incomplete s3 multipart upload (and all its component parts) from an item. overwrite=1 Allow upload task in mode scribe=2 (books from scribes) to ignore check for existing image stack and scandata. from_SEC_allowed=1 If the from_url host is an archive.org PRIMARY datanode, allow the file to be copied from the SECONDARY instead when all of the following apply: - the PRI isn't in the "site" (datacenter) we're running in - that host's SEC *is* in the datacenter we're running in - the mtime of the itemdir on the PRI is no later than that on the SEC -------------------------------------------------------------------------------- - Way to upload multiple files to EXISTING item: You may use extra CGI arguments to indicate 2nd, 3rd, ... files to transfer via 'from_url2', 'from_url3', ... Files will copied in order from_url, from_url2, from_url3, etc... You may pair an 'md5', 'filename', and/or 'filesize' argument to the file with arguments like: 'filename2=my2ndfile.jpg' 'md52=8c99426b825e6fff0bf3ad00fd0f7c07' 'md53=83ff480e9ceba842ea358eea4bdda729' 'filename4=my4thfile.jpg' 'filesize2=11912' -------------------------------------------------------------------------------- - Complex item example (to add three files into existing item; will verify one file's MD5): identifier=XXX priority=-3 submitter=XXX@archive.org update_mode=1 from_url=[url dir]/XXX.mov filesize=2445667 from_url1=[url dir]/XXX.avi md51=XXX filesize1=122912 from_url2=http://archive.org/services/arc_meta.php?identifier=XXX md52=0 filename2=XXX_meta.xml -------------------------------------------------------------------------------- PLEASE check the from_url to verify the FTP, HTTP, or rsync (file or directory) works first when doing bulk submissions! 8-)