This is extracted from an email from Jaime Frey on a Globus mailing list: http://www-unix.globus.org/mail_archive/developers/2002/04/msg00007.html

New Job State
-------------
GLOBUS_GRAM_PROTOCOL_JOB_STATE_UNSUBMITTED
Indicates that the job has not been submitted. The state will never be
seen by old clients. It is only returned only in response to client
calls (status, register, signal, etc) made before the job has been
submitted.

New RSL Attributes
------------------
(save_state=yes)
Causes the jobmanager to save job state/information to a persistent file
on disk. If the jobmanager crashes, the client can later start up a new
jobmanager that can take over watching of the job.

(two_phase=<int>)
Implement a two-phase commit for job submission and completion. The
jobmanager will respond to the initial job request with a
WAITING_FOR_COMMIT error. It will then wait for a signal from the client
before doing the actual job submission. The integer supplied is the number
of seconds the jobmanager should wait before timing out. If the jobmanager
times out before receiving a commit signal (or the client issues a
cancel), the jobmanager will clean up the job's files and exit (after
sending a FAILED callback). After the jobmanager sends a DONE or FAILED
callback (the final callback), it will wait for a commit signal from the
client. If it receives one, it cleans up and exits as usual. If it times
out and save_state was enabled, it will leave all the job's files in-place
and exit (assuming the client is down and will attempt a job restart
later). The timeout value can be extended via a signal.
When one of the following errors occurs, the jobmanager doesn't not delete
the job state file when it exits. Since it can be restarted in these
cases, it doesn't wait for the commit signal after sending the FAILED
callback:
GLOBUS_GRAM_PROTOCOL_ERROR_COMMIT_TIMED_OUT
GLOBUS_GRAM_PROTOCOL_ERROR_TTL_EXPIRED
GLOBUS_GRAM_PROTOCOL_ERROR_JM_STOPPED
GLOBUS_GRAM_PROTOCOL_ERROR_USER_PROXY_EXPIRED

(restart=<old jm contact>)
Start a new jobmanager but instead of submitting a new job, start watching
over an existing job. The jobmanager will search for the job state file
created by the original jobmanager (requires that save_state was enabled
in the original submission). If it finds the file and successfully reads
it, it will become the new watcher of the job, sending callbacks on status
and streaming stdout/err if appropriate. It can fail if it detects that
the old jobmanager is still alive (via a timestamp in the state file). If
stdout/err was being streamed over the network, new stdout and stderr
attributes can be specified in the restart RSL and the jobmanager will
stream to the new locations (useful when output is going to a GASS server
started by the client that's listening on a dynamic port, and the client
was restarted). The new jobmanager will return a new contact string that
should be used to communicate with it. If a jobmanager is restarted
multiple times, any of the previous contact strings can be given for the
restart attribute.

(stdout_position=<int>)
(stderr_position=<int>)
Can be specified as part of a job restart RSL. Specifies where in the
file streaming should be restarted from for streamed output.

(remote_io_url=<url base>)
Writes the given url to a file and puts GLOBUS_REMOTE_IO_URL=<path to file>
in the job's environment. If specified as part of a job restart RSL,
updates the contents of the file. This is intended for jobs that want to
access files from the client via GASS, but the port the GASS server is
listening on can change if the client crashes and recovers.

New GRAM Signals
----------------
GLOBUS_GRAM_PROTOCOL_JOB_SIGNAL_COMMIT_REQUEST
Tells the jobmanager to do a job submission if two_phase is enabled.

GLOBUS_GRAM_PROTOCOL_JOB_SIGNAL_COMMIT_END
Tells the jobmanager to do a job cleanup if two_phase is enabled.

GLOBUS_GRAM_PROTOCOL_JOB_SIGNAL_COMMIT_EXTEND
If the jobmanager is currently waiting for a commit signal, tells it to
wait an additional n seconds (where n is the string argument).

GLOBUS_GRAM_PROTOCOL_JOB_SIGNAL_STDIO_UPDATE
Allows client to submit an RSL that changes some I/O attributes of the job:
stdout (if stdout is streamed)
stderr (if stderr is streamed)
stdout_position (if stdout is streamed)
stderr_position (if stderr is streamed)
remote_io_url
This allows updating of I/O going to a dynamic port when the port changes
due to a crash, etc.

GLOBUS_GRAM_PROTOCOL_JOB_SIGNAL_STDIO_SIZE
Allows the client to verify that streamed I/O has been completely
received. The signal string should contain the number of bytes of stdout
and stderr received, seperated by a space. Success is returned if these
matched the amount sent. Otherwise, an error of STDIO_SIZE is returned.
If stdout/err is merged, only one number should be sent. A size of -1
means not to check the size of that stream (useful when only one of
stdout and stderr is streamed).

GLOBUS_GRAM_PROTOCOL_JOB_SIGNAL_STOP_MANAGER
Tells the jobmanager to exit, but leave the job running. One final
callback with state FAILED and error JM_STOPPED will be sent.

Some limitations concerning streamed I/O:
Whether stdout/err is streamed or not cannot be changed as part of a job
restart or STDIO_UPDATE (once streamed, also streamed; once local,
always local).
Whether stdout/err is going to a single file or not cannot be changed as
part of a job restart or STDIO_UPDATE (i.e. if stdout/err is merged
initially, it can't be split later on, and vice versa).

New Errors
----------
GLOBUS_GRAM_PROTOCOL_ERROR_WAITING_FOR_COMMIT 110
Error returned for a job request by the jobmanager when it's waiting for a
commit signal.

GLOBUS_GRAM_PROTOCOL_ERROR_COMMIT_TIMED_OUT 111
The jobmanager timed out waiting for a commit signal and is aborting.

GLOBUS_GRAM_PROTOCOL_ERROR_RSL_SAVE_STATE 112
Invalid save_state attr.

GLOBUS_GRAM_PROTOCOL_ERROR_RSL_RESTART 113
Invalid restart attr.

GLOBUS_GRAM_PROTOCOL_ERROR_RSL_TWO_PHASE_COMMIT 114
Invalid two_phase attr.

GLOBUS_GRAM_PROTOCOL_ERROR_INVALID_TWO_PHASE_COMMIT 115
Invalid two_phase attr. value

GLOBUS_GRAM_PROTOCOL_ERROR_RSL_STDOUT_POSITION 116
Invalid stdout_position attr.

GLOBUS_GRAM_PROTOCOL_ERROR_INVALID_STDOUT_POSITION 117
Invalid stdout_position attr. value

GLOBUS_GRAM_PROTOCOL_ERROR_RSL_STDERR_POSITION 118
Invalid stderr_position attr.

GLOBUS_GRAM_PROTOCOL_ERROR_INVALID_STDERR_POSITION 119
Invalid stderr_position attr. value

GLOBUS_GRAM_PROTOCOL_ERROR_RESTART_FAILED 120
Job restart attempt failed

GLOBUS_GRAM_PROTOCOL_ERROR_NO_STATE_FILE 121
The approrpriate job state file could not be found

GLOBUS_GRAM_PROTOCOL_ERROR_READING_STATE_FILE 122
The job state file could not be read

GLOBUS_GRAM_PROTOCOL_ERROR_WRITING_STATE_FILE 123
The job state file could not be written

GLOBUS_GRAM_PROTOCOL_ERROR_OLD_JM_ALIVE 124
A job manager for this job is still alive

GLOBUS_GRAM_PROTOCOL_ERROR_TTL_EXPIRED 125
Failed to update state file to indicate jobmanager is still alive

GLOBUS_GRAM_PROTOCOL_ERROR_SUBMIT_UNKNOWN 126
Can't determine if the job to be restarted was submitted

GLOBUS_GRAM_PROTOCOL_ERROR_RSL_REMOTE_IO_URL 127
Invalid remote_io_url attr.

GLOBUS_GRAM_PROTOCOL_ERROR_WRITING_REMOTE_IO_URL 128
Couldn't write the the remote_io_url file

GLOBUS_GRAM_PROTOCOL_ERROR_STDIO_SIZE 129
The stdout/err sizes don't match the amount sent

GLOBUS_GRAM_PROTOCOL_ERROR_JM_STOPPED 130
The jobmanager received a STOP_MANAGER signal

GLOBUS_GRAM_PROTOCOL_ERROR_USER_PROXY_EXPIRED 131
The user proxy expired

GLOBUS_GRAM_PROTOCOL_ERROR_JOB_UNSUBMITTED 132
The job was not submitted by the original jobmanager

GLOBUS_GRAM_PROTOCOL_ERROR_INVALID_COMMIT 133
The jobmaanger isn't waiting for that commit signal

-- MarcoMambelli - 01 May 2008
Topic revision: r1 - 01 May 2008, MarcoMambelli
This site is powered by FoswikiCopyright © by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding Foswiki? Send feedback