.nr % 1
.OH ''PBS ERS'Batch Server'
.EH 'Batch Server'PBS ERS''
.P1
.so ers_setup.ms
.Rv $Revision: 2.3 $
.nr H1 2
.NH 1
.Tc "\f3\s+2Batch Server Functions\s-2\fP"
.OF 'Chapt \*(rV''\n(H1-%'
.EF '\n(H1-%''Chapt \*(rV'
.\"         Portable Batch System (PBS) Software License
.\" 
.\" Copyright (c) 1999, MRJ Technology Solutions.
.\" All rights reserved.
.\" 
.\" Acknowledgment: The Portable Batch System Software was originally developed
.\" as a joint project between the Numerical Aerospace Simulation (NAS) Systems
.\" Division of NASA Ames Research Center and the National Energy Research
.\" Supercomputer Center (NERSC) of Lawrence Livermore National Laboratory.
.\" 
.\" Redistribution of the Portable Batch System Software and use in source
.\" and binary forms, with or without modification, are permitted provided
.\" that the following conditions are met:
.\" 
.\" - Redistributions of source code must retain the above copyright and
.\"   acknowledgment notices, this list of conditions and the following
.\"   disclaimer.
.\" 
.\" - Redistributions in binary form must reproduce the above copyright and 
.\"   acknowledgment notices, this list of conditions and the following
.\"   disclaimer in the documentation and/or other materials provided with the
.\"   distribution.
.\" 
.\" - All advertising materials mentioning features or use of this software must
.\"   display the following acknowledgment:
.\" 
.\"   This product includes software developed by NASA Ames Research Center,
.\"   Lawrence Livermore National Laboratory, and MRJ Technology Solutions.
.\" 
.\"         DISCLAIMER OF WARRANTY
.\" 
.\" THIS SOFTWARE IS PROVIDED BY MRJ TECHNOLOGY SOLUTIONS ("MRJ") "AS IS" 
.\" WITHOUT WARRANTY OF ANY KIND, AND ANY EXPRESS OR IMPLIED WARRANTIES, 
.\" INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY, 
.\" FITNESS FOR A PARTICULAR PURPOSE, AND NON-INFRINGEMENT ARE EXPRESSLY
.\" DISCLAIMED.
.\"
.\" IN NO EVENT, UNLESS REQUIRED BY APPLICABLE LAW, SHALL MRJ, NASA, NOR
.\" THE U.S. GOVERNMENT BE LIABLE FOR ANY DIRECT DAMAGES WHATSOEVER,
.\" NOR ANY INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
.\" DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS
.\" OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) 
.\" HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT
.\" LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY 
.\" OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF
.\" SUCH DAMAGE.
.\" 
.\" This license will be governed by the laws of the Commonwealth of Virginia,
.\" without reference to its choice of law rules.
.LP
A batch server provides services in one of two ways:
.IP - 3
The server provides a service at the request of a client.
.IP -
The server provides a 
.I "deferred service"
as a result of a change in conditions monitored by the server.
The server also performs a number of internal bookkeeping functions
that are described in this major section.
.NH 2
.Tc "\f3Client Service Requests\fP"
.LP
By definition, clients are processes that make requests of a batch server.
The requests may ask for an action to be performed on one or more jobs,
one or more queues, or the server itself.
.LP
The server is required to respond to all requests it receives.  Those
requests that cannot be successfully completed, are 
.I rejected .
The reason for the rejection is returned in the reply to the client.
.LP
The following subsections describe the services provided by a batch server
in response to a request from a client.
The requests are grouped in the following sub-sections by the type
of object affected by the request: server, queue, job, or resource.
.NH 3
.Tc Server Management
.LP
The batch requests described in this section control the functioning of the
batch server.  The control is either direct as in the Shut Down request,
or indirect as when server attributes are modified.
.NH 4
Manage Request
.LP
The
.I Manage
request supports the 
.B qmgr (8)
command and several of the operator commands.
The command directs the server to create, alter, or delete
an object managed by the server or one of its attributes.  For more
information, see the qmgr command.
.NH 4
Server Status Request
.LP
The status of the server may be requested with a
.I "Server Status"
request.
.LP
The batch server will reject the request if any of the following
conditions are true:
.IP - 3
The user of the client is not authorized to query the status of the server.
.LP
If the request is accepted, the server will return a
.I "Server Status Reply" .
See the
.B qstat
command and the Data Exchange Format description for details of which
server attributes are returned to the client.
.NH 4
Start Up
.LP
A batch request to start a server
cannot be sent to a server since the server is not running.
Therefore a batch server must be started by a process
local to the host on which the server is to run.
.LP
The server is started by a
.B pbs_server
command.  The server recovers the state of managed objects, such as queues
and jobs, from the information last recorded by the server.  The treatment
of jobs which were in the 
.B running
state when the server previously shut down is dictated by the start up
mode, see the description of the pbs_server(8) command.
.NH 4
Shut Down
.LP
The batch server is "shut down" when it no longer reponds to requests from
clients and does not perform deferred services.
The batch server is requested to shut down by sending it a
.I "Server Shutdown"
request.
.LP
The server will reject the request from a client not authorized to shut down
the server.
When the server accepts a shut down request, it will terminate in the manner
described under the
.B qterm
command.
.LP
When shutting down, the server must record the state of all managed
objects (jobs, queues, etc.) in non-volatile memory.
Jobs which were running will be marked in the secondary state field for
possible special treatment when the server is restarted.
If checkpoint is supported, any job running at the time of the shut down
request whose 
.At Checkpoint
attribute is not
.Av n ,
will be checkpointed.  This includes jobs whose 
.At Checkpoint
attribute value is \*Qunspecified\*U, a value of
.Av u .
.LP
If the server receives either a
.B SIGTERM
or a
.B SIGSHUTDN
signal, the server will act as if it had received a shut down immediate
request.
.NH 3
.Tc Queue Management
.LP
The following client requests effect one or more queues managed by the server.
These requests require a privilege level generally assigned to operators and
administrators.
.NH 4
Queue Status Request
.LP
The status of a queue at the server may be requested with a
.I "Queue Status"
request.
.LP
The batch server will reject the request if any of the following
conditions are true:
.IP - 3
The user of the client is not authorized to query the status of the designated
queue.
.IP -
The designated queue does not exist on the server.
.LP
If the request does not specify a queue, status of all the queues at the
server will be returned.
.LP
When the request is accepted, the server will return a 
.I "Queue Status Reply" .
See the
.B qstat
command and the Data Exchange Format description for details of which
queue attributes are returned to the client.
.NH 3
.Tc Job Management
.LP
The following client requests effect one or more jobs managed by the server.
These requests do not require any special privilege except when the job for
which the request is issued is not owned by the user making the request.
.NH 4
Abort Request
.LP
.\" The
.\" .I Abort
.\" sub-request is sent to direct the receiving server to abort any multiple
.\" part request.  Currently, the only multiple part, or complex, request is
.\" the Queue Job request.
[The Abort Request has been aborted (deleted).]
.NH 4
Commit Request
.LP
The
.I Commit
sub-request is part of the Queue Job request.
The Commit notifies the receiving server that all parts of the 
job have been transferred and the receiving server should now assume
ownership of the job.  Prior to sending the Commit, the sending client,
command or another server, is the owner.
.NH 4
Delete Job Request
.LP
A
.I "Delete Job"
request asks a server to remove a job from the queue in which it exists
and not place it elsewhere.
.LP
The batch server will reject a Delete Job Request if any of the
following conditions are true:
.IP - 3
The user of the client is not authorized to delete the designated job.
.IP -
The designated job is not owned by the server.
.IP -
The designated job is not in an eligible state.
Eligible states are
.B queued ,
.B held ,
.B waiting ,
.B running ,
and
.B transiting .
.LP
If the job is in the 
.B running
state, the server will first send a
.B SIGTERM
signal to the job process group.  After a delay specified by the delete
request or if not specified, the 
.At kill_delay
queue attribute, the server will send a
.B SIGKILL
signal to the job process group.
.LP
The job is then placed into the 
.B exiting
state.
.NH 4
Hold Job Request
.LP
A client can request that one or more holds be applied to a job.
.LP
The batch server will reject a
.I "Hold Job"
request if any of the following conditions are true:
.IP - 3
The user of the client is not authorized to add any of the specified holds.
.IP -
The batch server does not manage the specified job.
.LP
When the server accepts the Hold Job Request, it will add each
type of hold listed which is not already present to the value of the
.At Hold_Types
attribute of the job.
.LP
If the job is in the
.B queued
or
.B waiting
state, it is placed in the
.B held
state.
.LP
If the job is in
.B running
state, then the following additional actions are taken:
If checkpoint\^/\^restart is supported by the host system, placing a hold
on a running job will cause the job (1) to be checkpointed, (2) the
resources assigned to the job will be released, and (3) the job is
placed in the
.B held
state in the execution queue.
If the
.I "expedite modifier"
to the hold type is specified, the
.At sched_hint
job attribute will be set to one (1) to indicate to the scheduler that this job
should be placed into execution as soon as possible.
If the
.I "expedite modifier"
is not specified, the job will be rescheduled for execution as if
it had been re-submitted.
.LP
If checkpoint\^/\^restart is not supported, the server will only set
the requested hold attribute.  This will have no effect unless the job
is rerun or restarted.
.NH 4
Queue Job Request
.LP
The
.I "Queue Job"
sub-request is part of the Queue Job complex request.
This sub-request identifies the destination for the job and, in
effect, asks the receiving server to fork off a child to process
the remaining sub-requests.
It also passes the public and private job attributes to the 
receiving server.
.NH 4
Job Credential Request
.LP
The
.I "Job Credential"
sub-request is part of the Queue Job complex request.
This sub-request transfers a copy of the credential provided by the
authentication facility explained in section 10.2.
.NH 4
Job Script Request
.LP
The
.I "Job Script"
sub-request is part of the Queue Job complex request.
This sub-request passes a block of the job script file to the receiving server.
The script is broken into 8 kilobyte blocks to prevent having to hold the
entire script in memory.  One or more Job Script sub-requests may be 
required to transfer the script file.
.NH 4
Locate Job Request
.LP
A client may ask a server to respond with the location of a job
that was created or is owned by the server.  When the server accepts the
.I "Locate Job"
request, it returns a 
.I "Locate Reply" .
.LP
The request will be rejected if any of the following conditions are true:
.IP - 3
The server does not own (manage) the job, and
.IP -
The server did not create the job.
.IP -
The server is not maintaining a record of the current location of the job.
.LP
.NH 4
Message Job Request
.LP
A batch server can be requested to write a string of characters to one or both
output streams of an executing job.
This request is primarily used by an operator to record a message for the user.
.LP 
The batch server will reject a 
.I "Message Job"
request if any of the following conditions are true:
.IP - 3
The designated job is not in the running state.
.IP -
The user of the client is not authorized to post a message to the designated
job.
.IP -
The designated job is not owned by the server.
.LP
When the server accepts the
.I "Message Job"
request, it will append the message string, followed by a new line
character, to the file or files indicated.  If no file is indicated,
the message will be written to the standard error of the job.
.NH 4
Modify Job Request
.LP
A batch client makes a
.I "Modify Job"
request to the server to alter the attributes of a job.
The batch server will reject a Modify Job Request if any of the
following conditions are true:
.IP - 3
The user of the client is not authorized to make the requested modification
to the job.
.IP -
The designated job is not owned by the server.
.IP -
The requested modification is inconsistent with the state of the job.
This is detailed later in this subsection.
.IP -
A requested resource change would exceed the limits of the queue
or server.
.IP -
An unrecognized resource is requested for a job in an execution queue.
.LP
When the batch server accepts a Modify Job Request, it will modify all
the specified attributes of the job.  When the batch server rejects a Modify
Job Request, it will modify none of the attributes of the job.
.LP
The following table indicates which attributes are alterable in each state.
.LP
.DS
.TS
box tab(/) ;
l c c c c c c .

Attribute/q/h/w/r/t/e
Account_Name/x/x/x/ / / 
Checkpoint/x/x/x/1/ / 
_
depend/x/x/x/ / / 
Error_Path/x/x/x/1/ / 
Execution_Time/x/x/x/1/ 
_
group_list/x/x/x/ / / 
Hold_Types/x/x/x/1/ / 
Job_Name/x/x/x/x/ / 
_
Join_Path/x/x/x/ / / 
Keep_Files/x/x/x/ / / 
Mail_Points/x/x/x/x/ / 
_
Mail_Users/x/x/x/x/ / 
Output_Path/x/x/x/ / / 
Priority/x/x/x/ / / 
_
Rerunable/x/x/x/x/ / 
Resource_List/x/x/x/2/ /  
Shell_Path_List/x/x/x/1/ /  
_
stagein/x/x/x/ / / 
stageout/x/x/x/ / / 
User_List/x/x/x/ / / 
Variable_List
.TE
Notes:
.br
.IP 1. 3
May be altered with qalter, but changes will not take effect
until job is requeued.  Use of qhold produces special processing.
.IP 2. 3
Only certain resources limits may be changed.
Those resource limits may be lowered by job owner.
Increasing those limits requires special privilege.
.DE
.NH 4
Move Job Request
.LP
A client can request a server to move a job to a new destination.
.LP
The batch server will reject a 
.I "Move Job"
Request if any of the following conditions are true:
.IP - 3
The user of the client is not authorized to remove the designated job
from the queue in which the job resides.
.IP -
The user of the client is not authorized to submit a job to the new
destination.
.IP -
The designated job is not owned by the server.
.IP -
The designated job is not in the 
.B queued ,
.B held ,
or
.B waiting
state.
.IP -
The new destination is disabled.
.IP -
The new destination is inaccessible.
.LP
When the server accepts a Move Job request, it will
.IP - 3
Queue the designated job at the new destination.
.IP -
Remove the job from the current queue.
.LP
If the destination exists at a different server, the current server will
transfer the job to the new server by sending a
.I "Queue Job"
request sequence to the target server.
.LP
The server will insure that a job is neither lost nor duplicated.
.NH 4
Queue Job Request
.LP
A
.I "Queue Job"
request is a complex request consisting of several subrequests:
Initiate Job Transfer, Job Data, Job Script, and Commit.
The end result of a successful 
.I "Queue Job"
request is an additional job being managed by the server.  The job may have
been created by the request or it may have been moved from another server.
.LP
The job resides in a queue managed by the server.
When a queue is not specified in the request, the job is placed in a queue
selected by the server.  This queue is known as the
.I "default queue" .
The default queue is an attribute of the server that is settable by the
administrator.  The queue, whether specified or defaulted, is called the target
queue.
.LP
The batch server will reject a Queue Job Request if any of the
following conditions are true:
.IP - 3
The client is not authorized to create a job in the target queue.
.IP -
The target queue does not exist at the server.
.IP -
The target queue is not enabled.
.IP -
The target queue is an execution queue and a resource requirement of the job
exceeds the limits set upon the queue.
.IP -
The target queue is an execution queue and an unrecognized resource is
requested by the job.
.IP -
The target queue is an execution queue,
the batch server does not support checkpoint, and the value of the
.At Checkpoint
attribute of the job is neither the single character "n" nor the single
character "u".
.IP -
The job requires access to a user identifier that the client is not authorized
to access.
.LP
When a job is placed in a execution queue, it is placed in the
.B queued
state unless one of the following conditions applies:
.IP - 3
The job has an
.At execution_time
attribute that specifies a time in the future and the
.At Hold_Types
attribute has value of
.Sc NONE .
Then the job is placed in the
.B waiting
state.
.IP -
The job has a
.At Hold_Types
attribute with a value other than
.Sc NONE .
The job is placed in the
.B held
state.
.LP
When a job is placed in a routing queue, It is placed in the
.B queued
state unless one of the following conditions applies:
.IP - 3
The job has an
.At execution_time
attribute that specifies a time in the future, the 
.At Hold_Types
attribute has value of
.Sc NONE ,
and the
.At route_waiting_jobs
queue attribute is
.Sc FALSE .
Then the job is placed in the 
.B waiting
state.
.IP -
The job has a
.At Hold_Types
attribute with a value other than
.Sc NONE
and the
.At route_held_jobs
queue attribute is
.Sc FALSE .
Then the job is placed in the
.B held
state.
.LP
A batch server that accepts a Queue Job Request for a 
.I new
job will add the
.B PBS_O_QUEUE
variable to the
.At Variable_List
attribute of the job and set the value to the name of the target queue.
.LP
A batch server that accepts a Queue Job Request for a
.I new
job will add the
.B PBS_JOBID
variable to the
.At Variable_List
attribute of the job and set the value to the job identifier assigned to
the job.
.LP
A batch server that accepts a Queue Job Request for a
.I new
job will add the
.B PBS_JOBNAME
variable to the
.At Variable_List
attribute of the job and set the value to the value of the 
.At Job_Name
attribute of the job.
.LP
When the server accepts a Queue Job request for an existing job,
the server will send a
.I "Track Job"
request to the server which created the job.
.NH 4
Release Job Request
.LP
A client can request that one or more holds be removed from a job.
.LP
A batch server rejects a
.I "Release Job"
request if any of the following conditions are true:
.IP - 3
The user of the client is not authorized to add (remove) any of the
specified holds.
.IP -
The batch server does not manage the specified job.
.LP
When the server accepts the Release Job Request, it will remove each
type of hold listed from the value of the
.At Hold_Types
attribute of the job.
.LP
If the job is in the
.B held
state and all holds have been removed, the job is placed in the
.B waiting 
state if the 
.At Execution_Time
attribute specifies a time in the future.  Otherwise the job is placed
in the
.B queued
state.
.NH 4
Rerun Job Request
.LP
To rerun a job is to kill the members of the session (process) group
of the job and leave the job in the execution queue.  Unless the
.At Hold_Types
attribute is not 
.Sc NONE ,
the job is eligible to be re-scheduled for execution.
.LP
The server will reject the
.I "Rerun Job"
request if any of the following conditions are true:
.IP - 3
The user of the client is not authorized to rerun the designated job.
.IP -
The
.At Rerunable
attribute of the job has the value
.Sc FALSE .
.IP -
The job is not in the running state.
.IP
The server does not own the job.
.LP
When the server accepts the Rerun Job request, it performs the following
actions:
.IP - 3
Send a 
.B SIGKILL
signal to the session (process) group of the job.
.IP -
Requeue the job in the execution queue in which it was executing.
If the
.At Hold_Types
attribute is not
.Sc NONE ,
the job will be placed in the
.B held
state.  If the
.At execution_time
attribute is a future time, the job will be placed in the
.B waiting
state.  Otherwise, the job is placed in the
.B queued
state.
.NH 4
Run Job
.LP
The
.I "Run Job"
request directs the server to place the specified job into immediate execution.
The request is issued by a
.B qrun
operator command and by the PBS Job Scheduler.
.NH 4
Select Jobs Request
.LP
A client is able to request from the server a list of jobs owned
by that server that match a list of selection criteria.
The request is a
.I "Select Jobs"
request.  All the jobs owned by the server and which the user is
authorized to query are initially eligible for selection.
Job attributes and resources relationships listed in the request restrict
the selection of jobs.  Only jobs which have attributes and resources that meet
the specified relations will be selected.
.LP
The server will reject the request if any of the following conditions
are true:
.IP - 3
The queue portion of a specified destination does not exist on the server.
.LP
When the request is accepted, the server will return a
.I "Select Reply" 
containing a list of zero or more jobs that met the selection criteria.
.NH 4
Signal Job Request
.LP
A batch client is able to request that the server signal the session
(process) group of a job.  Such a request is called a 
.I "Signal Job"
request.
.LP
The batch server will reject a Signal Job Request if any of the following
conditions are true:
.IP - 3
The user of the client is not authorized to signal the job.
.IP -
The job is not in the running state.
.IP -
The server does not own the designated job.
.IP -
The requested signal is not supported by the host operating system.
(The killpg system call returns 
.Er EINVAL .)
.LP
When the server accepts a request to signal a job, it will
send the signal requested by the client to the session (process) group
of the job.
.NH 4
Status Job Request
.LP
The status of a job or set of jobs at a destination may be requested with a
.I "Status Job"
request.
.LP
The batch server will reject a Status Job Request if any of the following
conditions are true:
.IP - 3
The user of the client is not authorized to query the status of the
designated job.
.IP -
The designated job is not owned by the server.
.LP
When the server accepts the request, it will return a Job Status Message to
the client.  See the 
.B qstat
command and the Data Exchange Format description for details of which
job attributes are returned to the client.
.LP
If the request specifies a job identifier, status will be returned only for
that job.  If the request specifies a destination identifier, status will be
returned for all jobs residing within the specified queue that the user
is authorized to query.
.NH 2
.Tc "\f3Server to Server Requests\fP"
.LP 
Server to Server requests are a special category of client requests.
They are only issued to a server by another server.
.NH 3
.Tc Track Job Request
.LP
A client that wishes to
request an action be performed on a job must send a batch request to the
server that currently manages the job.  As jobs are routed or moved through
the batch network, finding the location of the job can be difficult without
a tracking service.  The
.I "Track Job"
request forms the basis for this service.
.LP
A server that queues a job sends a track job request to the server
which created the job.  Additional backup location servers may be defined.
.LP
A server that receives a track job request records the information
contained therein.  This information is made available in response to a
.I "Locate Job"
request.
.NH 3
.Tc Synchronize Job Starts
.LP
.B PBS
provides for synchronizing the initiation of jobs across hosts.
This is done to support distributing processing.
.QP
Author note:
.br
There are several approaches that could be taken to solve this
requirement, none of them simple and straightforward.  The best
approach for synchronization of jobs would be a single job
scheduler for all hosts on which jobs could be concurrently started.
However, this approach greatly complicates the already complicated
scheduling problem.  Whereas the number of concurrent starts will be
small compared to the total number of jobs, the semaphore approach
was selected.
.QP
It is the intent of the developers that PBS will be expanded to encompass
the concept of a single job whose execution is distributed among multiple
hosts.
.LP
Job start synchronization is requested through a special dependency attribute.
The first job in the set, the \*Qmaster\*U, specifies the dependency attribute
as:
.br
.Ty \ \ \ \ -W\ synccount=count
.br
where
.Ty count
is an integer which is the number of other jobs to be synchronized with this
job.  This job is the master only in the sense that it defines the
rendezvous point for the semaphore messages and that it must be submitted
first so the identifier is known for the other jobs in the set.
.LP
The other jobs in the sync set specify the dependency attribute as:
.br
.Ty \ \ \ \ -W\ syncwith=job_identifier
.br
where 
.Ty job_identifier
is the job identifier assigned to the job which contained the
.B synccount
resource, the master job.
.LP
When the server queues a job in an execution queue and the job
is a member of a sync set, including the \*Qmaster\*U,
the server places a system hold on the job.
The secondary state is set to indicate the system hold is for sync.
The server managing the non master jobs will register the job with
the server managing the master by sending a
.I "Register Dependent"
request with a "Register" operation.
.LP
When all jobs have registered, as determined by the count on the master,
the server managing the master job will send a
.I "Register Dependent"
request, with a "Release" operation,
request to each job in turn in the set to remove the system hold.
The released job may now vie for resources.
The jobs are released in order of the \*Qcheapest\*U resources first; the
concept of \*QResource Costs\*U will be explained shortly.
.LP
When the resources required by a released job are available,
as determined by the Scheduler, A run Job Request will be issued for that job.
The server which manages the job will send a 
.I "Register Dependent"
request with a \*QReady\*U operation to the server that owns the
master job.
This request indicates that the dependent job is ready and the job with the
next cheapest resources can be released.
.LP
The server calculates the 
.I "Resource Cost"
of a job by summing the product of the amount of each
resource multiplied by an assigned cost of the resource.
A general system surcharge may also be assigned and added to the above sum.
Resources with a \*Qsize\*U unit are converted to megabytes before the
multiplication to keep the number from becoming too large.
See the server attributes
.At resources_cost
and
.At system_cost .
.LP
If the master of a sync set is aborted before all jobs in the set
begin execution, an
.I "Abort Job"
request is sent to all jobs in the set.  This is done because the synchronous
feature is intended for a set jobs which need communication amount themselves
during execution.  If the master is gone, (1) the rendezvous point for server
messages is lost, and (2) the job set is unlikely to be able to establish
the inter job communications required.
.NH 3
.Tc Job Dependency
.LP
.B PBS
provides support for job dependency.  A job, the child, can be declared to
be dependent on one or more jobs, the parents.
A parent may have any number of children.
The dependency is specified as an attribute on the qsub command with
the -W option
The general specification is of the form:
.br
.Ty "    -W type=argument[,type=argument,...]
.br
See the 
.B qalter (1B)
or
.B qsub (1B)
man pages for the complete specification of the dependency list.
.LP
When a server queues a job with a dependency type of
.Av syncwith ,
.Av after ,
.Av afterok ,
.Av afternotok ,
or
.Av afterany
in an execution queue,
the server will send a 
.I "Register Dependent Job"
request to the server managing the job specified by the associated
job_identifier.  The request will specify that the server is to
.I register
the dependency.  This actually creates a corresponding 
.Av before...
type dependency attribute entry on the parent.
If the request is rejected because the parent job does not exist, the child
job is aborted.  If the request is accepted, a system hold is placed on
the child job.
.LP
When a parent job, with any of the
.Av before...
types of dependency, reaches the required state, started or terminated,
the server executing the parent job sends a 
.I "Register Dependent Job"
request to the server managing the child job directing it to
.I release
the child job.  If there are no other dependencies on other jobs, the system
hold on the child job is removed.
.LP
When a child job is submitted with an
.Av on
dependency and the parent is submitted with any of the
.Av before...
types of dependencies, the parent will register with the child.
This causes the
.Av on
dependency count to be reduced and a corresponding
.Av after...
dependency to be created for the child job.  
.LP
The result is a pairing between corresponding 
.Av before...
and
.Av after...
dependency types.
.LP
If the parent job terminates in a manner that the child is not released,
it is up to the user to correct the situation by either deleting the child
job or by correcting the problem with the parent job and resubmitting it.
If the parent job is resubmitted, it must have a dependency type of 
.Av before ,
.Av beforeok ,
.Av beforenotok ,
or
.Av beforeany
specified to connect it to the waiting child job.
.NH 2
.Tc "\f3Deferred Services\fP"
.LP
This section describes the deferred services performed by batch servers:
file staging,
job selection, job initiation, job routing, job exit, job abort,
and the rerunning of jobs after a restart of the server.
.LP
The following rules apply to deferred services on behalf of jobs:
.IP - 3
If the server
.I cannot
complete a deferred service for a reason which is permanent, then
the job is aborted.
.IP - 3
If the service cannot be completed
at the current time but may be later, the service is retried a finite
number of times.
.LP
.NH 3
.Tc Job Scheduling
.LP
If the server attribute
.At scheduling
is set true, the server will immediately request a scheduling cycle of the
PBS Job Scheduler.  While it remains true, the Scheduler will be cycled when
any of four events occur:
.RS
.IP \(bu
Enqueuing of a job in an execution queue or the change of state of a job
in an execution queue to 
.Ty Queued
from
.Ty Waiting
or
.Ty Held .
.IP \(bu
Termination of a running job.  The termination may be normal execution
completion, or because the job was deleted by request.
.IP \(bu
Elapse of a specified cycle time as established by the administrator.
.IP \(bu
The completion of a scheduling cycle in which one and only one job was
scheduled for execution.  This provides for the implementation of scheduling
scripts that must see the impact of the new job on system resources before
picking a second job. 
.RE
.LP
The Scheduler is then treated as a privileged client and may make any
request of the Server, including Run Job, Delete Job, Hold Job, or Modify
Job/Queue/Server.
.LP
While a request for a scheduling cycle is outstanding, the connection to
the Scheduler is open, the Server will not make another request of the
Scheduler.
If the server attribute
.At scheduling
is set false, the server will not contact the scheduler.  This condition
is indicated by the 
.At server_state
attribute as
.Ty Idle .
.LP
.NH 3
.Tc File Staging
.LP
Two types of file staging services exist, in-staging before execution and
out-staging after execution.  These services are requested by an attribute
(via the -W option) which specifies the files to be staged:
.sp
.nf
.Ty
-W stagein=local_file@host:remote_path[,local_file@host:remote_path,...]
-W stageout=local_file@host:remote_path[,local_file@host:remote_path,...]
.ft 1
.fi
.LP
A request to 
.I "stage in"
a file directs the server to direct MOM to copy a file from a
remote host to the local host.  
The user must have authority to access the file under the same user name
under which the job will be run.
The remote file is not modified or 
destroyed.  The file will be available before the job is initiated.
If a file cannot be staged in for any reason, any files which were staged-in
are deleted and the job is placed into wait state and mail is sent to the
job owner.
.LP
A request to stage out a file directs the server to direct MOM to move a file
from the
local host to a remote host.  This service is performed after the job
has completed execution and regardless of its exit status.  If a file
cannot be moved, mail is sent to the job owner.  If a file is successfully
staged out, the local file is deleted.
.LP
A version of the BSD 4.4-Lite system utility, 
.B rcp (1),
will be used to move files over the network.
This version of rcp has been modified to always return a non-zero exit status
on any failure.
.NH 3
.Tc Job Initiation
.LP
Job initiation is to place a job into execution.  The server creates a
session leader that runs the shell program indicated by the
.At Shell_Path_List
attribute of the job.  The pathname of the script and any script arguments
are passed as parameters to the shell.
If the path name of the shell is a relative name, the server will search
its execution path, $PATH, for the shell.
If the path name of the shell is omitted or is the null string,
the server uses the login shell for the user under whose name
the job is to be run.
.LP
The server will determine the user name under which the job is to be run by the
following rules:
.IP 1. 4
Select the user identifier from the
.At User_List
job attribute which has a host name that matches the execution host.
.IP 2. 4
Select the user identifier from the
.At User_List
job attribute which has no associated host name.
.IP 3. 4
Use the user name from the 
.At job_owner
attribute of the job.
.LP
The server will place the job into
.B running
state.
.LP
The server will  create, in the environment of
the session leader of the job, the environment variables named:
.RS .25i
.B PBS_ENVIRONMENT
- the value of which is the string
.Av PBS_BATCH \^.
.br
.B PBS_QUEUE
- the value of which is the name of the execution queue.
.RE
.LP
The server will also place in the environment of the session leader of the
job, all of the variables and their corresponding values found in the
.At variables
attribute of the job.
.LP
The server will place the required limits on the resources for which the
host system supports resource limits.
.LP
If the job had been run before and is now being 
.I rerun ,
the server will insure that the standard output and standard error streams
of the job are appended to the prior streams, if any.
.LP
If the server and host system support accounting, the server will use
the value of the
.At Account_Name
job attribute as required by the host system.
.LP
If the server and host system support checkpoint, the server will set up
checkpointing of the job according to the value of the
.At Checkpoint
job attribute.
If checkpoint is supported and the
.At Checkpoint
attribute requests checkpointing at the minimum interval or a interval
less than the minimum interval for the queue, then checkpoint will be
set for an interval given by the queue attribute
.At minimum_interval .
.LP
The server will set up the standard output stream and the standard error
stream of the job according to the following rules:
.IP \(bu
The stream will be located either (1) in a temporary file in the server's
spool directory, or (2) a file in the user's home directory.  The choice
is determined by a server build time configuration parameter.
.IP \(bu
If the job attribute
.At Join_Path
has the value
.Av eo 
or
the value
.Av oe ,
the server connects the standard error stream of the job to the same file
as the standard output stream.
.LP
If the value of the job attribute
.At Mail_Points
contains the value
.Sc beginning ,
the server will send mail to each mail  address specified in the job attribute
.At Mail_Users .
.NH 3
.Tc Job Routing
.LP
Job routing is moving a job from a routing queue to one of the
destinations associated with the queue.
.LP
If the 
.At started
queue attribute is
.Sc TRUE ,
the server will route all eligible jobs which reside in the queue.  All
jobs in the
.B queued
state are eligible.  If the queue attribute
.At route_held_jobs
is
.Sc TRUE ,
jobs in the
.B held
state are eligible for routing.
If the queue attribute
.At route_waiting_jobs
is
.Sc TRUE ,
jobs in the
.B waiting
state are eligible.
.LP
The server will execute the function specified by the queue attribute
.At route_function
to select a destination for the job.  Possible destinations are listed in
the queue attribute
.At route_destinations .
.LP
If the destination to which the job is to be routed is at another server,
the current server will use a
.I "Queue Job"
request sequence to move the job to the new destination.
.LP
If the server is unable to route a job to a chosen destination, the
server will select another destination from the list and retry the route.
If the server is unable to route a job to any destination because of a
temporary condition, such as being unable to connect with the server at
the destination, the server will retry the route after a delay specified by
the queue attribute
.At route_retry_time .  
The server will proceed to route other jobs in the queue.
The server will retry the route up to the number of tries in the queue
attribute 
.At number_retries .
If the server is unable to route a job to any destination and all
failures are permanent (non-temporary), the server will abort the job.
.LP
.NH 3
.Tc Job Exit
.LP
When the session leader of a batch job exits, the server will perform the
following actions in the order listed.
.LP
Place the job in the
.B exiting
state.
.LP
\*QFree\*U the resources allocated to the job.  The actual releasing
of resources assigned to the processes of the job is performed by the
kernel.  
.B PBS
will free the resources which it \*Qreserved\*U for the job by
decrementing the
.At resources_used
generic data item for the queue and server.
.LP
Return the standard output and standard error streams of the job to the user.
If the 
.At Keep_Files 
attribute of the job contains 
.Sc KEEP_OUTPUT ,
the server copies the spooled file holding the standard output
steam of the job to the home directory of the user under whose name
the job executed.  The file name for the output is
.br
\ \ \ \ \f5job_name.\f6o\f5seq_number\f1
.br
See the qsub(1B) command description.
If the
.At Keep_Files
attribute of the job contains
.Sc KEEP_ERROR 
and the
.At Join_Path
attribute does not contain
.Av 'e' ,
the server copies the spooled file holding the standard error stream of
the job to the home directory of the user under whose name the job executed.
The file name for the error file is
.br
\ \ \ \ \f5job_name.\f6e\f5seq_number\f1
.br
If the files are not to be kept on the execution host as described above,
the temporary file holding the standard output is copied or renamed
to the host and path name specified by the job attribute
.At Output_Path .
If the path name is relative, the file will be located relative to
home directory of the user on the receiving host.
.LP
If the
.At Join_Path
attribute does not contain the value
.Av e ,
the standard error of the job is delivered according to the same rules as
the standard output described above.
.LP
If either output file cannot be copied to its specified destination,
the server will send mail to the job owner specifying the current location
of the output.
.LP
If the
.At Mail_Points
job attribute contains the value
.Sc EXIT ,
the server will send mail to the users listed in the job attribute
.At Mail_List .
.LP
If out staging of files is supported, the files listed in the outfile
resource will be copied to the specified destination.
.LP
The job will be removed from the execution queue.
.NH 3
.Tc Job Aborts
.LP
If the server aborts a job and the 
.At Mail_Points
job attribute contains the value
.Sc ABORT ,
the server will send mail to the users listed in the job attribute
.At Mail_List .
The mail message will contain the reason the job was aborted.
.LP
The job is removed from the queue.
.NH 3
.Tc Timed Events
.LP
The server performs certain events at a specified time or after a specified
time delay.
.LP
A job may have an
.At execution_time
attribute set to a time in the future.  When that time is reached, the
job state is updated.
.LP
If the server is unable to make connection with another server,  it is
to retry after a time specified either by the routing queue attribute
.At route_retry_time ,
or the general server attribute
.At network_retry_time .
.NH 3
.Tc Event Logging
.LP
The various daemons including the
.B PBS
server will maintain a log file of events.  This file is available to
the batch administrator for analysis of past events.
.LP
The file will be maintained under the path name
.Av "{PBS_SERVER_HOME}/server_log/date" ,
where 
.Av date
is the date in the form yyyymmdd when the log file started (see the -L option
in pbs_server(8B)).
.LP
The events recorded by the server in the file are specified by the
server attribute
.At log_events
which is a bit string with each bit determining if a type of event is logged:
.RS
.IP 1
Internal PBS errors.
.IP 2
System (OS) errors such as malloc failed.
.IP 4
Administrator related events, such as chaning server or queue attributes.
.IP 8
Job related events events: submitted, ran, deleted, ...
.IP "16 (0x010)"
Job resource usage, this duplicates the accounting information in the log.
.IP "32 (0x020)"
Security releated events, such as attempts to connect from an unknown host.
.IP "64 (0x040)"
When the scheduler was called and why.
.IP "128 (0x080)"
First level, common, debug messages.
.IP "256 (0x100)"
Second level, more rare, debug messages.
.RE
.LP
The log file is a text file with each entry terminated by a new line.
The format of an entry is:
.sp
.Ty "date time;event_code;server_name;object_type;object_name;message_text"
.sp
The 
.Ty "date time"
field is a date and time stamp in the format:
.Ty "mm/dd/yyyy hh:mm:ss" .
The
.Ty event_code
is the type of event which triggered the event logging.  It
correspondings to the bit position, 0 to n, in the
.At log_events
server attribute.  The 
.Ty server_name
is the name of the server which logged the message.  This is recorded in case
a site wishes to merge and sort the various logs in a single file.  The
.Ty object_type
is the type of object which the message is about, 
.Ty Svr
for server, 
.Ty Que
for queue, 
.Ty Job
for job, 
.Ty Req
for request, or 
.Ty Fil
for file.  The
.Ty object_name
is the name of the specific object.
.Ty message_text
field is the text of the log message.
.NH 3
.Tc Accounting
.LP
The PBS server maintains an accounting file.  
The file will be maintained under the path name
.Av "{PBS_SERVER_HOME}/server_priv/accounting/day" .
Where 
.Av day
is the date in the form yyyymmdd when the accounting file started (see the
-A option in pbs_server(8B)).
.LP
The account file is a text file with each entry terminated by a new line.
The format of an entry is:
.sp
.Ty "date time;record_type;job_id;message_text"
.sp
The 
.Ty "date time"
field is a date and time stamp in the format:
.Ty "mm/dd/yyyy hh:mm:ss" .
The 
.Ty job_id
is the job identifier.  The 
.Ty messge_text
is ascii text.  The content depends on the record type.  The message
text format is blank separated keyword=value fields.
The
.Ty record_type
is a single character indicating the type of record.  The types are:
.RS
.IP A 3
Job was aborted by the server.
.IP D
Job was deleted by request.  The message_text will contain
.Ty requestor= user@host
to identify who deleted the job.
.IP E
Job ended (terminated execution).  The message_text field contains:
.br
.Ty user= username
- the user name under which the job executed.
.br
.Ty group= groupname
- the group name under which the job executed.
.br
.Ty jobname= job_name
- the name of the job.
.br
.Ty queue= queue_name
- the name of the queue from which the job is executed.
.br
.Ty ctime= time
- time in seconds when job was created (first submitted).
.br
.Ty qtime= time
- time in seconds when job was queued into current queue.
.br
.Ty etime= time
- time in seconds when job became eligable to run; no holds, etc.
.br
.Ty start= time
- time in seconds when job execution started.
.br
.Ty exec_host= host
- name of host on which the job is being executed.
.br
.Ty Resource_List. resource=limit
- list of the specified reource limits.
.br
.Ty session= sesid
- session number of job.
.br
.Ty alt_id= id
- Optional alternate job identifier.  Will be included only for certain
systems:
.in +0.5i
Irix 6.x with Array Services \- 
The alternate id is the Array Session Handle (ASH) assigned to the job.
.in -0.5i
.Ty end= time
- time in secnds when job ended execution.
.br
.Ty Exit_status= value
- the exit status of the job.  If the value is less than 10000 (decimal) it
is the exit value of the top level process of the job, typically the shell.
If the value is greater than 10000, the top process exited on a signal whose
number is given by subtracting 10000 from the exit value.
.br
.Ty Resources_used. resource=limit
- list of the specified reource limits.
.IP
For Resource_List and Resources_used, there is one entry per resource.
.IP C
Job was checkpointed and held.
.IP Q
Job entered a queue.  The message_text contains
.Ty queue= name
identifying the queue into which the job was placed.
There will be a new Q record each time the job is routed or moved to a new
(or the same) queue.
.IP R
Job was rerun.
.IP S
Job execution started.  The message_text field contains:
.br
.Ty user= username
- the user name under which the job executed.
.br
.Ty group= groupname
- the group name under which the job executed.
.br
.Ty jobname= job_name
- the name of the job.
.br
.Ty queue= queue_name
- the name of the queue from which the job is executed.
.br
.Ty ctime= time
- time in seconds when job was created (first submitted).
.br
.Ty qtime= time
- time in seconds when job was queued into current queue.
.br
.Ty etime= time
- time in seconds when job became eligable to run; no holds, etc.
.br
.Ty start= time
- time in seconds when job execution started.
.br
.Ty exec_host= host
- name of host on which the job is being executed.
.br
.Ty Resource_List. resource=limit
- list of the specified reource limits.
.br
.Ty session= sesid
- session number of job.
.IP T
Job was restarted from a checkpoint file.
.RE
.LP
.NH 2
.Tc \f3Resource Management\fP
.LP
.B PBS
performs resource allocation at job initiation in two ways depending on
the support provided by the host system.
Resources are either reservable or non reservable.
.NH 3
.Tc Non Reservable Resources
.LP
Most Unix systems do not provide for resource reservation, only for
limits.  Resources, like memory, disk space, and cpu time, are
handed out by the kernel on a first come first served basis.  When
a request exceeds the users limits, the request is denied or the
job is signaled.  To add resource reservation to a system is generally
a major undertaking.  One example is the Session Reservable File System, SRFS,
extension to Unicos\(rg developed at NAS.  This extension required several 
additions to the kernel.
.LP
For resources which are not reservable,
.B PBS
manages resource allocation based on the amount \*Qallocated\*U to
.B PBS 
by the administrator.
This available amount of each resource is maintained in the server attribute
.At resources_available .
The share of the resource "distributed" to each queue
are maintained in an attribute for each of those objects.  This attribute
limits the aggregate total of the resources used by the jobs running
under each object.
This allocation is made by the batch system administrator.  Most host
systems do not provide support for dynamically adjusting the allocation
to the batch system.  A site may build in a procedure for adjusting
the values of certain resources based on system load.
.LP
An example of non reservable resources is memory.  The host operating
system kernel manages memory, dynamically assigning physical memory to
running processes.  A limit can be set which the process cannot exceed,
but memory cannot be reserved for a particular process in advance.
.LP
The
.At resources_max
attribute for the server and queue declares the maximum amount
of the resource that a single job may be allocated.  The server's resources_max
is examined if there is not a resources_max value for the type of resource
defined at the queue level.
.LP
.B PBS
insures that the resources requested by a job fall with in the two groups
of limits before the job is initiated.
.NH 3
.Tc Reservable Resources
.LP
On some hosts, certain resources types may be requested and the amount
guaranteed to the process.  Session Reservable File System, SRFS, is such a
resource.  For these types of resources, 
.B PBS
will attempt to reserve the requested amount before scheduling the job
for execution.
.LP
When the request to reserve resources is denied by the system, the job
selection function may assist or 
.I expedite
the job.  In this case resources allocated to the job are not "released".
.B PBS 
will attempt to acquire the remaining resources until it is successful
and the job can be initiated, or until a time limit, specified by the
queue attribute
.At reserved_expedite ,
is reached.  At that point all the resources are released.
If a job is being expedited, other jobs whose resources do not conflict 
with the needs of the expedited job may be scheduled for execution.
.LP
When the reservable resources have been allocated to the job and 
the non reservable resources fit into what
.B PBS
has available, the job will be placed into execution.
.NH 3
.Tc Resource Limits
.LP
When submitting a job, a user may specify the hard limit of usage for resources
known to the system on which the job will run.  If the executing job usage
of resources exceed the specified limit, the job is aborted.
.LP
If the user does not specify a limit for a resource type, the limit may be
set to a default established by the PBS administrator.  The default limit
is taken from the first of the following attributes which is set:
.RS
.IP 1. 4
The current queue's attribute
.At resources_default .
.IP 2. 
The server's attribute 
.At resources_default .
.IP 3.
The current queue's attrbute
.At resources_max .
.IP 4.
The server's attrbute
.At resources_max .
.RE
.LP
If the user does not specify a limit for a resource and a default is not
established via one of the above attributes, the usage of the resource is
unlimited.
.NH 3
.Tc Types of Resources
.LP
The following table lists the names recommend for various resources.
Not all types are supported on a single server, some are not yet
implemented on any system.
Following sub-sections will list the resources supported by each system.
.DS B
.TS
box tab(/) ;
c c c
l l l .
Keyword/Units/Definition
_
cput/time/job cpu time
pcput/time/process cput time
mem/size/job memory size
pmem/size/process memory size
_
pf/size/Amount of file systems block for the job
ppf/size/Amount of file systems block for any process in job
file/size/Amount of space for any single file
filsys/string:size/Amount of space on a file system
fileexist/string/file exists and is readable
srfs/size/Session Reservable File System space
_
walltime/time/wall clock time running
memt/size*time/T{
Maximum job memory * time
.br
(byte_seconds)
T}
ncpus/unitary/Number of cpus
typecpu/string/type of cpu
cpugroup/string/set of cpus 
_
9trk/unitary/number of 9 track tape drives
3480/unitary/number of 18 track tape drives
3490/unitary/number of 36 track tape drives
8mm/unitary/number of 8mm tape drives
.TE
.DE
.LP
The attribute values take the following units:
.IP time 10
specifies a maximum time period the resource can be used.  Time is expressed
in seconds as an integer, or in the form:
.br
.Ty [[hours:]minutes:]seconds[.milliseconds]
.br
If specified, milliseconds are rounded to the nearest second.
.KS
.IP size 10
specifies the maximum amount in terms of bytes or words.  It is expressed
in the form
.Ty integer[suffix] 
The suffix is a multiplier defined in the following table,
\*Qb\*U means bytes (the default) and \*Qw\*U means words.
The size of a word is calculated on the execution server as its word size.
.DS B
.TS
box tab(/) ;
c s | c
c | c | r.
Suffix/Multiplier
_
b/w/1
kb/kw/1024
mb/mw/1,048,576
gb/gw/1,073,741,824
tb/tw/1,099,511,627,776
.TE
.DE
.IP string 10
of characters which must be interpreted by the execution server.  It is
frequently a path name.
.KE
.IP unitary 10
The maximum amount of a resource which is expressed as a simple integer.
.LP
.so ../man1/pbs_resources_aix4.7B
.so ../man1/pbs_resources_digitalunix.7B
.so ../man1/pbs_resources_irix5.7B
.so ../man1/pbs_resources_irix6.7B
.so ../man1/pbs_resources_linux.7B
.so ../man1/pbs_resources_sp2.7B
.so ../man1/pbs_resources_sunos4.7B
.so ../man1/pbs_resources_solaris5.7B
.so ../man1/pbs_resources_unicos8.7B
.so ../man1/pbs_resources_unicosmk2.7B
.NH 3
.Tc Interactive Session Management
.QP
Author note:
.br
This section is very incomplete.  It is the beginnings of an idea
based on the need to know, and perhaps control, the resource
utilization by interactive sessions.  In most implementations based
on NQS, this is not done.  A few implementations have been extended to
periodically monitor the proc table in the kernel.  The disadvantage
of this method is the makeup of the proc table varies greatly for each
kernel implementation.
.LP
To improve its ability to schedule jobs and manage resources, PBS
must be aware of the load of the system produced by interactive
jobs.  It would be an advantage to have the capability to control the
activities of interactive sessions.
.LP
One approach is to provide a communication capability between the
login process and
.B PBS.
The number of login sessions and the amount of resources \*Qassigned\*U to 
each session, based on the user limits, would be communicated to PBS.
PBS would then be able to adjust the amount of resources available to 
batch jobs.  
.LP
If a site wished to restrict interactive sessions based on the availability
of resources under control of PBS, this capability would be extended such
that PBS could direct the login process to disallow the user login attempt.
.LP
The sum of the resources (limits) used by all current interactive sessions
would be treated as those assigned to a job.
.\" force next chapter to odd page
.bp
.if e \{
\&
.sp 10
.DS C
[Page intentionally left bank.]
.DE
.bp
\}
