.de Cs
.\" The following pair of macros are used to bracket sections of code
.in +0.5i
.br
.ft 5
.nf
..
.de Ce
.br
.fi
.ft 1
.in -0.5i
..
.de Ty          \" Type-ins and examples (typewriter)
.ft 5
.if \\n(.$>0 \&\\$1\fP\\$2
..
.de Al
.IP "\&\f5\s-1\\$1\s+1\fP"
..
.de At
.ft 3
.if \\n(.$>0 \&\\$1\fP\\$2
..
.\" following macro cuts the date provided by RCS so only yy/mm/dd shows
.de Cd
\&\\$2
..
.de Rv
.ds rV Revision: \\$2
..
.de Tc
\&\\$1 \\$2 \\$3 \\$4 \\$5
.if \\nI=2 \{\
.       tm .XA \\n(PN
.       tm \\*(SN \\$1 \\$2 \\$3 \\$4 \\$5
.\}
..

.\" the Fi numerical register is used to number figures
.nr Fi 0 1
.Rv $Revision: 2.42 $
.sp 3
.in +0.7i
.sp 0.22i
.vs 11
.ps +22
P
.br
\h'11p'B\ \ 
.ps -4
\v'-4p'Portable Batch System\v'4p'
.br
.ps +4
\h'21p'S
.br
.ps -22
.vs 12
\l'4.9i'
.br
.in -0.8i
.sp 3
.TL
Administrator Guide
.AU
Albeaus Bayucan
Robert L. Henderson
Lonhyn T. Jasinskyj
Casimir Lesiak
Bhroam Mann
Tom Proett
Dave Tweten \(dg
.FS \(dg
Numerical Aerospace Simulation Systems Division,
NASA Ames Research Center, Moffett Field, CA
.FE
.sp 1
.AI
.B "MRJ Technology Solutions"
2672 Bayshore Parkway
Suite 810
Mountain View,  CA 94043
http://pbs.mrj.com
.sp
.so ../ers/release.ms
.br
Printed: \*(DY
.LP
.OH 'PBS Administrator Guide''Preface'
.EH 'Preface''PBS Administrator Guide'
.bp
\ 
.bp
.OF '''-i-'
.EF '-ii-'''
.DS C
\s+2\f3Portable Batch System (PBS) Software License\fP\s-2
.sp
Copyright \(co 1999, MRJ Technology Solutions.
.br
All rights reserved.
.DE
.LP
Acknowledgment: The Portable Batch System Software was originally developed
as a joint project between the Numerical Aerospace Simulation (NAS) Systems
Division of NASA Ames Research Center and the National Energy Research
Supercomputer Center (NERSC) of Lawrence Livermore National Laboratory.
.LP
Redistribution of the Portable Batch System Software and use in source
and binary forms, with or without modification, are permitted provided
that the following conditions are met:
.IP -
Redistributions of source code must retain the above copyright and
acknowledgment notices, this list of conditions and the following disclaimer.
.IP -
Redistributions in binary form must reproduce the above copyright and
acknowledgment notices, this list of conditions and the following disclaimer
in the documentation and/or other materials provided with the distribution.
.IP -
All advertising materials mentioning features or use of this software must
display the following acknowledgment:
.RS
.QP
This product includes software developed by NASA Ames Research
Center, Lawrence Livermore National Laboratory, and MRJ Technology Solutions.
.RE
.sp
.LP
.ce
DISCLAIMER OF WARRANTY
.QP
THIS SOFTWARE IS PROVIDED BY MRJ TECHNOLOGY SOLUTIONS ("MRJ") "AS IS" WITHOUT
WARRANTY OF ANY KIND,  AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING,
BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY,  FITNESS
FOR A PARTICULAR PURPOSE, AND NON-INFRINGEMENT ARE EXPRESSLY  DISCLAIMED.
.QP
IN NO EVENT, UNLESS REQUIRED BY APPLICABLE LAW, SHALL MRJ, NASA, NOR
THE U.S. GOVERNMENT  BE LIABLE FOR ANY DIRECT DAMAGES WHATSOEVER,
NOR ANY  INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS
OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER
CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY,
OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE
USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
.LP
This license will be governed by the laws of the Commonwealth of Virginia,
without reference to its choice of law rules.
.LP
This product includes software developed by the NetBSD
Foundation, Inc. and its contributors.
.LP
.bp
.DS C
PBS Revision History
.DE
.so ../ers/rel_history.ms
\"
\" The TOC follows
.bp
.OF '''-iii-'
.EF '-iv-'''
.XS i
PBS License Agreement
.XE
.XS ii
Revision History
.so toc.input
.XE
.LG
Table of Contents
.NL
.sp
.PX no
.OH 'PBS Administrator Guide''Introduction'
.EH 'Introduction''PBS Administrator Guide'
.bp
.OF 'Document \*(rV''%'
.EF '%''Document \*(rV'
.nr % 1
.P1
.NH 1
.Tc \f3\s+2Introduction\s-2\fP
.LP
This document is intended to provide the system administrator with the
information required to build, install, configure, and manage the Portable
Batch System.  It is very likely that some important tidbit of information
has been left out.  No document of this sort can ever be complete, and until
it has been updated by several different administrators at different sites,
it is sure to be lacking.
.LP
You are strongly encouraged to read the PBS External Reference Specification,
ERS, included with the release.  Look for pbs_ers.ps in the src/doc directory.
.NH 2
.Tc \f3What is PBS?\fP
.LP
The Portable Batch System, PBS, is a batch job and computer system resource
management package.  It was developed with the intent to be conformant with
the POSIX 1003.2d Batch Environment Standard.  As such, it will accept batch
jobs, a shell script and control attributes, preserve and protect the job
until it is run, run the job, and deliver output back to the submitter.
.LP
PBS may be installed and configured to support jobs run on a single system,
or many systems grouped together.   Because of the flexibility of PBS,
the systems may be grouped in many fashions.
.NH 2
.Tc \f3Components of PBS\fP
.LP
PBS consist of four major components: commands, the job Server, the job
executor, and the job Scheduler.  A brief description of each is given
here to help you make decisions during the installation process.
.IP Commands
PBS supplies both command line commands that are POSIX 1003.2d conforming and
a graphical interface.   These are used to submit, monitor, modify, and
delete jobs.  The commands can be installed on any system type supported by
PBS and do not require the local presence of any of the other components of PBS.
There are three classifications of commands: user commands which any authorized
user can use, operator commands, and manager (or administrator) commands.
Operator and manager commands require different access privileges.
.IP "Job Server"
The Job Server is the central focus for PBS.  Within this document, it is
generally referred to as 
.I "the Server"
or by the execution name
.I pbs_server .
All commands and the other daemons communicate with the Server via an IP
network.
The Server's main function is to provide the basic batch services such as
receiving/creating a batch job, modifying the job, protecting the job against
system crashes, and running the job (placing it into execution).
.IP "Job Executor"
The job executor is the daemon which actually places the job into execution.
This daemon,
.I pbs_mom ,
is informally called 
.I Mom
as it is the mother of all executing jobs.
Mom places a job into execution when it receives a copy of the job from a 
Server.   Mom creates a new session as identical to a user login session
as is possible.  For example, if the user's login shell is csh, then
Mom creates a session in which .login is run as well as .cshrc.  
Mom also has the responsibility for returning the job's output to the user
when directed to do so by the Server.
.IP "Job Scheduler"
The Job Scheduler is another daemon which contains the site's policy controlling
which job is run and where and when it is run.  Because each site has its own
ideas about what is a good or effective policy, PBS allows each site to 
create its own Scheduler.  When run, the Scheduler can communicate with the
various Moms
to learn about the state of system resources and with the Server to learn
about the availability of jobs to execute.   The interface to the Server
is through the same API as the commands.   In fact, the Scheduler just
appears as a batch Manager to the Server.  
.LP
In addition to the above major pieces, PBS also provides a Application Program
Interface, API, which is used by the commands to communicate with the Server.
This API is described in the section 3 man pages firnished with PBS.  A site
may make use of the API to implement new commands if so desired.
.NH 2
.Tc \f3Release Information\fP
.LP
This information applies to the 2.1 release of PBS from MRJ Technology
Solutions.
.NH 3
Tar File
.LP
PBS is provided as a single tar file.  The tar file contains:
.IP -
This document in both postscript and text form.
.IP -
A \*Qconfigure\*U script, all source code, header files, and make files
required to build and install PBS.
.IP -
A full set of documentation sources.  These are troff input files.
The documentation may also be obtained by registered sites from the
PBS web site:  http://pbs.mrj.com
.LP
When extracting the tar file, a top level directory will be created with the 
above information there in.   This top level directory will be named for the
release version and patch level.  For example, the directory will be named
.Ty pbs_v2.1p13
for release 2.1 patch level 13.
.LP
It is recommended that the files be extracted with the -p option to tar to
perserve permission bits.
.LP
.NH 3
Additional Requirements
.LP
PBS uses a configure script generated by GNU autoconf to produce makefiles.
If you have a POSIX make program then the makefiles generated by configure
will try to take advantage of POSIX make features.  If your make is unable to
process the makefiles while building you may have a broken make.
Should make fail during the build, try using GNU make.
.LP
If the Tcl based GUI (xpbs and xpbsmon) or the Tcl based Scheduler is used,
the Tcl header file and library are required.
The offical site for Tcl is:
.Cs
http://www.scriptics.com/
ftp://ftp.scriptics.com/pub/tcl/tcl8_0
.Ce
.LP
Versions of Tcl prior to 8.0 can no longer be used with PBS.  Tcl
and Tk version 8.0 or greater must be used.
.LP
If the BaSL Scheduler is used,
yacc and lex (or GNU bison and flex) are required.
Possible sites for bison and flex are:
.Cs
http://www.gnu.org/software/software.html
prep.ai.mit.edu:/pub/gnu
.Ce
.LP
To format the documentation included with this release, we strongly recommend
the use of the GNU groff package.
The lastest version of groff is 1.11.1 and it can be found at:
.Cs
http://www.gnu.org/software/groff/groff.html
.Ce

.OH 'PBS Administrator Guide''Installation'
.EH 'Installation''PBS Administrator Guide'
.bp
.NH 1
.Tc \f3\s+2Installation\s-2\fP
.LP
This section attempts to explain the steps to build and install PBS.
PBS installation is accomplished via the GNU autoconf process.
This installation procedure requires more manual configuration than is
\*Qtypical\*U for many packages.  There are a number of options which
involve site policy and therefore cannot be determined automagically.

If PBS is to be run on Redhat Linux on the intel x86, a RPM package is 
available for installation.  Please see section 2.4.9 for installation 
instructions.

.LP
To reach a usable PBS installation, the following steps are required:
.IP 1.
Read this guide and plan a general configuration of hosts and PBS.
See sections 1.2 and 3.0 through 3.2.
.IP 2.
Decide where the PBS source and objects are to go.
See section 2.2.
.IP 3.
Untar the distribution file into the source tree. 
See section 2.2.
.IP 4.
Select \*Qconfigure\*U options and run configure from the top of the
object tree.
See sections 2.2 through 2.4.
.IP 5.
Compile the PBS modules by typing \*Qmake\*U at the top of the object tree.
See sections 2.2 and 2.3.
.IP 6.
Install the PBS modules by typing \*Qmake install\*U at the top of the
object tree.  Root privilege is required.
See section 2.2.
.IP 7.
.mc |
Create a node description file if PBS is managing a complex of nodes or a
parallel system like the IBM SP.
See Chapter 
.B "3. Batch System Configuration" .
Nodes may be added after the Server is up via the qmgr command, even if a node
file is not created at this point.
.mc
.IP 8.
Bring up and configure the Server.
See sections 3.1 and 3.5.
.IP 9.
Configure and bring up the Moms.
See section 3.6.
.IP 10.
Test by hand scheduling a few jobs.
See the qrun(8B) man page.
.IP 11.
Configure and start a Scheduler program.  Set the Server to active by enabling
scheduling.
See Chapter 4.
.NH 2
.Tc \f3Planning\fP
.LP
PBS is able to support a wide range of configurations.  It may be installed and
used to control jobs on a single (large) system.  It may be used to load
balance jobs on a number of systems.  It may be used to allocated nodes of a
cluster or parallel system to parallel and serial jobs.   Or it can deal with
a mix of the above.
.LP
Before going any farther, we need to define a few terms.  How PBS uses some of
these terms is different than you may expect.
.IP Node\ \ \ 
A computer system with a single Operating System image, a unified virtual 
memory image, one or more cpus and one or more IP addresses.
Frequently, the term 
.I "execution host"
is used for node.
A box like the SGI Origin 2000,
with contains multiple processing units running under a single OS copy is
one node to PBS 
.I regardless
of SGI's terminology.
A box like the IBM SP which contains many units, each with their own copy
of the OS, is a collection of many nodes.
.mc |
.IP
New in PBS release 2.2, a 
.I cluster
node is declared to consist of one or more 
.I "virtual processors" .
The term virtual is used because the number of virtual processor
declared may equal or be more or less than the number of real processor in 
the physical node.  It is now these virtual processors that are allocated,
rather than the entire physical node.  The virtual processors (VPs) of a cluster
node may be allocated
.I exclusively
or
.I "temporarily shared" .
Time-shared nodes are not considered to consist of virtual nodes and these nodes
or used by, but not allocated to, jobs.
.mc
.IP Complex
A collection of hosts managed by one batch system.  A complex may be made up
of nodes that are allocated to only one job at a time or of nodes that have
many jobs executing on each at once or a combination of both.
.IP Cluster
A complex made up of cluter nodes.
.IP "Cluster Node"
A node whose virtual processors are allocated specifically to one job at a time
(see 
.I "exclusive node" ),
or a few jobs (see 
.I "temporarily-shared nodes" ).
This type of node may also be called
.I "space shared" .
.mc |
If a cluster node has more than one virtual processor, the VPs may be assigned
to different jobs or used to satisfy the requirements of a single job.
However, all VPs on a single node will be allocated in the same matter, i.e.
all will be allocated exclusive or allocated temporarily-shared.
.mc
Hosts that are timeshared among many jobs are called \*Qtimeshared.\*U
.IP "Exclusive Nodes"
An exclusive node is one that is used by one and only one job at a time.
A set of nodes is assigned exclusively to a job for the duration of that job.
This is typically done to improve the performance of message passing programs.
.IP "Temporarily-shared Nodes"
A
.I "temporarily-shared node"
is one whose VPs are temporarily shared by multiple jobs.
If several jobs request
multiple temporarily-shared nodes, some VPs may be allocated
commonly to both jobs and some may be unique to one of the jobs.
When a VP is allocated as on  temporarily-shared basis, it remains so until
all jobs using it are terminated.  Then 
the VP may be next allocated again for temporarily-shared use or for
exclusive use.
.IP Timeshared
In our context, to timeshare is to always allow multiple jobs to run
concurrently on an execution host or node.  A 
.I "timeshared node"
is a node on which jobs are timeshared.  Often the term 
.I host
rather than node is used in conjunction with timeshared, as in 
.I "timeshared host" . 
If the term node is used without the timeshared prefix, the node is a cluster
node which is allocated either exclusively or temporarily-shared.
.IP
If a host, or node, is indicated to be timeshared, it will never be allocated
(by the Server) exclusively nor temporarily-shared.
.IP "Load Balance"
A policy wherein jobs are distributed across multiple timeshared hosts
to even out the work load on each host.  Being a policy, the distribution
of jobs across execution hosts is solely a function of the Job Scheduler.
.mc |
.IP "Node Attribute"
As with jobs,  queue and the server, nodes have attributes associated with them
which provide control information.   The attributes defined for nodes are:
state, type (ntype), number of virtual processor (np), the list of jobs to which
the node is allocated, and properties.
.mc
.IP "Node Property"
In order to have a means of grouping nodes for allocation, a set of zero or
more node properties may be given to each node.   The property is nothing
more than a string of alphanumeric characters (first character must be
alphabetic) without meaning to PBS.   
You, as the PBS administrator, may chose whatever property names you wish.
Your choices for property names should be relayed to the users.
.IP "Batch System"
A PBS Batch System consists of one Job Server (pbs_server), one or more Job
Schedulers (pbs_sched), and one or more execution servers (pbs_mom).
With prior versions of PBS, a Batch System could be set up to support only
a cluster of exclusive nodes 
.B or
to support one or more timeshared hosts.
There was no support for temporarily-shared nodes.
With this release, a PBS Batch System
may be set up to feed work to one large timeshared system, multiple time
shared systems, a cluster of nodes to be used exclusively or temporarily-shared,
or any combination of the preceding.
.IP "Batch Complex"
See Batch System.
.LP
If PBS is to be installed on one time sharing system, all three daemons may
reside on that system; or you may place the Server (pbs_server) and/or the
Scheduler (pbs_sched) on a \*Qfront end\*U system.  Mom (pbs_mom) must run on
every system where jobs are to be executed.
.LP
If PBS is to be installed on a collection of time sharing systems, a Mom must be
on each and the Server and Scheduler may be installed on one of the systems or
on a front end.  If you are using the default supplied Scheduler program, you
will need to setup a 
.I node
file for the Server in which is named each of the time sharing systems.  You
will need to append
.Ty :ts
to each host name to identify them as time sharing.
.LP
The same arrangement applies to a cluster except that the node names in the
node file do not have the appended
.Ty :ts .
.NH 2
.Tc \f3Installation Overview\fP
.LP
The normal PBS build procedure is to separate the source from the target.
This allows the placement of a single copy of the source on a shared file
system from which multiple different target systems can be built.  Also, the
source can be protected from accidental destruction or modification by making
the source read-only.  However, if you choose, objects may be made within
the source tree.
.LP
In the following descriptions, the 
.I "source tree"
is the result of un-tar-ing the tar file into a directory (and subdirectories).
A diagram of the source tree is show in figure \n(H1\-\n+(Fi.
.KF
.so sourcetree.pic
.sp
.ce
\f3Figure \n(H1\-\n(Fi: Source Tree Structure\f1
.sp
.KE
.KS
The 
.I "target tree"
is a set of parallel directories in which the object modules are actually
compiled.  This tree may (and generally should) be separate from the source
tree.
.KE
.LP
An overview of the \*Qconfigure\*U, compile, installation and batch system 
configurations steps is listed here.
Detailed explanation of symbols will follow.
It is recommended that you read completely 
through these instructions before beginning the installation.  To install PBS:
.IP 1.
Place the tar file on the system where you would like to maintain the source.
.IP 2.
Untar the tar file.
.Cs
tar xpf \f2file\fP
.Ce
It will untar in the current directory
producing a single directory named for the current release and patch number.
Under that directory will be
several files and subdirectories.
This directory and the subdirectories make up the
.I "source tree" .   
You may write-protect the source tree at this point should you so choose.
.IP
In the top directory are two files, named "Release_Notes" and "INSTALL".
The Release_Notes file contains
information about the release contents, changes since the last release and 
points to this guide for installation instructions.
The "INSTALL" file consists of standard notes about the use of GNU's configure.
.IP 3.
If you choose as recomended to have separate build (target) and source trees,
then  create the top level directory of what will become the 
.I "target tree" 
at this time.
The target tree must reside on a file system mounted
on the same architecture as the target system for which you are generating
the PBS binaries.
This may well be the same system as holds the source or it may not.
Change directories to the top of the target tree.
.IP 4.
Make a job Scheduler choice.
A unique feature of PBS is its external Scheduler module.   This allows a
site to implement any policy of its choice.   To provide even more freedom
in implementing policy, PBS provides three scheduler frameworks.   Schedulers
may be developed in the C language, the Tcl scripting language, or PBS's
very own C language extensions, the \f3Ba\fPtch \f3S\fPcheduling
\f3L\fPanguage, or BaSL.
.IP
As distributed, 
.I configure
will default to a C language based scheduler known as 
.I fifo .
This Scheduler can be configured to several common simple scheduling policies,
not just first in \- first out as the name suggests.
When this Scheduler
is installed, certain configuration files are installed in
{PBS_HOME}/scheduler_priv/.
\f3You will need to modify these files for your site.\fP
These files are discussed in sections 
.B "4.5 QC based Sample Scheduler" 
and in the section 
.B "4.5.1 FIFO Scheduler" .  
.IP
To change the selected Scheduler, see the configure options
.At --set-sched
and
.At --set-sched-code
in the Features and Package Options section of this chapter.
Additional information on the types of schedulers and how to configure fifo
can be found in the Scheduling Policies chapter later in this guide.
.IP 5.
Read setion 2.3, then from within the top of the target tree created in step 3,
type the following command
.Cs
{source_tree}/configure [options]
.Ce
Where 
.Ty {source_tree}
is the full relative or absolute path to the configure script
in the source tree. If you are building in the source tree type
.Ty "./configure [options]"
at the top level of the source tree where the configure script is found.
.IP
This will generate the complete target tree starting with the current working
directory and a set of header files and make files used to build PBS.
Rerunning the configure script
will only need to be done if you choose to change options specified
on the configure command line.
See section
.B "2.3 Build Details"
for information on the configure options. 
.LP
No options are absolutely required, but unless the vendor's C compiler is not
ANSI, it is suggested that you use the 
.Ty --set-cc
option to not use gcc.
If you wish to build the GUI to PBS, and the Tcl libraries are not in the
normal place, /usr/local/lib, then you will need to specify
.Ty --with-tcl=\f1directory\fP ,
giving the path to the Tcl libraries.
.LP
Running config without any (other) options will
produce a working PBS system with the following defaults:
.RS
.IP -
User commands are installed in /usr/local/bin.
.IP -
The daemons and administrative commands are installed in /usr/local/sbin.
.IP -
The working directory (PBS_HOME) for the daemons is usr/spool/pbs.
.IP -
The Scheduler will be the C based scheduler \*Qfifo\*U.
.RE
.IP
Because the number of options you select may be large and because each option
is very
wordy you may wish to create a shell script consisting of the configure command
and the selected options.
.IP
The documentation is not generated by default.
You may make it by specifying the
.Ty "--enable-docs"
option to configure or by changing into the
.I doc
subdirectory in the target tree and typing make.
.IP
In order to build and print PostScript copies of the documentation from the
included source, you will need the GNU groff formating package including the
\*Qms\*U formatting macro package.
You may choose to print using
different font sets.   In the source tree is a file \*Qdoc/doc_fonts\*U
which may be edited.  Please read the comments in that file.  Note that
font position 4 is left with the symbol font mounted.
.IP 6.
After running the configure script, the next step is to compile PBS by typing
.Cs
make
.Ce
from the top of the target tree.
.IP 7.
To install PBS you must be running with root privileges.  As root, type 
.Cs
make install
.Ce
from the top of the object tree.
This generates the working directory structures required for running PBS
and installs the programs in the proper executable directories.
.IP
When the working directories are made, they are also checked to see that
they have
been setup with the correct ownership and permissions.  This is performed to
ensure that files are not tampered with and the security of PBS compromised.
Part of the check is to insure that all parent directories and all files are:
.RS
.IP - 3
owned by root (bin, sys, or any uid < 10), 
.B EPERM 
returned if not;
.IP -
that group ownership is by a gid < 10, 
.B EPERM 
returned if not;
.IP -
that the directories are not world writable, or where required to be
world writable that the sticky bit is set,
.B EACCESS
returned if not; and
.IP -
that the file or directory is indeed a file or directory,
.B ENOTDIR
returned if not.
.RE
.IP
The various PBS daemons will also
perform similar checks when they are started.
.IP 8.
If you have more than one host in your PBS cluster, you need to create a
node file for the Server.  Create the file
.Ty {PBS_HOME}/server_priv/nodes .
It should contain one line per node on which a Mom is to be run.  The line
should consist of the short host name, without the domain name parts.
For example if you have three nodes: larry.stooge.com, curley.stooge.com, and
moe.stooge.com; then the node file should contain
.Cs
larry
curley
moe
.Ce
If the nodes are timesharing nodes which will be load balanced, append
.Ty :ts
to the name of each node, as in
.Cs
larry:ts
curley:ts
moe:ts
.Ce
.IP 9.
The three daemons, pbs_server, pbs_sched and pbs_mom
must be run by root in order to function.  Typically in a production system,
they are started at system boot time out of the boot /etc/rc* files.
This first time, you will start the daemons by hand.
It does not matter what the current working directory is when a daemon is
started.   The daemon will place itself in its own directory
{PBS_HOME}/*_priv, where * is either serv, resmom, or sched.
.IP
Note that not all three daemons must be or even should be present on all
systems.
In the case of a large system, all three may be present.   In the case of a
cluster of workstations, you may have the Server (pbs_server) and the
Scheduler (pbs_sched) on one system only and a copy of Mom (pbs_mom) on each
node where jobs may be executed.  At this point, it is assumed that you plan
to have all three daemons running on one system.
.IP
To have a fully functional system, each of the daemons will require certain
configuration information.
Except for the node file, the Server's configuration information is provided
via the qmgr command after the Server is running.
.mc |
The node information by be entered by editing the node file before bringing up
the server, or via the qmgr interface after the server is up.
.mc
The configuration information
for Mom and the Scheduler is provided by editing a config file located in 
{PBS_HOME}/mom_priv or {PBS_HOME}/sched_priv.
This is explained in detail in this guide in
.B "Chapter 3. Batch System Configuration" .
.RS
.IP A.
Before starting the execution server(s), Mom(s), on each execution host,
you will need to create her config file.  To get started, the following
lines are sufficient:
.Cs
$logevent 0x1ff
$clienthost \f2server-host\fP
.Ce
where \f2server-host\fP is the name of the host on which the Server is running.
This is not requried if the Server and this Mom are on the same host.
Create the file {PBS_HOME}/mom_priv/config and copy the above lines into it.
See the pbs_mom(8) man page and section
.B "3.6 Configuring the Execution Server"
for more information on the config file.
.IP
Start the execution server, pbs_mom,
.Cs
{sbindir}/pbs_mom
.Ce
No options or arguments are required.  See the pbs_mom(8) man page.
.IP B.
\f3The first time only\fP, start pbs_server with
the "-t create" option,
.Cs
{sbindir}/pbs_server -t create
.Ce
See the ERS for command details.  This option
causes the Server to initialize various files.  This option will not be
required after the first time unless you wish to clear the Server database
and start over.  See the pbs_server(8) man page for more information.
A copy of the section 8 man pages can be found in the External Reference Spec,
ERS, Chapter 6.
.IP C.
Start the selected job Scheduler, pbs_sched.
.RS
.IP i.
For C language based schedulers, such as the default fifo Scheduler, 
options are generally required.  To run the Scheduler, type
.Cs
{sbindir}/pbs_sched
.Ce
See the man page pbs_sched_cc(8) for more detail.  
.IP ii.
For the BaSL Scheduler, the scheduling policy is written in a
specialized batch scheduling language that is similar to C. The scheduling
code, containing BaSL constructs, must first be converted into C using the
.B basl2c
utility.  This is done by setting the configure option 
.At --set-sched-code=file
where 
.I file
is the relative (to src/scheduler.basl/samples) or absolute path of a basl
source file.  The file name should end in 
.Ty .basl .
A good sample program is "fifo_byqueue.basl" that can schedule jobs on a
single-server, single-execution host environment, or a single-server,
multiple-node hosts environment. Read the header of this sample Scheduler
for more information about the algorithm used.
.IP
The Scheduler configuration file is an important entity in BaSL because it is
where the list of servers and host resources reside.
Execute the basl based Scheduler by typing:
.Cs
{sbindir}/pbs_sched -c config_file
.Ce
The Scheduler searches for the config file in
{PBS_HOME}\f2/sched_priv\fP by default.
More information can be found in the man page pbs_sched_basl(8).
.IP iii.
The Tcl Scheduler
requires the Tcl code policy module.
Samples of Tcl scripts may be found in
.I src/scheduler.tcl/sample_scripts
.IP
For the Tcl based Scheduler, the Tcl body script should be placed in 
{PBS_HOME}\f5/sched_priv/\f2some_file\f1 and the Scheduler run via
.Cs
{sbindir}/pbs_sched -b PBS_HOME/sched_priv/some_file
.Ce
More information can be found in the man page pbs_sched_tcl(8).
.RE
.RE
.IP 10.
Log onto the system as root and define yourself to pbs_server as a manager
by typing:
.Cs
\f1#\fP qmgr
\f1Qmgr:\fP set server managers=\f2your_name\fP@\f2your_host\fP
.Ce
Information on qmgr can be found in the qmgr(8) man page and on-line help
is available by typing help within qmgr.
.IP
From this point, you no longer need root privilege.  Note, \f2your_hos\fPt
can be any host on which PBS' qmgr command is installed.
You can now configure and manage a remote batch system from the comfort of
your own workstation.
.IP
Now you need to define at least one queue.  Typically it will be an
execution queue unless you are using this Server purely as a gateway.
You may chose to establish queue minimum, maximum, and/or default 
resource limits for some resources.  For example, to establish a
minimum of 1 second, a maximum of 12 cpu hours, and a default of 30 cpu
minutes on a queue named \*Qdque\*U; issue the following commands inside
of qmgr:
.Cs
\f1Qmgr:\fP create queue dque queue_type=e
\f1Qmgr:\fP s q dque resources_min.cput=1,resources_max.cput=12:00:00
\f1Qmgr:\fP s q dque resources_default.cput=30:00
\f1Qmgr:\fP s q dque enabled=true, started=true
.Ce
.IP
You may also wish to increase the system security by restricting from where
the Server may be contacted.   To restrict services to your domain, give
the following qmgr directives:
.Cs
\f1Qmgr: \fPset server acl_hosts=*.\f2your_domain\fP
\f3Qmgr: \fPset server acl_host_enable=true
.Ce
.IP
Last, activate the Server \- Scheduler interaction, i.e. the scheduling of jobs
by pbs_sched, by issuing:
.Cs
\f1Qmgr:\fP s s scheduling=true
.Ce
.sp
When the attribute
.At scheduling
is set to true, the Server will call the the job Scheduler, if false the job
Scheduler is not called.  The value of 
.At scheduling
may also be specified on the pbs_server command line with the \-a option.
.NH 2
.Tc \f3Build Details\fP
.LP
While the overview gives sufficient information to build a basic PBS system,
there are lots of options available to you and custom tailoring that should
be done.
.NH 3
.Tc "Configure Options"
.LP
The following is detailed information on the options to the configure script.
.NH 4
Generic Configure Options
.LP
The following are generic configure options that do not affect the
functionality of PBS.
.IP --cache-file=\f5file\fP
Cache the system configuration test results in \f5file\fP.
.br
Default: config.cache
.IP --help\ \ 
Prints out information on the available options.
.IP --no-create
Do not create output files.
.IP "--quiet, --silent"
Do not print \*Qchecking\*U messages.
.IP --version
Print the version of autoconf that created configure.
.IP --enable-depend-cache
This turns on configure's ability to cache 
.I makedepend
information across runs of configure.
This can be bad if the user makes certain configuration
changes in rerunning configure, but it can save time in the hands of
experienced developers.
.br
Default: disabled
.LP
.NH 4
Directory and File Names
.LP
These options specify where PBS objects will be placed.
.IP --prefix=\f5PREFIX\fP
Install files in subdirectories of \f5PREFIX\fP directory.
.br
Default:
.B /usr/local
.IP --exec-prefix=\f5EPREFIX\fP
Install architecture dependent files in subdirectories of \f5EPREFIX\fP.
.br
Default: see PREFIX
.IP --bindir=\f5DIR\fP
Install user executables (commands) in subdirectory \f5DIR\fP.
.br
Default: EPREFIX/bin   (\f3/usr/local/bin\fP)
.IP --sbindir=\f5DIR\fP
Install System Administrator executables in subdirectory \f5DIR\fP.
This includes certain administrative commands and the daemons.
.br
Default: EPREFIX/sbin   (\f3/usr/local/sbin\fP)
.IP --libdir=\f5DIR\fP
Object code libraries are placed in \f5DIR\fP.  This includes the PBS API
library, libpbs.a.
.br
Default: PREFIX/lib   (\f3/usr/local/lib\fP)
.IP --includedir=\f5DIR\fP
C language header files are installed in \f5DIR\fP.
.br
Default: PREFIX/include   (\f3/usr/local/include\fP)
.IP --mandir=\f5DIR\fP
Install man pages in \f5DIR\fP.
.br
Default: PREFIX/man   (\f3/usr/local/man\fP)
.IP --srcdir=\f5SOURCE_TREE\fP
PBS sources can be found in directory \f5SOURCE_TREE\fP.
.br
Default: location of the
.I configure
script.
.IP --x-includes=\f5DIR\fP
X11 header files are in directory \f5DIR\fP.
.br
Default: attempts to autolocate the header files
.IP --x-libraries
X11 libraries are in directory \f5DIR\fP.
.br
Default: attempts to autolocate the libraries
.LP
.NH 4
Features and Package Options
.LP
In general, these options take the following forms:
.TS
tab(;) ;
l l.
\f5--disable-\fP\s-2FEATURE\s+2;Do not compile for \s-2FEATURE\s+2, same as --enable-\s-2FEATURE\s+2=no
\f5--enable-\fP\s-2FEATURE\s+2;Compile for FEATURE
\f5--with-\fP\s-2PACKAGE\s+2;Compile to include \s-2PACKAGE\s+2 
\f5--without-\fP\s-2PACKAGE\s+2;Do not compile to include PACKAGE, same as with-PACKAGE=no
\f5--set-\fP\s-2OPTION\s+2;Set the value of \s-2OPTION\s+2
.TE
For PBS, the recognized --enable/disable, --with/without, and --set options are:
.IP --enable-docs
Build (or not build) the PBS documentation.
To do so, you will need the following GNU
utilities: groff, gtbl and gpic.  Even if this option is not set, the
man pages will still be installed.
.br
Default: disabled
.IP --enable-server
Build (or not build) the PBS job server, pbs_server.
Normally all components (Commands, Server, Mom, and Scheduler) are built.
.br
Default: enabled
.IP --enable-mom
Build (or not build) the PBS job execution daemon, pbs_mom.
.br
Default: enabled
.IP --enable-clients
Build (or not build) the PBS commands.
.br
Default: enabled
.IP --with-tcl=\f5DIR_PREFIX\fP
Use this option if you wish Tcl based PBS features compiled and the Tcl
libraries are not in /usr/local/lib.  These Tcl based features
include the GUI interface, xpbs.  If the following option, --with-tclx, is
set, use this option only if the Tcl libraries are not co-located with the
Tclx libraries.  When set, \f5DIR_PREFIX\fP must specify the absolute path
of the directory containing the Tcl Libraries.
.br
Default: if --enable-gui is enabled, then with, Tcl utilities are built;
otherwise,  without, Tcl utilities are not built.
.IP --with-tclx=\f5DIR_PREFIX\fP
Use this option if you wish the Tcl based PBS features to be based on Tclx.
This option implies --with-tcl.
.br
Default: Tclx is not used.
.IP --enable-gui
Build the xpbs GUI.
Only valid if --with-tcl is set.
.br
Default: enabled
.IP --set-cc[=\f5ccprog\fP]
Specify which C compiler should be used.  This will override the CC environment
setting.  If only --set-cc is specified, then CC will be set to \f5cc\fP.
.br
Default: gcc (after all, configure is from GNU also)
.IP --set-cflags[=\f5FLAGS\fP]
Set the compiler flags.  This is used to set the CFLAGS variable.
If only --set-cflags is specified, then CFLAGS is set to \*Q\*U.
This must be set to \f5-64\fP to build 64 bit objects under Irix 6, e.g.
.Ty "--set-cflags=-64" .
Note, multiple flags, such as -g and -64 should be enclosed in quotes, e.g.
.Ty "--set-cflags='-g\ -64'"
.br
Default: CFLAGS is set to a best guess for the system type.
.IP --enable-debug
Builds PBS with debug features enabled.   This allows the daemons to remain
attached to standard output and produce vast quantities of messages.
.br
Default: disabled
.IP --set-tmpdir=\f5DIR\fP
Set the tmp directory in which pbs_mom will create temporary scratch
directories for jobs.  Used on Cray systems only.
.br
Default:
.B /tmp
.IP --set-server-home=\f5DIR\fP
Sets the top level directory name for the PBS working directories,
PBS_HOME.
This directory \f3MUST reside on a file system which is local to the host\fP on
which any of the daemons are running.  That means you must have a local file
system on any system where a pbs_mom is running as well as where pbs_server
and/or pbs_sched is running.
PBS uses synchronous writes to files to maintain state.  
We recommend that the file system has the same mount point and path on each
host, that enables you to copy daemons from one system to another rather than
having to build on each system.
.br
Default:
.B /usr/spool/pbs
.IP --set-server-name-file=\f5FILE\fP
Set the file name which will contain the name of the default Server.
This file is used by the commands to determine which Server to contact.
If \f5FILE\fP is not an absolute path, it will be evaluated relative to
the value of --set-server-home, PBS_HOME.  
.br
Default:
.Ty server_name
.IP --set-default-server=\f5HOSTNAME\fP
Set the name of the host that clients will contact when not otherwise specified
in the command invocation.  It must be the primary network name of the host.
.br
Default: the name of the host on which PBS is being compiled.
.IP --set-environ=\f5PATH\fP
Set the path name of the file containing the environment variables used by
the daemons and placed in the environment of the jobs.
For AIX based systems, we suggest setting this option to \f5/etc/environment\fP.
Relative path names are interpreted relative to the value of --set-server-home,
PBS_HOME.
.br
Default: the file \f5pbs_environment\fP in the directory PBS_HOME.
.IP
For a discussion of this file and the environment, see section
.B "6.1.1. Internal Security" .
You may edit this file to modify the path or add other environmental variables.
.IP --enable-plock-daemons=\f5WHICH\fP
Enable daemons to lock themselves into memory to improve performance.
The argument \f5WHICH\fP is the logical-or of 1 for pbs_server, 2 for
pbs_sheduler, and 4 for pbs_mom (7 is all three daemons).
This option is recommended for Unicos systems.   It must \f3not\fP be used for
AIX systems.
.br
Default: disabled.
.IP
Note, this feature uses the plock() system call which is not available on
Linux and bsd derived systems.  Before using this feature, check that plock(3)
is avaible on the system.
.IP --enable-syslog
Enable the use of syslog for error reporting.   This is in addition to the
normal PBS logs.
.br
Default: disabled.
.IP --set-sched=\f5TYPE\fP
Set the Scheduler (language) type.
If set to \f5c\fP, a C based Scheduler will be compiled.
If set to \f5tcl\fP, a Tcl based Scheduler will be used.
If set to \f5basl\fP, a BAtch Scheduler Language Scheduler will be generated.
If set to \f5no\fP, no Scheduler will be compiled, jobs will have to be run
by hand.
.br
Default: c
.IP --set-sched-code=\f5PATH\fP
Sets the name of the file or directory containing the source for the Scheduler.
This is only used for C and BaSL Schedulers, where --set-sched is set to
either \f5c\fP or \f5basl\fP.
For C Schedulers, this should be a directory name.   For BaSL Schedulers,
it should be file name ending in \f5.basl\fP.  If the path is not absolute,
it will be interpreted relative to 
SOURCE_TREE/src/schedulers.SCHED_TYPE/samples.  For example, if  --set-sched
is set to basl, then set --set-sched-code to \f5fifo_byqueue.basl\fP.
.br
Default: fifo  (C based Scheduler)
.IP --enable-tcl-qstat
Builds qstat with the Tcl interpreter extensions.  This allows site and
user customizations.  Only valid if --with-tcl is already present.
.br
Default: disabled
.IP  --set-tclatrsep=\f5CHAR\fP
Set the character to be used as the separator character between attribute
and resource names in Tcl/Tclx scripts.
.br
Default: "."
.IP --set-mansuffix=\f5CHAR\fP
Set the character to be used as the man page section suffix letter.  
For example, the qsub man page is installed as man1/qsub.1B.
To install without a suffix, --set-mansuffix="".
.br
Default: "B"
.IP --set-qstatrc-file=\f5FILE\fP
Set the name of the file that qstat will use if there is no 
.I .qstatrc
file in the user's home directory.  
This option is only valid when  --enable-tcl-qstat is set.
If \f5FILE\fP is a relative path, it will be evaluated relative to
the PBS Home directory, see --set-server-home.
.br
Default:
.Ty PBS_HOME/qstatrc
.IP  --with-scp
Directs PBS to attempt to use the
.I "Secure Copy Program" ,
.I scp ,
when copying files to or from a remote host.  This applies for delivery of
output files and stage-in/stage-out of files.  If scp is to used and the
attempt fails, PBS will then attempt the copy using rcp in case that scp
did not exist on the remote host.
.IP
For local delivery, \*Q/bin/cp -r\*U is always used.   For remote delivery, a
varient of rcp is required.   The program must always provide a non-zero exit
status on any failure to deliver files.   This is not true of all rcp
implementation, hence a copy of a known good rcp is included in the source,
see mom_rcp.
More information can be found in section 
.B "7.5 Delivery of Output Files" .
.br
Default: sbindir/pbs_rcp  (from the mom_rcp source directory) is used,
where sbindir is the value from --sbindir.
.IP --enable-shell-pipe
When enabled, pbs_mom passes the name of the job script to the top level
shell via a pipe.  If disabled, the script file is the shell's standard
input file. See section 
.B "7.3 Shell Invocation"
for more information.
.br
Default: enabled
.IP --enable-rpp
Use the Reliable Packet Protocol, RPP, over UDP for resource queries to mom
by the Scheduler.  If disabled, TCP is used instead.
.br
Default: enabled
.IP --enable-sp2
Turn on special features for the IBM SP.
This option is only valid when the PBS machine type is
aix4.  The PBS machine type is automatically determined by the configure
script.
.br
Default: disabled
.IP
With PSSP software before release 3.1, access to two IBM supplied libraries,
libjm_client.a and libSDR.a, are required.
These libraries are installed when the ssp.clients fileset in installed,
and PBS will expect to find them in the normal places for libraries.
.IP
With PSSP 3.1 and later, libjm_client.a and libSDR.a are not required,
instead libswitchtbl.a is used to load and unload the switch.
See the discussion under the sub-section
.B "IBM SP"
in the section
.B "2.4 Machine Dependent Build Instructions" .
.IP --enable-nodemask
Build PBS with support for SGI Origin2000 nodemask.  Requires Irix 6.x.
.br
Default: disabled
.IP --enable-pemask
Build PBS on Cray T3e with support for scheduler controlled pe-specific
job placement. Requires Unicos/MK2.
.br
Default: disabled
.IP --enable-srfs
This option enables support for Session Reservable File Systems.
It is only valid on
Cray systems with the NASA modifications to support Session Reservable
File System, SRFS.
.br
Default: disabled
.IP --enable-array
Setting this under Irix 6.x forces the use of SGI Array Session tracking.
Enabling this feature is recommanded if MPI jobs use the Array Services
Daemon.  The PBS machine type is set to irix6array.
Disabling this option forces the use of POSIX session ids.
See section
.B "2.4.5 SGI Systems Running IRIX 6" .
.br
Default: Autodetected by existence and content of /etc/config/array.
.LP
.NH 3
.Tc "Make File Targets"
.LP
The follow target names are applicable for make:
.IP all 11
The default target, it compiles everything.
.IP build 11
Same as all.
.IP depend 11
Builds the header file dependency rules.
.IP install\ 
Installs everything.
.IP clean 
Removes all object and executable program files in the current subtree.
.IP distclean
Leaves the object tree very clean.  It will remove all files that were
created during a build.
.LP
.These targets exist at most levels within the object tree.  Therefore, it is
possible to compile or install a piece, such as Mom, by changing to the 
appropriate subdirectory and typing \*Qmake\*U or \*Qmake install\*U.
.NH 2
.Tc "\f3Machine Dependent Build Instructions\fP"
.LP
There are a number of possible variables that are only used for a
particular type of machine.  If you are not building for one of
the following types, you may ignore this section.
.NH 3
.Tc "Cray Systems"
.LP
.NH 4
Cray C90, J90, and T90 Systems
.LP
On the traditional Cray systems such as the C90, PBS supports
Unicos versions 8, 9 and 10.
.LP
Because of the fairly standard usage of the symbol 
.B TARGET
within the PBS makefiles, when building under Unicos you cannot have the
environment variable TARGET defined.   Otherwise, it is changed by Unicos's make
to match the makefile value, which confuses the compiler.  If set, type
.Ty "unsetenv TARGET"
before making PBS.
.LP
If your system supports the Session Reservable File System enhancement by NASA,
run configure with the 
.Ty --enable-srfs
option.
If enabled, the Server and Mom will be compiled to have the resource names
.I srfs_tmp ,
.I srfs_big ,
.I srfs_fast ,
and
.I srfs_wrk .
These may be used from
.B qsub
to request SRFS allocations.  The file
.Ty /etc/tmpdir.conf
is the configuration file for this.  An example file is:
.Cs
# Shell environ var     Filesystem
TMPDIR
BIGDIR                  /big/nqs
FASTDIR                 /fast/nqs
WRKDIR                  /big/nqs
.Ce
The directory for TMPDIR will default to that defined by JTMPDIR in
Unicos's /usr/include/tmpdir.h.
.LP
Without the SRFS mods, Mom under Unicos will create a temporary job scratch
directory.   By default, this is placed in /tmp.   The location can be changed
via 
.Ty --set-tmpdir=DIR .
.LP
.NH 4
Unicos 10 with MLS
.LP
If you are running Unicos MLS, required in Unicos 10.0 and later, the following
action is required after the system is built and installed.  Mom updates 
.At ue_batchhost
and
.At ue_batchtime
in the UDB for the user.  In an MLS system, Mom must have the security
capability to write the protected UDB.  To grant this capability,
change directory to wherever pbs_mom has been installed and type:
.Cs
spset -i 16 -j daemon -k exec pbs_mom
.Ce
You, the administrator, must have capabilities
.B secadm
and 
.B "class 16"
to issue this command.  You use the setucat and setucls commands to get to these
levels if you are authorized to do so.  The UDB 
.At reclsfy
permission bit gives a user the proper authorization to use the spset command.
.QP
.B WARNING
.QP
There has been only limited testing in the weakest of MLS environments,
problems may appear because of differences in your environment.
.NH 4
Cray T3E Systems
.LP
On the Cray T3E MPP systems, PBS supports the microkernal-based
Unicos/MK version 2. On this system PBS "cooperates" with the T3E
Global Resource Manager (GRM) in order to run jobs on the system.
This is needed primarly since jobs on the T3E must be run on
physically contigious processing elements (PEs).
.LP
The above discussions (see section 2.4.1.1) of the environment variable
.B TARGET,
support for Session Reservable File System, and changing
.B TMPDIR
are also applicable to the Cray T3E.
.LP
.NH 3
.Tc Digital UNIX
.LP
The following is the recommend value for CFLAGS when compiling PBS under
Digital UNIX 4.0D:
.Ty --set-cflags="-std0"
that is s-t-d-zero.
.LP
.NH 3
.Tc "HP-UX"
.LP
The following is the recommend value for CFLAGS when compiling PBS under
HP-UX:
.Ty --set-cflags="-Ae"
.LP
.NH 3
.Tc "IBM Workstations"
.LP
PBS supports IBM workstations running AIX 4.x.
When man pages are installed in 
.I mandir ,
the default man page file name suffix,
\*Q\f3B\fP\*U,  must be removed.   Currently, this must be done by hand.
For example, change man3/qsub.3B to man3/qsub.3.
.LP
Do not use the configure option \f5--enable-plock\fP.
It will crash the system by using up all of memory.
.NH 3
.Tc IBM SP
.LP
Every thing under IBM Workstation section above applies to the IBM SP.
Be sure to read the section
.B "3.2 Multiple Execution Systems"
before configuring the Server.
.mc |
.QP
.B "Important Notes"
.QP
The PBS_HOME directory, see --set-server-home,
used by the pbs_moms located on each node, 
.B must
be on local storage and must
have an identical path on each node.
If the directory is setup in a different path,
then Mom will not be able to initialize the SP switch correctly.
.QP
The node names provided to the server 
.B must
match the node names shown by the st_status command.
This should be the \*Qreliable\*U node name.
.mc
.LP
Set special SP-2 option,
.Ty --enable-sp2 ,
to compile special code to deal with the SP high speed switch.
.LP
If the library 
.I libswitchtbl.a
is not detected, it is assumed that you are running with
PSSP software prior to 3.1.   In this case, the IBM poe command sets up
the high speed switch directly and PBS interfaces with the IBM Resource
(Job) Manager to track which nodes jobs are using.
PBS requires two libraries, libjm_client.a and libSDR.a, installed with the
ssp.clients fileset.
.LP
If the library libswitchtbl.a is detected, it is assumed you are running with
PSSP 3.1 or later software.  
PBS takes on the responsibility of loading the high speed switch tables to
provide node connectivity.
.QP
.mc |
.B "Important Note"
.br
Regardless of the number of real processors per node,  the number of virtual
processors that may be declared to the Server is limited to the number of
Switch windows supported by the PSSP software.   At the present time, this is
four (4).   Therefore only 4 virtual processors may be declare per node.
.LP
.mc
With PSSP 3.1, two additional items of information must be passed to the job,
the switch window id (via a file whose name is passed), and a 
.I "job key"
which authorizes a process to use the switch.   As poe does not pass this
information to the processes it creates,
an underhanded method had to be created to present them to the job.
Two new programs are compiled and installed into the 
.Ty bindir
directory,
.I pbspoe
and 
.I pbspd .
.LP
.B pbspoe
is a wrapper around the real poe command.  pbspoe must be used by the user in
place of the real poe.   pbspoe modifies the command arguments and invokes the
real poe, which is assumed to be in /usr/lpp/ppe.poe/bin.  If a user specifies:
.br
.Ty "pbspoe a.out args"
.br
that command is converted to the effective command:
.br
.Cs
/usr/lpp/ppe.poe/bin/poe pbspd job_key winid_file a.out args \\ 
-hfile $PBS_NODEFILE
.Ce
.LP
PBS_NODEFILE of course contains the nodes allocated by pbs.  The pbs_mom on
those nodes have loaded the switch table with the user's uid, the job key,
and a window id of zero.   
.LP
.B pbspd
places the job key into the environment as 
.B MP_PARTITION ,
and the window id as
.B MP_MPI_NETWORK .
pbspd then exec-s a.out with the remaining arguments.
.LP
If the user specified a command file to pbspoe with 
.I "-cmdfile file" ,
then pbspoe prefixes each line of the command file with 
.B "pbspd job_key"
and copies it into a temporary file.
The temporary file is passed to poe instead of the user's file.
.LP
pbspoe also works with /usr/lpp/ppe.poe/bin/pdbx and /usr/lpp/ppe.poe/bin/xpdbx.
This substitution is done to make the changes as tranparent to the
user as possible.
.QP
.B Note
.br
Not all poe arguments or capabilities are supported.  For example, poe job
steps are not supported.
.LP
For transparent usage, it is 
.B necessary
that after PBS is installed that you perform these additional steps:
.IP 1.
Remove IBM's poe, pdbx, and xpbsdx from /usr/bin or any directory in
the user's normal path.   Be sure to leave the commands in /usr/lpp/ppe.poe/bin
which should not be in the user's path, or
if in the user's path must be after /usr/bin.
.IP 2.
Create a link named /usr/bin/poe pointing to
.Ty {bindir}/pbspoe.
Also make links for /usr/bin/pdbx and /usr/bin/xpbdx which point to
.Ty {bindir}/pbspoe..
.IP 3.
Be sure that pbspd is installed in a directory in the user's normal path on
each and every node.
.LP
.NH 3
.Tc "SGI Workstations Running IRIX 5"
.LP
If, and only if, your system is running Irix 5.3, you will need to add
.Ty -D_KMEMUSER
to
.B CFLAGS
because of a quirk in the Irix header files.
.LP
.NH 3
.Tc "SGI Systems Running IRIX 6"
.LP
If built for Irix 6.x, pbs_mom will track which processes are part of a PBS job
in one of two ways depending on the existence of the Array Services Daemon,
arrayd, as determined by /etc/config/array.
If the daemon is not configured to run, pbs_mom will
use POSIX session numbers.  This method is fine for workstations and
multiprocessor boxes not 
using SGI's mpirun command.  The PBS machine type (PBS_MACH) is set to 
.Ty irix6 .
This mode can also be forced by setting
.Ty --disable-array .
.LP
Where arrayd and mpirun are being used, the tasks
of a parallel job are started through requests to arrayd and hence are not
part of the job's POSIX session.  In order to relate processes to the job, 
the SGI Array Session Handle (ASH) must be used.
This feature is enabled when /etc/config/array contains
.Ty on
or may be forced by setting the configure option
.Ty --enable-array .
The PBS machine type (PBS_MACH) is set to
.Ty irix6array
.LP
IRIX 6 supports both 32 and 64 bit objects.
In prior versions, PBS was typically built as a 32 bit object.
Irix 6.4 introduced system supported checkpoint/restart;
PBS will include support for checkpoint/restart if the file
.I /usr/lib64/libcpr.so
is detected during the build process.  To interface with the SGI
checkpoint/restart library, PBS must be built as a 64 bit object.  Add 
.Ty "-64"
to the 
.B CFLAGS .
This can be done via the configure option
.Ty --set-cflags=-64
.QP
.B WARNING
.QP
Because of changes in structure size, PBS will not be able to recover any
server, queue, or job information recorded by a PBS built with 32 bit objects,
or vice versa.
Please read section 6.5 of the Admin Guide
entitled 
.I "Installing an Updated Batch System"  
for instructions on dealing with this incompatibility.
.LP
If libcpr.so is not present, PBS may be built as either a 32 bit or
a 64 bit object.  To build as 32 bit, add
.Ty "-n32"
instead of -64 to
.B CFLAGS .
.LP
.NH 3
.Tc "FreeBSD and NetBSD"
.LP
There is a problem with FreeBSD up to at least version 2.2.6.  It is possible
to lose track of which session a set of processes belongs to if the
session leader exits.  This means that if the top shell of a job leaves
processes running in the background and then exits, Mom will not be
able to find them when the job is deleted.  This should be fixed in
a future version.
.LP
.NH 3
.Tc "Linux"
.LP
.mc |
Redhat version 4.x - 6.x are supported for the intel x86.
.LP
There are two RPM packages for Redhat Linux.  The first contains the entire PBS
distribution and is meant for the front end node.  The second is a mom/client 
distribution and this is meant for cluster compute nodes.  
.LP
The entire PBS distribution package should install and run out of the box.  
If you are installing a single timeshared host, then you are done.  If you are
installing a cluster of compute nodes, then install the mom package on each 
of the compute nodes.  There is a little bit of configuration which must be 
done for the compute nodes.
.LP
YOU MUST EDIT THESE TWO FILES
.br
1. /usr/spool/pbs/mom_priv/config 
.br
2. /usr/spool/pbs/default_server
LP
You must replace 
.LP "<Server hostname>"
with the fully qualified domain name
for the machine which is running the pbs server.
.LP
NOTE: If you remove PBS package (pbs or pbs-mom), some files will remain in 
/usr/spool/pbs.  These can be safely removed if PBS is no longer needed.
.mc
.LP
.NH 3
.Tc "SUN Running SunOS"
.LP
The native SunOS C compiler is not ANSI and cannot be used to build PBS.
GNU gcc is recommended.

.LP
.OH 'PBS Administrator Guide''Configuration'
.EH 'Configuration''PBS Administrator Guide'
.bp
.NH 1
.Tc "\f3\s+2Batch System Configuration\s-2\fP"
.LP
Now that the system has been built and installed, the work has just begun.
The Server and Moms must be configured and the
scheduling policy must be implemented.
These items are closely coupled.   Managing which and how many jobs are
scheduled into execution can be done in several methods.  Each method has
an impact on the implementation of the scheduling policy and server attributes.
An example is the decision to schedule jobs out of a single pool (queue) or
divide jobs into one of multiple queues each of which is managed differently.
More on this type of discussion is covered under the Chapter
.B "4. Scheduling Policies" .  
.LP
.NH 2
.Tc \f3Single Execution System\fP
.LP
If you are installing PBS on a single system, you are ready to configure
the daemons and start worrying about your scheduling policy.
We still suggest that you read section 
.B "3.2.3 Where Jobs May Be Run"
and then continue with section
.B "3.3 Network Addresses" .
No nodes file is needed.
.LP
If you wish, the PBS Server and Scheduler, pbs_server and pbs_sched,
can run on one system and jobs execute on another.
This is trivial case of multiple execution systems discussed in the next
section.   We suggest that you read it.
If you are running the default Scheduler, fifo, you will need a nodes file with
one entry, the name of the host with Mom on it, appendix with 
.Ty :ts .
If you write your own Scheduler, it can told in ways other than the nodes file
on which host jobs should be run.
.NH 2
.Tc \f3Multiple Execution Systems\fP
.LP
If you are running on more than a single computer, you will need
to install the execution daemon (pbs_mom) on each system where jobs are
expected to execute.
If you are running the default scheder, fifo, you will need a nodes file with
one entry for each execution host.  The entry is the name of the host with
Mom on it, appendix with 
.Ty :ts .
Again, if you write your own Scheduler, it can be told in ways other than the 
Server's nodes file on which hosts jobs could be run.
.NH 3
.Tc "Installing Mulitple Moms"
.LP
There are four ways in which a Mom may be installed on each of the various
execution hosts.
.IP 1.
The first method is to do a full install of PBS on each host.   While this 
works, it is a bit wasteful.
.IP 2.
The second way is to rerun configure with the following options:
.Ty "--disable-server --set-sched=no" .
You may also choose to 
.Ty --disable-clients ,
but users often use the PBS commands within a job script so you will likely
want to build the commands.
You will then need to recompile and then do an install on each execution host.
.IP 3.
The third way is to do an install of just Mom (and maybe the commands) on
each system.   If the system will run the same binaries as where PBS was 
compiled, cd down to src/mom and 
.Ty "make install"
as root.   To install the commands cd ../cmds and again 
.Ty "make install" .
If the system requries recompiling, do so at the top level to recompile the
libraries and then proceed as above.
.IP 4.
The fourth requires that the the system be able to execute the existing
binaries and that the directories
.I sbindir
and
.I bindir
in which the PBS daemons and commands were installed during the initial full
build be available on each host.   These directories, unlike the PBS_HOME
directory can reside on a network file system.
.IP
.mc |
If the target tree is accessible on the host, as root execute the following
commands on each execution host:
.ft 5
.nf
sh {target_tree}/buildutils/pbs_mkdirs [-d new_directory] mom
sh {target_tree}/buildutils/pbs_mkdirs [-d new_directory] aux
sh {target_tree}/buildutils/pbs_mkdirs [-d new_directory] default
.br
.fi
.ft 1
This will build the required portion of PBS_HOME on each host.
Use the -d option if you wish to place PBS_HOME in a different place on
the node.   This directory must be on local storage on the node, not on a
shared file system.  If you use a different path for PBS_HOME than was
specified when configure was run,  you must also start pbs_mom with the 
corresponding -d option so she knows where PBS_HOME is located.
.mc
.IP
If the target tree is not accessible, copy the pbs_mkdirs shell script to 
each execution host and again as root, execute it with the above operands.
.LP
You will now need to declare the name of the execution hosts to the pbs_server
daemon as explained in the next section.
.NH 3
.Tc "Declaring Nodes"
.LP
.mc |
In PBS, allocation of cluster nodes (actually the virtual processors, VPs, of
the nodes) to a job is handled by the Server.
Each node must have its own copy of Mom running
.mc
on it.  If only timeshared hosts are to be served by the PBS batch system,
the Job Scheduler must direct where the job should be run.
If unspecified, the Server will execute the job on the host where it is running.
See the next section for full details.
.LP
If nodes' virtual processor are to be allocated 
.I exclusively 
or
.I temporarily-shared ,
a list of the nodes must be specified to the Server.   This list may also
contain timeshared nodes.   Nodes marked as timeshared will be listed
by the
Server in a node status report along with the other nodes.  However, the
Server will \f3not attempt to allocate them\fP to jobs.
The presence of timeshared nodes in
the list is solely as a convenience to the Job Scheduler and other programs,
such as xpbsmon.
.LP
The node list is given to the Server in a file named
.Ty nodes
in the Server's home directory
.Ty PBS_HOME/server_priv .
.mc |
This is a simple text file with the specification of a single node per line
in the file.  The format of each line in the file is:
.Cs
node_name[:ts] [property ...] [np=\f2NUMBER\fP]
.Ce
.IP -
The node name is the network name of the node (host name), it does not have to be fully qualified (in fact it is best if it is as short as possible).
The optional
.Ty :ts
appended to the name indicates that the node is a timeshared node.
.IP -
Zero or more properties may be specified.
The property is nothing more than a string of alphanumeric characters
(first character must be alphabetic) without meaning to PBS.
.IP -
The expression \f5np=\f2NUMBER\f1 may be added to declare the number of
virtual processors (VP) on the node.   \f2NUMBER\f1 is a numeric string,
for example np=4.  This expression will allow the node to be allocated up
to \f2NUMBER\f1 of times to one job or more than one job.
If np=# is not specified for a cluster node, it is assumed to have one VP.
While np=# may be declared on a time-share node without a warning, but
it is meaningless.
.IP -
Each item on the line must be separated by white space.
The items may be listed in any order, except that the host name must always
be first.
.IP -
Comment lines may be included if the first
non-white space character is the pound sign '#'.
.mc
.LP
The following is an example of a possible nodes file:
.Cs
# The first set of nodes are cluster nodes.
# Note that the properties are provided to group 
# certain nodes together.
curly stooge odd
moe   stooge even
larry stooge even
harpo marx odd np=2
groucho marx odd np=3
chico marx even
# And for fun we throw in one timeshared node.
chaplin:ts 
.Ce
.LP
After the pbs_server is started, the list of nodes may be entered or altered
via the qmgr command.
.IP "Add nodes:"
\f3Qmgr: \f5create node\ \f2node_name\f5\ [attributes=values]\f1
.br
.mc |
where the attributes and their associated  possible values are:
.TS
box;
c | c
l | l.
Attribute	Value
_
state	\f5free\fP, \f5down\fP, \f5offline\fP
properties	any alphanumeric string or comma separated set of strings
ntype	\f5cluster\fP, \f5time-shared\fP
np	a number of virtual processors greater than zero
.TE
In addition to the states listed above which can be set by the administrator,
there are certain other states that are only set internally.
.RS
.IP \f5busy\fP\ 
state is set by the execution daemon, pbs_mom, when a
load-average threshold is reached on the node.   See 
.I max_load
in Mom's config file [section 3.6].
.IP "\f5Job-exclusive\fP and \f5job-sharing\fP"
states are set when jobs are running on the node.
.RE
.IP
Please note, all comma separated strings which must be enclosed in quotes.
.br
Examples:
.Cs
create node box1 np=2,ntype=cluster,properties="green,blue"
.Ce
.IP "Modify nodes:"
\f5set\ node\ \f2node_name\f5\ [attributes[+|-]=values]\f1
.br
where attributes are the same as for create.
Examples:
.Cs
set node box1 properties+=red
set node box1 properties-=green
set node box1 properties=purple
.Ce
.IP "Delete nodes:"
\f3Qmgr: \f5delete node \f2node_name\f1
.br
Examples:
.Cs
delete node box1
.Ce
.mc
.LP
.NH 3
.Tc "Where Jobs May Be Run"
.LP
Where jobs may be or will be run is determined by an interaction between
the Scheduler and the Server.  This interaction is effected by the existence
of the 
.I nodes
file.   
.NH 4
No Node File
.LP
If a nodes file does not exist, the Server only directly knows about its own
host.   It assumes that jobs may be executed on it.  When told to run
a job without a specific execution host named, it will default to its own host.
Otherwise, it will attempt to execute the job where directed in the Run Job
request.  Typically the job Scheduler will know about other hosts because it
was written that way at your site.  The Scheduler will direct the Server where
to run the job.
.LP
The default fifo Scheduler depends on the existence of a node file if more than
one host is to be scheduled.   Any or all of the nodes contained in the file may
be time shared hosts with the appended \*Q:ts\*U.
.NH 4
Node File Exists
.LP
If a nodes file exists, then the following rules come into play
.RS
.IP 1. 4
If a specific host is named in the Run Job request and
the host is specified in the nodes file as a 
.I timeshared
host, the Server will attempt to run the job on that host.
.IP 2.
.mc |
If a specific host is named in the Run Job request and
the named node is not in the nodes file as a timeshared host or if there
are multiple nodes named in the Run Job request, then the Server attempts to
allocate one (or more as requested) virtual processor on the the named 
.I cluster
node or nodes to the job.
.mc
All of the named nodes must appear in the Server's
nodes file.  If the allocation succeeds, the job [shell script]
is run directly on the first of the nodes allocated.
.IP 3.
.mc |
If no location was specified on the Run Job request, but
the job requests nodes, then virtual processor(s) on cluster nodes which match
the request are allocated
if possible.  If the allocation succeeds, the job is run on the node allocated
to match the first specification in the node request.
Note, the Scheduler may modify the
job's original node request,  see the job attribute
.At neednodes .
.IP
For SMP nodes, where multiple virtual processors have been declared, the
order of allocation of processors is controlled by the setting of the Server
attribute
.At node_pack :
.RS
.IP - 3
If set true, VPs will first be taken from nodes with the fewest free VPs.
This
.I packs
jobs into the fewest possible nodes, leaving nodes available with
many VPs for those jobs that need many VPs on a node.
.IP - 3
If node_pack is set false, VPs are allocated from nodes with the most free VPs.
This scatters jobs across the nodes to minimize conflict between jobs.
.IP - 3
If node_pack is not set to either true or false, i.e. 
.I unset ,
then the VPs are allocated in the order that the nodes are declared in the
server's nodes file.
.IP
Be aware, that if node_pack is set, the internal order of nodes is changed.
If node_pack is later unset, the order will no longer be changed, but it will
not be in the order originally established in the nodes file.
.RE
A user may request multiple virtual processors per node by adding the term
\f5ppn=#\fP (for processor per node) to each node expression.   For example,
to request 2 VPs on each of 3 nodes and 4 VPs on 2 more nodes, the user can
request
.Cs
-l nodes=3:ppn=2+2:ppn=4
.Ce
.mc
.IP 4.
If the server attribute
.At default_node
is is set, its value is used.   If this matches the name of a time-shared
node, the job is run on that node.   If the value of default_node can be mapped
to a set of one or more free cluster nodes, they are allocated to the job.
.IP 5
If default_node is not set, and at least one time-shared node is defined,
that node is used.  If more than one is defined, one is selected for the job,
but which is not really predictable.
.IP 6.
The last choice is to act as if the job has requested 
.Ty 1#shared .
The job has allocated to it any existing job-shared VP, or if none exist,
then a free VP is allocated as job-shared.
.RE
.LP
What all the above means can be boiled down into the following set of
guidelines:
.RS
.IP \- 4
If the batch system consists of a single timeshared host on which the Server
and Mom are running, no problem \- all 
the jobs run there.  The Scheduler only needs to say which job it wants run.
.IP \-
If you are running a timeshared complex with \f2one\fP or more back-end hosts,
where Mom is on a different host than is the Server, then
load balancing jobs across the various hosts
is a matter of the Scheduler determining on which host to place the 
selected job.
This is done by querying the resource monitor side of Mom using the resource
monitor API - the addreq() and getreq() calls.   The Scheduler tells the Server
where to run each job.
.IP \-
If your cluster is made up of cluster nodes and you are
running distributed (multiple node) jobs, as well as serial jobs, the
Scheduler typically uses the
.I "Query Resource"
or
.I Avail
request to the Server for each queued job under consideration.   The Scheduler
then selects one of the jobs that the Server replied could run, and directs
.mc |
that the job should be run.  The Server will then allocate one or more
virtual processors on one or more nodes as required to the job.
.mc
By setting the Server attribute 
.At default_node
set to one temporarily-shared node,
.Ty 1#shared ,
jobs which do not request nodes will be placed together on a few 
temporarily-shared nodes.
.IP \-
If you have a batch system supporting both cluster nodes and one timeshared
node, the situation is like the above, only you may wish to change
.At default_node
to point to the timeshared host.  Jobs that do not ask for nodes will end
up running on the timeshared host.
.IP \-
If you have a batch system supporting both cluster nodes and multiple time
shared hosts,  you have a complex system which requires a smart Scheduler.
The Scheduler must recognize which jobs request nodes and use the
.I Avail 
request to the Server. It must also recognize which jobs are to be load
balanced among the timeshared hosts, and provide the host name
to the Server when directing that the job be run.  The supplied 
.I fifo
Scheduler has this capability.
.RE
.LP
.NH 2
.Tc \f3Network Addresses and Ports\fP
.LP
PBS makes use of fully qualified host names for identifying the jobs and
their location.  A PBS batch system is known by the host name on which the
Server, pbs_server, is running.
The name used by the daemons, or used to authenicate messages is the 
.B canonical
host name.  This name is taken from the primary name field,
.Ty h_name ,
in the structure returned by the library call gethostbyaddr().
According to our understanding of the IETF RFCs, this name must be fully
qualified and consistent for any IP address assigned to that host.
.LP
The three daemons and the commands will attempt to use /etc/services to identify
the standard port numbers to use for communication.  The port numbers need not
be below the magic 1024 number.  The service names that should be added to
/etc/services are
.Cs
pbs           15001/tcp           # pbs server (pbs_server)
pbs_mom       15002/tcp           # mom to/from server
pbs_resmom    15003/tcp           # mom resource management requests
pbs_resmom    15003/udp           # mom resource management requests
pbs_sched     15004/tcp           # scheduler
.Ce
The numbers listed are the default number used by this version of PBS.  If
you change them, be careful to use the same numbers on all systems.  
Note, the name 
.I pbs_resmom
is a carry-over from early versions of PBS when separate daemons for job
execution (pbs_mom) and resource monitoring (pbs_resmon).  The two functions
were combined into pbs_mom though the term "resmom" might be found referring
to the combined functions.
.LP
If the services cannot be found in /etc/services, the PBS components will
default to the above listed numbers.
.LP
If the Server is started with an non-standard
port number, see -p option in the pbs_server(8) man page, the Server \*Qname\*U
becomes 
.I host_name.domain:port , 
where port is the numeric port number being used.  See the discussion of 
.B "Alternate Test Systems" ,
section 6.4.
.LP
.NH 2
.Tc \f3Starting Daemons\fP
.LP
All three of the daemon processes, Server, Scheduler and Mom,
must run with the real and effective uid of root.
Typically, the daemons are started from the systems boot files,
e.g. /etc/rc.local.
However, it is recommended that the Server be brought up \*Qby hand\*U the
first time and configured before being run at boot time.
.NH 3
Starting Mom
.LP
Mom should be started at boot time.  Typically there are no requried options.
It works best if Mom is started before the Server so she will be ready to 
respond to the Server's \*Qare you there?\*U ping.  Start Mom with the line
.Cs
{sbindir}/pbs_mom
.Ce
in the /etc/rc2 or equivalent boot file.
.LP
If Mom is taken down and the
host system continues to run, Mom should be restarted with either of the
following options:
.IP -p
This directs Mom to let running jobs continue to run.  Because Mom is no
longer the parent of the Jobs, she will not be notified (SIGCHLD) when they 
die and there must poll to determine when the jobs complete.   The resource
usage information therefore may not be completely accurate.
.IP -r
This directs Mom to kill off any jobs which were left running.  See
the ERS for a full explanation.
.IP
Without either the -p or the -r option, Mom will assume the jobs' processes
are non-existent due to a system restart, a cold start.
She will not attempt to kill the processes and will request that any jobs which
where running before the system restart be requeued.
.LP
By default, Mom will only accept connections from a privileged port on her
system, either the port associated with \*Qlocalhost\*U or the name returned
by gethostname(2).    If the Server or Scheduler are
running on a different host, the host
name(s) must be specified in Mom's configuration file.
See the -c option on the
pbs_mom(8B) man page and in the Admin Guide, see sections
.B "3.6 Configurating the Execution Server, pbs_mom"
for more information on the configuration file.
.LP
Should you wish to make use of the prologue and/or epilogue script features,
please see section 6.2 \*QJob Prologue/Epilogue Scripts".
.NH 3
Starting the Server
.LP
The initial run of the Server or any first time run after recreating the home
directory must be with the
.Ty "-t create"
option.   This option directs the Server to create a new server database.
This is best done by hand.
If a database is already present, it is discarded after receiving a positive
validation response.  At this point it is necessary to configure the
Server.  See the section
.B "3.5 Server Configuration".
The create option leaves the Server in a \*Qidle\*U state.  In this state
the Server will not contact the Scheduler and jobs are not run, except manually via the qrun(1B) command.
Once the Server is up, it can be placed in the \*Qactive\*U state by setting
the Server attribute \f5scheduling\fP to a value of true:
.Cs
qmgr -c "set server scheduling=true"
.Ce
The value of \f5scheduling\fP is retained across Server terminations/starts.
.LP
After the Server is configured it may be placed into service.   Normally it is
started in the system boot file via a line such as:
.Cs 
{sbindir}/pbs_server
.Ce
The 
.Ty "-t start_type"
option may be specified where
.Ty start_type
is one of the options specified in the ERS (and the pbs_server man page).
The default is
.Ty warm . 
Another useful option is the 
.Ty "-a true|false"
option.  This turns on|off the invocation of the PBS job Scheduler.
.NH 3
Starting the Scheduler
.LP
The Scheduler should also be started at boot time.
Start it with an entry in the /etc/rc2 or equivalent file:
.Cs
{sbindir}/pbs_sched [options]
.Ce
There are no required options for the default fifo scheduler.
Typically the only required option for the BaSL based Scheduler is the
.Ty "-c config_file"
option specifying the configuration file.
For the Tcl based Scheduler, the option
is used to specify the Tcl script to be called.
.NH 2
.Tc \f3Configuring the Job Server, pbs_server\fP
.LP
Server management consist of configuring the Server attributes and establishing
queues and their attributes.
Unlike Mom and the Job Scheduler, the Job Server (pbs_server) is configured
while it is running, except for the nodes file.
Configuring server and queue attributes and creating queues is done with the
qmgr(1B) command.  This must be either as root or as a user who has been
granted PBS Manager privilege as shown in the last step in the
.B "Build Overview"
section of this guide.
Exactly what needs to be set depends on your
scheduling policy and how you chose to implement it.  The system needs at
least one queue established and certain server attributes initialized.
.LP
The Server attributes are discussed in section 2.4 of the ERS.  The following
are the \*Qminimum required\*U server attributes and the recommended attributes.
For the sake of examples, we will assume that your site is a sub-domain of a
large network and all hosts at your site have names of the form:
.sp
.nf
\ \ \ \f2host\f3.foo.bar.com\f1
.fi
.sp
and the batch system consists of a single large machine named
.B big.foo.bar.com .
.NH 3
.Tc Server Configuration
.LP
The following attributes are required or recommended.
They are set via the 
.I "set server"
(s s) subcommand to the
.I qmgr (1B)
command.
.LP
Not all of the Server attributes are discussed here, only what is needed
to get a reasonable system up and running.   See the
.I pbs_server_attributes
man page for a complete list of server attributes.
.NH 4
Required Server Attributes
.LP
.Al default_queue 
Declares the default queue to which jobs are submitted if a queue is not
specified on the qsub(1B) command.  The queue must be created first.
Example:
.Cs
\f3Qmgr: \fPc q dque queue_type=execution
\f3Qmgr: \fPs s default_queue=dque
.Ce
.NH 4
Recommended Server Attributes
.LP
.Al acl_hosts
A list of hosts from which jobs may be submitted.  For example, if you wish
to allow all the systems on your sub-domain plus one other host, boss,
at headquarters to submit jobs, then set:
.Cs
\f3Qmgr: \fPs s acl_hosts=*.foo.bar.com,boss.hq.bar.com
.Ce
.Al acl_host_enable
Enables the Server's host access control list, see above.
.Cs
\f3Qmgr: \fPs s acl_host_enable=true
.Ce
.Al default_node
Defines the node on which jobs are run if not otherwise directed.  Please see
section 3.2.3 
.B "Where Jobs May be Run"
for a discussion of how to set this attibute depending on your system.
The default value (also the value assumed if the attribute is unset) is
.Ty 1#shared .
.Cs
\f3Qmgr: \fPs s default_node=big
.Ce
Note, the value may be specified as either 
.Ty big
or
.Ty big.foo.bar.com .
If there is a node file, the value must match exactly the name specified
in the node file.  I.e.  big in both places or big.foo.bar.com in both places.
.Al managers
Defines which users, at a specified host, are granted batch system
administrator privilege.  For example, to grant privilege to \*Qme\*U at
all systems on the sub-domain and \*Qsam\*U only from this system, big, then:
.Cs
\f3Qmgr: \fPs s managers=me@*.foo.bar.com,sam@big.foo.bar.com
.Ce
.mc |
.Al node_pack
Defines the order in which multiple cpu cluster nodes are allocted to jobs.
See the discussion in section 3.2.3 Where Jobs May Be Run.
If set, the internal node list is sorted based on the number of free VPs.
If set
.B true ,
jobs are packed into the fewest possible nodes.  If set 
.B false ,
jobs are scattered across the most possible nodes.  If left
.B unset ,
jobs will be placed across nodes in the order that the nodes are
declared to the server.
.mc
.Al operators
Defines which users, at a specified host, are granted batch system
operator privilege.  Specified as are the managers.
.mc |
.Al query_other_jobs
This attributes determines the ability to access to status (qstat) jobs that
belong to other users.   If it is not set, or if set to False, a user will not
be able to query status of any job not belonging to himself or herself.
Most sites will wish to set this attribute to True:
.Cs
\f3Qmgr: \fPs s query_other_jobs=true
.Ce
.mc
.Al resources_defaults
This attribute establishes the resource limits assigned to jobs that were
submitted without a limit and for which there are no queue limits.  It is 
important that a default value be assigned for any resource requirement used
in the scheduling policy.
See the 
.I pbs_resources_*
man page for your system type (* is irix6, linux, solaris5, ...).
.Cs
\f3Qmgr: \fPs s resources_defaults.cput=5:00
\f3Qmgr: \fPs s resources_defaults.mem=4mb
.Ce
.Al resources_max
This attribute sets the maximum amount of resources which can be used by a
job entering any queue on the Server.   This limit is checked only if there
is not a queue specific resources_max attribute defined for the specific 
resource.
.LP
.NH 3
.Tc Queue Configuration
.LP
There are two types of queues defined by PBS, routing and execution.
A routing queue is a queue used to move jobs to other queues which may even
exist on different PBS Servers.  Routing queues are similar to the old NQS pipe
queues.  A job must reside in an execution queue to be eligible to run.  The 
job remains in the execution queue during the time it is running.  
.LP
A Server may have multiple queues of either or both types.
A Server must have at least one queue defined.
Typically it will be an execution queue;
jobs cannot be executed while residing in an routing queue.
.LP
Queue attributes fall into three groups: those which are applicable to both 
types of queues, those applicable only to execution queues, and those applicable
only to routing queues.  If an \*Qexecution queue only\*U attribute is set
for a routing queue, or vice versa, it is simply ignored by the system.
However, as this situation might indicate the administrator made a mistake,
the Server will issue a warning message about the conflict.
The same message will be issued if the queue type is changed and there are
attributes that do not apply to the new type.
.LP
Not all of the Queue Attributes are discussed here, only what is needed
to get a reasonable system up and running.   See the
.I pbs_queue_attributes
man page for a complete list of queue attributes.
.NH 4
Required Attributes for All Queues
.LP
.Al queue_type
Must be set to either 
.Ty execution
or
.Ty routing 
(e or r will do).  The queue type must be set before the queue can be enabled.
If the type conflicts with certain attributes which are valid only for the
other queue type,  the set request will be rejected by the Server.
.Cs
\f3Qmgr: \fPs q dque queue_type=execution
.Ce
.Al enabled
If set to true, jobs may be enqueued into the queue.  If false, jobs will not
be accepted.
.Cs
\f3Qmgr: \fPs q dque enabled=true
.Ce
.Al started
If set to true, jobs in the queue will be processed, either routed by the
Server if the queue is a routing queue or scheduled by the job Scheduler
if an execution queue.
.Cs
\f3Qmgr: \fPs q dque started=true
.Ce
.LP
.NH 4
Required Attributes for Routing Queues
.LP
.Al route_destinations
List the local queues or queues at other Servers to which jobs in this routing
queue may be sent.  For example:
.Cs
\f3Qmgr: \fPs q routem route_destinations=dque,overthere@another.foo.bar.com
.Ce
.LP
.NH 4
Recommended Attributes for All Queues
.LP
.Al resources_max
If you chose to have more than one execution queue based on the size or type
of job, you may wish to establish maximum and minimum values for various
resource limits.  This will restrict which jobs may enter the queue.  A routing
queue can be established to \*Qfeed\*U the execution queues and jobs will
be distributed by those limits automatically.
.IP
A resources_max value defined for a specific resource at the queue level will
override the same resource resources_max defined at the Server level.
Therefore, it is possible to define a higher as well as a lower value for a
queue limit than the Server's corresponding limit.
If there is no maximum value declared for a resource type, there is no 
restriction on that resource.  For example:
.Cs
s q dque resources_max.cput=2:00:00
.Ce
places a restriction that no job requesting more than 2 hours of cpu time will
be allowed in the queue.   There is no restriction on the memory,
.At mem ,
limit a job may request.
.Al resources_min
Defines the minimum value of resource limit specified by a job before the
job will be accepted into the queue.
If not set, there is no minimum restriction.
.LP
.NH 4
Recommended Attributes for Execution Queues
.LP
.Al resources_default
Defines a set of default values for jobs entering the queue that did not
specify certain resource limits.  There is a  corresponding server attribute
which sets a default for all jobs.
.IP
The limit for a specific resource usage is established by checking various
job, queue, and server attributes.  The following list shows the attributes
and their order of precedence:
.RS
.IP 1. 4
The job attribute Resource_List, i.e. what was requested by the user.
.IP 2.
The queue attribute resources_default.
.IP 3.
The Server attribute resources_default.
.IP 4.
The queue attribute resources_max.
.IP 5.
The Server attribute resources_max.
.IP *
Under Unicos, a user supplied value must be within the system's User Data
Base, UDB, limit for the user.
If the user does not supply a value, the lower of the defaulted value from the
above list and the UDB limit is used.
.RE
.IP
\f3Please note, an \f2unset\f3 resource limit for a job is treated as an 
\f2infinite\f3 limit\f1.
.LP
.NH 4
Selective Routing of Jobs into Queues
.LP
Often it is desirable to route jobs to various queues on a Server, or even
between Servers, based on the resource requirements of the jobs.  
The queue 
.I resources_min
and
.I resources_max
attributes discussed above make this selective routing possible.
As an example, let us assume you wish to establish two execution queues,
one for short jobs of less than 1 minute cpu time, and the other for long
running jobs of 1 minute or longer.  Call them 
.B short
and 
.B long .
Apply the resources_min and resources_max attribute as follows:
.Cs
\f3Qmgr: \fPset queue short resources_max.cput=59
\f3Qmgr: \fPset queue long  resources_min.cput=60
.Ce
.LP
When a job is being enqueued, it's requested resource list is tested against
the queue limits:
.Ty "resources_min <= job_requirement <= resources_max" .
If the resource test fails, the job is not accepted into the queue.
Hence, a job asking for 20 seconds of cpu time would be accepted into queue
.B short
but not into queue
.B long .
Note, if the min and max limits are equal, only that
exact value will pass the test.
.LP
You may wish to set up a routing queue to feed jobs into the queues with
resource limits.
For example:
.Cs
\f3Qmgr: \fPcreate queue feed queue_type=routing
\f3Qmgr: \fPset queue feed route_destinations="short,long"
\f3Qmgr: \fPset server default_queue=feed
.Ce
A job will end up in either 
.B short
or 
.B long
depending on its cpu time request.
.LP
You should always list the destination queues in order of the most restrictive
first as the first queue which meets the job's requirements will be its
destination (assuming that queue is enabled).  Extending the above example to
three queues:
.Cs
\f3Qmgr: \fPset queue short resources_max.cput=59
\f3Qmgr: \fPset queue long  resources_min.cput=1:00,resources_max.cput=1:00:00
\f3Qmgr: \fPcreate queue verylong queue_type=execution
\f3Qmgr: \fPset queue feed route_destinations="short,long,verylong"
.Ce
A job asking for 20 minutes (20:00) of cpu time will be placed into queue
.B long .
A job asking for 1 hour and 10 minutes (1:10:00) will end up in queue
.B verylong
by default.
.LP
Caution, if a test is being made on a resource as shown with cput above,
and a job does not specify that resource item (it does not appear in the
.Ty "-l resource=value"
list on the qsub command,  the test will pass.
In the above case,
a job without a cpu time limit will be allowed into queue
.B short .
For this reason, together with the fact that an unset limit is considered to
be an infinite limit, you may wish to add a default value to the queues or
to the Server.  Either
.Cs
\f3Qmgr: \fPset queue short resources_default.cput=40
.Ce
or
.Cs
\f3Qmgr: \fPset server resources_default.cput=40
.Ce
will see that a job without a cpu time specification is limited to 40 seconds.
A resources_default attribute at a queue level only applies to jobs in that
queue.
Be aware of two facts:
.RS
.IP 1.
If a default value is assigned, it is done so after the tests against
min and max.
.IP 2.
Default values assigned to a job from a queue
.Ty resources_default
are not carried with the job if the job moves to another queue.  Those resource
limits becomes unset as when the job was specified.  If the new queue specifies
default values, those values are assigned to the job while it is in the
new queue.
.IP 3.
Server level default values are applied if there is no queue level default.
.RE
In the above example, a default attibute should be applied to either at the
server level or at the routing queue level.
or
.LP
Minimum and maximum queue limits work with numeric valued resources, 
including time and size values.   Generally, they do not work with string
valued resources because of character comparison order.
However, setting the min and max to the same value to
force an exact match will work even for string valued resources.
For example,
.Cs
\f3Qmgr: \fPset queue big resources_max.arch=unicos8
\f3Qmgr: \fPset queue big resources_min.arch=unicos8
.Ce
can be used to limit jobs entering queue 
.B big
to those specifying
.Ty arch=unicos8 .
Again, remember that if 
.Ty arch
is not specified by the job, the tests pass automatically and the job
will be accepted into the queue.
.LP
It is possible to set limits on queues (and the Server) as to how many
nodes a job can request.  The \f2nodes\fP resource itself is a text string
and difficult to limit.  However,
two additional Read-Only resources exist for jobs.
They are \f2nodect\fP and \f2neednodes\fP.
Nodect (node count) is set by the Server to the integer number of nodes
desired by the user as declared in the \*Qnodes\*U resource specification.
That declaration is parsed and
the resulting total number of nodes is set in nodect.  This is useful when
an administrator wishes to place an integer limit,
.I resources_min
or 
.I resources_max ,
on the number of nodes used by a job entering a queue.
.LP
Based on the earlier example of declaring nodes, if a user requested the
following nodes, see section 
.B "7.2 Parallel Jobs"
for more information:
.Cs
3:marx+2:stooge
.Ce
nodect would be set to 5 (3+2).
Neednodes is initially set by the Server to the same value as nodes.
Neednodes may be modified by the job Scheduler for special policies.
The contents of
neednodes determines which nodes are actually assigned to the job.
Neednodes is visible to the administrator but not to an unprivileged user.
.LP
If you wish to set up a queue default value for \*Qnodes\*U (a value to which
the resource is set if the user does not supply one), corresponding
default values must be set for \*Qnodect\*U and \*Qneednodes\*U.  For example
.Cs
\f1Qmgr:\fP set queue foo resources_default.nodes=1
\f1Qmgr:\fP set queue foo resources_default.nodect=1
\f1Qmgr:\fP set queue foo resources_default.neednodes=1
.Ce
Minimum and maximum limits are set for \*Qnodect\*U only.  For example:
.Cs
\f1Qmgr:\fP set queue foo resources_min.nodect=1
\f1Qmgr:\fP set queue foo resources_max.nodect=15
.Ce
Minimum and maximum values must 
.B not
be set for nodes or neednodes as those are string values.
.NH 3
.Tc "Recording Server Configuration"
.LP
Should you wish to record the configuration of a Server for re-use, you
may use the 
.I print
subcommand of
.B qmgr (8B).
For example,
.Cs
qmgr -c "print server" > /tmp/server.con
.Ce
will record in the file server.con the qmgr subcommands required to recreate
the current configuration including the queues.
The commands could be feed back into qmgr via standard input:
.Cs
qmgr < /tmp/server.con
.Ce
.LP
.NH 2
.Tc "\f3Configuring the Execution Server, pbs_mom\fP"
.LP
Mom is configured via a configuration file which she reads at initialization
time and when sent the SIGHUP signal.  This file is described in the pbs_mom(8) man page as well as in the following section.
.LP
If the -c option is not specified when Mom is run, she will open
.I PBS_HOME/mom_priv/config
if it exists.   If it does not, Mom will continue anyway.
This file may be placed elsewhere or given a different name, in which case
pbs_mom must be started with the -c option.
.LP
The file provides several types of run time information to pbs_mom:
static resource names and values, external resources provided 
by a program to be run on request via a shell escape, and values
to pass to internal set up functions at initialization
(and re-initialization).
.LP
Each item type is on a single line with the component parts separated by
white space.  If the line starts with a hash mark (pound sign, #), 
the line is considered to be a comment and is skipped.
.NH 3
Access Control and Initialization Values
.LP
An initialization value directive has a name which starts with a
dollar sign ($) and must be known to Mom via an internal table.
Currently the entries in this table are: 
.IP clienthost
A 
.I $clienthost
entry causes a host name to be added to the list of hosts which will be allowed
to connect to Mom as long as it is using a privileged port.
For example, here are two lines for the
configuration file which will allow the hosts "fred" and "wilma"
to connect:
.Cs
$clienthost      fred
$clienthost      wilma
.Ce
Two host names are always allowed to connect to pbs_mom, "localhost"
and the name returned to pbs_mom by the system call gethostname().  These
names need not be specified in the configuration file.  The hosts
listed as "clienthosts" comprise a "sisterhood" of hosts.  Any one of the
sisterhood will accept connections from a Scheduler 
[Resource Monitor (RM) requests]
or Server [jobs to execute] from within the sisterhood.
They will also accept Internal Mom (IM) messages from within the sisterhood.
For a sisterhood to be able to communicate IM messages to each other,
they must all share the same RM port.
.IP
For a Scheduler to be able to query resource information from a Mom,
the Scheduler's host must be listed as a 
.I clienthost .
.IP
If the Server is provided with a nodes file,  the IP addresses of the hosts
(nodes) in the file will be forwarded by
the Server to the Mom on each host listed in the node file.
These hosts need not be
in the various Mom's configuration file as they will be added internally when
the list is received from the Server.  The Server's host must be either
the same host as the Mom or be listed as a clienthost entry in each Mom's
config file.
.IP restricted
A 
.I $restricted
host entry
causes a host name to be added to the list of hosts which will be allowed
to connect to Mom without needing to use a privilaged port.  These names
allow for wildcard matching.  For example, here is a configuration file
line which will allow queries from any host from the domain "ibm.com".
.Cs
$restricted      *.ibm.com
.Ce
Connections from the specified hosts are restricted in that only
internal queries may be made.  No resources from a config file
will be reported and no control requests can be issued.
This is to prevent any shell commands from being
run by a non-root process.  
.IP
This type of entry is typically used to specify hosts on which a monitoring
tool, such as xpbsmon, can be run.   Xpbsmon will query Mom for general
resource information.
.IP logevent
A
.I $logevent
entry sets the mask that determines which event types are logged by pbs_mom.
For example:
.Cs
$logevent 0x1ff

$logevent 255
.Ce
.IP
The first example would set the log event mask to 0x1ff (511) which enables
logging of all events including debug events.  The second example would set
the mask to 0x0ff (255) which enables all events except debug events.
The values of events are listed in section 
.B "6.3 Use and Maintenace of Logs"
.mc |
.IP ideal_load
An
.I $ideal_load 
directive declares the low water mark for load on a node.  It works in
conjunction with a 
.I $max_load
directive.  When the load average on the node drops below the ideal_load,
Mom on the  node will inform the Server that the node is no longer busy.
.br
For example:
.Cs
$ideal_load 2.0
$max_load   3.5
.Ce
.IP max_load
An
.I $max_load
directive declares the high water mark for load on a node.  It is used in
conjunction with a 
.I $ideal_load
directive.  When the load average exceeds the high water mark, Mom on the node
will notify the Server that the node is busy.   The state of the node will be
shown as 
.B busy .
A busy cluster node will not be allocated to jobs.  
This is useful in preventing allocation of jobs to nodes which are busy with
interactive sessions.
.IP
A 
.B busy
time-shared node may still run new jobs under the direction of the scheduler.
Both the $ideal_load and $max_load directives add a static resource, ideal_load
and max_load, which may be queried by the Scheduler.   These static resources
are supported by the default FIFO scheduler when load-balancing jobs.
See the discussion of the FIFO scheduler for more information.
.mc
.IP usecp\ 
If Mom is to move a file to a host other than her own, Mom normally uses scp
or rcp
to transfer the file.  This applies to stage-in/out and delivery of the job's
standard output/error.
[Please study the -o and -e option to qsub, qsub(1) man page, and section
.B "3.3.5 Job Exit"
of the ERS, to understand the naming convention
for standard output and error files.]
The destination is recorded as
.Ty hostx:/full/path/name .
So if 
.Ty hostx
is not the same system on which Mom is running, then she uses scp or rcp;
if it is the same system, then Mom uses /bin/cp.
.IP
If the destination file system is NFS mounted amoung all the systems in the PBS
environment (cluster), then a cp may work better than s/rcp.  One or more
.I $usecp
directives in the config file can be used to inform Mom on which file systems
a cp command can be used instead of s/rcp.  The $usecp entry has the form:
.Cs
$usecp  host_specification:path_prefix  substitute_prefix 
.Ce
The
.I host_specification
is either a fully qualified host\-domain name or a wild carded host\-domain
specification as used in the Server's host ACL attribute.  The 
.I path_prefix
is a leading component of the fully qualified path for the NFS files as
visible on the specified host.  The 
.I substitute_prefix
is the initial components of the path to the same files on Mom's host.  If
different mount points are used, the path_prefix and the substitute_prefix will
be different.   If the same mount points are used for the cross mounted file
system, then the two prefixes will be the same.
.IP
When given a file destination, Mom will:
.RS
.IP 1.
Match the host_spec against her host name.   If they match, Mom will use the
cp command to move the file.   If the hostspec is
.Ty localhost ,
then Mom will also use cp.
.IP 2.
If the match in step one fails,  Mom will match the host portion of the
destination against each $usecp host_specification in turn.  If the host
matches, Mom matches the 
.I path_prefix
against the initial segment of the destination name.   If this matches,
Mom will discard the host name,  replace the initial segment of the path that
matched against
.I path_prefix
with the
.I substitute_prefix
and use cp for the resulting destination.
.IP 3.
If the host is neither the local host nor does it match any of the usecp
directives, them Mom will use the rcp command to move the file.
.RE
.IP
For example,  a user on host
.B myworkstation.company.com
submits a job while her current working directory is
.B /u/wk/her_home/proj .
The destination for her output would be given by PBS as
.Ty myworkstation.company.com:/u/wk/her_home/proj/123.OU
The job runs on host
.B pool2.company.com 
which has the user's home file system cross mounted as
.B /r/home/her_home ,
then either of the following entries in the config file on pool2
.Cs
$usecp myworkstation.company.com:/u/wk/ /r/home/
$usecp *.company.com:/u/wk/  /r/home/
.Ce
will result in a cp copy to
.Ty /r/home/her_home/proj/123.OU
instead of an rcp to
.Ty myworkstation.company.com:/u/wk/her_home/proj/123.OU .
.IP
Note that the destination is matched against the $usecp entries in the order in
the config file.   The first match of host and file prefix determines the
substitution.   Therefore, if you have the same file system mounted on /foo
on HostA and on /bar on every other host, then the entries for pool1 should be
in the following order
.Cs
$usecp HostA.company.com:/foo /bar
$usecp     *.company.com:/bar /bar
.Ce
.IP cputmult
A
.I $cputmult
entry sets a factor used to adjust cpu time used by a job.  This is provided
to allow adjustment of time charged and limits enforced where the job might
run on systems with different cpu performance. 
If Mom's system is faster than the reference system, set cputmult to a decimal
value greater than 1.0.   If Mom's system is slower, set cputmult to a value
between 1.0 and 0.0.  The value is given by
.Cs
value = speed_of_this_system / speed_of_reference_system
.Ce
For example:
.Cs
$cputmult 1.5
.Ce
or
.Cs
$cputmult 0.75
.Ce
.IP wallmult
A
.I $wallmult
entry sets a factor used to adjust wall time usage by to job to a
common reference system.  The factor is used for walltime calculations and
limits in the same way as cputmult is used for cpu time.
.mc |
.IP prologalarm
A $prologalarm entry sets the time-out period in seconds for the prologue and
epilogue scripts.   An alarm is set to prevent the script from locking up the 
job if the script hangs or takes a very long time to execute.   The default
value is 30 seconds.  An example:
.Cs
$prologalarm 60
.Ce
.mc
.LP
.NH 3
Static Resources
.LP
For static resource names and values, the configuration file contains a
list of resource name/value pairs, one pair per line and separated by
white space.   An Example of static resource names and values could be 
the number of tape drives of different types and could be specified by
.Cs
tape3480      4
tape3420      2
tapedat       1
tape8mm       1
.Ce
The names can be anything and are not restricted to actual hardware. 
For example the entry
.Ty "pong  1"
could be used to indicate to the Scheduler that a certain piece
of software is available on this system.
.NH 3
Shell Commands
.LP
If the first character of the value portion of a name/value pair
is the exclamation mark (!),
the entire rest of the line is saved to be executed through the services of
the \f3system\fP(3) standard library routine.
The first line of output from the shell command is returned as the response
to the resource query.   
.LP
The shell escape provides a means for the resource monitor to yield
arbitrary information to the Scheduler.  Parameter substitution is
done such that the value of any qualifier sent with the resource query, as
explained below, replaces a token with a percent sign (%) followed
by the name of the qualifier.  For example, here is a configuration file
line which gives a resource name of "escape":
.Cs
escape     !echo %xxx %yyy
.Ce
If a query for "escape" is sent with no qualifiers, the command
executed would be "echo %xxx %yyy".  If one qualifier is sent,
"escape[xxx=hi there]", the command executed would be "echo hi there %yyy".
If two qualifiers are sent, "escape[xxx=hi][yyy=there]", the command
executed would be "echo hi there".  If a qualifier is sent with
no matching token in the command line, "escape[zzz=snafu]", an error
is reported.
.LP
Another example would allow the Scheduler to have Mom query the existence
of a file.  The following entry would be placed in Mom's config file:
.Cs
file_exists !if test -f %file; then echo yes; else echo no; fi
.Ce
The the query string "file_exists[file=/tmp/lockout]" would return \*Qyes\*U
if the file exists and \*Qno\*U if it did not.
.LP
Another possible use of the shell command configuration entry is to provide a
means by which the use of floating software licenses may be tracked.
If a program can be written to query the license server, the number of
available licenses could be returned to tell the Scheduler if it is possible
to run a job that needs a certain licensed package.  [You get the fun and
games of writing this program.]
.LP
.NH 3
Examples of Config File
.LP
For the following examples, we will assume your site is \*QThe Widget Company\*U
and your domain name is \*Qwidget.com\*U.
The following is an example of a config file for pbs_mom where the batch
system is a single large system.  We want to log most records and specify that
the system has 1 8mm tape drives.
.KS
.Cs
$logevent 0x0ff
tape8mm 1
.Ce
.KE
.LP
If the Scheduler for the large system happened to be on a front end machine,
named fe.widget.com, then you would want to allow it to access Mom, so the
config file becomes:
.KS
.Cs
$logevent 0x0ff
$clienthost fe.widget.com
tape8mm 1
.Ce
.KE
.LP
Now the center has expanded to two large systems.  The new system has two
tape drives and is 30% faster than the old system.  You wish to charge the
users the same regardless of where their job runs.
Basing the charges on the old system, you will need
to multiple the time used on the new system by 1.3 to charge the same as on
the old system.   The config file for the \*Qold\*U system stays the same.
The config file for the \*Qnew\*U system is:
.KS
.Cs
$logevent 0x0ff
$clienthost fe.widget.com
$cputmult 1.3
$wallmult 1.3
tape8mm 2
.Ce
.KE
.LP
Now you have put together a cluster of PCs running Linux named \*Qbevy\*U, as in
a bevy of PCs.  The Scheduler and Server is running on 
.I bevyboss.widget.com
which also has the user's home file systems mounted as /u/home/...
The nodes are named 
.I bevy1.widget.com ,
.I bevy2.widget.com ,
etc.
The user's home file systems are NFS mounted as /r/home/...
Your personal workstation, adm.widget.com, is where you plan to run
.I xpbsmon
to monitor the cluster.  The config file for each Mom would look like:
.KS
.Cs
$logevent 0x1ff
$clienthost bevyboss.widget.com
$restricted adm.widget.com
$usecp bevyboss.widget.com:/u/home /r/home
.Ce
.KE
.LP
.NH 2
.Tc "\f3Configurating the Scheduler, pbs_sched\fP"
.LP
The configuration required for a Scheduler depends on the Scheduler itself.
If you are starting with the delivered 
.I fifo
Scheduler, please jump ahead to section 4.5.1 \*QFIFO Scheduler\*U in this
guide.
.OH 'PBS Administrator Guide''Scheduling'
.EH 'Scheduling''PBS Administrator Guide'
.bp
.NH 1
.Tc \f3\s+2Scheduling Policies\s-2\fP
.LP
PBS provides a separate process to schedule which jobs should be placed into
execution.  This is a flexible mechanism by which you may implement
a very wide variety of policies.  The Scheduler uses the standard PBS API
to communicate with the Server and an additional API to communicate with the 
PBS resource monitor,
.B pbs_mom .
Should the provided Schedulers be insufficient to meet your site's needs,
it is possible to implement a replacement Scheduler using the provided APIs
which will enforce the desired policies.
.LP
The first generation batch system, NQS, and many of the other batch systems
use various queue based controls to limit or schedule jobs.  Queues would 
be turned on and off to control job ordering over time or have a limit of
the number of running jobs in the queue.
.LP
While PBS supports multiple queues and the queues have some of the
\*Qjob scheduling\*U attributes used by other batch systems, the PBS Server
does not by itself run jobs or enforce any of the restrictions implied by
these queue attributes.  In fact, the Server will happily run a 
.I held
job that resides in a 
.I stopped
queue with a zero limit on running jobs, if it is directed to do so.  The
direction may come from the operator, administrator, or the Scheduler.
In fact, the Scheduler is nothing more than a client with administration
privilege.
.LP
If you chose to implement your site scheduling policy using a multiple queue
\- queue control based scheme, you may do so.
The Server and queue attributes used to control job scheduling may be adjusted
by a client with privilege, such as 
.B qmgr (8B),
or by one of your own creation.  However, the controls actually
reside in the Scheduler, not in the Server.  The Scheduler must check the
status of the Server and queues, as well as the jobs,
determining the setting of the Server and queue controls.
It then must use the settings of those controls in its decision making.
.LP
Another approach is the \*Qwhole pool\*U approach, wherein all jobs are
in a single pool (single queue).  The Scheduler evaluates each job on its
merits and decides which, if any, to run.   The policy can easily include
factors such as time of day, system load, size of job, etc.  Ordering of jobs
in the queue need not be considered.
The PBS team believes that this approach is superior for two reasons:
.RS
.IP 1.
Users are not tempted to lie about their requirements in order to \*Qgame\*U
the queue policy. 
.IP 2.
The scheduling can be performed against the complete set of
current jobs resulting in better fits against the available resources.
.RE
.LP
.NH 2
.Tc "\f3Scheduler \- Server Interaction\fP"
.LP
In developing a scheduling policy, it may be important to understand when
and how the Server and the Scheduler interact.  The Server always initiates the
scheduling cycle.
When scheduling is active within the Server, 
the Server opens a connection to the Scheduler and sends a
command indicating the reason for the scheduling cycle.
The reasons or events that trigger a cycle are:
.IP -
A job newly becomes eligible to execute.  The job may be a new job in an
execution queue, or a job in an execution queue that just changed state from
held or waiting to queued.  [\ \s-1SCH_SCHEDULE_NEW\s+1\ ]
.IP -
An executing job terminates.  [\ \s-1SCH_SCHEDULE_TERM\s+1\ ]
.IP -
The time interval since the prior cycle specified by the Server attribute
.At schedule_iteration
is reached.  [\ \s-1SCH_SCHEDULE_TIME\s+1\ ]
.IP -
The Server attribute 
.At scheduling
is set or reset to true.  If set true, even if it's value was true, the
Scheduler will be cycled.  This provides the administrator/operator a means
on forcing a scheduling cycle.  [\ \s-1SCH_SCHEDULE_CMD\s+1\ ]
.IP -
If the Scheduler was cycled and it requested one and only one job to be run,
then the Scheduler will be recycled by the Server.  This event is a bit
abstruse.   It exists to \*Qsimplify\*U a Scheduler.   The Scheduler only
need worry about choosing the one best job per cycle.   If other jobs can
also be run, it will get another chance to pick the next job.   Should a
Scheduler run none or more than one job in a cycle it is clear that it need
not be recalled until conditions change and one of the above trigger the
next cycle.  [\ \s-1SCH_SCHEDULE_RECYC\s+1\ ]
.IP -
If the Server recently recovered, the first scheduling cycle, resulting
from any of the above, will be indicated uniquely.
[\ \s-1SCH_SCHEDULE_FIRST\s+1\ ]
.LP
Once the Server has contacted the Scheduler and sent the reason for the 
contact, the Scheduler then becomes a privileged client of the Server.
As such, it may command the Server to perform any action allowed to a
manager.  
.LP
When the Scheduler has completed all activities it wishes to perform in
this cycle, it will close the connection to the Server.  While a connection
is open, the Server will not attempt to open a new connection.
.LP
Note, that the Server contacts the Scheduler to begin a scheduling cycl
only if scheduling is active in the Server.   This is controlled by the value
of the Server attribute 
.At scheduling .
If set true, scheduling is active and \*Qqstat -B\*U will show the Server Status
as Active.  If scheduling is set false, then the Server will not contact the
Scheduler and the Server's status is shown as Idle.  When started, the Server
will recover the value for
.At scheduling
as it was set when the Server shut down.   The value may be changed in two ways:
the -a option on the pbs_server command line, or by setting scheduling to 
true or false via qmgr.
.LP
One point should be clarified about job ordering:
.QP
Queues \*Qare\*U and \*Qare not\*U FIFOs.
.LP
What is meant is that while jobs are ordered first in \- first out in the
Server and in each queue, that fact does NOT imply that running them in that
order is mandated, required, or even desirable.  That is a decision left 
completely up to site policy and implementation.  The Server will maintain
the order across restarts solely as a aid to sites that wish to use a FIFO
ordering in some fashion.
.NH 2
.Tc \f3BaSL Scheduling\fP
.LP
The provided BaSL Scheduler uses a C-like procedural language to write the
scheduling policy. The language provides a number of constructs and predefined
functions that facilitate dealing with scheduling issues. Information about a
PBS Server, the queues that it owns, jobs residing on each queue, and the
computational nodes where jobs can be run are accessed via the BaSL data types
.B Server,
.B Que,
.B Job,
.B CNode,
.B "Set Server",
.B "Set Que",
.B "Set Job",
and
.B "Set CNode".
.sp
The idea is that a site must first write a function (containing the
scheduling algorithm) called
.I sched_main()
(and all functions supporting it) using BaSL constructs, and then translate the
functions into C using the BaSL compiler
.B "basl2c",
which would also attach a
main program to the resulting code.  This main program performs general
initialization and housekeeping chores such as setting up local socket to
communicate with the Server running on the same machine, cd-ing to the priv
directory, opening log files, opening configuration file (if any), setting up
locks, forking the child to become a daemon, initializing a scheduling cycle
(i.e.  get node attributes that are static in nature), setting up the signal
handlers, executing global initialization assignment statements specified by
the Scheduler writer, and finally sitting on a loop waiting for a scheduling
command from the Server. The name of the resulting code is
.I pbs_sched.c .
.sp
When the Server sends the Scheduler an appropriate
scheduling command
{\ \s-1SCH_SCHEDULE_NEW\s+1\ , \ \s-1SCH_SCHEDULE_TERM\s+1\ , \ \s-1SCH_SCHEDULE_TIME\s+1\ , \ \s-1SCH_SCHEDULE_RECYC\s+1\ , \ \s-1SCH_SCHEDULE_CMD\s+1\ , \ \s-1SCH_SCHEDULE_FIRST\s+1\ },
the Scheduler wakes up and obtains information about Server(s), jobs, queues, and execution host(s),
and then it calls
.I sched_main().
The list of Servers, execution hosts, and host queries to send to the hosts'
Moms are specified in the Scheduler configuration file. 
.sp
Global variables defined in the BaSL program will retain their values
in between scheduling cycles while locally-defined variables do not.
.LP
.NH 2
.Tc \f3Tcl Based Scheduling\fP
.LP
The provided Tcl based Scheduler framework uses the basic Tcl interpreter
with some extra commands for communicating with the PBS Server and Resource
Monitor.  The scheduling policy
is defined by a script written in Tcl.
A number of sample scripts are provided in the source directory
.I src/scheduler.tcl/sample_scripts .
.LP
The Tcl based Scheduler works, very generally, in the following way:
.IP 1.
On start up, the Scheduler reads the initialization script (if specified with
the -i option)
and executes it.  Then, the body script is read into memory.  This is the
file that will be executed each time a \*Qschedule\*U command is received
from the Server.
It then waits for a \*Qschedule\*U command from the Server.
.IP 2.
When a schedule command is received, the body script is executed.  No
special processing is done for the script except to provide a connection
to the Server.  A typical script will need to retrieve information for
candidate jobs to run from the Server using
.B pbsselstat
or
.B pbsstatjob .
Other information from the Resource Monitor(s) will need to be retrieved
by opening connections with
.B openrm
and submitting queries with
.B addreq
and getting the results with
.B getreq .
The Resource Monitor connections must be closed explicitly with
.B closerm
or the Scheduler will eventually run out of file descriptors.
When a decision is made to run a job, a call to
.B pbsrunjob
must be made.
.IP 3.
When the script evaluation is complete, the Scheduler will close the TCP/IP
connection to the Server.
.NH 3
Tcl Based Scheduling Advice
.LP
The Scheduler does not restart the Tcl interpreter for each cycle.
This gives the ability to carry information from one cycle to the next.
It also can cause problems if variables are not initialized or "unset"
at the beginning of the script when they are not expected to contain any
information later on.
.LP
System load average is frequently used by a script.  This number
is obtained from the system kernel by pbs_mom.  Most systems smooth the
load average number over a time period.  If one scheduling cycle runs one or
more jobs and the next scheduling cycle occurs quickly, the impact of the
newly run jobs will likely not be reflected in the load average.   
This can cause the load average to shoot way up especially when first starting
the batch system.  Also when jobs terminate, the delay in lowering the
load average may delay the scheduling of additional jobs.
.LP
The Scheduler redirects the output from \*Qstdout\*U and \*Qstderr\*U to
a file.  This makes it easy to generate debug output to check what your
script is doing.  It is advisable to use this feature heavily until you
are fairly sure that your script is working well.
.NH 3
Implementing a Tcl Scheduler
.LP
The best advice is study the examples found in src/scheduler.tcl/sample_scripts.
Then once you have modified or written a scheduler body script and optionally an
initialization script, place them in the directory {PBS_HOME}/sched_priv and
invoke the Scheduler typing
.Cs
{sbindir}/pbs_sched [-b body_script] [-i init_script]" 
.Ce
See the pbs_sched_tcl(8) man page for more information.
.NH 2
.Tc "\f3C Based Scheduling\fP"
.LP
The C based Scheduler is similar in structure and operation to the Tcl
Scheduler except that C functions are used rather than Tcl scripts.  
.IP 1.
On start up, the Scheduler calls
.I "schedinit(argc, argv)"
one time only
to initialize whatever is required to be initialized.
.IP 2.
When a schedule command is received, the function
.I "schedule(cmd, connector)"
is invoked.  All scheduling activities occur within that function.
.IP 3.
Upon return to the main loop, the connection to the Server is closed.
.LP
Several working Scheduler code examples are provided in the samples
subdirectory.
The following sections discuss certain of the sample schedulers including
the default scheduler fifo.
The sources for the samples are found in 
.I src/scheduler.cc/samples
under the Scheduler type name, for example 
.I src/scheduler.cc/samples/fifo .
.NH 3
.Tc "FIFO Scheduler"
.LP
This Scheduler will provide several simple scheduling policies.  It provides the
ability to sort the jobs in several different ways, in addition to FIFO order.
There is also the ability to sort on user and group priority.  Mainly this
Scheduler is intended to be a jumping off point for a real Scheduler to be
written.  A good amount of code has been written to make it easier to change
and add to this Scheduler.  Check the IDS for a more detailed view of the
code.
.LP
As distributed, the fifo Scheduler is configured with the following
options, see file
.Ty PBS_HOME/sched_priv/sched_config :
.IP -
All jobs in a queue will be considered for execution before the next queue
is examined.
.IP -
The queues are sorted by queue priority.
.IP -
The jobs within each queue are sorted by requested cpu time (cput).
The shortest job is places first.
.IP -
Jobs which have been queued for more than a day will be considered starving
and heroic measures will be taken to attempt to run them.
.IP -
Any queue whose name starts with \*Qded\*U is treated as a dedicated time queue.
Jobs in that queue will only be considered for execution if the system is in
dedicated time as specified in the 
.Ty dedicated_time
configuration file.  If the system is in dedicated time, jobs not in a \*Qded\*U
queue will not considered.
(See file
.Ty PBS_HOME/sched_priv/dedicated_time )
.IP -
Prime time is from 4:00 AM to 5:30 PM.  Any holiday is considered non-prime.
Standard federal holidays for the year 1998 are included.
(See file
.Ty PBS_HOME/sched_priv/holidays)
.IP -
A sample
.Ty dedicated_time
and resource group file are also included.
.IP -
These system resources are checked to make sure they are not exceeded: 
.I mem
(memory requested) and 
.I ncpus
(number of CPUs requested).
.LP
.NH 4
Installing the FIFO Scheduler
.IP 1. 
As discussed in the build overview, run configure with the following options:
.Ty --set-sched=c
and
.Ty --set-sched-code=fifo ,
which are the default.
.IP 2. 
You may wish to read through the src/scheduler.cc/samples/fifo/config.h file.  
Most default values will be fine.
.IP 3. 
Build and install PBS
.IP 4. 
Change directory into 
.Ty PBS_HOME/sched_priv
and edit the scheduling policy config file
.Ty sched_config ,
or use the default values.  
This file controls the scheduling policy (which jobs are run when).
The default name of
.Ty sched_config
may be changed in config.h.
The format of the sched_config file is: 
.IP
.ft 3
name: value [prime | non_prime | all]
.ft 1
.IP
name and value may not contain any white space
.br
value can be:  true | false | number | string 
.br
any line starting with a '#' is a comment. 
.br
a blank third word is equivalent to \*Qall\*U which is both prime and non-prime
.IP 
the associated values as shipped as defaults are shown in braces {}:
.RS
.IP round_robin
.br
boolean: If true \- run jobs one from each queue in a circular fashion; if
false \- run as many jobs as possible up to queue/server limits from one
queue before processing the next queue.
The following server and queue attributes, if set,  will control if a job
\*Qcan be\*U run:
.At resources_max ,
.At max_running ,
.At max_user_run ,
and
.At max_group_run .
See the man pages pbs_server_attributes and pbs_queue_attributes.
.br
{false all}

.IP by_queue
.br
boolean: If true \- the jobs will be run from their queues; if false \- the
entire job pool in the Server is looked at as one large queue.
.br
{true all}

.IP strict_fifo
.br
boolean: If true \- will run jobs in a strict FIFO order.
This means if a job fails to run for any reason, no
more jobs will run from that queue/server that 
scheduling cycle.  If 
.I strict_fifo 
is not set, large jobs can be starved, i.e., not allowed to run because a 
never ending series of small jobs use the available resources.   Also see the
server attribute 
.At resources_max
in section 3.5.1, and the fifo parameter
.I help_starving_jobs
below.
.br
{false all}

.IP fair_share
.br
boolean: This will turn on the fair share algorithm.  It will also turn on 
usage collecting and jobs will be selected using a function of their usage
and priority(shares).
.br
{false all}

.IP load_balancing
boolean: If this is set the Scheduler will load balance the jobs between a 
list of time-shared hosts (:ts) obtained from the Server (pbs_server).
The Server reads the list from its nodes file, see section 3.2.
.br
{false all}

.IP help_starving_jobs
boolean: This bit will have the Scheduler turn on its rudimentry starving jobs
support.  Once jobs have waited for the amount of time give by
.Ty starve_max , 
they are considered starving.  If a job is considered starving, then no jobs
will run until the starving job can be run.  
.Ty Starve_max 
needs to be set also.

.IP sort_by 
string: have the jobs sorted.  sort_by can be set to a single
sort type or 
.I multi_sort.  
If set to 
.I multi_sort, 
multiple 
.I key 
fields are used.  Each 
.I key 
field will be a key for the multi sort.  The order of the key fields decides 
which sort type is used first.

Sorts: no_sort, shortest_job_first, longest_job_first, 
smallest_memory_first, largest_memory_first, high_priority_first,
low_priority_first, multi_sort, fair_share, large_walltime_first, 
short_walltime_first
.br
{shortest_job_first}
.RS
.IP no_sort 
do not sort the jobs
.IP shortest_job_first
ascending by the cput attribute
.IP longest_job_first
descending by the cput attribute
.IP smallest_memory_first
ascending by the mem attribute
.IP largest_memory_first
descending by the mem attribute
.IP high_priority_first
descending by the job priority attribute
.IP low_priority_first
ascending by the job priority attribute
.IP large_walltime_first
descending by job walltime attribute
.IP cmp_job_walltime_asc
ascending by job walltime attribute
.IP multi_sort
sort on multiple keys.  
.IP fair_share
If
.Ty fair_share
if given as the sort key, the jobs are sorted based on the values in the
resource group file.  This is only used if strict priority sorting is needed.
.RE
.IP key
Sort type as defined above for multiple sorts.   Each sorting key is listed
on a separate line starting with the word
.I key .
For example:
.Cs 
sort_by: multi_sort
key: sortest_job_first
key: smallest_memory_first
key: high_priority_first
.Ce
.IP log_filter
What event types not to log.  The value should be the addition of the event 
classes which should be filtered (i.e. ORing them together).
The numbers are defined in 
.I src/include/log.h.
NOTE: those numbers are in hex and log_filter is in base 10.
.br
{256}

Examples:
.nf
To filter PBSEVENT_DEBUG2, PBSEVENT_DEBUG and PBSEVENT_ADMIN
	  0x100: 256	   0x080: 128  	  0x004: 4	= 388
log_filter 388

To filter PBSEVENT_JOB,	PBSEVENT_DEBUG and PBSEVENT_SCHED
	   0x008: 8	   0x080: 128	  0x040: 64	= 200		
log_filter 200
.fi

.IP dedicated_prefix
The queues with this prefix will be considered dedicated queues.
Example:
if the dedicated prefix is "ded"
then dedicated, ded1, ded5 etc would be dedicated queues

{ded}

.IP starve_max
The amount of time before a job is considered starving.  This config variable
is not used if 
.Ty help_starving_jobs
is not set.

.LP
The following do not matter if fair share is not turned on (which is is not by
default).

.IP half_life
The half life of the fair share usage
.br
{24:00:00}

.IP unknown_shares
The amount of shares for the "unknown" group. 
.br
{10}
.IP sync_time
The amount of time between writing the fair share usage data to disk.
.br
{1:00:00}
.RE
.IP
The policy set by the supplied values in sched_config is:
.br
Jobs are run on the basis of queue priority, both in prime and non-prime time.
.br
Jobs with in each queue are sorted on the basis of smallest (memory) first.
.br
Help for starving jobs will take effect after a job is 24 hours old.

.IP 5.
If fair share or strict priority is going to be used, the resource group file
.Ty {PBS_HOME}/sched_priv/resources_group ,
will need to be edited.  A sample file was installed.  
When editing the file, use the following format for each line of the file:
.Cs
# comment
username cresgrp resgrp shares
.Ce
.RS
.IP username 
string: the username of the user or the group
.IP cresgrp
numeric: an id for the group or user, should be unique for each.  For users,
the UID works well.
.IP resgrp
.br
string: the name of the parent resource group this user/group is in.
The root of the entire tree is called 
.Ty root
and is added automatically to the tree by the Scheduler.
.IP shares
numeric: The amount of shares(priority) the user/group has in the resource 
group.
.RE

.IP 6.
If strict priority is wanted, a fair share tree will be needed.  A really 
simple one will suffice.  Every user's resgrp will be root.  The amount of
shares will be their priority.  Next, set 
.Ty unknown_shares
to one.  Everyone who is not in the tree will share the one share between them
to make sure everyone in the tree will have priority over them.  Lastly, the 
main sort must be set to 
.I fair_share.
This will sort by the fair share tree which was just set up.

.IP 7. 
Create the holidays file to handle prime time and holidays.  The holidays file 
should use the UNICOS 8 holiday format.  The ordering does matter.  Any line 
that begins with a "*" is considered a comment.
.RS
.IP YEAR\ YYYY
This is the current year.  
.IP <day>\ <prime>\ <nonprime>
Day can be weekday | saturday | sunday
.br
prime and nonprime are times when prime or non-prime time start.  They can 
either be HHMM with no colons(:) or the word "all" or "none"
.IP <day>\ <date>\ <holiday>
day is the day of the year between 1 and 365
date is the calendar date.  Ex Jan 1
holiday is the name of the holiday.  Ex New Year's Day
This is repeated for each company holiday
.RE
.IP 8.
To load balance between timesharing nodes, several things need to happen.  
First, a nodes file needs to be set up as PBSHOME/server_priv/nodes.
(See section 3.2).  All 
timesharing nodes need to be denoted with :ts appended to the hostname.
These are the nodes between which the 
Scheduler will load balance.  Secondly, on every node there has to be
a Mom.  In each of Mom's config files two static values need to be set up.  One 
is for the ideal load and the other for the maximum load.
This is done by putting two 
lines in the config file in the following format: name value.  The names will
be 
.I ideal_load
and
.I max_load ,
and values are floating point numbers.
Lastly, turn the load_balancing bit on in the scheduling policy config file.
Load balancing will have the job comment changed on running of the job to show
where the job was run.
.Cs
Example of Mom config file:	(64 processor machine)
ideal_load 50
max_load 64
.Ce
.mc |
Note that $ideal_load and $max_load directives as discussed under Mom's config
file will create the corresponding ideal_load and max_load entries.
.mc

.IP 9.
Space sharing is done automatically if there are both a nodes file and the 
job requests nodes.  Make sure to set up a resources_default.nodes and 
resources_default.nodect.
.KS
.IP 10.
The Scheduler honors the following attributes/node resources:
.TS
;
l | l | l.
Source Object	Attribute/Resource	Comparison
_
Queue	started	equal true
Queue	queue_type	equal execution
Queue	max_running	ge #jobs running
Queue	max_user_run	ge #jobs running for a user
Queue	max_group_run	ge #jobs running for a group 
Job	job state	equal Queued
Server	max_running	ge #jobs running
Server	max_user_run	ge #jobs running for a user
Server	max_group_run	ge #jobs running for a group
Server	resources_available	ge resources requested by job
Server	resources_max	ge resources requested
Node	loadave	less than configured limit
Node	arch	equal type requested
Node	host	equal name requested
Node	ncpus	ge number ncpus requested
Node	physmem	ge amount mem requested
.TE

NOTE: if resources_available.res is set, it will be used, if not
resources_max.res will be used.  If neither are set infinity is 
assumed.
.KE
.LP
.NH 4
Examples FIFO Configuration Files
.LP
The following are just examples and may or may not be what is shipped.
.LP
.LP

.UL "Example of a scheduling config file"

.nf
.ft 2
# 	Set the boolean values which define how the scheduling policy finds 
#	the next job to consider to run.
.ft 5
round_robin: False	ALL
by_queue: True		prime
by_queue: false		non-prime
strict_fifo: true	ALL
fair_share: True	prime
fair_share: false	non-prime

.ft 2
# help jobs which have been waiting too long 
.ft 5
help_starving_jobs: true	prime
help_starving_jobs: false	non-prime

.ft 2
# Set a multi_sort
# This example will sort jobs first by ascending cpu time requested, and then
# by ascending memory requested, and then finally by descending job priority
#
.ft 5
sort_by: multi_sort
key: shortest_job_first
key: smallest_memory_first
key: high_priority_first

.ft 2
# Set the debug level to only show high level messages.
# Currently this only shows jobs being run
.ft 5
debug_level: high_mess

.ft 2
# a job is considered starving if it has waited for this long
.ft 5
max_starve:	24:00:00

.ft 2
# If the Scheduler comes by a user which is not currently in the resource group
# tree, they get added to the "unknown" group.  The "unknown" group is in roots
# resource group.  This says how many shares it gets.
.ft 5
unknown_shares: 10

.ft 2
# The usage information needs to be written to disk in case the Scheduler 
# goes down for any reason.  This is the amount of time between when the 
# usage information in memory is written to disk.  The example syncs the 
# information ever hour.
.ft 5
sync_time: 1:00:00

.ft 2
# What events do you not want to log.  The event numbers are defined in 
# src/include/log.h.  NOTE: the numbers are in hex, and log_filter is in
# base 10.
# The example is not to log DEBUG2 events, which are the most prolific
.ft 5
log_filter: 256
.fi
.LP
.LP

.UL "Here is an example of the holidays file"

.nf
.ft 2
* the current year
.ft 5
YEAR    1998

.ft 2
*
* Start and end of prime time
*
*               Prime   Non-Prime
* Day           Start   Start
.ft 5
.TS
;
l6 l l.
weekday	0400	1130
saturday	none	all
sunday	none	all
.TE

.ft 2
* 
* The holidays
*
* Day of        Calendar        Company
* Year          Date            Holiday
*
.ft 2
.TS
;
l12 l12 l.
1	Jan 1	New Year's Day
20	Jan 20	Martin Luther King Day
48	Feb 17	President's Day
146	May 26	Memorial Day
185	Jul 4	Independence Day
244	Sep 1	Labor Day
286	Oct 13	Columbus Day
315	Nov 11	Veteran's Day
331	Nov 27	Thanksgiving
359	Dec 25	Christmas Day
.TE
.fi
.ft 1
.LP
.LP

.UL "Example of the resource group file for fair share"

.nf
.ft 2
#
# the groups "root" and "unknown" are added by the Scheduler
# All the parents must be added for the children.  This is why all the groups
# are added first.  The cresgrp numbers the users have are their UIDs
#

# name		resgrp		child resgrp	shares

.ft 5
.TS
;
l10 l10 l14 l.
grp1	50	root	10
grp2	51	root	20
grp3	52	root	10
grp4	53	grp1	20
grp5	54	grp1	10
grp6	55	grp2	20
usr1	60	root	5
usr2	61	grp1	10
usr3	62	grp2	10
usr4	63	grp6	10
usr5	64	grp6	10
usr6	65	grp6	20
usr7	66	grp3	10
usr8	67	grp4	10
usr9	68	grp4	10
usr10	69	grp5	10
.TE
.fi
.ft 1
.LP
.LP

.UL "Example of strict priority resource group file"

.nf
.ft 2
# this is a strict priority resource group file.  These are people who should 
# get priority over everyone else.  The amount of shares is the priority of 
# the user.
.ft 5
.fi

.TS
;
l10 l10 l14 l.
sally	1000	root	4
larry	1001	root	6
manager	1010	root	100
vp	1016	root	500
ceo	2000	root	10000
.TE
.ft 1
.LP
.LP

.UL "Example of dedicated file"

.nf
.ft 2
# Format: 
#	FROM		 	       TO
# MM/DD/YYYY HH:MM        MM/DD/YYYY HH:MM
.ft 5
.TS
;
l6 l6 l6 l.
04/10/1998	15:30	04/11/1998	23:50
05/15/1998	05:15	05/15/1998	08:30
06/10/1998	23:25	06/10/1998	23:50
.TE
.fi
.ft 1
.LP
.NH 3
.Tc "IBM_SP Scheduler"
.LP
This is a highly optimized scheduler for the IBM SP series of supercomputers.
This scheduler was the first to provide a "dynamic backfill" algorithm for
the SP.  The algorithm is designed to implement a usage policy comparable to
the one found on NAS traditional vector supercomputers.
The algorithm primary goals are to minimize the turnaround time for small jobs
during Prime-Time hours, and to maintain the highest possible node utilization
during NonPrime-Time hours. Scheduling a diverse workload composed of
interactive, small debugging, and long batch jobs presents significant
difficulties on the SP, due to its limited resource management capabilities,
and parallel job scheduling restrictions (only space-sharing, no time-sharing).
The space-sharing scheduling algorithm utilized uses a sophisticated
Dynamic-Backfilling method to overcome the SP limitations. The algorithm
achieves turnaround time for small jobs to 10 - 20 minutes, and maintains
node utilization around 75%. See the whitepaper included in the
scheduler.cc/samples/ibm_sp directory for a full discussion of the
algorithms used.
.NH 4
Installing the IBM_SP Scheduler
.LP
.IP 1.
As discussed in the build overview, run configure with the following options:
.Cs
--set-sched=cc and --set-sched-code=ibm_sp
.Ce
.IP 2.
Review src/scheduler.cc/samples/ibm_sp/sched_globals.h editing any variables
necessary, such as the value of SCHED_DEFAULT_CONFIGURATION.
.IP 3.
Build and install PBS.
.IP 4.
Change directory into {PBS_HOME}/sched_priv and edit the scheduler
configuration file "config" (see 4.5.2.2). This file controls the
scheduling policy used to determine which jobs are run and when.
The comments in the config file explain what each option is for. If in
doubt, the default option is generally acceptable.
.LP
.NH 4
Configuring the IBM_SP Scheduler
.LP
The ibm_sp scheduler config file contains the following tunable parameters,
which control the policy implemented by the scheduler.  Comments are allowed
anywhere in the file, and begin with a '#' character.  Any non-comment lines
are considered to be statements, and must conform to the syntax:
.Cs
<option> <argument>
.Ce
Arguments must be one of:
.IP <boolean> 12
A boolean value. Either 0 (false/off) or 1 (true/on)
.IP <domain>
A registered domain name, eg. "mrj.com"
.IP <hostname>
A hostname registered in the DNS system.
.IP <integer>
An integral (typically non-negative) decimal value.
.IP <pathname>
A valid pathname (i.e. "/usr/local/pbs/pbs_acctdir").
.IP <real>
A real valued number (i.e. the number 0.80).
.IP <string>
An uninterpreted string passed to other programs.
.IP <time_spec>
A string of the form HH:MM:SS (i.e. 00:30:00).
.LP
Below is a listing of the available configuration parameters for this
scheduler, and a brief explaination of each. See the README and the
actual "config" files for a detailed description.
.KS
\s-1
.TS
;
l l l.
Parameter	Type	Definition
_

DEFAULT_ATTR	<string>	Define default node attribute
ENFORCE_ALLOC	<boolean>	Indicate enforcement of allocations
ENFORCE_DEDTIME	<boolean>	Indicate enforcement of dedicated time
LOCAL_DOMAIN	<domain>	Local network domain name
LOWUSAGE_NODEINUSE	<integer>	Threshold where we start to ignore "policy"
MAXJOB_RUNNING	<integer>	Maximum number of jobs allowed per user
MAXJOB_WALLTIME	<integer>	Maximum walltime (seconds) that a job is allowed
		to run in the 'normal' queue.  If the request is over, the job is deleted.
MAX_QUEUED_TIME	<integer>	Seconds to wait before delaying other jobs
MIN_QUEUED_TIME	<integer>	Seconds a short job should remain in the queue.
NODEUSAGE_DECAY	<real>	Decay factor of node/hour usage
NONPRIME_AVAIL	<integer>	Define Non-Prime node high availability
NONPRIME_BATCH_START	<time_spec>	Define start of the NonPrime-Time Batch only period
NONPRIME_BATCH_STOP	<time_spec>	Define end of the NonPrime-Time Batch only period
NONPRIME_SAT_START	<time_spec>	Special case for the interactive period on Saturday
NONPRIME_SAT_STOP	<time_spec>	Special case for the interactive period on Saturday
OVERALLOC_DECAY	<real>	Decay factor for jobs over allocation.
PBS_HOST	<string>	Name of system -- ie, for the whole SP
PBS_HOST_UPPER	<string>	Upper case version of PBS_HOST
PBS_SERVER	<hostname>	Hostname where PBS server is running
PEER_ENABLE	<boolean>	Enable MetaCenter PEER checking -- for PeerScheduler
PERCENT_TO_LETGO	<integer>	Threashold for % of time shift required for a job to be scheduled.
PRIME_32_END	<time_spec>	End of <32 node window
PRIME_32_START	<time_spec>	Jobs <32 nodes can start during prime
PRIME_AVAIL	<integer>	Define Prime node high availability
PRIME_NODE	<integer>	Define Prime Time Node size Threshold
PRIME_TIME_END	<time_spec>	Define end of the Prime-Time period
PRIME_TIME_START	<time_spec>	Define start of the Prime-Time period
QUEUE_DEDTIME	<pathname>	Name of "dedicated time" queue
QUEUE_PBS	<pathname>	Name of primary/default queue) 
QUEUE_SPECIAL	<pathname>	Name of "special" queue
RESMON_HOST	<hostname>	Hostname where PBS mom/resmom is running
SCHEDULE_DOWNTIME	:<pathname>	Location of 'schedule' command for scheduled downtime
SCHED_ACCT_DIR	<pathname>	Location of the per-group allocation and usage files
SCHED_DEBUGGING	<pathname>	Location of the scheduler debugging config file
SCHED_DECAY	<pathname>	Location of the scheduler usage decay file
SCHED_MAPFILE	<pathname>	Location of the user mapfile
SCHED_OUTPUT	<pathname>	Location of the scheduler output file
SCHED_STATUS	<pathname>	Location of the scheduler status file
SCHED_TIMEOUT	<integer>	Seconds to wait before timing out a connection
SEEK_WORK_DELAY	<integer>	Seconds to wait before contacting a PEER
SHIFT_NODELIMIT	<integer>	Node watermark limit for the dynamic backfilling
SMALL_QUEUED_TIME	<time_spec>	Treshold to separate a long job from a short job.
TYPE_AVAIL	<integer>	Flag to maintain availability for a specific node request
TYPE_NODEAVAIL	:<string>	Node request to maintain highly available
USE_SITE_MAPFILE	<boolean>	Indicate use of Username Mapfile
WALLTIME0	<time_spec>	Maximum walltime constants for over-allocation jobs
WALLTIME1	<time_spec>	Walltime limit constants for normal jobs
WALLTIME2	<time_spec>	Walltime limit constants for normal jobs
WALLTIME5	<time_spec>	Maximum walltime constants for over-allocation jobs
_
.TE
\s+1
.KE
.LP
.NH 3
.Tc "SGI_Origin Scheduler"
.LP
This is a highly specialized scheduler for managing a cluster of SGI
Origin2000 systems, providing integrated support for Array Services (for
MPI programs), and NODEMASK (to pin applications via software to dynamically
created regions of nodes within the system). The scheduling algorithm
includes an implementation of static backfill and dynamically calculates
NODEMASKs on a per-job basis. (See the README file in the
scheduler.cc/samples/sgi_origin directory for details of the algorithm.)
.NH 4
Installing the SGI_ORIGIN Scheduler
.LP
.IP 1. 3
As discussed in the build overview, run configure with the following options:
.Cs
--set-sched=cc   --set-sched-code=sgi_origin
.Ce
If you wish to enable scheduler use of the NODEMASK facility, then also
add the configure option
.Ty  --enable-nodemask .
.IP 2.
Review src/scheduler.cc/samples/sgi_origin/toolkit.h editing any variables
necessary, such as the value of SCHED_DEFAULT_CONFIGURATION.
.IP 3. 
Build and install PBS.
.IP 4.
Change directory into {PBS_HOME}/sched_priv and edit the scheduler
configuration file "config" (see 4.4.3.2). This file controls the
scheduling policy used to determine which jobs are run and when.
The comments in the config file explain what each option is.
If in doubt, the default option is generally acceptable.
.NH 4
Configuring the SGI_Origin Scheduler
.LP
The {PBS_HOME}/sched_priv/config file contains the following tunable parameters,
which control the policy implemented by the scheduler.  Comments are allowed
anywhere in the file, and begin with a '#' character.  Any non-comment lines
are considered to be statements, and must conform to the syntax:
.Cs
<option> <argument>
.Ce
See the README and config files for a description of the options listed below,
and the type of argument expected for each of the options.  Arguments must
be one of:
.IP <boolean>
A boolean value.  The strings "true", "yes", "on" and
"1" are all true, anything else evaluates to false.
.IP <hostname>
A hostname registered in the DNS system.
.IP <integer>
An integral (typically non-negative) decimal value.
.IP <pathname>
A valid pathname (i.e. "/usr/local/pbs/pbs_acctdir").
.IP <queue_spec>
The name of a PBS queue.  Either 'queue@exechost' or just 'queue'.  If the
hostname is not specified, it defaults to the name of the local host machine.
.IP <real>
A real valued number (i.e. the number 0.80).
.IP <string>
An uninterpreted string passed to other programs.
.IP <time_spec>
A string of the form HH:MM:SS (i.e. 00:30:00 for
thirty minutes, 4:00:00 for four hours).
.IP <variance>
Negative and positive deviation from a value.  The syntax is '-mm%,+nn%'
(i.e. '-10%,+15%' for minus 10 percent and plus 15% from some value).
.LP
Syntactical errors in the configuration file are caught by the parser, and
the offending line number and/or configuration option/argument is noted in
the scheduler logs.  The scheduler will not start while there are syntax
errors in its configuration files.
.LP
Before starting up, the scheduler attempts to find common errors in the
configuration files.  If it discovers a problem, it will note it in the
logs (possibly suggesting a fix) and exit.
.LP
The following is a complete list of the recognized options:
\s-1
.TS
;
l l.
Parameter	Type
_
AVOID_FRAGMENTATION	<boolean>
BATCH_QUEUES	<queue_spec>[,<queue_spec>...]
DECAY_FACTOR	<real>
DEDICATED_QUEUE	<queue_spec>
DEDICATED_TIME_CACHE_SECS	<integer>
DEDICATED_TIME_COMMAND	<pathname>
ENFORCE_ALLOCATION	<boolean>
ENFORCE_DEDICATED_TIME	<boolean>
ENFORCE_PRIME_TIME	<boolean>
EXTERNAL_QUEUES	<queue_spec>[,<queue_spec>...]
FAKE_MACHINE_MULT	<integer>
HIGH_SYSTIME	<integer>
INTERACTIVE_LONG_WAIT	<time_spec>
MAX_DEDICATED_JOBS	<integer>
MAX_JOBS	<integer>
MAX_QUEUED_TIME	<time_spec>
MAX_USER_RUN_JOBS	<integer>
MIN_JOBS	<integer>
NONPRIME_DRAIN_SYS	<boolean>
OA_DECAY_FACTOR	<real>
PRIME_TIME_END	<time_spec>
PRIME_TIME_SMALL_NODE_LIMIT	<integer>
PRIME_TIME_SMALL_WALLT_LIMIT	<time_spec>
PRIME_TIME_START	<time_spec>
PRIME_TIME_WALLT_LIMIT	<time_spec>
SCHED_ACCT_DIR	<pathname>
SCHED_HOST	<hostname>
SCHED_RESTART_ACTION	<string>
SERVER_HOST	<hostname>
SMALL_JOB_MAX	<integer>
SMALL_QUEUED_TIME	<time_spec>
SORT_BY_PAST_USAGE	<boolean>
SPECIAL_QUEUE	<queue_spec>
SUBMIT_QUEUE	<queue_spec>
SYSTEM_NAME	<hostname>
TARGET_LOAD_PCT	<integer>
TARGET_LOAD_VARIANCE	<variance>
TEST_ONLY	<boolean>
WALLT_LIMIT_LARGE_JOB	<time_spec>
WALLT_LIMIT_SMALL_JOB	<time_spec>
_
.TE
\s+1
See the following files for detailed explaination of these options:
.br
src/scheduler.cc/samples/sgi_origin/README
.br
src/scheduler.cc/samples/sgi_origin/config
.LP
.NH 3
.Tc "CRAY T3E Scheduler"
.LP
This is a highly specialized scheduler for the Cray T3E MPP system.
The supporting code of this scheduler (configuration file parser,
reading of external files, limits specification, etc.) is based on
the previously discussed SGI Origin scheduler (see section 4.4.3 above).
.LP
The scheduling algorithm is an implementation of a priority-based
system wherein jobs inheritate an initial priority from the queue
that they are first submitted to, and then the priority is adjusted
based on a variety of factors. These factors include such variables
as: length of time in queue, time of day, length of time requested,
number of nodes and/or amount of memory requested, etc. (See the README
file in the scheduler.cc/samples/cray_t3e directory for details of the
algorithm and configuration options.)
.NH 4
Installing the CRAY_T3E Scheduler
.LP
.IP 1. 3
As discussed in the build overview, run configure with the following options:
.Cs
--set-sched=cc   --set-sched-code=cray_t3e
.Ce
If you wish to enable scheduler use of the PEMASK facility, then also
add the configure option
.Ty  --enable-pemask .
.IP 2.
Review src/scheduler.cc/samples/sgi_origin/toolkit.h editing any variables
necessary, such as the value of SCHED_DEFAULT_CONFIGURATION.
.IP 3. 
Build and install PBS.
.IP 4.
Change directory into {PBS_HOME}/sched_priv and edit the scheduler
configuration file "config" (see 4.4.5.2). This file controls the
scheduling policy used to determine which jobs are run and when.
The comments in the configuration file explain what each option is.
If in doubt, the default option is generally acceptable.
.NH 4
Configuring the Cray T3E Scheduler
.LP
The {PBS_HOME}/sched_priv/config file contains the following tunable parameters,
which control the policy implemented by the scheduler.  Comments are allowed
anywhere in the file, and begin with a '#' character.  Any non-comment lines
are considered to be statements, and must conform to the syntax:
.Cs
<option> <argument>
.Ce
See the README and config files for a description of the options listed below,
and the type of argument expected for each of the options.  Arguments must
be one of:
.IP <boolean>
A boolean value.  The strings "true", "yes", "on" and
"1" are all true, anything else evaluates to false.
.IP <hostname>
A hostname registered in the DNS system.
.IP <integer>
An integral (typically non-negative) decimal value.
.IP <pathname>
A valid pathname (i.e. "/usr/local/pbs/pbs_acctdir").
.IP <queue_spec>
The name of a PBS queue.  Either 'queue@exechost' or just 'queue'.  If the
hostname is not specified, it defaults to the name of the local host machine.
.IP <real>
A real valued number (i.e. the number 0.80).
.IP <string>
An uninterpreted string passed to other programs.
.IP <time_spec>
A string of the form HH:MM:SS (i.e. 00:30:00 for
thirty minutes, 4:00:00 for four hours).
.IP <variance>
Negative and positive deviation from a value.  The syntax is '-mm%,+nn%'
(i.e. '-10%,+15%' for minus 10 percent and plus 15% from some value).
.LP
Syntactical errors in the configuration file are caught by the parser, and
the offending line number and/or configuration option/argument is noted in
the scheduler logs.  The scheduler will not start while there are syntax
errors in its configuration files.
.LP
Before starting up, the scheduler attempts to find common errors in the
configuration files.  If it discovers a problem, it will note it in the
logs (possibly suggesting a fix) and exit.
.LP
The following is a complete list of the recognized options:
\s-1
.TS
;
l l.
Parameter       Type
_
AVOID_FRAGMENTATION     <boolean>
BACKGROUND_QUEUE_NAME   <string>
BATCH_QUEUES    <queue_spec>[,<queue_spec>...]
CHALLENGE_QUEUE_NAME    <string>
DECAY_FACTOR    <real>
DEDICATED_QUEUES        <queue_spec>
DEDICATED_TIME_CACHE_SECS       <integer>
DEDICATED_TIME_COMMAND  <pathname>
ENFORCE_ALLOCATION      <boolean>
ENFORCE_DEDICATED_TIME  <boolean>
ENFORCE_PRIME_TIME      <boolean>
EXTERNAL_QUEUES <queue_spec>[,<queue_spec>...]
FAKE_MACHINE_MULT       <integer>
INTERACTIVE_LONG_WAIT   <time_spec>
MAX_JOBS        <integer>
MAX_QUEUED_TIME <time_spec>
MIN_JOBS        <integer>
NONPRIME_DRAIN_SYS      <boolean>
OA_DECAY_FACTOR <real>
PRIME_TIME_END  <time_spec>
PRIME_TIME_SMALL_NODE_LIMIT     <integer>
PRIME_TIME_SMALL_WALLT_LIMIT    <time_spec>
PRIME_TIME_START        <time_spec>
PRIME_TIME_WALLT_LIMIT  <time_spec>
SCHED_ACCT_DIR  <pathname>
SCHED_HOST      <hostname>
SCHED_RESTART_ACTION    <string>
SERVER_HOST     <hostname>
SMALL_JOB_MAX   <integer>
SMALL_QUEUED_TIME       <time_spec>
SORT_BY_PAST_USAGE      <boolean>
SORTED_JOB_FILE <pathname>
SPECIAL_QUEUE   <queue_spec>
SUBMIT_QUEUE    <queue_spec>
SYSTEM_NAME     <hostname>
TARGET_LOAD_PCT <integer>
TARGET_LOAD_VARIANCE    <variance>
TEST_ONLY       <boolean>
WALLT_LIMIT_LARGE_JOB   <time_spec>
WALLT_LIMIT_SMALL_JOB   <time_spec>
_
.TE
\s+1
See the following files for detailed explaination of these options:
.br
src/scheduler.cc/samples/cray_t3e/README
.br
src/scheduler.cc/samples/cray_t3e/config
.NH 3
.Tc "MULTITASK Scheduler"
.LP
This scheduler provides support for "multi-tasking" (ie timesharing of CPU
and memory resources). Orginally written for the SGI PowerChallenge, and
later ported to the Origin 2000, this scheduler should work for most
shared-memory multiprocessor (SMP) systems.
.NH 4
Installing the MULTITASK Scheduler
.LP
.IP 1. 3
As discussed in the build overview, run configure with the following options:
.Cs
--set-sched=cc   --set-sched-code=multitask
.Ce
.IP 2.
Review src/scheduler.cc/samples/multitask/toolkit.h editing any variables
necessary, such as the value of SCHED_DEFAULT_CONFIGURATION.
.IP 3.
Build and install PBS.
.IP 4.
Change directory into PBS_HOME/sched_priv and edit the scheduler
configuration file "config". This file controls the scheduling policy
used to determine which jobs are run and when. The comments in the
config file explain what each option is for. If in doubt, the default
option is generally acceptable.
.LP
.NH 2
.Tc "\f3Scheduling and File Staging\fP"
.LP
A decision must be made about when
to begin to stage-in files for a job. The files must be available before the
job executes.  The amount of time that will be required to copy the files is
unknown to PBS, that being a function of file size and network speed.
If file in-staging is not started until the job has been selected to run
when the other required resources are available, either those resources are
\*Qwasted\*U while the stage-in occurs, or another job is started which takes
the resources away from the first job, and might prevent it from running.
If the files are staged in well before the job is otherwise ready to run,
the files may take up valuable disk space need by running jobs.
.LP
PBS provides two ways that file in-staging can be initiated for a job.
If a run request is received for a job with a requirement for staging-in files,
the staging in operation is begun and when completed, the job is run.
Or, a specific stage-in request may be received for a job, see pbs_stagein(3B),
in which case the files are staged in but the job is not run.  When the job
is run, it begins execution immediately because the files are already there.
.LP
In either case, if the files could not be staged-in for any reason, the job
is placed into a wait state with a \*Qexecute at\*U time
.B PBS_STAGEFAIL_WAIT ,
30 minutes in the future.   A mail message is sent to the job owner requesting
that s/he look into the problem.  The reason the job is changed into wait
state is to prevent the Scheduler from constantly retrying the same job which
likely would keep on failing.
.LP
Figure 5.0 in appendix B of the ERS shows the (sub)state changes for a job
involving file in staging.  The Scheduler may note the substate of the job and
chose to perform pre-staging via the pbs_stagein() call.
The substate will also indicate completeness or failure of the operation.
The Scheduler developer should carefully chose a stage-in approach based
on factors such as the likely source of the files, network speed, and
disk capacity.

.LP
.OH 'PBS Administrator Guide''GUI'
.EH 'GUI''PBS Administrator Guide'
.bp
.NH 1
.Tc "\f3\s+2GUI System Administrator Notes\s-2\fP"
.LP
Currently, PBS provides two GUIs: xpbs and xpbsmon.
.NH 2
.Tc "xpbs"
.LP
\f3xpbs\f1 provides a user-friendly point-and-click interface to the PBS
commands.
The xpbs(1) man page provides full information on configuring and running xpbs.
Some of that information is repeated here.
To run \f3xpbs\fP as a regular, non-privileged user, type:
.Cs
setenv DISPLAY <display_host>:0"
xpbs
.Ce
To run \f3xpbs\fP with the additional purpose of terminating PBS Servers,
stopping and starting queues, or running/rerunning jobs, then run:
.Cs
xpbs -admin
.Ce
.LP
Running \f3xpbs\fP will initialize the X resource database from various sources
in the following order:
.IP "1."
The \f3RESOURCE_MANAGER\fP property on the root window (updated via xrdb) with
settings usually defined in the .Xdefaults file
.IP "2."
Preference settings defined by the system administrator in the global xpbsrc
file
.IP "3."
User's ~/.xpbsrc file - this file defines various X resources like fonts,
colors, list of PBS hosts to query, criteria for listing queues and jobs,
and various view states.
See XPBS Preferences section below for a list of resources that can be set.
.LP
The system administrator can specify a global resources file,
\f2{libdir}/xpbs/xpbsrc\f1, which is read by the GUI if a
personal \f2.xpbsrc\f1 file is missing.  Keep in mind that within an
Xresources file (Tk only), later entries take precedence. For example, suppose
in your \f2.xpbsrc\f1 file, the following entries appear in order:
.sp
xpbsrc*backgroundColor: blue
.br
*backgroundColor: green
.sp
The later entry "green" will take precedence even though the first one is more
precise and longer matching.
.LP
The things that can be set in the personal preferences file are fonts, colors,
and favorite Server host(s) to query.
.NH 3
XPBS Preferences
.LP
The resources that can be set in the X resources file, ~/.xpbsrc, are:
.IP *serverHosts
list of server hosts (space separated) to query by \f3xpbs\fP.
.IP *timeoutSecs
specify the number of seconds before timing out waiting for a connection to
a PBS host.
.IP *xtermCmd
the xterm command to run driving an interactive PBS session.
.IP *labelFont
font applied to text appearing in labels.
.IP *fixlabelFont
font applied to text that label fixed-width widgets such as
listbox labels. This must be a fixed-width font.
.IP *textFont
font applied to a text widget. Keep this as fixed-width
font.
.IP *backgroundColor
the color applied to background of frames, buttons,
entries, scrollbar handles.
.IP *foregroundColor
the color applied to text in any context (under selection, insertion,
etc...).
.IP *activeColor
the color applied to the background of a selection,
a selected command button, or a selected scroll bar
handle.
.IP *disabledColor
color applied to a disabled widget.
.IP *signalColor
color applied to buttons that signal something to the
user about a change of state. For example, the color of the
.Ar "Track Job"
button when returned output files are detected.
.IP *shadingColor
a color shading applied to some of the frames to
emphasize focus as well as decoration.
.IP *selectorColor
the color applied to the selector box of a radiobutton or
checkbutton.
.IP *selectHosts
list of hosts (space separated) to automatically select/highlight
in the HOSTS listbox.
.IP *selectQueues
list of queues (space separated) to automatically select/highlight
in the QUEUES listbox.
.IP *selectJobs
list of jobs (space separated) to automatically select/highlight
in the JOBS listbox.
.IP *selectOwners
list of owners checked when limiting the
jobs appearing on the Jobs listbox in the main \f3xpbs\fP window.
Specify value as "Owners: <list_of_owners>".
See -u option in \f3qselect(1B)\fP for format of <list_of_owners>.
.IP *selectStates
list of job states to look for (do not space separate) when
limiting the jobs appearing on the Jobs listbox in the main
\f3xpbs\fP window.
Specify value as "Job_States: <states_string>".
See -s option in \f3qselect(1B)\fP for format of <states_string>.
.IP *selectRes
list of resource amounts (space separated) to consult when
limiting the jobs appearing on the Jobs listbox in the main
\f3xpbs\fP window.
Specify value as "Resources: <res_string>".
See -l option in \f3qselect(1B)\fP for format of <res_string>.
.IP *selectExecTime
the Execution Time attribute to consult when limiting the
list of jobs appearing on the Jobs listbox in the main
\f3xpbs\fP window.
Specify value as "Queue_Time: <exec_time>".
See -a option in \f3qselect(1B)\fP for format of <exec_time>.
.IP *selectAcctName
the name of the account that will be checked when limiting the
jobs appearing on the Jobs listbox in the main \f3xpbs\fP
window.
Specify value as "Account_Name: <account_name>".
See -A option in \f3qselect(1B)\fP for format of <account_name>.
.IP *selectCheckpoint
the checkpoint attribute relationship (including the logical
operator) to consult when limiting the list of jobs
appearing on the Jobs listbox in the main \f3xpbs\fP window.
Specify value as "Checkpoint: <checkpoint_arg>".
See -c option in \f3qselect(1B)\fP for format of <checkpoint_arg>.
.IP *selectHold
the hold types string to look for in a job when limiting the
jobs appearing on the Jobs listbox in the main \f3xpbs\fP
window.
Specify value as "Hold_Types: <hold_string>".
See -h option in \f3qselect(1B)\fP for format of <hold_string>.
.IP *selectPriority
the priority relationship (including the logical operator) to
consult when limiting the list of jobs appearing on the Jobs
listbox in the main \f3xpbs\fP window.
Specify value as "Priority: <priority_value>".
See -p option in \f3qselect(1B)\fP for format of <priority_value>.
.IP *selectRerun
the rerunnable attribute to consult when limiting the list of
jobs appearing on the Jobs listbox in the main \f3xpbs\fP
window.
Specify value as "Rerunnable: <rerun_val>".
See -r option in \f3qselect(1B)\fP for format of <rerun_val>.
.IP *selectJobName
name of the job that will be checked when limiting the jobs
appearing on the Jobs listbox in the main \f3xpbs\fP window.
Specify value as "Job_Name: <jobname>".
See -N option in \f3qselect(1B)\fP for format of <jobname>.
.IP *iconizeHostsView
a boolean value (true or false) indicating whether or not to
iconize the HOSTS region.
.IP *iconizeQueuesView
a boolean value (true or false) indicating whether or not to
iconize the QUEUES region.
.IP *iconizeJobsView
a boolean value (true or false) indicating whether or not to
iconize the JOBS region.
.IP *iconizeInfoView
a boolean value (true or false) indicating whether or not to
iconize the INFO region.
.IP *jobResourceList
a curly-braced list of resource names as according to
architecture known to xpbs. The format is as follows:
.br
{ <arch-type1> resname1 resname2 ... resnameN }
.br
{ <arch-type2> resname1 resname2 ... resnameN }
.br
 . . .
.br
{ <arch-typeN> resname1 resname2 ... resnameN }
.LP
.NH 3
XPBS and PBS Commands
.LP
\f3xpbs\fP calls PBS commands as follows:
.IP "\f3Command Button\fP" 22
\f3PBS Command\fP
.IP "detail (Hosts)" 22
qstat -B -f <selected server_host(s)>
.IP "terminate" 22
qterm <selected server_host(s)>
.IP "detail (Queues)" 22
qstat -Q -f <selected queue(s)>
.IP "stop" 22
qstop <selected queue(s)>
.IP "start" 22
qstart <selected queue(s)>
.IP "enable" 22
qenable <selected queue(s)>
.IP "disable" 22
qdisable <selected queue(s)>
.IP "detail (Jobs)" 22
qstat -f <selected job(s)>
.IP "modify" 22
qalter <selected job(s)>
.IP "delete" 22
qdel <selected job(s)>
.IP "hold" 22
qhold <selected job(s)>
.IP "release" 22
qrls <selected job(s)>
.IP "run" 22
qrun <selected job(s)>
.IP "rerun" 22
qrerun  <selected job(s)>
.IP "rerun" 22
qrerun  <selected job(s)>
.IP "signal" 22
qsig <selected job(s)>
.IP "msg" 22
qmsg <selected job(s)>
.IP "move" 22
qmove <selected job(s)>
.IP "order" 22
qorder <selected job(s)>
.LP
.NH 2
.Tc "xpbsmon"
.LP
\f3xpbsmon\f1 is the node monitoring GUI for PBS. It is used for displaying
graphically information about execution hosts in a PBS environment. Its
view of a PBS environment consists of a list of sites where each site
runs one or more Servers, and each Server runs jobs on one or more execution
hosts (nodes).
.LP
The system administrator needs to define the sites information in a global X
resources file, \f2$PBS_LIB/xpbsmon/xpbsmonrc\f1, which is read by the GUI if
a personal \f2.xpbsmonrc\f1 file is missing. A default xpbsmonrc file usually
would have been created already upon install, defining (under *sitesInfo
resource) a default site name, list of Servers that run on a site, set of nodes
(or execution hosts) where jobs on a particular Server run, and the list of
queries that are communicated to each node's \f3pbs_mom\f1.  If node queries
have been specified, the host where xpbsmon is
running must have been given explicit permission by the \f3pbs_mom\f1 daemon to
post queries to it.  This is done by including a 
.Ty $restricted
entry in the Mom's config file.  See section 3.6 for more information on
the restricted entry.
.LP
It is not recommended to manually update the *sitesInfo value in the xpbsmonrc
file as its syntax is quite cumbersome. The recommended procedure is to
bring up xpbsmon, click on "Pref.." button, manipulate the widgets in
the Sites, Server, and Query Table dialog boxes, then click "Close" button
and save the settings to a \f2.xpbsmonrc\f1 file. Then copy this file over to
\f2$PBS_LIB/xpbsmon\f1.
.OH 'PBS Administrator Guide''Operation'
.EH 'Operation''PBS Administrator Guide'
.bp
.NH 1 
.Tc "\f3\s+2Operational Issues\s-2\fP"
.LP
This chapter addresses a few of the \*Qday to day\*U operational issues
which will arise.
.NH 2
.Tc \f3Security\fP
.LP
There are three parts to security in the batch system: 
.IP "Internal security"
Can the daemons be trusted?
.IP Authentication
How do we believe a client about who it is. 
.IP Authorization
Is the client entitled to have the requested action performed.
.LP
.NH 3
.Tc "Internal Security"
.LP
An effort has been made to insure the various PBS daemon themselves cannot
be a target of opportunity in an attack on the system.  The two major parts
of this effort is the security of files used by the daemons and the security
of the daemons environment.
.LP
Any file used by PBS, especially files that specify configuration or other
programs to be run, must be secure.  The files must be owned by root and in
general cannot be writable by anyone other than root.
When PBS directories are installed, the make process runs a program to validate
ownership and access to the files.  This can be rechecked at any time by 
running
.Ty check-tree
in the top level make file.
.Ty check-tree
is located in the directory given by the value of bindir in configure.
Each daemon also validates the most critical files and directories each time
it is started.
.LP
A corrupted environment is another source of attack on a system.  To prevent
this type of attack, each daemon resets its environment when it starts.
The source of the environment is a file named by PBS_ENVIRON set by the
configure option --set-environ, defaulting to 
.Ty {PBS_HOME}/pbs_environment .
If it does not already exists, this file is created during the install
process.
As built by the install process,  it will contain a very basic path and
if found in root's environment, the following variables:
.B TZ ,
.B LANG ,
.B LC_ALL ,
.B LC_COLLATE ,
.B LC_CTYPE ,
.B LC_MONETARY ,
.B LC_NUMERIC ,
and
.B LC_TIME .
It may be edited to include the other variables required on your system.
Please note that 
.B PATH
must be included.  This value of 
.B PATH
will be passed on to batch jobs.
To maintain security, it is important that 
.B PATH
be restricted to known, safe
directories.  Do NOT include "." in 
.B PATH .
Another variable which can be dangerous and should not be set is 
.B IFS .
.LP
The syntax of an PBS_ENVIRON file entry is either
.Cs
.Ty "variable_name=value"
.Ce
or
.Cs
.Ty "variable_name"
.Ce
In the later case, the value for the variable is obtained from the daemons
own environment before it is reset.
.LP
.NH 3
.Tc "Host Authentication"
.LP
PBS uses a combination of information to authenticate a host.
If a request is made from a client whose socket is bound to a privileged
port (less than 1024, which requires root privilege), PBS (right or wrong)
believes the IP (Internet Protocol) network layer as to whom the host is.
If the client request is from a non-privileged port, the name of the host
which is making a client request must be included in the credential included
with the request and it must match the IP network layer opinion as to the
host's identity.
.NH 3
.Tc "Host Authorization"
.LP
Access to the pbs_server from another system may be controlled by an access
control list (ACL).   See section 10.1.1 of the ERS for details.
.LP
Access to pbs_mom is controlled through a list of hosts specified
in their configuration files.   By default, only \*Qlocalhost\*U and the name
returned by gethostname(2) are allowed.  See the man pages pbs_mom(8B)
for more information on the configuration file.
.LP
Access to the pbs_sched is not limited other than it must be from a
privileged port.
.NH 3
.Tc User Authentication
.LP
\f2Is the user who he/she claims to be?\fP
.LP
The PBS Server authenticates the user name included in a request with the
supplied PBS credential.  This credential is supplied by 
.B pbs_iff (1B),
see section 10.2 of the ERS.
.NH 3
.Tc "User Authorization"
.LP
\f2Is the user entitled to make the request of the Server job under
that name?\fP
.LP
PBS as shipped assumes a consistent user name space within the set of systems
which make up a PBS cluster.  Thus if a job is submitted by
.I UserA@hostA ,
PBS will allow the job to be deleted or altered by 
.I UserA@hostB .
The routine
.I site_map_user ()
is called twice.  Once to map the name of the requester and again to map
the job owner to a name on the Server's (local) system.
If the two mapping agree, the requester is considered the job owner.
See section 10.1.3 of the ERS.
This behavior may be changed by a site by altering the Server routine
site_map_user() found in the file src/server/site_map_user.c,
see the Internal Design Spec.
.LP
\f2Is the user entitled to execute the job under that name?\fP
.LP
A user may supply a name under which the job is to executed on a certain
system.  If one is not supplied, the name of the job owner is chosen to be 
the execution name.  See the 
.I "-u user_list"
option of the 
.B qsub (1B)
command.  Authorization to execute the job under the chosen
name is granted under the following conditions:
.IP 1.
The job was submitted on the Server's (local) host and the submitter's name
is the same as the selected execution name.
.IP 2.
The host from which the job was submitted are declared trusted by the 
execution host in the /etc/hosts.equiv file or the submitting host and
submitting user's name are listed in the execution users' .rhosts file.
The system supplied library function,
.I ruserok (),
is used to make these checks.
.LP
If the above are not satisfactory to a site, the routine
.I site_check_user_map ()
in the file src/server/site_check_u.c may be modified.  See the IDS for
more information.
.LP
In addition to the above checks, access to a PBS Server and queues within that
Server may be controlled by  access control lists.  See section 10.1.1 and
10.1.2 of the ERS for more information.
.NH 3
.Tc "Group Authorization"
.LP
PBS allows a user to submit jobs and specify under which group the job should
be executed.  The user specifies a 
.I "group_list"
attribute for the job which contains a list of 
.Ty groups@hosts
similar to the user list.  See the 
.At group_list
attribute under the -W option of qsub(1B).
The PBS Server will ensure that the user is a member of the specified group by 
.IP 1.
Checking if the group is the user's primary group in the password entry.
In this case the user's name does not have to appear in the group entry for his
primary group.
.IP 2.
Checking for the user's name in the specified group entry in /etc/group.
.LP
The job will be aborted if both checks fail.
The checks are skipped if the user does not supply a group list attribute.
In this case the user's primary group from the password file will be used.
.LP
When staging files in or out, PBS also uses the selected execution group
for the copy operation.  This provides normal UNIX access security to the
files.   Since all group information is passed as a string of characters,
PBS cannot determine if a numeric string is intended to be a group name or
GID.
.LP
Therefore when a group list is specified by the user, PBS places one
requirement on the groups within a system.
Each and every group in which a user might execute a job MUST have a group
name and an entry in /etc/group.  If no group lists are ever used, PBS will
use the login group and will accept it even if the group is not listed in
/etc/group.  Note in this case, the 
.At egroup
attribute value is a numeric string representing the user's gid rather than
the group \*Qname\*U.
.NH 3
.Tc "Root Owned Jobs"
.LP
The Server will reject any job which would execute under the UID of zero
unless the owner of the job, typically root on this or some other system,
is listed in the Server attribute
.At acl_roots .
.NH 2
.Tc "\f3Job Prologue/Epilogue Scripts\fP"
.LP
PBS provides the ability to run a site supplied script before and/or after
each job runs.  This provides the capability to perform initialization or
cleanup of resources, such as temporary directories or scratch files.
The scripts may also be used to write \*Qbanners\*U on the job's output files.
When multiple nodes are allocated to a job, these scripts are only run by the
\*QMother Superior\*U, the pbs_mom on the first node allocated.  This is also
where the job shell script is run.
.LP
If a prologue or epilogue script is not present,
Mom continues in a normal manner.
If present, the script is run with root privilege.
In order to be run, the script must adhere to the following rules:
.IP \(bu 3
The script must be in the \f2PBS_HOME\f5/mom_priv\f1 directory with the
name \f5prologue\fP for the script to be run before the job and the name
\f5epilogue\fP for the script to be run after the job.
.IP \(bu
The script must be owned by root.
.IP \(bu
The script must be readable and executable by root.
.IP \(bu
The script cannot be writable by anyone but root.
.LP
The script may be a shell script or an executable object file.
Typically, a shell script should start with a line of the form:
.Ty "#! interpreter" .
 See the rules under execve(2) or exec(2) on your system.
.NH 3
Prologue and Epilogue Arguments
.LP 
When invoked, the prologue is called with the following arguments:
.IP argv[1] 10
is the job id.
.IP argv[2]
is the user name under which the job executes.
.IP argv[3]
is the group name under which the job executes.
.LP
The epilogue is called with the above, plus:
.IP argv[4] 10
is the job name.
.IP argv[5]
is the session id. \(dg
.IP argv[6]
is the requested resource limits (list). \(dg
.IP argv[7]
is the list of resources used.
.IP argv[8]
is the name of the queue in which the job resides. \(dg
.IP argv[9]
is the account string, if one exists.
.LP
For both the prologue and epilogue:
.IP envp 10
The environment passed to the script is null.
.IP cwd
The current working directory is the user's home directory.
.IP input 
When invoked, both scripts have standard input connected to a system
dependent file.  Currently, for all systems this file is /dev/null.
.IP output
With one exception, the standard output and standard error of the scripts
are connected to the files which contain the standard output and error
of the job.  If a job is an interactive PBS job, the standard output and error
of the epilogue is pointed to /dev/null because the pseudo terminal connection
used was released by the system when the job terminated.
.NH 3
Prologue Epilogue Time Out
.LP
To prevent a bad script or error condition within the script from delaying
PBS, Mom places an alarm around the scripts execution.  This is currently
set to 30 seconds.  If the alarm sounds before the scripts has terminated,
Mom will kill the script.  The alarm value can be changed by changing the
define of 
.B PBS_PROLOG_TIME
within src/resmom/prolog.c.
.NH 3
Prologue Error Processing
.LP
Normally, the prologue script should exit with a zero exit status.
Mom will record in her log any case of a non-zero exit from a script.
Exit status values and their impact on the job are:
.IP -4
The script timed out (took too long).  The job will be requeued.
.IP -3
The wait(2) call waiting for the script to exit returned with an error.
The job will be requeued.
.IP -2
The input file to be passed to the script could not be opened.
The job will be requeued.
.IP -1
The script has a permission error, it is not owned by root and or is writable
by others than root.
The job will be requeued.
.IP 0
The script was successful.   The job will run.
.IP 1
The script returned an exit value of 1, the job will be aborted.
.IP >1
The script returned a value greater than one, the job will be requeued.
.LP
The above apply to normal batch jobs.
Note, interactive-batch jobs (-I option) cannot be requeued on a non-zero
status, the network connection back to qsub is lost and cannot be
re-established.  Interactive jobs will be aborted on any non-zero prologue
exit.

The administrator must exercise great caution in setting up the prologue to
prevent jobs from being flushed from the system.
.LP
Epilogue script exit values are logged, if non-zero, but have no impact on the
state of the job.
.NH 2
.Tc "\f3Use and Maintenance of Logs\fP"
.LP
The PBS system tends to produce lots of log file entries.
There are two types of logs, the event logs which record events within each
PBS daemon (pbs_server, pbs_mom, and pbs_sched) and the Server's accounting log.
.NH 3
The Daemon Logs
.LP
Each PBS daemon maintains an event log file.
The details of the log format is covered in section
.B "3.3.8. Event Logging"
of the ERS.
The Server (pbs_server), Scheduler (pbs_sched), and Mom (pbs_mom)
default their logs to a file with the
current date as the name in the PBS_HOME/(daemon)_logs directory.  
This location can be overridden with the "-L pathname" option; pathname
must be an absolute path.
.LP
If the default log file name is used, no -L option, the log will be closed
and reopened with the current date daily.  This happens on the first message
after midnight.  If a path is given with the -L option, the automatic
close/reopen does not take place.   All daemons will close and reopen the
same named log file on receipt of SIGHUP.  The pid of the daemon is available
in its lock file in its home directory.
Thus it is possible to move the current log file to a new name and send
SIGHUP to restart the file:
.Cs
cd PBS_HOME/daemon_logs
mv current archive
kill -HUP `cat ../daemon_priv/daemon.lock`
.Ce
.LP
The amount of output in the logs depends on the selected events to log and
the presence of debug writes, turned on by compiling with -DDEBUG.
The Server and Mom can be directed to record only messages pertaining
to certain event types.  The specified events are logically \*Qor-ed\*U.
Their decimal values are:
.IP 1
Error Events
.IP 2
Batch System/Server Events
.IP 4
Administration Events
.IP 8
Job Events
.IP 16
Job Resource Usage (hex value 0x10)
.IP 32
Security Violations (hex value 0x20)
.IP 64
Scheduler Calls (hex value 0x40)
.IP 128
Debug Messages (hex value 0x80)
.IP 256
Extra Debug Messages (hex value 0x100)
.LP
Everything turned on is of course 511.  127 is a good value to use.
The event logging mask is controlled differently for the Server and Mom.
The Server's mask is set via qmgr(1B) setting the 
.At log_events
attribute.   This can be done at any time.
Mom's mask may be set via her configuration file with a
.Ty $logevent
entry, see the -c option on pbs_mom.  To change her logging mask, edit
the configuration file and send Mom a SIGHUP signal.
.LP
The Scheduler, being site written may have a different method of changing
its event logging mask, or it may not have the ability at all.
.NH 3
The Accounting Log
.LP
The PBS Server daemon maintains an accounting log.  The format of the log
is described in section 
.B "3.3.9 Accounting"
of the ERS.  The log name defaults to
.Ty PBS_HOME/server_priv/accounting/yyyymmdd
where yyyymmdd is the date.  The accounting may be placed elsewhere
by specifying the -A option on the pbs_server command line.
The option argument is the full (absolute) path name of the file to be used.
If a null string is given, for example
.Cs
pbs_server -A ""
.Ce
then the accounting log will not be opened and no accounting records will
be recorded.
.LP
The accounting file is changed according to the same rules as the log files.
If the default file is used, named for the date, the file will be closed
and a new one opened every day on the first event (write to the file)
after midnight.
With either the default file or a file named with the -A option,
the Server will close the accounting log and reopen it upon the receipt of
a SIGHUP signal.  This allows you to rename the old log and start recording
anew on an empty file.  For example, if the current date is February 9
the Server will be writing in the file 
.Ty 19990209 .
The following actions will cause
the current accounting file to be renamed
.Ty feb1 ,
and the Server to close the file and starting writing a new
.Ty 19990209 .
.Cs
mv 19990201 feb1
kill -HUP 1234     (the Server's pid)
.Ce
.LP
.NH 2
.Tc "\f3Alternate Test Systems\fP"
.LP
Alternate or test copies of the various daemons may be run through the use of
the command line options which set their home directory and service port.
For example, the following commands would start the three daemons with a home
directory of 
.Ty /tmp/altpbs
and four ports around 13001, the Server on 13001, Mom on 13002 and 13003,
and the Scheduler on 13004.
.LP
.Cs
pbs_server -t create -d /tmp/altpbs -p 13001 -M 13002 -R 13003 -S 13004
pbs_mom -d /tmp/altpbs -M 13002 -R 13003
pbs_sched -d /tmp/altpbs -S 13004 -r script_file
.Ce
.LP
The home directories must be pre-built.  The easiest method is to
alter the 
.B PBS_HOME
variable by use of the 
.Ty --set-server-home
option to configure, rerun configure and remake PBS.
.LP
Jobs may be directed to the test system by using the
.Ty server:port
syntax on the -q option.  Status is also obtained using the 
.Ty :port
syntax:
For example, to submit a job to the default queue on the above test Server,
request the status of the test Server, and request the status of jobs at
the test Server:
.Cs
qsub -q @\f1host\fP:13001 job
qstat -Bf \f1host\fP:13001
qstat @\f1host\fP:13001
.Ce
.LP
If you or users are using job dependencies on or between test systems,
there are minor problems of which you (and the users) need to be aware.
The syntax of both the dependency string,
.Ty depend_type:job_id:job_id   
and the job id
.Ty seq_number.host:port 
use colons in an indistinguishable manner. 
The way to work around this is covered in the
.B "Advice for Users"
section at the end of this guide.
.NH 2
.Tc "\f3Installing an Updated Batch System\fP"
.LP
Once you have a running batch system, there will come a time when you wish to
update it or install a new version.
It is assumed that you will wish to build and test the new version using
alternative directories and port numbers described above.
You may change the location of PBS_HOME for the
test version, see configure option 
.Ty --set-server-home .
Once you are satisfied with the new system, it is suggested
that you  rebuild the three daemons with PBS_HOME set to directory
which will be used in normal operation.  Otherwise you will always have to
use the -d option when starting the daemons.
.LP
When the new batch system is ready to be placed into service, you will wish to
move jobs from the old system to the new.  The following procedure is
suggested.  All Servers must be run by root.  The qmgr and qmove commands
should be run by an batch administrator (likely, root is good).
.IP 1.
With the old batch system running, disable the queues and stop scheduling
by setting \*Qscheduling=false\*U.
.IP 2.
Backup the pool of jobs in PBS_HOME(old)/server_priv/jobs.
Tar may used for this.   
.LP
Assuming the change is a minor update (change in third digit of the release
version number) or a local change where the job structure did not change
from the old version to the new, it is likely that you could start the new
system in the old HOME and all jobs would be recovered.  However if the job
structure has changed you will need to 
.I move
the jobs from the old system to the new.
The release notes will contain a warning if the job structure has changed or
the move is required for other reasons.
.LP
To move the jobs, continue with the following steps:
.IP 3.
It is likely that PBS_HOME will have changed and have been made
during testing.
If not, build a (temporary) server directory tree by changing PBS_HOME
using --set-server-home
and typing 
.Cs
"buildutils/pbs_mkdirs server"
.Ce
while in the top of the object tree.
.IP 4.
Start the new PBS Server in its new home.  If the new home is different from the
directory when it was compiled, use the -d option.
Use the -t option if the Server has not been configured for the new directory.
Also start with an alternative port using the -p option.
Turn off attempts to schedule with the -a option:
.Cs
pbs_server -t create -d \f1new_home\fP -p 13001 -a false
.Ce
Remember, you will need to use the 
.Ty :port
syntax when commanding the new Server.
.IP 5.
Duplicate on the new Server the current queues and server attributes
(assuming you wish to
do so).  Enable each queue which will receive jobs at the new Server.
.Cs
qmgr -c "print server" > \f1/tmp/config\fP
qmgr \f1host\fP:13001  < \f1/tmp/config\fP
qenable \f1queue1\fP@\f1host\fP:13001
qenable \f1queue2\fP@\f1host\fP:13001
...
.Ce
.IP 6.
Now list the jobs at the original Server and move a few jobs one at a time
from the old to the new Server:
.Cs
qstat 
qmove \f1queue\fP@\f1host\fP:13001 \f1job\fP
qstat @\f1host\fP:13001
...
.Ce
If all is going well, move the remaining jobs a queue at a time:
.Cs
qmove \f1queue1\fP@\f1host\fP:13001 `qselect -q\f1queue1\fP`
qstat \f1queue1\fP@\f1host\fP:13001
qmove \f1queue2\fP@\f1host\fP:13001 `qselect -q\f1queue2\fP`
qstat \f1queue2\fP@\f1host\fP:13001
...
.Ce
.IP 7.
At this point, all of the jobs should be under control of the new Server and
located in the new Server's home.  If the new Server's home is a temporary
directory, shut down the new Server and move everything to the real home using
.Cs
cp -R \f1new_home real_home\fP
.Ce
or, if the real (new) home is already set up,
.Cs
cd \f1new_home\fP/server_priv/jobs
cp * \f1real_home\fP/server_priv/jobs
.Ce
to copy just the jobs.
.LP
At this point, you are ready to bring up and enable the new batch system.
.LP
You should be aware of one quirk when using qmove.   If you wish to move a
job from a Server running on a test port to the Server running on the
normal port (15001), you may attempt, 
.I unsuccessfully ,
to use the following command:
.Cs
qmove \f1queue@host\fP 123.job.\f1host\fP:13001
.Ce
However, that will only move the job to the end of the queue it is already in.
The Server receiving the move request (13001), will compare the destination
server name, host, with its own name only, not including the port.   Hence
it will match and it will not send the job where you intended.  To get the
job to move to the Server running on the normal port you have to specify
that port in the destination:
.Cs
qmove \f1queue@host\fP:15001 123.job.\f1host\fP:13001
.Ce
.LP
.NH 2
.Tc \f3Problem Solving\fP
.LP
The following is a very incomplete list of possible problems and how to
solve them.
.NH 3
.Tc "Clients Unable to Contact Server"
.LP
If a client command, qstat, qmgr, ..., is unable to connect to a Server
there are several possibilities to check.  If the error return is
15034, \*QNo server to connect to\*U, check (1) that there is indeed a 
Server running and (2) that the default server information is set correctly.
The client commands will attempt to connect to the Server specified on the
command line if given, or if not given, the Server specified in the \*Qdefault
server file\*U specified when the commands where built and installed.
.LP
If the error return is 15007, \*QNo permission\*U, check for (2) as above.
Also check that the executable
.I pbs_iff
is located in the search path for the client and that it is setuid root.
Additionally, try running pbs_iff by typing:
.Cs
pbs_iff server_host 15001
.Ce
Where 
.Ty server_host
is the name of the host on which the Server is running and 
.Ty 15001
is the port to which the Server is listening (if built with a different
port number, use that number instead of 15001).  pbs_iff should print
out a string of garbage characters and exit with a status of 0.
The garbage is the encrypted credential which would be used by the command
to authenticate the client to the Server.  If pbs_iff fails to print the
garbage and/or exits with a non-zero status, either the Server is not running
or was built with a different encryption system than was pbs_iff.
.NH 3
.Tc "Nodes Down"
.LP
The PBS Server determines the state (up or down), by communicating with Mom
on the node.
The state of nodes may be listed by two commands qmgr and pbsnodes:
\f3Qmgr: \f5 list nodes @active\f1 or \f5pbsnodes -a\fP.
A node in PBS may be marked \*Qdown\*U in one of two substates.
.LP
If the node is listed as
.Cs
Node lensmen
        state = down, state-unknown
        properties = sparc, mine
        ntype = cluster
.Ce
then the Server has not had contact with Mom since the Server came up.
Check to see if a Mom is running on the node.   If there is a Mom and if the
Mom was just started, the Server may have attempted to poll her before she was
up.   The Server should see her during the next polling cycle in 10 minutes.
If the node is still marked \*Qdown, state-unknown\*U after 10+ minutes, either the node name specified in the Server's node file does not map to the real 
network hostname or there is a network problem between the Server's host and
the node.
.LP
If the node is listed as
.Cs
Node lensmen
        state = down
        properties = sparc, mine
        ntype = cluster
.Ce
then the Server has been able to ping Mom on the node in the past, but she
has not responded recently.  The Server will send a \*Qping\*U PBS message
to every free node each ping cycle, 10 minutes.   If a node does not 
acknowledge the ping before the next cycle, the Server will mark the node
down.  On a IBM SP, a node may also be marked down if Mom on the node believes
that the node is not connected to the high speed switch.
When the Server receives an acknowledgement from Mom on the node, the node will
again be marked up (free).
.NH 3
.Tc "Non Delivery of Output"
.LP
If the output of a job cannot be delivered to the user, it is saved in
a special directory, PBS_HOME/undelivered, and mail is sent to the user.
The typical causes of non-delivery are:
.IP (1)
The destination host is not trusted
and the user does not have a .rhost file.
.IP (2)
An improper path was specified.
.IP (3)
A directory in the specified destination path is not writable.
.IP (4)
The user's .cshrc on the
destination host generates output when executed.
.IP (5)
The PBS spool directory on the execution host does not have the correct
permissions.  This directory must have mode 1777 (drwxrwxrwxt).
.LP
These are explained fully in
the section \*QDelivery of Output Files\*U in the next chapter.
.NH 3
.Tc "Job Cannot be Executed"
.LP
If a user receives a mail message containing a job id and the line
\*QJob cannot be executed\*U,
the job was aborted by Mom when she tried to place it into execution.
The complete reason can be found in one of two places, Mom's log file or
the standard error file of the user's job.
.LP
If the second line of the message is \*QSee Administrator for help\*U,
then Mom aborted the job before the job's files were set up.   The reason will
be noted in 
OM's log.  Typical reasons are a bad user/group account, 
checkpoint/restart file (Cray), or a system error.
.LP
If the second line of the message is \*QSee job standard error file\*U,
then Mom had created the job's file and additional messages were written to
standard error.  This is typically the result of a bad resource request.
.NH 3
.Tc "Running Jobs with No Active Processes"
.LP
On very rare occasions, PBS may be in a situation where a job is in the
Running state but has no active processes.  This should never happen as the
death of the job's shell should trigger Mom to notify the Server that the job
exited and end of job processing should begin.  The fact that it happens
even rarely means there is a bug in PBS (\f2gasp! Oh the horror of it all.\fP).
.LP
If this situation is noted, PBS offers a way out.  Use the qsig command to 
send SIGNULL, signal 0, to the job.   If Mom notes there are not any processes
then she will force the job into the exiting state.
.NH 3
.Tc "\f3Dependent Jobs and Test Systems\fP"
.LP
If you have users running on a test batch system using an alternative port
number, -p option to pbs_server, problems may occur with job dependency if
the following requirements are not observed:
.IP 1.
For a test system,
the job identifier in a dependency specification must include at least the
first part of the host name.
.IP 2.
The colon in the port number specification must be escaped by a black slash.
This is true for both the Server and current server sections.
.LP
For example:
.br
.Ty 123.test_host\e:17000
.br
.Ty 123.old_host@test_host\e:17000
.br
.Ty 123.test_host\e:17000@diff_test_host\e:18000
On a shell line, the back slash itself must be escaped from the shell, so the
above become:
.br
.Ty 123.test_host\e\e:17000
.br
.Ty 123.old_host@test_host\e\e:17000
.br
.Ty 123.test_host\e\e:17000@diff_test_host\e\e:18000
.LP
These rules are not documented on the qsub/qalter man pages since the likely
hood of the general user community finding themselves seting up dependencies
with jobs on a test system is small and the inclusion would be generally
confusing.
.NH 2
.Tc "\f3Communication with the User\fP"
.LP
Users tend to want to know what is happening to their job.  PBS provides a
special job attribute, 
.I comment ,
which is available to the operator, manager, or the Scheduler program.  This
attribute can be set to a string to pass information to the job owner.  It
might be used to display information about why the job is not being run or why
a hold was placed on the job.  Users are able to see this attribute when it
is set by using the -f option of the qstat command.  A Scheduler program can
set the comment attribute via the pbs_alterjob() API.  Operators and managers
may use the -W option of the qalter command, for example
.Cs
qalter -W comment="some text" job_id
.Ce
.OH 'PBS Administrator Guide''Advice'
.EH 'Advice''PBS Administrator Guide'
.bp
.NH 1
.Tc "\f3\s+2Advice for Users\s-2\fP"
.LP
The following sections provide information necessary to the general
user community concerning use of PBS.  Please make this information available.
.NH 2
.Tc "\f3Modification of User shell initialization files\fP"
.LP
A user's job may not run if the user's start-up files (.cshrc, .login, or
\&.profile) contain commands which
attempt to set terminal characteristics.
Any such activity should be skipped by placing
a test of the environment variable \f3PBS_ENVIRONMENT\fP
(or for NQS compatibility, \f3ENVIRONMENT\fP).
This can be done as shown in the following sample .login:
.Cs
...
setenv PRINTER printer_1
setenv MANPATH /usr/man:/usr/local/man:/usr/new/man
if ( ! $?PBS_ENVIRONMENT ) then
	do terminal stuff here
endif
.Ce
.LP
If the user's login shell is csh, the following message may appear in the
standard output of a job:
.Cs
Warning: no access to tty, thus no job control in this shell
.Ce
This message is produced by many csh versions when the shell determines 
that its input is not a terminal.  Short of modifying csh, there is no way
to eliminate the message.   Fortunately, it is just an informative message
and has no effect on the job.
.NH 2 
.Tc "\f3Parallel Jobs\fP"
.LP
If you have set up PBS to manage a cluster of systems or on a parallel system,
it is likely with the intent to manage parallel jobs.
As discussed in section
.B "2.1 Planning" 
and
.B "3.2 Multiple Execution Systems" ,
PBS allocated nodes to one job at a time, called space-sharing.
It is important to remember that the entire node is allocated to the job
regardless of the number of processors or the amount of memory in the node.
.LP
To have PBS allocate nodes to a user's job, the user must specify how many
of what type of nodes are required for the job.  Then the user's parallel job
must execute tasks on the allocated nodes.
.NH 3
.Tc "How User's Request Nodes"
.LP
The \f2nodes\fP resources_list item is set by the user to declare the node
requirements for the job.  It is a string of the form 
.Cs
-l nodes=\f2node_spec\fP[+\f2node_spec\fP...]
.Ce
where node_spec is
.Cs
.Ty "number | property[:property...] | number:property[:property...]"
.Ce
The node_spec may have an optional global modifier appended. This is of the
form
.Ty #property .
For example:
.Cs
6+3:fat+2:fat:hippi+disk
.Ce
or 
.Cs
.Ty 6+3:fat+2:fat:hippi+disk#prime .
.Ce
Where fat, hippi, and disk are examples of property names assigned by the
administrator in the 
.Ty {PBS_HOME}/server_priv/nodes
file.  The above example translates as the 
user requesting 6 plain nodes plus 3 \*Qfat\*U nodes plus 2 nodes that are
both \*Qfat\*U and \*Qhippi\*U plus one \*Qdisk\*U node, a total of 12 nodes.
Where #prime is appended as a global modifier, the global property,
\*Qprime\*U is appended by the Server to each element of the spec.  It would
be equivalent to
.Cs
6:prime+3:fat:prime+2:fat:hippi:prime+disk:prime .
.Ce
A major use of the global modifier is to provide the 
.I shared
keyword.   This specifies that all the nodes are to be temporarily-shared nodes.
The keyword shared is only recognized as such when used as a global modifier.
.NH 3
.Tc Parallel Jobs and Nodes
.LP
PBS provides a means by which a parallel job can spawn, monitor and 
control tasks on remote nodes.  See the man page for tm(3).
.I Unfortunately ,
no vendor has made use of this capability though several
contributed to its design.  Therefore, spawing the tasks of a parallel job
fall to the parallel environment itself.
PVM provides one means by which a parallel job spawns processes via the pvmd
daemon.  MPI typically has a vendor dependent method, often using rsh or rexec. 
.LP
All of these means are outside of PBS's control.
PBS cannot control or monitor resource usage of the remote
tasks, only the ones started by the job on Mother Superior.
PBS can only make the list of allocated nodes available to the parallel job
and hope that the vendor and the user make use of the list and stay within the
allocated nodes.
.LP
The names of the allocated nodes are place in a file in {PBS_HOME}/aux.
The file is owned by root but world readable.  The name of the file is passed
to the job in the environment variable
.B PBS_NODEFILE .
For IBM SP systems, it is also in the variable MP_HOSTFILE.
.LP
If you are running an open source version of MPI, such as MPICH, then the
mpirun command can be modified to check for the PBS environment and use the
PBS supplied host file.
.NH 2
.Tc "\f3Shell Invocation\fP"
.LP
When PBS starts a job, it invokes the user's login shell (unless the user
submitted the job with the -S option).  PBS passes the job script which is
a shell script to the login in one of two ways depending on how PBS was
installed.
.RS 4
.IP "Name of Script on Standard Input"
The default method (PBS built with --enable-shell-pipe) is to pass the
name of the job script to the shell program.
This is equivalent to typing the script name as a command to an interactive
shell.  Since this is the only line passed to the script, standard input will
be empty to any commands.  This approach offers both advantages and
disadvantages:
.RS 4
.IP + 3
Any command which reads from standard input without redirection will get an EOF.
.IP +
The shell syntax can vary from script to script, it does not have to match the
syntax for the user's login shell.   The first line of the script, even before
any #PBS directives, should be
.Ty #!/shell
where 
.Ty shell
is the full path to the shell of choice, /bin/sh, /bin/csh, ...
The login shell
will interpret the #! line and invoke that shell to process the script.
.IP -
An extra shell process is run to process the job script.
.IP -
If the script does not include a #! line as the first line, the wrong shell
may attempt to interpret the script producing syntax errors.
.IP -
If a non-standard shell is used via the -S option, it will not receive the
script, but its name, on its standard input.
.RE
.IP "Script as Standard Input" 5
The alternative method for PBS (built with --disable-shell-invoke), is to
open the script file as standard input for the shell.
This is equivalent to typing
.Ty "shell\ <\ script" .
This also offers advantages and disadvantages:
.RS 4
.IP + 3
The user's script will always be directly processed by the user's login shell.  
.IP +
If the user specifies a non-standard shell (any old program) with the -S option,
the script can be read by that program as its input.
.IP -
If a command within the job script reads from standard input, it may read
lines from the script depending on how far ahead the shell has buffered
its input.  Any command line so read will not be executed by the shell.
A command that reads from standard input with out explicit redirection is 
generally unwise in a batch job.
.RE
.LP
The choice of shell invocation methods is left to the site.   It is recommended
that all PBS execution servers (pbs_mom) within that site be built to use
the same shell invocation method.
.RE
.LP
.NH 2
.Tc "\f3Job Exit Status\fP"
.LP
The exit status of a job is normally the exit status of the shell executing
the job script.  If a user is using 
.I csh
and has a
.I .login
file in the home directory, the exit status of csh becomes the exit status
of the last command in .logout.  This may impact the use of job dependencies
which depend on the job's exit status.  To preserve the job's status, the
user may either remove .logout or add the following two lines to it.  Add
as the first line:
.br
.Ty "\ \ \ \ set EXITVAL = $status"
.br
and as the last executable line:
.br
.Ty "\ \ \ \ exit $EXITVAL"
.NH 2
.Tc "\f3Delivery of Output Files\fP"
.LP
To transfer output files or to tranfer staged-in or
staged-out files to/from a remote destination, PBS uses either rcp or scp
depending on the configuration options.
PBS includes the source of a version of the 
.B rcp (1)
command, from the bsd 4.4 lite distribution.  The resulting object program,
.B pbs_rcp (1B),
is used.
This version of rcp is
provided because it, unlike some rcp implementation, always exits with a
non-zero exits status for any error.   Thus Mom knows if the file was
delivered or not.  Fortunately, the secure copy program, scp, is also based on
this version of rcp and exits with the proper status code.
.LP
Using rcp,
the copy of output or staged files can fail for (at least) two reasons.
.IP 1.
If the user's .cshrc script outputs any characters to standard output,
e.g. contains an echo command, 
.B pbs_rcp
will fail.
See the section in this document entitled
.B "Modification of User shell initialization files" .
.IP 2.
The user must have permission to 
.I rsh
to the remote host.
Output is delivered to the remote destination host with the remote file owner's
name being the job owner's name (job submitter).  On the execution host, the
file is owned by the user's execution name which may be different.
For information, see the 
.Ty "-u user_list"
option on the
.B qsub (1)
command.
.IP
If the two names are identical, permission to rcp may be granted at the
system level by an entry in the destination host's 
.I /etc/host.equiv
file calling out the execution host.
.IP
If the owner name and the execution name are different or if the destination
host's /etc/hosts.equiv file does not contain an entry for the execution host,
the user must have an ".rhosts" file in her home directory of the system
to which the output files are being returned.  The .rhosts must contain
an entry for the system on which the job executed with the user name 
under which the job was executed.  It is wise to have two lines, one
with just the "base" host name and one with the full 
.I "host.domain_name" .
.LP
If PBS is built to use the \f2Secure Copy Program\fP, scp, then PBS will first
try to deliver output or stage-in/out files using scp.   If scp fails, PBS
will try again using rcp [assuming that scp might not exist on the remote
host].  If rcp also fails, the above cycle will be repeated after a delay
in case the problem is caused by a temporary network problem.   All failures
are logged in Mom's log.
.LP
For delivery of output files on the local host, PBS uses the 
.B /bin/cp (1)
command.
Local and remote Delivery of output may fail for the following additional
reasons:
.IP 1.
A directory in the specified destination path does not exist.
.IP 2.
A directory in the specified destination path is not searchable by the user.
.IP 3.
The target directory is not writable by the user.
.LP
Additional information as to the cause of the delivery problem might be
determined from Mom's log file.  Each failure is logged.  The various
error codes are described in requests.c/sys_copy() in the IDS.
.NH 2
.Tc "\f3Stage in and Stage out problems\fP"
.LP
The same requirements and hints discussed above in regard to delivery of
output apply to staging files in and out.  It may also be useful to
note that the stage-in and stage-out option on qsub both take the form
.br
.Ty local_file@remote_host:remote_file
.br
regardless of the direction of transfer.
Thus for stage-in, the direction of travel is
.nf
	local_file  <--  remote_host:remote_file
.fi
and for stage out, the direction of travel is
.nf
	local_file  -->  remote_host:remote_file
.fi
Also note that all relative paths are
relative to the user's home directory on the respective hosts.
PBS uses rcp or scp (or cp if the remote host is the local host) to perform the 
transfer.   Hence, a stage-in is just a
.nf
	rcp -r remote_host:remote_file local_file
.fi
and a stage out is just
.nf
	rcp -r local_file remote_host:remote_file
.fi
.LP
As with rcp, the remote_file may be a directory name.  Also as
with rcp, the local_file specified in the stage in/out directive may
name a directory.
For stage-in, if remote_file is a directory, then local file must also be
a directory.   For stage out, if local_file is a directory, then remote_file
must also be a directory.
.LP
If 
.I local_file
on a stage out directive is a directory , that
directory on the execution host, including all files and subdirectories,
will be copied.  At the end of the job, the directory, including all files
and subdirectories, will be deleted.   Users should be aware that this may
create a problem if multiple jobs are using the same directory.
.LP
Stage in presents another problem.   Assume the user wishes to stage-in
the contents of a single file named
.I poo
and gives the following stage-in directive:
.Cs
-W stagein=/tmp/bear@somehost:poo
.Ce
If /tmp/bear is an existing directory, the local file becomes /tmp/bear/poo.
When the job exits, PBS will determind that 
.I /tmp/bear
is a directory and append 
.I /poo 
to it.  Thus 
.I /tmp/bear/poo
will be deleted.
If however, the user wishes to stage-in the contents of a directory named
.I cat
and gives the following stage-in directive:
.Cs
-W stagein=/tmp/dog/newcat@somehost:cat
.Ce
where 
.I /tmp/dog
is an existing directory, then at job end, PBS will determine that
.I /tmp/dog/newcat
is a directory and append
.I /cat
and then fail on the attempt to delete
.I /tmp/dog/newcat/cat .
.LP
On stage-in when remote_file is a directory, the user should not specify a
new directory as local_name.  In the above case, the user should go with
.Cs
-W stagein=/tmp/dog@somehost:cat
.Ce
which will produce 
.I /tmp/dog/cat
which will match what PBS will try to delete at job's end.
.LP
Wildcards should not be used in either the local_file or the remote_file
name.  PBS does not expand the wildcard character on the local system.
If wildcards are used in the remote_file name, since rcp is launched by
rsh to the remote system, the expansion will occur.   However, at job end,
PBS will attempt to delete the file whose name actually contains the wildcard
character and will fail to find it.  This will leave all the staged in files
in place (undeleted).
.NH 2
.Tc "\f3Checkpointing MPI Jobs on SGI Systems\fP
.LP
Under Irix 6.5 and later,
MPI parallel jobs as well as serial jobs can be checkpointed and restarted
on SGI systems provided certain criteria are met.  SGI's checkpoint
system call cannot checkpoint processes that have open sockets.
Therefore it is necessary to tell mpirun to not create or to close an open
socket to the array services daemon used to start the parallel processes.
One of two options to mpirun must be used:
.RS
.IP -cpr 10
This option directs mpirun to close its connection to the array services daemon
when a checkpoint is to occur.
.IP -miser 10
This option directs mpirun to directly create the parallel process rather than
use the array services.   This avoids opening the socket connection at all.
.RE
.LP
The -miser option appears the better choice as it avoids the socket in the first
place.   If the -cpr option is used, the checkpoint will work, but will be 
slower because the socket connection must be closed first.
.LP
Note, interactive jobs or MPMD jobs (more than one executable program) can not
be checkpointed in any case.  Both use sockets (and TCP/IP) to communicate,
outside of the job for interactive jobs and between programs in the MPMD case.
.OH 'PBS Administrator Guide''Customizing'
.EH 'Customizing''PBS Administrator Guide'
.bp
.NH 1
.Tc \f3\s+2Customizing PBS\s-2\fP
.LP
Most sites find that PBS works for them with only configuration changes.
As their experience with PBS grows, many sites find it useful to customize the
supplied Scheduer or to develop one of their own to meet very specific policy
requirements.   Custom Schedulers have been written in C, BaSL or Tcl.
.LP
This section addresses several ways that PBS can be customized for your site.
While having the source code is the first step, there are specific actions 
other than modifying the code you can take.
.NH 2
.Tc "\f3Additional Build Options\fP"
.LP
Two header files within the subdirectory src/include provide additional
configuration control over the Server and Mom.
The modification of any symbols in the two
files should not be undertaken lightly.
.NH 3
.Tc pbs_ifl.h
.LP
This header file contains structures, symbols and constants used by the API,
libpbs.a, and the various commands as well as the daemons.
Very little here should ever be changed.  Possible exceptions are the following
symbols.  They must be consistent between all batch systems
which might interconnect.
.IP PBS_MAXHOSTNAME
Defines the length of the maximum possible host name.  This should be set 
at least as large as
.B MAXHOSTNAME
which may be defined in 
.I sys/params.h .
.IP PBS_MAXUSER
Defines the length of the maximum possible user login name.
.IP PBS_MAXGRPN
Defines the length of the maximum possible group name.
.IP PBS_MAXQUEUENAME
Defines the length of the maximum possible PBS queue name.
.IP PBS_USE_IFF
If this symbol is set to zero (0), before the library and commands are built,
the API routine pbs_connect() will not attempt to invoke the program
.B pbs_iff
to generate a secure credential to authenticate the user.
Instead, a clear text credential will be generated.
This credential is completely subject to forgery and is useful only for
debugging the PBS system.
You are strongly advised against using a clear text credential.
.IP PBS_BATCH_SERVICE_PORT
Defines the port number at which the Server listens.
.IP PBS_MOM_SERVICE_PORT
Defines the port number at which Mom, the execution miniserver, listens.
.IP PBS_SCHEDULER_SERVICE_PORT
Defines the port number at which the Scheduler listens.
.LP
.NH 3
.Tc server_limits.h
.LP
This header file contains symbol definitions used by the Server and by Mom.
Only those that 
.I might
be changed are listed here.  These should be changed with care.  It is 
strongly recommended that no other symbols in server_limits.h be changed.
If server_limits.h is to be changed, it may be copied into the include
directory of the 
.I target
(build) tree and modified before compiling.
.IP NO_SPOOL_OUTPUT
If defined, directs Mom to not use a spool directory for the job output,
but to place it in the user's home directory while the job is running.
This allows a site to invoke quota control over the output of running batch
jobs.
.IP PBS_BATCH_SERVICE_NAME
This is the service name used by the Server to determine to which port number
it should listen.   It is set to
.Ty pbs ,
in quotes as it is a character string.   Should you wish to assign PBS a service
port in 
.Ty /etc/services ,
change this string to the service name assigned.
You should also update PBS_SCHEDULER_SERVICE_NAME as required.
.IP PBS_DEFAULT_ADMIN
Defined to the name of the default administrator, typically \*Qroot\*U.
Generally only changed to simplify debugging.
.IP PBS_DEFAULT_MAIL
Set to user name from which mail will be sent by PBS.  The default is "adm".
This is overridden if the Server attribute
.I mail_from
is set.
.IP PBS_JOBBASE
The length of the job id string used as the basename for job associated files
stored in the spool directory.  It is set to 11, which is 14 minus the 3
characters of the suffixes like 
.Ty .JB
and 
.Ty .OU .
Fourteen is the guaranteed length for a file name under POSIX.   The actual
length that a file name can be depends on the file system and must be determined
at run time, but PBS is too lazy to go to that trouble.  If the Server
and Mom run on a file system that support longer names (most do), then
you may up this value so that the names are more readable.
.IP PBS_MAX_HOPCOUNT
Used to limit the number of hops taken when being routed from queue to queue.
It is mainly to detect loops.
.IP PBS_NET_MAX_CONNECTIONS
The maximum number of open file descriptors and sockets supported by the
server.  
.IP PBS_NET_RETRY_LIMIT
The limit on retrying requests to remote servers.
.IP PBS_NET_RETRY_TIME
The time between network routing retries to remote queues and for requests
between the Server and Mom.
.IP PBS_RESTAT_JOB
To refrain from over burdening any given Mom, the Server will wait this
amount of time (default 30 seconds)
between asking her for updates on running jobs.  In other words, if a user
asks for status of a running job more often than this value, the prior data
will be returned.
.IP PBS_ROOT_ALWAYS_ADMIN
If defined (set to 1), \*Qroot\*U is an administrator of the batch system
even if not listed in the 
.Ty managers
attribute.
.IP PBS_SCHEDULE_CYCLE
The default value for the elapsed time between scheduling cycles with no
change in jobs queued.
This is the initial value used by the Server, but it can be changed via 
.B qmgr (1B).
.LP
.NH 2
.Tc "\f3Site Modifiable Source Files\fP"
.LP
It is safe to skip this section until you have played with PBS for a while
and want to start tinkering.
.LP
Dave Tweten of NASA has said, "If it ain't source, it ain't software."
This is part of PBS's philosophy that source distribution should be a major
part of any software product.  Otherwise, the product becomes
\*Qhard\*U\-ware.
The first example of this philosophy is the PBS job Scheduler.   The
implementation of the site policy is left to the site.  PBS provides three
tools for that implementation, the BaSL Scheduler, the Tcl Scheduler, and
the C Scheduler.
.LP
The philosophy does not stop with the Scheduler.
With distribution of the source, a site has the ability to modify any
part of PBS as they so choose.  Of course, indiscriminate modification is
not without dangers.  Not the least of which is conflicts with future releases
by the developers.
.LP
Certain functions of PBS appear to be likely targets of widespread modification
by sites for a number of reasons.  When identified, the developers of PBS have
attempted to improve the easy of modification in these areas by the inclusion
of special 
.I "site specific modification routines" .
These are identified in the IDS under chapter headings of \*QSite Modifiable
Files\*U in the sections on the Server and Mom.
The distributed default version of these files build a private library,
libsite.a, which is include in the linking phase for the Server and for Mom.
They may be replaced as needed by a site.  The procedure is described in
the IDS under \*Qlibsite.a \- Site Modifiable Library\*U in Chapter 10.
.LP
The files include:
.IP Server
.RS
.IP site_allow_u.c
The routine in this file, 
.I site_allow_u() ,
provides an additional point at which a user
can be denied access to the batch system (server).  It may be used instead
of or in addition to the Server Acl_User list.
.IP site_alt_rte.c
The function
.I site_alt_router()
allows a site to add decision capabilities to job routing.  This function
is called on a per-queue basis if the queue attribute 
.At alt_router
is true.  As provided, site_alt_router() just invokes the default router,
.I default_router() .
.IP site_check_u.c
There are two routines in this file.
.IP
The routine 
.I site_check_user_map() ,
provides the service of authenticating that the
job owner is privileged to run the job under the user name specified or
selected for execution on the Server system.  Please see the IDS for the
default authentication method.
.mc |
.IP
The routine 
.I site_acl_check()
provides the site with the ability to restrict entry into a queue in ways
not otherwise covered.   For example, you may wish to check a bank account to
see if the user has the funds to run a job in the specific queue.
.mc
.IP site_map_usr.c
For sites without a common user name/uid space, this function,
.I site_map_user() ,
provides a place to add a user name mapping function.
The mapping occurs at two times.  First to determine if a user making a request
against a job is the job owner, see \*QUser Authorization\*U.
Second, to map the submitting user (job owner) to an execution uid on the
local machine.
.IP site_*_attr_*.h
These files provide a site with the ability to add local attributes to the
server, queues, and jobs.  The files are installed into the target tree
\*Qinclude\*U subdirectory during the first make.
As delivered, they contain only comments.  If a site wishes to
add attributes, these files can be 
.I carefully
modified.
.IP
The files are in three groups, by server, queue, and job.   In each group
are
.I site_*_attr_def.h
files which are used to defined the name and support functions for the new
attribute or attributes, and
.I site_*_attr_enum.h
files which insert a enumerated label into the set for the corresponding parent
object.  For server, queue, node  attributes, there is also an additional file
that defines if the qmgr(1) command will include the new attribute in
the set \*Qprinted\*U with the 
.Ty "print server" ,
.Ty "print queue" ,
or 
.Ty "print node"
sub-commands.
.IP
Detailed information on how to modify these files can be found in the IDS
under the \*QSite Modifiable Files\*U section of the Server, Chapter 5.
You should note that just adding attributes will have no effect on how PBS
processes jobs.   The main usage for new attributes would be in providing new
Scheduler controls and/or information.  The scheduling algorithm will have
to be modified to use the new attributes.    If you need Mom to do something
different with a job, you will still need \*Qto get down and dirty\*U with
her source code.
.RE
.IP Mom
.RS
.IP site_mom_chu.c
If a server is feeding jobs to more than one Mom, additional checking for
execution privilege may be required at Mom's level.  It can be added in
this function
.I site_mom_chkuser() .
.IP site_mom_ckp.c
Provide post-checkpoint,
.I site_mom_postchk()
and pre-restart
.I site_mom_prerst()
\*Quser exits\*U for the Cray and SGI systems.
.IP site_mom_jset.c
The function
.I site_job_setup()
allows a site to perform specific actions once the job session has been created
and before the job runs.
.RE
.nr Pb 1  \" turn on -ms version of macros for "man" pages.
.so ../ers/ers.macros
.bp
.NH 1
.Tc "\f3\s+2Useful Man Pages\s-2\fP"
.LP
The following pages are copies of various PBS man pages which are of
special interest to the Administrator.  The full set of man pages are
included in the distibution and may also be found in the ERS.
.NH 2
.Tc "\f3pbs_server\fP"
.LP
.so ../man8/pbs_server.8B
.bp
.NH 2
.Tc "\f3pbs_mom\fP"
.LP
.so ../man8/pbs_mom.8B
.bp
.NH 2
.Tc "\f3C Based Scheduler\fP"
.LP
.so ../man8/pbs_sched_cc.8B
.bp
.NH 2
.Tc "\f3BaSL Scheduler\fP"
.LP
.so ../man8/pbs_sched_basl.8B
.bp
.NH 2
.Tc "\f3Tcl Scheduler\fP"
.LP
.so ../man8/pbs_sched_tcl.8B
.bp
.NH 2
.Tc "\f3Qmgr Command\fP"
.LP
.so ../man8/qmgr.8B
.bp
.NH 2
.Tc "\f3Server Attributes\fP"
.LP
.so ../man1/pbs_server_attributes.7B
.bp
.NH 2
.Tc "\f3Queue Attributes\fP"
.LP
.so ../man1/pbs_queue_attributes.7B
.bp
.NH 2
.Tc "\f3Job Attributes\fP"
.LP
.so ../man1/pbs_job_attributes.7B
