.nr % 1
.OH ''PBS ERS'Resource Monitor'
.EH 'Resource Monitor'PBS ERS''
.P1
.so ers_setup.ms
.Rv $Revision: 2.4 $
.nr H1 8
.NH 1
.Tc \f3\s+2Resource Monitor\s-2\fP
.OF 'Chapt \*(rV''\n(H1-%'
.EF '\n(H1-%''Chapt \*(rV'
.\"         Portable Batch System (PBS) Software License
.\" 
.\" Copyright (c) 1999, MRJ Technology Solutions.
.\" All rights reserved.
.\" 
.\" Acknowledgment: The Portable Batch System Software was originally developed
.\" as a joint project between the Numerical Aerospace Simulation (NAS) Systems
.\" Division of NASA Ames Research Center and the National Energy Research
.\" Supercomputer Center (NERSC) of Lawrence Livermore National Laboratory.
.\" 
.\" Redistribution of the Portable Batch System Software and use in source
.\" and binary forms, with or without modification, are permitted provided
.\" that the following conditions are met:
.\" 
.\" - Redistributions of source code must retain the above copyright and
.\"   acknowledgment notices, this list of conditions and the following
.\"   disclaimer.
.\" 
.\" - Redistributions in binary form must reproduce the above copyright and 
.\"   acknowledgment notices, this list of conditions and the following
.\"   disclaimer in the documentation and/or other materials provided with the
.\"   distribution.
.\" 
.\" - All advertising materials mentioning features or use of this software must
.\"   display the following acknowledgment:
.\" 
.\"   This product includes software developed by NASA Ames Research Center,
.\"   Lawrence Livermore National Laboratory, and MRJ Technology Solutions.
.\" 
.\"         DISCLAIMER OF WARRANTY
.\" 
.\" THIS SOFTWARE IS PROVIDED BY MRJ TECHNOLOGY SOLUTIONS ("MRJ") "AS IS" 
.\" WITHOUT WARRANTY OF ANY KIND, AND ANY EXPRESS OR IMPLIED WARRANTIES, 
.\" INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY, 
.\" FITNESS FOR A PARTICULAR PURPOSE, AND NON-INFRINGEMENT ARE EXPRESSLY
.\" DISCLAIMED.
.\"
.\" IN NO EVENT, UNLESS REQUIRED BY APPLICABLE LAW, SHALL MRJ, NASA, NOR
.\" THE U.S. GOVERNMENT BE LIABLE FOR ANY DIRECT DAMAGES WHATSOEVER,
.\" NOR ANY INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
.\" DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS
.\" OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) 
.\" HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT
.\" LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY 
.\" OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF
.\" SUCH DAMAGE.
.\" 
.\" This license will be governed by the laws of the Commonwealth of Virginia,
.\" without reference to its choice of law rules.
.LP
This section describes the PBS resource monitor functions of the
Machine Oriented Miniserver (MOM).
MOM is the part of PBS which is machine dependent rather
than pure POSIX.  The resource monitor communicates with the world
(mainly the PBS scheduler) over the network.  All the
resource requests discussed below should be sent in as large a
group as practical.  This helps reduce the impact on the machine
answering the request because pbs_mom tries to gather information
only once for each request.  Each resource reported within each
request will then also be consistent with all others.
.LP
Each resource request to pbs_mom is sent with a command number telling what
the rest of the request will contain.
For the command "tell me about these resources", the following
strings are each a separate resource request.  These consist of
a resource name with parameters inclosed in square brackets.
Each parameter has a parameter name followed by an equal sign
and then a value.  For example
.Ty size[fs=/tmp] .
.NH 2
.Tc \f3Resources\fP
.LP
The machines and operating systems supported by
.Ar pbs_mom
are
.IP -
Cray systems using
.Ar "Unicos 8, 9 or 10, or Unicos MK 2"
.IP -
IBM 590 Workstations using
.Ar "AIX 4"
.IP -
IBM SP using
.Ar "AIX 4"
with
.Ar "PSSP 2.1" 
or
.Ar "PSSP 3.1" .
.IP -
Silicon Graphics systems using
.Ar "IRIX 5.x or 6.x" ,
.IP -
Sun Sparc using
.Ar "SunOS 4.1.x" or
.Ar "Solaris 2.5 (5.5)" ,
.IP -
Fujitsu VPP300 systems using
.Ar "UXP/V 4.1 ES" ,
.IP -
AMD/Intel/Cyrix systems using
.Ar "Linux" or
.Ar "FreeBSD"
.LP
All of these machines will support a standard list of resources.
.IP arch
This returns a string which specifies the machine architecture of
the host being queried.  By default, it will be one of: "aix4", "irix5",
"fujitsu", "irix6", "irix6array",
"linux", "freebsd", "solaris5", "sunos4",
"unicos8" or "unicosmk2".
The string returned may be changed by adding a 
.I "static resource"
or
.I "shell command"
to Mom's configuration file.  The value of the configuration file entry will
override that of the standard return for arch.   
.IP
Note, an configure file entry with the same name as any standard resource
will override the standard value when reported to a resource request from
a privileged port on a 
.I "client host" .
A resource request from an unprivileged port on a 
.I "restricted host"
will receive the standard value; the configure file entries are ignored.
.IP "cput"
This reports cpu time in seconds.  Two different parameters are
accepted.  One is
.Ar proc
which can be used to specify a process, the other is
.Ar session
which can be used to specify a session.
For example:
.Ty cput[proc=8765]
and
.Ty cput[session=4567] .
.IP "idletime"
This is the time in seconds in which no keystroke or mouse movement
has taken place on any terminal connected to the system.
.IP loadave
This reports the smoothed system loadave (number of processes in the kernel
run queue).
.IP "mem"
This reports memory usage in bytes.  The
.Ar proc
and
.Ar session
parameters are accepted.
For example: 
.Ty mem[proc=8765]
and
.Ty mem[session=4567] .
.IP ncpus\ 
Returns the number of processors that are available.
.IP "nsessions"
Returns the number of sessions which exist in the system.  Sessions owned by
root are excluded.  No parameters are allowed.
.IP "nusers"
Reports the number of users who have processes running in the system, excluding
root.
No parameters are allowed.
.IP "pids"
Reports a list of processes for a session.  A parameter
.Ar session
must be given.  For example
.Ty pids[session=34615] .
.IP "physmem"
Returns the physical (main) memory size in kilobytes.
.IP "sessions"
This will return a list of the sessions which exist in the system.  This
is different from PBS \*Qjobs\*U and the sessions listed may not be part of any
PBS job.
Sessions owned by root are excluded.  No parameters are allowed.
(Was \*Qjobs\*U in prior versions)
.IP size
Reports the size of file system objects in kilobytes.  Two different
parameters may be given.  One is
.Ar file
which specifies a filename whose size is returned.  The other is
.Ar fs
which specifies a directory in a file system which is examined to
find the amount of free space available.
.IP uname
This returns a string with the POSIX specified information returned
from the
.B uname()
function spearated by spaces.  The strings are in order:
.I sysname ,
.I nodename ,
.I release ,
.I version
and
.I machine .
.IP validuser
This function returns a string of either 
.Ty yes
or
.Ty no
if the user name is valid, i.e. has a password entry.
A parameter of \f5user=\f2name\f1 must be given.
For example:
.Ty validuser[user=joe] .
.IP walltime
Reports the time in seconds which a 
.Ar proc
or 
.Ar session
has existed in the system.
.LP
The following resources formerly were available under prior versions of PBS
built with
.Sc NEEDNODES
from pbs_mom:
.B avail ,
.B reserve ,
.B totpool ,
and
.B usepool .
.LP
They are now available from the job server, pbs_server, via the standard
PBS API.
See the man page for
.I pbs_rescquery ()
for more information.
.LP
The following are general examples of resource queries to pbs_mom:
.Cs
pids[session=456]
mem[proc=123]
size[file=/etc/passwd]
idletime
.Ce
In the given order, these resource requests would return a list of pid's for
the session "456", the memory usage for process "123", the size of
the file "/etc/passwd", and the idletime of the system.
.NH 3
.Tc SunOS Resources
.LP
On a Sun system running SunOS, PBS supports several resources in additon to
 the above list.
.IP availmem
Returns the virtual memory available to be used.
.IP quota\ 
Returns information about disk quotas.  Several parameters are
used.  The first must be
.Ar type
which can have a value of
.Ar harddata ,
.Ar softdata ,
.Ar currdata ,
.Ar hardfile ,
.Ar softfile ,
.Ar currfile ,
.Ar timedata
or
.Ar timefile .
The second must be
.Ar dir
which is used to specify the directory of the file system for which
quota information is desired.  The last parameter must be
.Ar user
which is used to specify the user name or id number whose quota
information is to be retrieved.
The type "harddata" returns the hard limit for data storage in kilobytes.
The type "softdata" returns the warning limit for data storage in kilobytes.
The type "currdata" returns the current usage of data storage in kilobytes.
The type "hardfile" returns the hard limit for the number of files.
The type "softfile" returns the warning limit for the number of files.
The type "currfile" returns the current number of files.
The type "timedata" returns the number of seconds that a user has left in
the grace period for excessive disk use, or zero if the grace period is
not active.
The type "timefile" returns the number of seconds that a user has left in
the grace period for having an excessive number of files, or zero if
the grace period is not active.
.IP resi
Returns the resident memory size in kilobytes for a
.Ar proc
or
.Ar session .
.IP totmem
Returns the total virtual memory which exists in the system in kilobytes.
.NH 3
.Tc Digital Unix
.LP
The resources for systems running Digital Unix include the standard set
plus:
.IP platform
Returns the cpu type as a string.
.NH 3
.Tc FreeBSD Resources
.LP
The FreeBSD resources include the standard set plus all those supported
by the Sun under SunOS except totmem and availmem.
.NH 3
.Tc SGI Resources
.LP
The Silicon Graphics resources for either Irix 5 or 6 include the standard
set plus all those supported by the Sun under SunOS.  If the MOM is
built for "irix6array", an additional resource is available:
.IP availmask
Return a MAXCNODES-bit string with a '1' in each position where there
are two CPUs available for a node.
.NH 3
.Tc Solaris Resources
.LP
The Sun Solaris resources include the standard set with the
addition of the following.
.IP platform
Returns the cpu type as a string, for example
.Ty SUNW,Ultra-1 . 
.NH 3
.Tc Fujitsu Resources
.LP
The Fujitsu UXP/V resources include the standard set with the
addition of the following.
.IP totmem
Returns the total virtual memory which exists in the system in kilobytes.
.IP availmem
Returns the virtual memory available to be used.
.NH 3
.Tc Cray Unicos Resources
.LP
Cray Unicos has some fundamental differences from both the Sun and SGI.
It does not use virtual memory and it has a much more sophisticated
quota system.  Some of the same resource names are used with slightly
different meanings.  All of the standard resources are included
as well as all the resources of the SGI except "walltime".
Unicos uses a periodic data gathering routine to calculate
swap rate and cpu idle values.  The time period that is used
for each calculation is SAMPLE_DELTA which is set to 10 seconds
by default.
.IP availmem
Returns the memory which is available for use by programs.
.IP cpuidle
This is the percent of idle time that all the processors have experienced
within the previous SAMPLE_DELTA seconds.  This is a time filtered value.
.IP cpuguest
This is the percent of time that all the processors have spent running
a guest operating system
within the previous SAMPLE_DELTA seconds.  This is a time filtered value.
.IP cpusysw
This is the percent of time that all the processors have spent in
system wait
within the previous SAMPLE_DELTA seconds.  This is a time filtered value.
.IP cpuunix
This is the percent of time that all the processors have spent running
kernel code
within the previous SAMPLE_DELTA seconds.  This is a time filtered value.
.IP cpuuser
This is the percent of time that all the processors have spent running
user code
within the previous SAMPLE_DELTA seconds.  This is a time filtered value.
.IP totmem
Returns the total memory in kilobytes minus what is taken
by the kernel.
.IP swapavail
The number of characters of free space in the swap device(s).
.IP swapinrate
The swap in activity within the previous SAMPLE_DELTA seconds in characters per
second.  This is also a time filtered value.
.IP swapoutrate
The swap out activity within the previous SAMPLE_DELTA seconds in characters per
second.  This is also a time filtered value.
.IP swaprate
The swap activity within the previous SAMPLE_DELTA seconds in characters per
second.  This includes both swap in and swap out transfers and is
time filtered such that previous values of
.Ar swaprate
are used to smooth the current calculated value.
.IP swaptotal
The total number of characters in the swap device(s).
.IP swapused
The number of characters of active data in the swap device(s).
.IP quota\ 
The same quota types as the Sun and SGI are supported by the C90.
The cray supports several others as well.  The additional types
only operate if the resource monitor is compiled with the symbol
.Ty SRFS .
They are "snap_avail", "ares_avail",
"res_total", "soft_res", "delta" and "reserve".
Additionally, if
.Ty SRFS
is defined, the "dir" attribute can specify one of several special directories
by starting with a dollar sign ($) character.  The allowed special names are
.Ty $TMPDIR ,
.Ty $BIGDIR ,
.Ty $FASTDIR
or
.Ty $WRKDIR .
These names are read from the file
.Ty /etc/tmpdir.conf
so in theory they could be changed but in practice they have been
hardwired into MOM and the Server as resources.  The file just gives
the administrator the ability to change the actual directory
where space will be allocated for a user making a SRFS request.
.sp
If the "type" attribute is one of the SRFS values, there can
be no other parameters.  If the "type" attribute is one of the standard
values, there must be one more parameter.  It can have a name of
"user", "group" or "account" and a value of a name or id number.
The standard types have the same meaning as the other machines.
The meanings of the SRFS types are very UNICOS specific.
They are
.Ar snap_avail ,
.Ar ares_avail ,
.Ar res_total ,
.Ar soft_res ,
.Ar delta ,
and
.Ar reserve .
The type "snap_avail" returns the amount of total space, in
characters.
The type "ares_avail" returns
.Ar snap_avail
less unused job reserved space.
The type "res_total"
returns the total amount of space reserved for jobs.
The type "soft_res"
returns "true" if soft reservation is allowed and "false" otherwise.
The type "delta"
returns the setting of over or under commitment.
The type "reserve"
returns the number of characters committed to "srfs_assist" mode jobs.
For more information, see the UNICOS SRFS documentation.
.IP srfs_reserve
This gives the ability to set the number of characters committed to
"srfs_assist" mode jobs.  The number is not a delta value so if
there is more than one job in "srfs_assist" mode, their requests
must be added together before making this call.
.NH 3
.Tc Cray Unicos MK 2 Resources
.LP
Cray Unicos MK 2 has some of the same resources as Unicos.  Quite a
few are missing because Unicos MK 2 typically comes in a "binary"
release that does not include header files that allow programs
such as PBS's MOM to retrieve information about processes in a general
way.  The resources available are "cput", "totmem", "availmem",
"ncpus" (the comment for this says "Number of started processors" but
it does not show the number of PEs), "physmem", "size", "idletime"
and "quota".
.LP
In addition to these, PBS under Unicos/MK2 support the following
resources:
.IP mppe_app
The number of application PEs currently configured.
.IP mppe_free
The total number of application PEs currently unallocated.
.IP mppe_avail
The size of largest contiguous block of application PEs.
.IP mppe_info
Returns the above three resources at the same time (intended to
provide a scheduler interface with lower overhead).
.LP
.NH 3
.Tc IBM SP
.LP
Basically, the SP is a special case of AIX.
However, because the SP Parallel Operating Environment works outside of PBS
PBS does not have
access to memory usage or cpu time of tasks other than serial processes run
from the job script.
.NH 2
.Tc "\f3Libraries\fP"
.LP
A number of libraries exist to communicate with MOM.  One is specialized
as a Resource Monitor interface and another as a Task Manager interface.
There is also a general network interface that uses UDP.  These
libraries are described below.
.NH 3
.Tc Resource Monitor Library
.LP
The resource monitor library contains functions to facilitate
communication with the resource monitor.  It is set up to make
it easy to connect to several resource monitors and handle the
network communication efficiently.  See the IDS for details.
.NH 3
.Tc Task Management Library
.LP
.so ../man3/tm.3
.NH 3
.Tc Reliable Packet Protocol Library
.LP
.so ../man3/rpp.3
.\" force next chapter to odd page
.bp
.if e \{
\&
.sp 10
.DS C
[Page intentionally left bank.]
.DE
.bp
\}
