.nr % 1
.OH ''PBS IDS'RESOURCE MONITOR'
.EH 'RESOURCE MONITOR'PBS IDS''
.P1
.so ids_setup.ms
.Rv $Revision: 2.2 $
.nr H1 6
.NH 1
.Tc \f3\s+2Resource Monitor\s-2\fP
.LP
.OF 'Chapt \*(rV''\n(H1-%'
.EF '\n(H1-%''Chapt \*(rV'
.\"         Portable Batch System (PBS) Software License
.\" 
.\" Copyright (c) 1999, MRJ Technology Solutions.
.\" All rights reserved.
.\" 
.\" Acknowledgment: The Portable Batch System Software was originally developed
.\" as a joint project between the Numerical Aerospace Simulation (NAS) Systems
.\" Division of NASA Ames Research Center and the National Energy Research
.\" Supercomputer Center (NERSC) of Lawrence Livermore National Laboratory.
.\" 
.\" Redistribution of the Portable Batch System Software and use in source
.\" and binary forms, with or without modification, are permitted provided
.\" that the following conditions are met:
.\" 
.\" - Redistributions of source code must retain the above copyright and
.\"   acknowledgment notices, this list of conditions and the following
.\"   disclaimer.
.\" 
.\" - Redistributions in binary form must reproduce the above copyright and 
.\"   acknowledgment notices, this list of conditions and the following
.\"   disclaimer in the documentation and/or other materials provided with the
.\"   distribution.
.\" 
.\" - All advertising materials mentioning features or use of this software must
.\"   display the following acknowledgment:
.\" 
.\"   This product includes software developed by NASA Ames Research Center,
.\"   Lawrence Livermore National Laboratory, and MRJ Technology Solutions.
.\" 
.\"         DISCLAIMER OF WARRANTY
.\" 
.\" THIS SOFTWARE IS PROVIDED BY MRJ TECHNOLOGY SOLUTIONS ("MRJ") "AS IS" 
.\" WITHOUT WARRANTY OF ANY KIND, AND ANY EXPRESS OR IMPLIED WARRANTIES, 
.\" INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY, 
.\" FITNESS FOR A PARTICULAR PURPOSE, AND NON-INFRINGEMENT ARE EXPRESSLY
.\" DISCLAIMED.
.\"
.\" IN NO EVENT, UNLESS REQUIRED BY APPLICABLE LAW, SHALL MRJ, NASA, NOR
.\" THE U.S. GOVERNMENT BE LIABLE FOR ANY DIRECT DAMAGES WHATSOEVER,
.\" NOR ANY INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
.\" DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS
.\" OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) 
.\" HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT
.\" LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY 
.\" OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF
.\" SUCH DAMAGE.
.\" 
.\" This license will be governed by the laws of the Commonwealth of Virginia,
.\" without reference to its choice of law rules.
The Resource Monitor is an adjunct to the Job Scheduler. The Resource Monitor
daemon provides the scheduler with information about resources on the local
system. 
.NH 2
.Tc "\f3Resource Monitor Overview\fP"
.LP
The Resource Monitor is part of
.B pbs_mom .
It listens for input on a specified socket, 
and responds with a list of resource names and values. The resource monitor
can respond to requests from many process, but the socket used
is privileged so only a root process can connect.
.QP
Note that pbs_mom no longer deals with allocation of execution nodes.  
That function has been moved to pbs_server as part of the full parallel
awareness features introducted in release 1.1.12.
.NH 2
.Tc Packaging
.LP
This chapter of the IDS only discusses the parts of pbs_mom which retain
to the Resource Monitor function.
The other pieces of pbs_mom are related to job execution.  These are discussed
in the following chapter entitled
.B "MOM - Machine Oriented Miniserver" .
.NH 2
.Tc \f3Program: pbs_mom\fP
.LP
The Resource Monitor portion of pbs_mom consists of an initialization
section and shares a 
single main loop. During the initialization phase, pbs_mom processes the input
line and calls
.Ar init_network()
to begin listening for clients.
The main loop consists of waiting for a message from a client by calling
.Ar wait_request()
which will read the input and call a routine to process the request.
This routine will obtain the required resource values, then
send the information back to the client. 
.LP
The Resource Monitor may also respond to a reconfiguration command by reading
a specified resource file. 
.NH 3
.Tc Configuration File
.LP
The configuration file provides a means to add resource names
to the Resource Monitor and also cause functions to be called.
This is described in the
.B pbs_mom
man page.
.NH 3
.Tc External Interfaces
.LP
The Resource Monitor communicates with the Job Scheduler using the
Reliable Packet Protocol (RPP) routines in the PBS net library.
Communication from the scheduler to the resource monitor consists of a
list of resource names. The resource monitor responds with a list of name/value
pairs.
.IP -
All information is passed as strings.
.IP -
All numeric values are in decimal.
.IP -
Time values are in seconds.
.IP -
Size (memory/disk) values are in kilobytes with the \*Qkb\*U appended.
.LP
.NH 4
.Tc "Scheduler to Resource Monitor communication"
.LP
Scheduler to Resource Monitor messages consist of a header, followed by a
message body.  The format of the message is:
.IP
header, containing command:
.br
.Sc RM_CMD_CLOSE ,
.Sc RM_CMD_REQUEST ,
.Sc RM_CMD_CONFIG
or
.Sc RM_CMD_SHUTDOWN
.IP
command body
.LP
The body of the message has a different usage for each command.  For the
RM_CMD_CLOSE and RM_CMD_SHUTDOWN commands, the body is ignored and
should be zero length.
.LP
For the RM_CMD_REQUEST command, the body consists of a number of
strings listing resource requests.  Each string has the following format:
.IP
.Ty name[qualifier=value][qualifier=value] ...
.LP
The qualifier/value pairs are enclosed in square brackets and are optional.
.LP
For the RM_CMD_CONFIG command, the body should have a single string
containing the full path name of a configuration file to read.
.NH 4
.Tc Resource Monitor to Scheduler communication
.LP
Resource Monitor to Scheduler messages consist of a header, followed by
a message body.
The format of the message is:
.IP
header, containing result:
.Er RM_RSP_OK
or
.Er RM_RSP_ERROR 
.IP
response body
.LP
If the command received was RM_CMD_CLOSE, no response will be returned.
If the command received was RM_CMD_REQUEST, the response
body will consist of the same list of resources which was sent in the
command body with each one followed by an equal sign (=) and
a value.  Each line in the response body has the form
.Ty "resource=value".
If no value can be returned, the character following the equal sign
is a question mark (?) followed by a space and an error number:
.Er RM_ERR_UNKNOWN ,
.Er RM_ERR_BADPARAM ,
.Er RM_ERR_NOPARAM ,
.Er RM_ERR_EXIST ,
or
.Er RM_ERR_SYSTEM .
.LP
If the value is a single entity, the character following the equal
sign will not be a space.  If the value is a list, the character
following the equal sign will be a space and each list entity will
be separated from the next with another space.
.LP
If any other command was received, the response body will be zero length.
.NH 4
.Tc Communication Library
.LP
To simplify communication with the resource monitor,
a Resource Monitor (RM) library has been provided to
handle the details of the protocol described above.  The Reliable
Packet Protocol (RPP) and Data Is Strings (DIS) librarys
are used as well.
.NH 4
.Tc Signal Handling
.LP
The Resource Monitor, pbs_mom, can be commanded to re-read the configuration
file
which was last read by sending it a SIGHUP signal.  If no configuration
file has ever been read, no action will take place.  An orderly shutdown
of the Resource Monitor, pbs_mom, will take place if a SIGINT or SIGTERM signal
is received.  Several other signals may be defined that also cause
an orderly shutdown.  These are SIGXCPU, SIGXFSZ, SIGCPULIM, SIGSHUTDN,
and SIGINFO.
.NH 3
.Fi resmon.h
.LP
This file defines several structures which will be used
throughout the code as well as some constant values such as
error codes.
.LP
The
.Ar rm_attribute
structure is used to pass name/value pairs from square bracket enclosed
strings in a request to lower level
routines in a convenient form.
.LP
.Cs
struct	rm_attribute {
	char	*a_qualifier;
	char	*a_value;
};
.Ce
The field
.Ar a_qualifier
points to the name to the left of the equal sign.  For example,
the string
.Ty [proc=1234]
could be sent as a qualifier for the
.Ar mem
request.  Here,
.Ar a_qualifier
would point to the string
.Ty proc
and
.Ar a_value
would point to
.Ty 1234 .
.LP
The
.Ar config
structure is used to save a name to be used as a key
for searching and a value or function call to provide an "answer"
for the name in question.
.LP
.Cs
typedef	char	*(*confunc) _A((struct rm_attribute *));
struct	config {
	char	*c_name;
	union	{
		confunc	c_func;
		char	*c_value;
	} c_u;
};
.Ce
For example, suppose the name
.Ty Informix
is found in the config file followed by the value
4.10.UD2
for the version.  In this case,
.Ar c_name
would point to
.Ty Informix
and
.Ar c_value
would point to
.Ty 4.10.UD2 .
In the case of a name that will have a routine provide a value,
the field
.Ar c_func
is used to provide a pointer to the function.
.NH 3
.Fi mom_main.c
.LP
This file contains the routines needed for communication and
processing an array of configuration elements (names and values).
.LP
.Fn main()
.Cs
main(\^int argc, char **argv)
.Ce
.IP Description: 4
Process command line arguments, and call
.Ar read_config()
to read any config files specified.
Set up to ignore or catch signals.
Call
.Ar dep_initialize()
to perform initialization processing based on machine type.
Initialize the network communications by calling
.Ar init_network()
in the PBS net library.
Enter an infinite processing loop which calls
.Ar wait_request()
with
.Ar get_request()
given as the routine to call to handle a request.
Each time a network event or timeout takes place, the routine
.Ar end_proc()
is called to do periodic processing.  The only machine that takes
advantage of this feature right now is the C90.  Others just have
a stub.
.LP
.Fn read_config()
.Cs
int read_config(char *file)
.Ce
.IP Returns: 4
0 on success or 1 on failure.
.IP Description: 4
If the value for the parameter
.Ar file
is not NULL, save the string it points to as the last seen
configuration filename.  If
.Ar file
is NULL, use the previously saved configuration filename.
Open and read the configuration file.  Save the names and values in a linked
list so we can count the number of entries and allocate an array
to hold them.  If the name starts with a dollar sign ($),
this is an entry which should be found in an internal table and
result in a function call.
After reading the file,
create an array, copy the list elements to the array and
free the list.
.LP
.Fn addclient()
.Cs
int addclient(char *name)
.Ce
.IP Args: 4
.Ar name
is the hostname to be added to the list of hosts which will be allowed
to make requests of Mom.  The routine
.B gethostbyname()
is called and the IP address of the host is stored in the array
.Ar okclients .
.LP
.Fn setlogevent()
.Cs
static u_long setlogevent(char *value)
.Ce
.IP Args: 4
.Ar value
in either decimal or hex to which the log event mask is set.
.LP
Sets the external long integer
.Ar log_event_mask
to the value.  Returns 0 if an error in the value such as an illegal character;
returns 1 if ok.
.LP
.Fn restricted()
.Cs
static  u_long restricted(char *name)
.Ce
.IP Args: 4
.Ar name
is the name of a host.
.LP
The named host is allowed to query internal or static resources,
but not any that require the execution of a script.   This was provied 
to allow xpbsmon to obtain information about nodes in a cluster.
The name is added to the
.Ar maskclient
array.  A connecting host is check against this array in
.I bad_restrict() .
.Fn cputmult()
.Cs
static u_long cputmult(char *value)
.Ce
.IP Args:
.Ar value is a string representing a floating point multipler.
.LP
The multipler is used to adjust the measured/charged cput against a faster or
slower base system.
.Fn wallmult()
.Cs
static u_long wallmult(char *value)
.Ce
.IP Args:
.Ar value is a string representing a floating point multipler.
.LP
The multipler is used to adjust the measured/charged wall time  against a
faster or slower base system.
.Fn usecp()
.Cs
static u_long usecp(char *value)
.Ce
.Ar value
is a string containing two tokens seperateh by white space.
.LP
This routine parses the $usecp config file entry.  Value is broken into the
two tokens.  The first token is of the form 
.Ty hostname:/file/path .
The second token is
.Ty /alternate/path .
The host name is seperated from the /file/path and the (now) three parts are
stored in an array of structures.   This array is used by
.I told_to_cp()
on behalf of
.I local_or_remote()
to determine if /bin/cp or rcp should be used to copy files.
.Fn rm_search()
.Cs
struct config *rm_search(struct config *where, char *what)
.Ce
.IP Args: 4
The
.Ar where
pointer is the beginning of an array of config structures which
are to be searched.
The
.Ar what
pointer is a character string which is the name to search for.
.IP Description: 4
Enter a loop to check each config entry in where.  If one is found
with a name field that matches
.Ar what ,
return that entry.
If no match is found, return a NULL pointer.
.LP
.Fn dependent()
.Cs
char *dependent(char *resource, struct rm_attribute *attr)
.Ce
.IP Args: 4
The
.Ar resource
character array is the name of the resource to search for.
The
.Ar attr
pointer specifies a qualifier/value pair in an rm_attribute structure.
.IP Description: 4
This is the routine which will report back
values for resources.  The
.Ar search()
routine
is used to search the array
.Ar dependent_config
of type
.Ar "struct config"
contained in the dependent code.
If the search returns a match, the function in the dependent code
pointed to by the matching entry in the array is called with
.Ar attr
as the parameter.
.IP Return: 4
A string with the value returned from the dependent function.
If no match was found, return a NULL.
.LP
.Fn initialize()
.Cs
void initialize();
.Ce
.IP Description: 4
Setup the
.Ar common_config
array with the entries for "avail", "reserve", "totpool" and "usepool".
Then call
.Ar read_nodes()
and
.Ar dep_initialize() .
.LP
.Fn cleanup()
.Cs
void cleanup();
.Ce
.IP Description: 4
Free all the memory for the node list and call
.Ar dep_cleanup() .
.LP
.Fn get_request()
.Cs
void get_request(int fd);
.Ce
.IP Description: 4
Read the socket to get a request.  Check to see if there is
any previously saved input from this socket.  If there is, add the buffer
just read to the saved input.  Check to see if this input has
an end of packet mark.  If not, return to wait to complete the
packet.  If so, format the reply
and write it back.  If the request is for a resource list,
increment the counter
.Ar reqnum
so the dependent
routine can tell which "packet number" it
is processing.  Next, call
.Ar getattr()
to read the first parameter, if any.  If any other parameters
are needed by the dependent routine, it can call
.Ar getattr()
with a NULL pointer argument.
.LP
.Fn getattr()
.Cs
struct rm_attribute *getattr(char *str);
.Ce
.IP Description: 4
Get an rm_attribute structure from a string.  Remember the
.Ar str
character pointer in a static variable.  If a NULL pointer is passed
for the string, use the previously remembered pointer.
If the rm_attribute name is "tag:", continue to the next attribute.
This allows the use of a special attribute which will be ignored.
For this feature to work correctly, the "tag:" attribute must be the
first one on the line for a request.  This is because
.Ar getattr()
saves the strings for the name and value in static strings which
will be over written by subsequent qualifiers.
.LP
.Fn arch()
.Cs
char *arch(struct rm_attribute *attrib)
.Ce
.IP Description: 4
Return the PBS_MACH string defined in "local.mk".
.LP
.Fn conf_res()
.Cs
char *conf_res(char *s, struct rm_attribute *attr)
.Ce
.IP Description: 4
Return a value for a resource from the configuration file.
If a match is found in
.Ar get_request()
for a resource read from the configuration file, this routine is
called to generate the reply.  The parameter
.Ar s
is the value for the resource in the config file.  The parameter
.Ar attr
is the pointer to the first attribute from the request given by
.Ar getattr() .
If
.Ar s[0]
is an exclamation mark (!), this is a shell escape resource.
If not, then
.Ar attr
must be NULL or an error occurs.  This is because a static resource
has a fixed value and cannot be modified by an attribute.
.sp
If the resource is a shell escape, enter a loop to save all the
attributes sent with the query.  Then enter a loop to scan
the command string passed in
.Ar *s .
Every time a percent character (%) is found, check to see if
a parameter substitution should take place by looking to see
if a token follows the percent sign and matches one of the saved
attribute names.  If so, copy the attribute value into the output
string, otherwise, just copy the current character.  When the
scan is done, check to see if any attributes were not used.  If
so, return an error.  Otherwise, call
.B popen()
to run a shell with the command generated from the request.
Return the first line read from standard out from this command.
.NH 3
.Fi sunos4/mom_mach.c
.LP
This is the code used to report
values from a sun workstation.  It will be used as an example
to make it possible to write code to be used on
another type of machine.
.Fn dep_initialize()
.Cs
void dep_initialize()
.Ce
.IP Args: 4
None.
.IP Description: 4
This is one of the external entry points what will be the same
name for code written for every type of machine.  All the steps
required to prepare the dependent section of code should be
executed here.  In this case, use
.B kvm_open()
and
.B kvm_nlist()
system calls to open the kernel for use.
.IP Returns: 4
Nothing.
.LP
.Fn getprocs()
.Cs
int getprocs()
.IP Args: 4
None.
.IP Description: 4
This routine fills in an array with the process table of the running
system.  It first checks to see if the information has already been
retrieved by comparing
.Av reqnum
for equality to a static counter it keeps.
If it is equal, the information has already been retrieved and no
further work needs to be done.  Counter roll over is not a problem
since the comparison is for equality.
If it needs to refresh the information,
it frees the old array, then reads the kernel to get the current number
of process.  A new array is allocated to hold the table and the
kernel is read to get the information.
.IP Returns: 4
The number of processes in the table.  Zero is returned if an
error occurs.
.LP
.Fn cput()
.Cs
char *cput(struct rm_attribute *attrib)
.Ce
.IP Args: 4
The parameter
.Ar attrib
is a pointer to the attribute structure returned by
.Ar getattr().
.IP Description: 4
Check
.Ar attrib
to see is the attributes are okay.  There needs to be one attribute
with a qualifier of
"job" or "proc" with a value that is
an integer greater than zero.  Call
.Ar cput_job()
if the qualifier is "job".  Call
.Ar cput_proc()
if the qualifier is "proc".
.IP Returns: 4
A character string giving the formatted response for the cpu time in
seconds or NULL pointer if an error occurred.
.LP
.Fn cput_job()
.Cs
char *cput_job(int jobid)
.Ce
.IP Args: 4
The parameter
.Ar jobid
is used to identify a "job".  On the sun, it will be compared with
the process group of a process.  If a match is found, it is
considered part of the same job.
.IP Description: 4
Call
.Ar getprocs()
to get the process table.  Loop over each process entry and see if
it is a member of the job identified by
.Ar jobid.
If it is, sum the cpu time used by the process into a counter.
.IP Returns: 4
A character string giving the number of seconds calculated for the
cpu time used by the job or a NULL pointer if an error occurred.
.LP
.Fn cput_proc()
.Cs
char *cput_proc(pid_t pid)
.Ce
.IP Args: 4
The parameter
.Ar pid
gives the pid of the process of interest.
.IP Description: 4
Call
.B kvm_getproc()
to get the process with the pid we are looking for.
.IP Returns: 4
A character string giving the number of seconds calculated for the
cpu time used by the process or a NULL pointer if an error occurred.
.LP
.Fn mem()
.Cs
char *mem(struct rm_attribute *attrib)
.Ce
.IP Args: 4
The parameter
.Ar attrib
is a pointer to the attribute structure returned by
.Ar getattr().
.IP Description: 4
Check
.Ar attrib
to see is the attributes are okay.  There needs to be one attribute
with a qualifier of
"job" or "proc" with a value that is
an integer greater than zero.  Call
.Ar mem_job()
if the qualifier is "job".  Call
.Ar mem_proc()
if the qualifier is "proc".
.IP Returns: 4
A character string giving the formatted response for the memory used
in bytes or a NULL pointer if an error occurred.
.LP
.Fn mem_job()
.Cs
char *mem_job(int jobid)
.Ce
.IP Args: 4
The parameter
.Ar jobid
is used to identify a "job".  On the sun, it will be compared with
the process group of a process.  If a match is found, it is
considered part of the same job.
.IP Description: 4
Call
.Ar getprocs()
to get the process table.  Loop over each process entry and see if
it is a member of the job identified by
.Ar jobid.
If it is, sum the memory used by the process into a counter.
.IP Returns: 4
A character string giving the number of bytes calculated for the
memory used by the job or a NULL pointer if an error occurred.
.LP
.Fn mem_proc()
.Cs
char *mem_proc(pid_t pid)
.Ce
.IP Args: 4
The parameter
.Ar pid
gives the pid of the process of interest.
.IP Description: 4
Call
.B kvm_getproc()
to get the process with the pid we are looking for.
.IP Returns: 4
A character string giving the number of bytes calculated for the
memory used by the process or a NULL pointer if an error occurred.
.LP
.Fn jobs()
.Cs
char *jobs(struct rm_attribute *attrib)
.Ce
.IP Description: 4
Check to make sure there are no attributes.
If so, call
.Ar getprocs()
and loop through the process list skipping those owned by root.
For each process job id,
check an array of saved job id's to see if it has been encountered
before.  If not, add the current job id to the array of saved job
id's.
.IP Returns: 4
A string with a space separated list of job id's of all the processes
in the system, or a NULL pointer if an error occurred.
.LP
.Fn pids()
.Cs
char *pids(struct rm_attribute *attrib)
.Ce
.IP Description: 4
Check to make sure there is only one attribute with a qualifier
of "job" and a value greater than zero.  If so, call
.Ar getprocs()
and search through the process list looking for members of
the job specified.
.IP Returns: 4
A string with a space separated list of pid's of all the processes
found to be a part of the job or a NULL pointer if an error occurred.
.LP
.Fn getanon()
.Cs
int getanon(char *id)
.Ce
.IP Args: 4
The character string
.Ar id
is used in logging to identify which routine made the call.
.IP Description: 4
The kernel maintains an area of general information called
.At anoninfo
which is retrieved by this routine.  As usual, the counter
.Av reqnum
is compared to a static counter to see if the information
is already in hand.
.IP Returns: 4
0 if all is well, 1 if an error occurred.
.LP
.Fn totmem()
.Cs
char *totmem(struct rm_attribute *attrib)
.Ce
.IP Args: 4
The parameter
.Ar attrib
is a pointer to the attribute structure returned by
.Ar getattr().
.IP Description: 4
Check to make sure no attributes have been passed.  Then, call
.Ar getanon()
to fill in the anoninfo structure which contains the total
memory size of the machine.
.IP Returns: 4
A character string with the total memory in bytes or
a NULL pointer if an error occurred.
.LP
.Fn availmem()
.Cs
char *availmem(struct rm_attribute *attrib)
.Ce
.IP Args: 4
The parameter
.Ar attrib
is a pointer to the attribute structure returned by
.Ar getattr().
.IP Description: 4
Check to make sure no attributes have been passed.  Then, call
.Ar getanon()
to fill in the anoninfo structure which contains the available
memory of the machine.
.IP Returns: 4
A character string with the available memory in bytes or
a NULL pointer if an error occurred.
.LP
.Fn physmem()
.Cs
char *physmem(struct rm_attribute *attrib)
.Ce
.IP Args: 4
The parameter
.Ar attrib
is a pointer to the attribute structure returned by
.Ar getattr().
.IP Description: 4
Check to make sure no attributes have been passed.  Then, read
the kernel to get the physical
memory size of the machine.
.IP Returns: 4
A character string with the physical memory in bytes or
a NULL pointer if an error occurred.
.LP
.Fn size()
.Cs
char *size(struct rm_attribute *attrib)
.Ce
.IP Args: 4
The parameter
.Ar attrib
is a pointer to the attribute structure returned by
.Ar getattr().
.IP Description: 4
Check
.Ar attrib
to make sure only one attribute was passed.  If so, check
the qualifier to be sure it is one we understand:  if it is
"file", call
.Ar size_file() .
If it is
"fs", call
.Ar size_fs() . 
.IP Returns: 4
A character string pointer which is returned from the function
called, or a NULL pointer if an error occurred.
.LP
.Fn size_fs()
.Cs
char *size_fs(char *param)
.Ce
.IP Args: 4
The parameter
is a character string which specifies the path to check.
.IP Description: 4
Use
.B statfs()
to get the information about the path specified.
.IP Returns: 4
A character string with the file system space available in bytes or
a NULL pointer if an error occurred.
.LP
.Fn size_file()
.Cs
char *size_fs(char *param)
.Ce
.IP Args: 4
The parameter
is a character string which specifies the path to check.
.IP Description: 4
Use
.B stat()
to get the information about the path specified.
.IP Returns: 4
A character string with the file size in bytes or
a NULL pointer if an error occurred.
.LP
.Fn idletime()
.Cs
char *idletime(struct rm_attribute *attrib)
.Ce
.IP Args: 4
The parameter
.Ar attrib
is a pointer to the attribute structure returned by
.Ar getattr().
.IP Description: 4
Check
.Ar attrib
to make sure no attributes were passed.  Use
.B opendir()
and
.B readdir()
to read the /dev directory and
.B stat()
devices which begin with "tty".  Maintain a time value with
the maximum access time for each device tested.  After
checking the "tty" devices, perform the same test on "/dev/kbd"
and "/dev/mouse".
.IP Returns: 4
A character string containing the difference between the current
time and the maximum access time from all devices tested.  This
value is reported in seconds.
If an error occurred, return a NULL pointer.
.LP
.Fn walltime()
.Cs
char *walltime(struct rm_attribute *attrib)
.Ce
.IP Args: 4
The parameter
.Ar attrib
is a pointer to the attribute structure returned by
.Ar getattr().
.IP Description: 4
Check
.Ar attrib
to see is the attributes are okay.  There needs to be one attribute
with a qualifier of
"proc" or "job" with a value that is
an integer greater than zero.  Call
.Ar getprocs
and search though the list of processes for the process or job as
specified by the attribute.
Call
.B kvm_getu()
to get the user structure for each process.  Check to see if the
start time is less than any other process encountered.  If so,
save the start time of the process being checked.
.IP Returns: 4
A character string containing the difference between the current
time and the smallest start time found.  This
value is reported in seconds.
If an error occurred, return a NULL pointer.
.LP
.Fn loadave()
.Cs
char *loadave(struct rm_attribute *attrib)
.Ce
.IP Args: 4
The parameter
.Ar attrib
is a pointer to the attribute structure returned by
.Ar getattr().
.IP Description: 4
Check
.Ar attrib
to make sure no attributes were passed.  Use
.B kvm_read()
to get the load average reported by the kernel.
.IP Returns: 4
A character string containing the load average of the system.
If an error occurred, return a NULL pointer.
.LP
.Fn quota()
.Cs
char *quota(struct rm_attribute *attrib)
.Ce
.IP Args: 4
The parameter
.Ar attrib
is a pointer to the attribute structure returned by
.Ar getattr().
.IP Description: 4
Check
.Ar attrib
for an attribute with a name of "type".  This attribute
must have a value of one of
"harddata", "softdata", "currdata", "hardfile", "softfile", "currfile",
"timedata", or "timefile".
The next attribute must have the name "dir" with the value
being a directory name.  This directory specifies the file system
to check for quota information.  The last attribute must have the
name "user" with a value giving a user name or an integer specifying
a uid.  The system call
.B quotactl()
is used to get quota information for the user in the specified directory.
The type "harddata" returns the hard limit for data storage in characters.
The type "softdata" returns the warning limit for data storage in characters.
The type "currdata" returns the current usage of data storage in characters.
The type "hardfile" returns the hard limit for the number of files.
The type "softfile" returns the warning limit for the number of files.
The type "currfile" returns the current number of files.
The type "timedata" returns the number of seconds that a user has left in
the grace period for excessive disk use, or zero if the grace period is
not active.
The type "timefile" returns the number of seconds that a user has left in
the grace period for having an excessive number of files, or zero if
the grace period is not active.
.LP
.Fn dep_cleanup()
.Cs
void dep_cleanup()
.Ce
.IP Args: 4
None.
.IP Description: 4
This is another external entry point to the dependent code.
Here is where all cleanup operations take place that are specific to
the machine of interest.  In the case of a sun, all that is needed is to
close the kernel device.
.IP Returns: 4
Nothing.
.NH 3
.Fi irix5/mom_mach.c
.LP
This is the code used to report
values from a Silicon Graphics machine.  It is very similar to the
code for the Sun except SGI IRIX can report the number of cpu's
on a host.
Also, the methods for getting the information about processes and jobs
center around the process file system rather than reading the
kernel structures directly.  This requires the use of version 5 or later
release of IRIX.
.LP
.Fn ncpus()
.Cs
char *ncpus(struct rm_attribute *attrib)
.Ce
.IP Description: 4
Since no attributes are legal for this request, check to make sure
.Ar attrib
is NULL.  Then call
.B sysmp()
with the parameter
.Ty MP_NAPROCS .
Return the value from this call formatted as a decimal number.
.NH 3
.Fi solaris5/mom_mach.c
.LP
This is the code used to report
values from a Sun Solaris machine.  It is very similar to the
code for IRIX5 except Solaris cannot report any quota information
or virtual memory size.
As with IRIX5, the methods for getting the information about processes and jobs
center around the process file system rather than reading the
kernel structures directly.
.NH 3
.Fi unicos8/mom_mach.c
.LP
This file contains the code for the Cray C-90.
It has several routines dealing with swap space and is the only
machine that uses the "periodic processing" capability of the
resource monitor.
Another difference for the cray is with the
.Ar quota()
routine.  It is much more complex then any other machine
and can optionally support the Session Reservable File System (SRFS).
.LP
This file also provides functions to read the file
.Ty /etc/tmpdir.conf
to get the temporary directory names the administrator has set up
for SRFS.  These currently must be
.Ty $TMPDIR ,
.Ty $BIGDIR ,
.Ty $FASTDIR
and
.Ty $WRKDIR .
.LP
.Fn end_proc()
.Cs
void *end_proc()
.Ce
.IP Description: 4
The global variable
.Ar last_time
is used to keep track of the last time processing took place.
A call to
.B rtclock()
is made to get the current time in clock ticks.  This is compared
to last_time to see if the network woke us up before it was time
to do something.  If so, calculate the value of
.Ar wait_time
such that the next wakeup will be timed correctly.  This variable
is used by the
.B select()
call in
.Ar wait_request()
as a timeout value.  If it is time to do something, set
.Ar wait_time
to the value of SAMPLE_DELTA which is a define'ed number.
Call
.B tabinfo()
and
.B tabread()
to get the PWS processor data and the SINFO system data.  Set
.Ar last_time
to the current time.
Calculate the cpu time
percentages, filter them and store them in the global variables
.Ar cpu_idle ,
.Ar cpu_guest ,
.Ar cpu_unix ,
.Ar cpu_sysw
and
.Ar cpu_user .
The method of filtering is to calculate the new value from the
current data and the old value as follows:
.Cs
cpu_idle = a * current  +  (1-a) * old
.Ce
The value of a must fall in the range [0-1].  I picked 0.75.
Next, calculate the average swap rate since the last call of
this routine.  Use the same filter operation as above and
store the result in the global variable
.Ar swap_rate .
.LP
.Fn quota()
.Cs
char *quota(struct rm_attribute *attrib)
.Ce
.IP Args: 4
The parameter
.Ar attrib
is a pointer to the attribute structure returned by
.Ar getattr().
.IP Description: 4
Check
.Ar attrib
for an attribute with a name of "type".  This attribute
can have one of the standard quota "type" values given for
the other machines.  These are
"harddata", "softdata", "currdata", "hardfile", "softfile", "currfile",
"timedata", or "timefile".
The cray supports several others as well.  The additional types
only operate if the resource monitor is compiled with the symbol
.Ty SRFS .
They are "snap_avail", "ares_avail",
"res_total", "soft_res", "delta" and "reserve".
The next attribute must have the name "dir" and a value of a directory
name or "variable" directory name.  These are described below.
If the type attribute is one of the SRFS values, there can
be no other attributes.  If the type attribute is one of the standard
values, there must be one more attribute.  It can have a name of
"user", "group" or "account" and a value of a name or id number.
Depending on what the name is, the value is looked up to see if
it is valid.  This is then used to retrieve the quota information.
The standard types have the same meaning as the other machines.
The meanings of the SRFS types are taken from the UNICOS header file
/usr/include/sys/srfs.h:
.sp
.Cs
int   snap_avail; /* number of currently available blocks if a snap     */
                  /* was taken of the system                            */
int   ares_avail; /* snap_avail less unused reserved blocks             */
                  /* the number of blocks available for reservation the */
                  /* sum of ares_avail, delta, and reserved             */
int   res_total;  /* total number of reserved blocks                    */
int   soft_res;   /* set to TRUE if soft reservation is allowed         */
long  delta;      /* over/under subscription delta                      */
long  reserve;    /* buffer for root demanded allocations on SRFS       */
.Ce
The type "soft_res" will return "true" or "false".  The values for
the rest are converted from blocks to characters.
.LP
.Fn srfs_reserve()
.Cs
char *srfs_reserve(struct rm_attribute *attrib)
.Ce
.IP Description: 4
The attributes (there must be at least one) names are passed to
.Ar var_value()
to see if the name exists as a defined temp directory.  If so,
the value is converted to a number and used as a parameter
to the system call
.Ar quotactl() .
This call is done with the command set to
.B SRFS_RESERVE
so that a "srfs_assist" mode reservation can be done.
.LP
.Fn swapused()
.Cs
char *swapused(struct rm_attribute *attrib)
.Ce
.IP Args: 4
The parameter
.Ar attrib
is a pointer to the attribute structure returned by
.Ar getattr().
.IP Description: 4
Call
.B tabinfo()
and
.B tabread()
to get the swapper information.  Calculate the number of characters
used in the swap areas and format this number in decimal.
.LP
.Fn cpuidle()
.Cs
char *cpuidle(struct rm_attribute *attrib)
.Ce
.IP Args: 4
The parameter
.Ar attrib
is a pointer to the attribute structure returned by
.Ar getattr().
.IP Description: 4
Check the global variable
.Ar last_time .
If it is zero, return with a system error.  This would mean the
calls to get the cpu usage information in
.Ar end_proc()
had failed and there was nothing to return.  Otherwise, format
the variable
.Ar cpu_idle
and return a pointer to
.Ar ret_string .
.LP
.Fn var_init()
.Cs
void var_init()
.Ce
.IP Description: 4
Open and read
.Ty /etc/tmpdir.conf .
For each line, ignore it if it begins with a hash mark (#) character.
Otherwise, save the temporary directory name and path.
.LP
.Fn var_cleanup()
.Cs
void var_cleanup()
.Ce
.IP Description: 4
Free space allocated in
.I var_init()
for the names and paths.
.LP
.Fn var_value()
.Cs
char *var_value(char *name)
.Ce
.IP Description: 4
Search the saved directory names for
.Ar name .
Return the value or NULL if it is not found.
.NH 3
.Fi aix4/mom_mach.c
.LP
This is the code used to report values from an IBM 590 workstation running
AIX 4.
.LP
.Fn dep_initialize()
.Cs
void dep_initialize()
.Ce
.IP Args: 4
None.
.IP Description: 4
Call
.I open()
to get access to "/dev/kmem' and call
.I knlist()
to get the name list.  The two structures in the kernel we
need access to are "vmker" which has memory information, and "avenrun"
which has the load averages.
.LP
.Fn getproctab()
.Cs
int getproctab()
.Ce
.IP Args: 4
None.
.IP Description: 4
This routine retrieves a table of procsinfo entries.  A library call to
.I getprocs()
is made in a loop which terminates when no more procsinfo entries
are available.  If there are more entries to retrieve, a call to
.B realloc()
is made to expand the table size.  This table is retained so
only the first call should result in several passes though the loop.
.NH 3
.Fi sp2/mom_mach.c
.LP
This is the code used to report values from an IBM SP-2 parallel
computer.  It is based on the code for the IBM 590 but there are
fewer supported functions for this machine because the parallel
"resource manager" does not support some important functionality.
It cannot return the cpu time or memory utilization for a job's
parallel node usage.
.LP
.Fn getjobstat()
.Cs
int getjobstat()
.Ce
.IP Args: 4
None.
.IP Description: 4
This routine will get a table of JM_JOB_STATUS entries which give
information about jobs running on the nodes.
Check
.I reqnum
to see if data needs to be retrieved.  If so, call the library function
.I jm_connect_ub()
to get access to the "resource manager".  Call
.I jmq_jobs_status()
to retrieve the table of information.  Then call
.I jm_disconnect()
to terminate communication with the resource manager.
.LP
This file also contains a function specific to the IBM SP-2 which
will remove from consideration any node which is shown
by the IBM Job Manager to be busy.
This function is called from
.Ar nodes_inuse() .
.LP
.Fn dep_inuse()
.Cs
int dep_inuse()
.Ce
.IP Description: 4
Call
.Ar getjobstat()
to gather information from the job manager.  Then loop thru the
node list checking for a JM_JOB_STATUS entry from the job manager
that contains a matching node.  If one is found, mark the inuse
flag true for the node.
.\" force next chapter to odd page
.bp
.if e \{
\&
.sp 10
.DS C
[This page is blank.]
.DE
.bp
\}
