tm_finalize - task management API
SYNOPSIS
#include <tm.h>
int tm_init(info, roots)
void *info;
struct tm_roots *roots;
int tm_nodeinfo(list, nnodes)
tm_node_id *list;
int *nnodes;
int tm_poll(poll_event, result_event, wait, tm_errno)
tm_event_t poll_event;
tm_event_t *result_event;
int wait;
int *tm_errno;
int tm_notify(tm_signal)
int tm_signal;
int tm_spawn(argc, argv, envp, where, tid, event)
int argc;
char **argv;
char **envp;
tm_node_id where;
tm_task_id *tid;
tm_event_t *event;
int tm_kill(tid, sig, event)
tm_task_id tid;
int sig;
tm_event_t event;
int tm_obit(tid, obitval, event)
tm_task_id tid;
int *obitval;
tm_event_t event;
int tm_taskinfo(node, tid_list, list_size, ntasks, event)
tm_node_id node;
tm_task_id *tid_list;
int list_size;
int *ntasks;
tm_event_t event;
int tm_atnode(tid, node)
tm_task_id tid;
tm_node_id *node;
int tm_rescinfo(node, resource, len, event)
void *info;
int len;
int *info_len;
tm_event_t event;
int tm_finalize()
DESCRIPTION
These functions provide a partial implementation of the task management
interface part of the PSCHED API. In PBS, MOM provides the task man-
ager functions. This library opens a tcp socket to the MOM running on
the local host and sends and receives messages.
The PSCHED Task Management API description used to create this library
was commited to paper on Novermber 15, 1996 and was given the version
number 0.1. Changes may have taken place since that time which are not
reflected in this library.
The API description uses several data types that it purposefully does
not define. This was done so an implementaion would not be confined in
the way it was written. For this specific work, the definitions fol-
low:
typedef int tm_node_id; /* job-relative node id */
#define TM_ERROR_NODE ((tm_node_id)-1)
typedef int tm_event_t; /* event handle, > 0 for real events */
#define TM_NULL_EVENT ((tm_event_t)0)
#define TM_ERROR_EVENT ((tm_event_t)-1)
typedef unsigned long tm_task_id;
#define TM_NULL_TASK (tm_task_id)0
There are a number of error values defined as well: TM_SUCCESS,
TM_ESYSTEM, TM_ENOEVENT, TM_ENOTCONNECTED, TM_EUNKNOWNCMD, TM_ENOTIM-
PLEMENTED, TM_EBADENVIRONMENT, TM_ENOTFOUND.
tm_init() initializes the library by opening a socket to the MOM on the
local host and sending a TM_INIT message, then waiting for the reply.
The info paramenter has no use and is included to conform with the
PSCHED document. The roots pointer will contain valid data after the
function returns and has the following structure:
struct tm_roots {
tm_task_id tm_me;
tm_task_id tm_parent;
int tm_nnodes;
int tm_ntasks;
int tm_taskpoolid;
tm_task_id *tm_tasklist;
};
tm_tasklist This will be NULL for PBS.
The tm_ntasks, tm_taskpoolid and tm_tasklist fields are not filled with
data specified by the PSCHED document. PBS does not support task pools
and, at this time, does not return information about current running
tasks from tm_init. There is a separate call to get information for
current running tasks called tm_taskinfo which is described below. The
return value from tm_init be TM_SUCCESS if the library initialization
was successful, or an error return otherwise.
tm_nodeinfo() places a pointer to a malloc'ed array of tm_node_id's in
the pointer pointed at by list. The order of the tm_node_id's in list
is the same as that specified to MOM in the "exec_host" attribute. The
int pointed to by nnodes contains the number of nodes allocated to the
job. This is information that is returned during initialization and
does not require communication with MOM. If tm_init has not been
called, TM_ESYSTEM is returned, otherwise TM_SUCCESS is returned.
tm_poll() is the function which will retrieve information about the
task management system to locations specified when other routines
request an action take place. The bookkeeping for this is done by gen-
erating an event for each action. When the task manager (MOM) sends a
message that an action is complete, the event is reported by tm_poll
and information is placed where the caller requested it. The argument
poll_event is meant to be used to request a specific event. This
implementation does not use it and it must be set to TM_NULL_EVENT or
an error is returned. Upon return, the argument result_event will con-
tain a valid event number or TM_ERROR_EVENT on error. If wait is zero
and there are no events to report, result_event is set to
TM_NULL_EVENT. If wait is non-zero an there are no events to report,
the function will block waiting for an event. If no local error takes
place, TM_SUCCESS is returned. If an error is reported by MOM for an
event, then the argument tm_errno will be set to an error code.
tm_notify() is described in the PSCHED documentation, but is not imple-
mented for PBS yet. It will return TM_ENOTIMPLEMENTED.
tm_spawn() sends a message to MOM to start a new task. The node id of
the host to run the task is given by where. The parameters argc, argv
and envp specify the program to run and its arguments and environment
very much like exec(). The full path of the program executable must be
given by argv[0] and the number of elements in the argv array is given
by argc. The array envp is NULL terminated. The argument event points
to a tm_event_t variable which is filled in with an event number. When
this event is returned by tm_poll , the tm_task_id pointed to by tid
will contain the task id of the newly created task. In addition, the
tid is available to the process in the PBS_TASKNUM environment vari-
able. Similarly, the node number is in the PBS_NODENUM variable and
the cpu number is in the PBS_VNODENUM variable.
tm_kill() sends a signal specified by sig to the task tid and puts an
event number in the tm_event_t pointed to by event.
tm_atnode() will place the node id where the task tid exists in the
tm_node_id pointed to by node.
tm_rescinfo() makes a request for a string specifying the resources
available on a node given by the argument node. The string is returned
in the buffer pointed to by resource and is terminated by a NUL charac-
ter unless the number of characters of information is greater than
specified by len. The resource string PBS returns is formated as fol-
lows:
A space separated set of strings from the uname system call followed by
a colon (:). The order of the strings is sysname, nodename, release,
version, machine.
A comma spearated set of strings giving the components of the
"Resource_List" attribute of the job. Each component has the resource
name, an equal sign, and the limit value.
For example, a return for a task running on an SGI workstation might
look like:
IRIX golum 6.2 03131015 IP22:cput=20:00,mem=400kb
tm_publish() causes len bytes of information pointed at by info to be
sent to the local MOM to be saved under the name given by name.
tm_subscribe() returns a copy of the information named by name for the
task given by tid. The argument info points to a buffer of size len
where the information will be returned. The argument info_len will be
set with the size of the published data. If this is larger than the
supplied buffer, the data will have been truncated.
tm_finalize() may be called to free any memory in use by the library
and close the connection to MOM.
SEE ALSO
pbs_mom, PSCHED: An API for Parallel Job/Resource Managment,
http://parallel.nas.nasa.gov/Psched/psched-api-report.ps
21 May 1997 TM(3)
Man(1) output converted with
man2html