tm_finalize - task management API


SYNOPSIS

       #include <tm.h>

       int tm_init(info, roots)
      void *info;
      struct tm_roots *roots;

       int tm_nodeinfo(list, nnodes)
      tm_node_id *list;
      int *nnodes;

       int tm_poll(poll_event, result_event, wait, tm_errno)
      tm_event_t poll_event;
      tm_event_t *result_event;
      int wait;
      int *tm_errno;

       int tm_notify(tm_signal)
      int tm_signal;

       int tm_spawn(argc, argv, envp, where, tid, event)
      int argc;
      char **argv;
      char **envp;
      tm_node_id where;
      tm_task_id *tid;
      tm_event_t *event;

       int tm_kill(tid, sig, event)
      tm_task_id tid;
      int sig;
      tm_event_t event;

       int tm_obit(tid, obitval, event)
      tm_task_id tid;
      int *obitval;
      tm_event_t event;

       int tm_taskinfo(node, tid_list, list_size, ntasks, event)
      tm_node_id node;
      tm_task_id *tid_list;
      int list_size;
      int *ntasks;
      tm_event_t event;

       int tm_atnode(tid, node)
      tm_task_id tid;
      tm_node_id *node;

       int tm_rescinfo(node, resource, len, event)

      void *info;
      int len;
      int *info_len;
      tm_event_t event;

       int tm_finalize()


DESCRIPTION

       These functions provide a partial implementation of the task management
       interface part of the PSCHED API.  In PBS, MOM provides the  task  man-
       ager  functions.  This library opens a tcp socket to the MOM running on
       the local host and sends and receives messages.

       The PSCHED Task Management API description used to create this  library
       was  commited  to paper on Novermber 15, 1996 and was given the version
       number 0.1.  Changes may have taken place since that time which are not
       reflected in this library.

       The  API  description uses several data types that it purposefully does
       not define.  This was done so an implementaion would not be confined in
       the  way  it was written.  For this specific work, the definitions fol-
       low:

       typedef   int            tm_node_id;    /* job-relative node id */
       #define   TM_ERROR_NODE  ((tm_node_id)-1)

       typedef   int            tm_event_t;    /* event handle, > 0 for real events */
       #define   TM_NULL_EVENT  ((tm_event_t)0)
       #define   TM_ERROR_EVENT ((tm_event_t)-1)

       typedef   unsigned long  tm_task_id;
       #define   TM_NULL_TASK   (tm_task_id)0

       There are a number of error values defined as well: TM_SUCCESS,
       TM_ESYSTEM, TM_ENOEVENT, TM_ENOTCONNECTED, TM_EUNKNOWNCMD, TM_ENOTIM-
       PLEMENTED, TM_EBADENVIRONMENT, TM_ENOTFOUND.

       tm_init() initializes the library by opening a socket to the MOM on the
       local  host  and sending a TM_INIT message, then waiting for the reply.
       The info paramenter has no use and is  included  to  conform  with  the
       PSCHED  document.   The roots pointer will contain valid data after the
       function returns and has the following structure:

       struct    tm_roots {
            tm_task_id     tm_me;
            tm_task_id     tm_parent;
            int       tm_nnodes;
            int       tm_ntasks;
            int       tm_taskpoolid;
            tm_task_id     *tm_tasklist;
       };

       tm_tasklist         This will be NULL for PBS.

       The tm_ntasks, tm_taskpoolid and tm_tasklist fields are not filled with
       data specified by the PSCHED document.  PBS does not support task pools
       and,  at  this  time, does not return information about current running
       tasks from tm_init.  There is a separate call to  get  information  for
       current running tasks called tm_taskinfo which is described below.  The
       return value from tm_init be TM_SUCCESS if the  library  initialization
       was successful, or an error return otherwise.

       tm_nodeinfo()  places a pointer to a malloc'ed array of tm_node_id's in
       the pointer pointed at by list.  The order of the tm_node_id's in  list
       is the same as that specified to MOM in the "exec_host" attribute.  The
       int pointed to by nnodes contains the number of nodes allocated to  the
       job.   This  is  information that is returned during initialization and
       does not require communication with  MOM.   If  tm_init  has  not  been
       called, TM_ESYSTEM is returned, otherwise TM_SUCCESS is returned.

       tm_poll()  is  the  function  which will retrieve information about the
       task management system  to  locations  specified  when  other  routines
       request an action take place.  The bookkeeping for this is done by gen-
       erating an event for each action.  When the task manager (MOM) sends  a
       message  that  an  action is complete, the event is reported by tm_poll
       and information is placed where the caller requested it.  The  argument
       poll_event  is  meant  to  be  used  to request a specific event.  This
       implementation does not use it and it must be set to  TM_NULL_EVENT  or
       an error is returned.  Upon return, the argument result_event will con-
       tain a valid event number or TM_ERROR_EVENT on error.  If wait is  zero
       and   there   are   no   events  to  report,  result_event  is  set  to
       TM_NULL_EVENT.  If wait is non-zero an there are no events  to  report,
       the  function will block waiting for an event.  If no local error takes
       place, TM_SUCCESS is returned.  If an error is reported by MOM  for  an
       event, then the argument tm_errno will be set to an error code.

       tm_notify() is described in the PSCHED documentation, but is not imple-
       mented for PBS yet.  It will return TM_ENOTIMPLEMENTED.

       tm_spawn() sends a message to MOM to start a new task.  The node id  of
       the  host to run the task is given by where.  The parameters argc, argv
       and envp specify the program to run and its arguments  and  environment
       very much like exec().  The full path of the program executable must be
       given by argv[0] and the number of elements in the argv array is  given
       by argc.  The array envp is NULL terminated.  The argument event points
       to a tm_event_t variable which is filled in with an event number.  When
       this  event  is  returned by tm_poll , the tm_task_id pointed to by tid
       will contain the task id of the newly created task.  In  addition,  the
       tid  is  available  to the process in the PBS_TASKNUM environment vari-
       able.  Similarly, the node number is in the  PBS_NODENUM  variable  and
       the cpu number is in the PBS_VNODENUM variable.

       tm_kill()  sends  a signal specified by sig to the task tid and puts an
       event number in the tm_event_t pointed to by event.

       tm_atnode() will place the node id where the task  tid  exists  in  the
       tm_node_id pointed to by node.

       tm_rescinfo()  makes  a  request  for a string specifying the resources
       available on a node given by the argument node.  The string is returned
       in the buffer pointed to by resource and is terminated by a NUL charac-
       ter unless the number of characters  of  information  is  greater  than
       specified  by len.  The resource string PBS returns is formated as fol-
       lows:

       A space separated set of strings from the uname system call followed by
       a  colon  (:).  The order of the strings is sysname, nodename, release,
       version, machine.

       A  comma  spearated  set  of  strings  giving  the  components  of  the
       "Resource_List"  attribute of the job.  Each component has the resource
       name, an equal sign, and the limit value.

       For example, a return for a task running on an  SGI  workstation  might
       look like:

       IRIX golum 6.2 03131015 IP22:cput=20:00,mem=400kb

       tm_publish()  causes  len bytes of information pointed at by info to be
       sent to the local MOM to be saved under the name given by name.

       tm_subscribe() returns a copy of the information named by name for  the
       task  given  by  tid.  The argument info points to a buffer of size len
       where the information will be returned.  The argument info_len will  be
       set  with  the  size of the published data.  If this is larger than the
       supplied buffer, the data will have been truncated.

       tm_finalize() may be called to free any memory in use  by  the  library
       and close the connection to MOM.


SEE ALSO

       pbs_mom,   PSCHED:   An   API   for  Parallel  Job/Resource  Managment,
       http://parallel.nas.nasa.gov/Psched/psched-api-report.ps



                                  21 May 1997                            TM(3)

Man(1) output converted with man2html