AUTOMATED ERROR CHECKING OF BATCH JOBS WITH MPEX/3000
            by Adrian Partridge, GAINSBOROUGH SOFTWARE LTD
              Published by INTERACT Magazine, Apr 1995.


Checking  $STDLISTs for errors and filing them for archive purposes is
an important function of the Information System department. It is also
an  unloved  chore.  Sites  with  XL  machines and  NMSPOOLER have the
benefit  of having their spool files all held as disk files. These can
be  searched and printed with normal MPE commands.  HP even supply you
with a JOBABORT condition for use with the SPOOLF/LIST SPF commands to
display all $STDLISTs that have JOBABORTed during execution.

  Using  the excellent MPEX/3000 package from  VESOFT I have created a
$STDLIST  Management Tool (SMT)  which provides significant advantages
over  a simple SPOOLF  @;SELEQ=[JOBABORT=TRUE];SHOW when required. The
finished  $STDLIST  Management Tool is a  good example of the features
and power provided by the MPEX software.

  This  is  a  list  of  major elements that  make a comprehensive and
  useful SMT.


$STDLIST SELECTION	

It  is  important  for  our  SMT  to  select  just $STDLISTs  from the
spoolqueue.  These  $STDLISTs are filtered out  to make sure that they
are  in  a READY state till OPENed or  LOCKed by a job or utility). In
this  simple SMT any $STDLISTs that  have been checked will be altered
to  a priority of 4, therefore only $STDLISTs above this priority will
be selected.

REPEAT
 ..
 ..
 ..
FORFILES O@.OUT.HPSPOOL(SPOOL.FILE="$STDLIST" & SPOOL.ISREADY &
SPOOL.OUTPRI>=5)

This  REPEAT...FORFILES  is an MPEX construct  gives us the ability to
repeat a number of commands on files that match the FORFILES selection
condition.

There are several other ways that you might wish to do this procedure;
another possible example is

Have   a   logon   UDC   for   batch   jobs,   which   does  a  SPOOLF
@;SELEQ=[FILEDES=$STDLIST];  PRI=1. This will  defer the $STDLIST down
to  a  priority  of  1  so  the SMT need only  check $STDLISTs at this
priority.  The  $STDLIST priority can then  be increased once checked,
this  could  even  cause  the $STDLISTs to  automatically print if the
priority is above the OUTFENCE. Because $STDLISTs are deferred down to
1  there  is  no  way  that  they  can be accidentally  printed if the
outfence is below 8.

Whatever  choice you make to select the $STDLISTs, the important thing
is that $STDLISTs are only checked once.


$STDLIST DIAGNOSIS

A batch management tool must be able to tell if a job has completed or
not.  Thi s SMT uses a number of basic principles to deduce the status
of  the job. JOBABOR Ted $STDLISTs  have definitely failed to complete
and  need  immediate  attention. The JOBABORT  condition has one major
downfall,  if  the  command  proceeding  the line with  the error is a
CONTINUE  statement  then JOBABORT will not pick  this up. If an error
occurred while this is set, then JOBABORT is rendered useless.

A $STDLIST in our SMT is treated as if it has finished in one of three
states:

The  $STDLIST did not complete. This could  be caused by the job being
ABORTJOBed  or  an error by one of the  commands within a JCL caused a
flush during execution. If the job does complete then there should not
be a :EOJ in the $STDLIST (its a lways a good idea to end your JCLs in
!EOJ).  This can be easily searched  for by passing the following line
into the SMTERROR file:

UPS(R[1:4]) <> ':EOJ' AND RECNUM = VEFINFO(FNUM).EOF - 2

This  line  instructs SMT to deem any  file without :EOJ in the second
from last line, as terminated in error.

The  $STDLIST contains an error. If  MPE commands cause an error there
is  a good c hance that either a  CIERR or FSERR will be returned, but
not  all commands. For e xample, if you  did a STORE of files and some
where  not stored correctly, you would want  SMT to point out that the
STORE was not 100% successful. The following errors will cater for the
bulk of MPE failures.

'(CIERR' or '(FSERR' or 'NOT STORED'

Your custom errors can be added into the SMTERROR file also.

$STDLIST   completed  successfully  -  If  the  SMT  completes  the  2
procedures above then the $STDLIST is deemed as OK.

This SMTERROR file is the file that contains the errors and is read in
each  tim e a $STDLIST is checked.  It is advised that caution be used
in  not  having an excessive number of  strings in this file. Remember
strings can be ORed together to reduce checking time.

Any strings that you wish to look for MUST be enclosed in ' ' symbols.
Other  PR  INT  functions  avilable  for  use  are  CL  (CaseLess), DL
(DeLimited),  RECNUM and vir tually any MPEX opertaor such as OR, AND,
NOT,   MATCHING,   BETWEEN,   etc.  These  opt  ions  should  be  left
undelimited.

The  SMTERROR file is read using the REPEAT...FORRECS construct, which
enables  u  se to repeat a number of  commands for each line read from
the  file  in  the  FORREC S selection. Each  line is passed through a
PRINT  ;SEARCH=  statement.  If a PRINT  command finds something, then
that  $STDLIST  is  deemed  as ending in  error. These are immediately
reported. The PRINT variable MPEXPRINTLINESFOUND is used to accomplish
this task.


$STDLIST HANDLING

What  do  you  do with these $STDLISTs?  What special treatment do you
give  to $ST DLISTs that have ended  in error? How do you reduce paper
consumption  by  unneces sary $STDLIST printing?  How do you make your
operator's  time more productive instead of checking $STDLISTs all day
long? Our SMT, that's how!

Our  SMT  at  the  moment  only does the  most basic $STDLIST handling
because   differ  ent  sites  might  want  to  do  something  slightly
different.  Some possible options f or  $STDLIST handling that you can
easily implement into the SMT are:

*  By  copying  $STDLISTs  selected  as  being in an  error state to a
different name (for example, STDERROR), you can then delete  $STDLISTs
when they become READY.

*  Using different spool file priorities you could create a daily tier
system.  The $STDLIST priorities reflect the day on which the $STDLIST
was  created  (5=today,  0=5  days  previous).  Each day  you roll the
$STDLISTs down a priority, deleting ones at 0.

*  Because  spool  files are on disk you  could copy the $STDLIST to a
group created each day. This group would contain a contents file which
is  written  to  each  time  a spoolfile is copied  into the group. An
ON-LINE  system could easily be written to pull back any contents file
and from it, pull back $STDLISTs from any number of days previous.

 The SMT we run with uses a number of the above options. All $STDLISTs
are  first  copied  to  a  log  group created daily.  OK $STDLISTs are
deleted  from the queue and all $STDLISTs that are in error are copied
to a new printout called STDERR OR and the $STDLIST purged.


CONTINUAL EXAMINATION OF $STDLISTs

One of the following scenarios might apply to you:

On  big sites, where batch jobs are running continually throughout the
day,  $STDLISTs  that  have  ended  in  error might not  be found from
anything from from 10 minutes to 1 hour.

When  you  arrive  in the morning you have  the overnight batch run to
 check  through  before any users can log  on, just in case some files
 not  have been backed-up, or a job which updates your data hasn't run
 successfully.

In  both cases it is essential  that $STDLISTs be checked quickly. The
easiest  way  of  doing  this  is to read the  $STDLISTs in a loop and
continually  cycle around that loop until it is broken To keep the job
looping  around  and  around  until  you  wish  to stop  offered a few
problems  at first. Many people might want to pause between checks for
up to 10 minutes. Checking if a flag file was built could only be done
once  in  the  JCL  of  the  SMT. This means that  if a pause had just
started  and you request the SMT to  stop by BUILDing the stop flag it
would  not  finish  until  the  pause  had completed I  have devised a
control  mechanism  that  allows  the user to stop  the job instan tly
after a check has completed. This mechanism will also instruct the SMT
to do a check as and when it is requested. This control mechanism uses
good  old  message  files,  background  task  and  all sorts  of other
trickery:

CONTROL SKELETON -

FILE SMTMESS=SMTMESS,OLD;SHR;GMULTI
PURGE SMTMESS
BUILD SMTMESS:REC=-10,,,ASCII;DISC=1;MSG
SETVAR OPTION "CHECK"
WHILE TRUE DO
 IF OPTION = "STOP" THEN
  RETURN	
 ELSEIF OPTION = "CHECK" THEN
  < $STDLIST CHECKER ROUTINES >
 ENDIF
 IF SONALIVE(GOONPIN) = FALSE THEN
  RUN MAIN.PUB.VESOFT;PARM=1;INFO="PAUSESMT";GOON;STDLIST=$NULL;PRI=DS
  SETVAR GOONPIN MPEXPIN
 ENDIF
 INPUT OPTION < *SMTMESS
ENDWHILE

PAUSESMT -

PAUSE 300
ECHO CHECK >*SMTMESS

Well, what does all this do? A message file is unique in that anything
reading  that  file  will wait until something  is written to it. This
principle  is applied to our control skeleton above. Firstly a message
file  is created called SMTMESS. This file is passed instructions from
either  users  or  the  program  itself.  In each loop  of SMT we read
SMTMESS  by way of the INPUT command, which in turn will cause SMT to
wait.  Before  this, a background process  (PAUSESMT) is started t hat
pauses  for  300 seconds (5 minutes) and  then writes to SMTMESS. This
will  then cause SMT to continue processing. The INPUT command is also
used  to set a variable (OPTION) to whatever value is written into the
message file. With this variable we can instruct SMT to either stop or
do  another  check of $STDLISTs in the  queue. The SMTMESS file can be
written  to by you, therefore a check is set up to see i f PAUSESMT is
running if it is then a new PAUSESMT is not started. This ensures that
every  5  minutes  a  check will begin regardless  of any outside user
intervention!!!


INFORM OF SUSPECTED ERRORS

OK  so  this  fancy SMT has found the  errors in some $STDLISTs - what
now.  This utility must make as  much noise, flashing highlighted text
as  possible  to  infor  m  the  console operator that  such a job has
aborted or needs checking and he/she can promptly inform the necessary
personnel

To  inform  the console of any errors  within $STDLISTs we use TELLOPs
and the ; FORMAT command within the PRINT line. A variable called R is
the  current  record  that  PRINT  is processing. Combine  this with a
TELLOP  and we can send the lines d irectly from the $STDLIST straight
to  the console. This gives virtually  instanta neous knowledge to the
jobs  creator  of  why  it  aborted  without  having to  print out the
$STDLIST.  This  is  done  by  outputting the current  record (R) with
TELLOP in front of it to a file and then executing that file:

BUILD TELERROR;TEMP;REC=-60,1,F,ASCII;NOCCTL
FILE TELERROR=TELERROR,OLDTEMP
PRINT <FILE AND CONDITION>;FORMAT="TELLOP "+R[1:50];OUT=*TELERROR
XEQ TELERROR
	
Only the first 50 chars of the $STDLIST are copied across because then
lines wi ll not wrap around when the message appears on the console.

If  you  don't  wish  errors  to do do the  console but some dedicated
device  just  fo  r  SMT  messages,  then the TELLOP  command could be
substituted with the WARN or TE LL command.

PRINT <FILE AND CONDITION>;FORMAT="WARN LDEV=nnn " + R[1:50];OUT=*TELERROR

whereby 'nnn' is the device number for the messages to be WARNed to.

An  alternate idea is to have errors  not only sent to the console but
also  your PAGER or BEEPER!! This SMT  could then be used to inform of
errors 24 hours a day.

	FILE TOMOD;DEV=nnn
	ECHO  ATDTttt >>*TOMOD
	
whereby 'nnn' is the device number of the MODEM and 'ttt' is you PAGER
or BEEPER number!

To  save time the SMT does not  want to search each $STDLIST for every
different  entry  in  the SMTERROR file. When  an error is detected we
want to inform the console and then continue with the next $STDLIST. A
TRAPERROR...IFERROR/ENDIFERROR  is another unique  MPEX construct that
will  cat  ch  errors  caused  within  a command file  or JCL and then
execute  commands between the  IFERROR/ENDIFERROR construct to rectify
the  problem. The TRAPERROR...IFERROR /ENDIFERROR  is used not just to
trap  errors  in  our  SMT,  but also as a means  of u sing the ESCAPE
command.  This ESCAPE command assigns the CIERROR variable to a number
and   forces   a   jumps   from   the   TRAPERROR   routine   to   the
IFERROR/ENDIFERROR  subroutine.  This feature is  used  in MPEX as the
equivalent  of  the  GOTO command used within  languages such as BASIC
Whenever an error is found, instead of proceeding with the next string
we jump from that routine to another.

GOTO CONSTRUCT -

REPEAT
 TRAPERROR
  REPEAT
   PRINT $STDLIST
   IF ERROR THEN
    ESCAPE 1
   ENDIF
  FORRECS RECORD=SMTCHECK,OLD
 TELLOP $STDLIST OK
 IFERROR
  IF CIERROR = 1 THEN
   TELLOP ERROR FOUND!!!
  ENDIF
 ENDIFERROR
FORFILES $STDLIST	

The complete SMT job stream looks like:-

!JOB SMT,MANAGER.SYS
!SETVAR VESOFTDEFAULTNOSPACE 1
!RUN MAIN.PUB.VESOFT;PARM=1;INFO="SMTCHECK"
!EOJ

The complete SMTCHECK routine looks like:-

FILE SMTMESS=SMTMESS,OLD;SHR;GMULTI
CONTINUE
PURGE SMTMESS
BUILD SMTMESS;REC=-10,,,ASCII;DISC=1;MSG
CONTINUE
PURGE TELERROR,TEMP
BUILD TELERROR;TEMP;REC=-51,,,ASCII
FILE TELERROR = TELERROR,OLDTEMP
SETVAR OPTION "CHECK"
SETVAR GOONPIN 0
WHILE TRUE DO
 IF OPTION = "STOP" THEN
  RETURN
 ELSEIF OPTION = "CHECK" THEN
 COMMENT *** START OF CHECK SECTION ***
  REPEAT
   TRAPERROR
    REPEAT
     ECHO ![MPEXCURRENTFILE]
     PRINT ![MPEXCURRENTFILE];SEARCH=!RECORD;CONTEXT=2,2;&
FORMAT=" TELLOP "+R[1:50];OUT=*TELERROR
     IF MPEXPRINTLINESFOUND > 0 THEN
      ECHO ![SPOOL.JOBNUMBER] - FAILED
      ESCAPE 1
     ENDIF
    FORRECS RECORD = SMTERROR,OLD
    COMMENT *** $STDLIST IS OK ***
    TELLOP ![SPOOL.JOBNUMBER] - ![SPOOL.JOBNAME]&
,![SPOOL.USER].![SPOOL.ACCOUNT]
    TELLOP $STDLIST (![SPOOL.SPOOLFILENUM]) IS OK
    TELLOP
    ECHO ![SPOOL.JOBNUMBER] - OK
   IFERROR
    IF CIERROR = 1 THEN
     TELLOP *********************
     TELLOP * A T T E N T I O N *
     TELLOP *********************
     TELLOP
     TELLOP ![SPOOL.JOBNUMBER]  - ![SPOOL.JOBNAME]&
,![SPOOL.USER].![SPOOL.ACCOUNT]
     TELLOP MAY CONTAIN A POSSIBLE ERROR
     TELLOP
     XEQ TELERROR
     TELLOP
     TELLOP PLEASE CHECK THIS $STDLIST IMMEDIATELY
    ENDIF
   ENDIFERROR
   COMMENT *** ALTER $STDLISTs TO PRI OF 4 ***
   SPOOLF ![SPOOL.SPOOLFILENUM];PRI=4
  FORFILES O@.OUT.HPSPOOL(SPOOL.FILE="$STDLIST" AND &
SPOOL.ISREADY AND SPOOL.OUTPRI >= 5)
 ENDIF
 COMMENT *** START BACKGROUND PAUSE ***
 IF SONALIVE(GOONPIN) = FALSE THEN
  RUN MAIN.PUB.VESOFT;PARM=1;INFO="PAUSESMT";STDLIST=$NULL;GOON
  SETVAR GOONPIN MPEXPIN
 ENDIF
 COMMENT *** START WAIT ***
 INPUT OPTION < *SMTMESS
 SETVAR OPTION RTRIM(OPTION)
ENDWHILE

The PAUSESMT routine is simply :-

	PAUSE 300
	ECHO CHECK >*SMTMESS

To  shutdown  the  SMT job you need a  little command file that I have
 called  TOSMT  This  little command file  that can interface directly
 with the SMT job and instruct it to either STOP or CHECK.

	PARM SMTMODE
	FILE SMTMESS=SMTMESS,OLD:SHR:GMULTI
	ECHO !SMTMODE > *SMTMESS

With  this  command  file typing TOSMT STOP  will shutdown the SMT job
almost  immed  iately.  Doing a TOSMT CHECK will  instruct SMT to do a
check of available $STDLISTs now.

Some essential lines for the SMTERROR file are:

UPS(R[1:4])  <> ':EOJ' AND RECNUM =  VEFINFO(FNUM).EOF - 2 '(FSERR' or
'(CIERR' or 'NOT STORED'

The  above  is  a  good  example  of the power  and flexibility of the
MPEX/3000  softwa re from VESOFT. A further enhancement to the package
which  we have developed is to allow viewing and printing of $STDLISTs
for  all jobs which have run in th  e last 30 days. This is a function
key  driven online system, again written  entirely in MPEX/3000 If you
require  any assistance in implementing any of the above at your site,
ple  ase  call  me  (Adrian  Partridge)  at VESOFT in the  UK, on UK -
121-352-0707.

Go to Adager's index of technical papers