MAKING OTHER PEOPLE'S PROGRAMS DO
                 WHAT THEY WERE NEVER INTENDED TO DO
                       by Eugene Volokh, VESOFT
        Presented at 1990 INTEREX Conference, Boston, MA, USA
              Published by INTERACT Magazine, Apr 1991.


THE PROBLEM

We  first  got  the  idea  for the MPEX hook in  1980, when one of our
earliest users complained to us about the time it took him to get into
MPEX.  Whenever he was in EDITOR and  wanted to use MPEX, he'd have to
/KEEP the file, exit EDITOR, get into MPEX, do the command, exit MPEX,
re-enter EDITOR, and re-/TEXT the file -- a lot of work, especially on
his overloaded Series III (remember those?).

We  were  thus  faced  with  a substantial problem.  What our customer
really  wanted  was  a change to EDITOR --  some way in which he could
execute  MPEX commands directly from within EDITOR, without exiting or
re-entering,  /TEXTing  or  /KEEPing. He wanted  us to modify somebody
else's program.

Unfortunately,  we did not have the  EDITOR source code; however, even
if  we  had it and modified it to  suit our needs, we'd have to repeat
this  modification  every  time a new version  of EDITOR came out (and
re-send  this  new version to all  of our customers). Furthermore, the
same sort of feature would be needed in TDP, QUERY, etc. -- even if we
had the source code to these programs, we wouldn't want to modify them
all (and then re-modify them for each subsequent version).

Now, if we couldn't modify the source code, could we modify the object
code?  Perhaps  find  out  which  locations  needed  to be  patched to
implement  this  feature, much like HP  sometimes sends out patches to
fix  certain MPE bugs? There were several  reasons why we could not do
this:

   *  Patching  object  code -- especially someone  else's -- is hard.
     Object  code is very hard to understand, and often it's difficult
     to tell if a patch you make might have an unexpected side-effect.
     (I say this now, in 1990 -- in 1980, it was even harder for me to
     deal with.)

   *  While patches can be used to delete chunks of code (by branching
     around  them)  or  to make small changes,  they cannot readily be
     used  for  additions.  It's very difficult to  insert code into a
     segment,  and  even  more  difficult  to  add  calls  to external
     procedures  that  the  segment doesn't already  call. To do this,
     you'd  almost have to write a  "program file editor" program that
     could  manipulate  program files, and though I  know how to do it
     now, I didn't know how to do it then, and don't want to do it now
     even though I know how.

   *  Patches would have to be generated  for every new version of the
     patched  program  that  comes out, and we'd  have to start almost
     from  scratch for every such version  (since the locations of the
     various  pieces  of  code in the program  and in each segment are
     likely  to  change  quite  radically,  and  the  entire  internal
     structure of this part of the program might change).

All in all, just patching object code was dangerous and difficult.


TRAPPING PROCEDURE CALLS

Fortunately,  there  was an alternative to  patching code directly, an
alternative  that  was pioneered (to the best  of my knowledge) by Bob
Green  of  Robelle.  (Even  if  he  didn't originate  it himself, he's
certainly the one from whom I adapted it.)

For  space and performances reasons, the  files handled by Bob's QEDIT
text  editor had their own special  internal format; a program (like a
compiler)  that expected normal EDITOR-generated  files would be quite
surprised  to  get  a  QEDIT  file.  But,  since  QEDIT  aimed  to  be
substantially  faster  (as  well  as  more powerful)  than EDITOR, Bob
didn't  want to have to convert the QEDIT file to EDITOR format before
each compile.

The thing that Bob took advantage of -- and I eventually did too -- is
that  programs are not self-contained. All  of their dealings with the
outside  world -- with disc files, with the terminal, etc. -- are done
through  intrinsics  (or some other system  SL procedures). If only we
could  cause  the programs to call our  own procedures that would look
like  the  system intrinsics but actually do  our own stuff, too (e.g.
pretend  that QEDIT files are actually normal EDITOR files, or process
user  input before the program gets it in order to possibly execute it
as  an  MPEX command), we could  change the program's behavior without
its  even noticing. (The way that we  would make the programs call our
own  procedures  is  by moving the programs  into a separate group and
putting  procedures  with the same names  as intrinsics into the group
SL.)  For instance, if we could replace  the READX called by a program
by our own procedure that:

   * accepts exactly the same parameters as the real READX;

   * calls the real READX;

   *  checks  to see if the user's  input starts with a "%" character,
     and if so, passes it to MPEX to be executed as an MPEX command;

   *  returns exactly the same values as the real READX (including the
     condition code)

then a user will be able to execute MPEX commands from that program by
just prefixing his input with a "%"; the program itself would not have
to be patched, since all of the logic will be in our SL procedure.

Note how this approach avoids the problems of object code patching:

   *  We don't have to read object code,  since all that we need to do
     is emulate the calling sequence of a well-documented procedure.

   *  We can easily add new  functionality, since the SL procedure can
     be of almost unlimited size.

   *  We  probably  won't  have  to  worry  about new  versions of the
     program,  since  no matter what changes  are made to the program,
     the  program  will probably still call  READX in the same context
     and for the same purpose as it did before.

Of course, there are also some limitations with this approach:

   *  We  can only alter those aspects  of the program's behavior that
     are  accomplished  through  external procedure  calls -- internal
     calculations  and  checks  will  often  be beyond  our reach. For
     example,  we  can implement a multi-line  REDO facility in EDITOR
     (since  we can, by intercepting  terminal input calls, record all
     the input that the user's given us and replace "redo" commands by
     the  appropriate  line of input), but we  can't input, say, a new
     feature  on a /CHANGE or /LIST  command because we don't have any
     access  to the EDITOR work file  and all of the work-file-related
     tables that EDITOR keeps.

   *  Though we will know all the parameters of the procedure call, we
     may  have  very  limited  information  about  its purpose  -- for
     instance, is this READX call intended to prompt for a command (in
     which case we want to process commands prefixed by a "%"), or for
     a  line  of text being added to the  file (in which case we don't
     want this).

And, there are some practical difficulties that we need to overcome to
make  this  approach  fully  workable -- more about  them in the pages
ahead.


WRITING THE PROCEDURE

The best way of discussing things further, I think, is to walk through
a  very simple example of "hooking" a program that you can actually do
as  you read the paper. Unfortunately,  it'll be of limited use (since
it  was chosen for simplicity rather than utility), but it might still
be somewhat impressive -- we'll "teach" LISTDIR5 to honor MPE commands
prefixed  by  ":"s,  so  you  can, for instance,  say ":ALTSEC xxx" or
":NEWGROUP xxx" or something like that when prompted with the LISTDIR5
">" prompt.

The first question that we must ask is

   which system intrinsic call can we intercept to get the job done?

The answer to this is quite simple -- the READX intrinsic. Our plan of
attack will be:

   *  Write  a  procedure  with  exactly the same  calling sequence as
     READX.

   * Call the real READX to get the input from the user.

   *  Check the input to see if it starts with a ":" -- if so, execute
     it as an MPE command using the COMMAND intrinsic.

   * Return exactly the same results as READX would.

So, let's begin:

   $CONTROL USLINIT, SUBPROGRAM, SEGMENT=MY'READX
   BEGIN
   INTEGER PROCEDURE READX (BUFFER, LEN);
   VALUE LEN;
   ARRAY BUFFER;
   INTEGER LEN;
   BEGIN

The very first thing that you notice is: this is written in SPL. What,
you   say   that   you   don't   know   SPL?  Well,  that's  perfectly
understandable,  but unfortunately there are two crucial things that a
hook procedure needs to be able to do that just cannot be done in some
languages:

   *  Accept as input virtually any  kind of parameter -- word address
     or byte address, by value or by reference.

   * Return a condition code.

To  the best of my knowledge, only  SPL and PASCAL procedures can take
by-value  parameters (like READX's LEN  parameter); in MPE/V, only SPL
procedures  can  return  a condition code  (though MPE/XL's HPSETCCODE
intrinsic permits other languages to do this on MPE/XL).

However, with the following procedure:

   PROCEDURE VESETCCODE (I << 0 = CCG, 1 = CCL, 2 = CCE >>);
   VALUE I;
   INTEGER I;
   BEGIN
   INTEGER ARRAY Q(*)=Q+0;
   Q(-Q(0)-1).(6:2):=I;
   END;

you  can set the condition code from, say, a PASCAL procedure, as long
as  you  call  it (VESETCCODE) from the  hook procedure itself and not
from any of the procedures called from within it.

Armed  with  VESETCCODE,  there's  no reason why  you can't write hook
procedures  in  PASCAL  on MPE/V (in fact, I'll  even use it in my SPL
examples)  though  I  think that you still can't  do them in any other
language.

OK,  back  to  our sample procedure. Note  that we created a procedure
header  that exactly corresponds to the  calling sequence of the READX
intrinsic. Each of the parameters must match exactly, both in type and
in  mode (by value/by reference); the return value must be exactly the
right  type,  and  if  the  procedure we're intercepting  is an OPTION
VARIABLE  procedure,  so  must  ours be. (PASCAL  programmers: you can
still  hook  OPTION VARIABLE procedures if  you realize that an OPTION
VARIABLE  procedure is just the same as  a normal one but has an extra
by-value  parameter  at the end that  contains the OPTION VARIABLE bit
mask.)

Now, let's continue:

   $CONTROL USLINIT, SUBPROGRAM, SEGMENT=MY'READX
   BEGIN
   INTEGER PROCEDURE READX (BUFFER, LEN);
   VALUE LEN;
   ARRAY BUFFER;
   INTEGER LEN;
   BEGIN
   INTRINSIC READX;
   BYTE ARRAY BUFFER'B(*)=BUFFER;
   INTEGER LEN'READ;
   BYTE ARRAY TEMP'CMD(0:255);
   INTEGER CIERR;
   INTEGER FSERR;
   LEN'READ:=READX (BUFFER, LEN);
   IF > THEN VESETCCODE (0)
   ELSE IF < THEN VESETCCODE (1)
   ELSE
     BEGIN
     IF LEN'READ<>0 AND BUFFER'B=":" THEN
       BEGIN
       MOVE TEMP'CMD:=BUFFER'B(1),(LEN'READ-1);
       TEMP'CMD(LEN'READ-1):=%15;  << carriage return >>
       COMMAND (TEMP'CMD, CIERR, FSERR);
       END;
     VESETCCODE (2);
     END;
   READX:=LEN'READ;
   END;

As  you  see,  we  call  READX, check the condition  code, set our own
return condition code appropriately, and if the read succeeded and the
input line starts with a ":", call the COMMAND intrinsic.

What is wrong with this picture? Well, there are three problems:

   *  First, and most important of all (I'm sure you noticed this), we
     have  our  procedure  called  READX calling  the intrinsic READX.
     You'd  think  that since you declared  READX as an intrinsic, the
     compiler will recognize that you want to call the READX intrinsic
     in the system SL.

     This,  unfortunately,  is not the case.  When the linker sees the
     call  to  READX,  it  views  it  as  a recursive call  to our own
     procedure and not a call to the READX intrinsic. (To make matters
     worse,  the  SPL  compiler  will  not flag  the "INTRINSIC READX"
     declaration  as a duplicate symbol error.)  In fact, we will find
     that  this  --  how  to  call  the  real procedure  from our hook
     procedure  --  is  one  of the more  substantial problems that we
     face.

   *   Secondly,   note  the  MOVE  TEMP'CMD:=BUFFER'B(1),(LEN'READ-1)
     statement  --  why  is  it wrong? Because the  way that the READX
     intrinsic is defined, its result (which we put into LEN'READ) may
     be  the number of bytes or the number of words read (depending on
     whether the LEN parameter was negative or positive). Actually, we
     might  discover  that  LISTDIR5  always  passes  a  negative  LEN
     parameter  and thus the READX result will always be in bytes, but
     we  don't  want to count on that  (especially if we want the hook
     procedure  to  be  general).  The  rule is thus  that you must be
     prepared  for  any  possible  set  of  input  parameters  and any
     possible result returned by the intrinsic.

     In  other  words,  instead  of LEN'READ, we  should have said (IF
     LEN<0 THEN LEN'READ ELSE 2*LEN'READ).

   *  Thirdly, what happens when a command that's prefixed by a ":" is
     input? Indeed, it will be executed as an MPE command, but it will
     then  be  returned  to  LISTDIR5  as  the  result of  the read --
     LISTDIR5  will  see  it as an invalid  command, and will output a
     nasty message.

     This  is important to remember --  when you intercept a procedure
     call,  from the program's point of view the call still completes,
     and  the  program  will  act  upon the data  returned by the hook
     procedure.  In  this  case,  we  should  make sure  that the data
     returned to LISTDIR5 is such that LISTDIR5 will do as little with
     it  as possible -- in LISTDIR5's  case, returning an empty string
     (just  as if the user hit return). For this, we'd have to set the
     function  result to 0 (0 characters  read), but we'd also have to
     make  sure that the buffer returned to the program is in the same
     state  as  it  was when our procedure  was called, since programs
     often  calculate  the  length of the data  input not by the READX
     result, but by the position of some terminating character (e.g. a
     carriage return) that they filled the buffer with.


CALLING THE REAL PROCEDURE

Let's  get  back  for  a moment to the  first problem mentioned in the
above list -- if we call the intrinsic READX from our READX procedure,
we get an infinite loop. What can we do about this?

Well, there are three possible solutions:

   * Since we're putting our READX in a group or account SL anyway, we
     can  put an "intermediary" procedure  called, say, INT'READX into
     the  system SL (or any SL higher  than the one in which our READX
     is)  -- our READX can call INT'READX, which will then call READX.
     Since  SL's  are  always  searched  in the  order group, account,
     system, a call to READX from an INT'READX that's in the system SL
     will  not call back to our  group/account SL READX, but rather go
     to the real READX in the system SL.

     The INT'READX procedure might then look something like this:

        INTEGER PROCEDURE INT'READX (BUFFER, LEN);
        VALUE LEN;
        ARRAY BUFFER;
        INTEGER LEN;
        BEGIN
        INTRINSIC READX;
        INT'READX:=READX (BUFFER, LEN);
        IF > THEN VESETCCODE (0)
        ELSE IF < THEN VESETCCODE (1)
        ELSE VESETCCODE (2);
        END;

   *  Another  alternative  is  to have our READX  call the real READX
     using  the LOADPROC intrinsic. Among  other things, LOADPROC lets
     you  specify that you want to  load the procedure from the system
     SL,  so  we won't get into the  same recursive loop that we would
     have had if our READX just tried to call READX directly.

     I  won't go into any more detail as to how this is done, but I do
     want to point out that one problem with this approach is where to
     put  the  plabel  returned  by the LOADPROC  procedure so that we
     don't have to re-LOADPROC for every procedure call. Actually, for
     intercepting  READX  we can afford to  re-LOADPROC the real READX
     every  time our READX is called because subsequent LOADPROCs of a
     procedure   that's  already  been  LOADPROCed  take  only  a  few
     milliseconds; however, if we're intercepting a more time-critical
     procedure,  like FREAD, we'd have to  be sure to LOADPROC it only
     once,  in  which  case we'd need some  global storage to keep the
     plabel. More about the global storage problem later.

   *  One other alternative that's  worth considering is somewhat more
     difficult to do but cures a very substantial disadvantage present
     in the first two solutions.

     Say  that  you  want  to hook a number  of programs all over your
     system,   for  instance  to  call  your  own  DBOPEN  replacement
     procedure  instead  of  the DBOPEN intrinsic (which  we do in our
     SECURITY/3000  VEOPEN  module), or to  call a replacement COMMAND
     procedure  (which we do in  our SECURITY/3000 STREAMX module). If
     you  want to intercept these  calls using procedures named DBOPEN
     and  COMMAND, you'd have to put these procedures into local group
     or account SLs in every group or account in which the programs to
     be  hooked  reside.  This can prove  quite cumbersome, especially
     when it comes time to install a new version of your procedures --
     you  might  have  to replace dozen of  SLs in dozens of different
     accounts.  The trouble, of course, is that you can't put your own
     hook  procedures  into  the  system SL, since  they have the same
     names as the real intrinsics.

     The  way  that  you  can  get around this  problem is by actually
     patching  all  the programs to be hooked  to call not a procedure
     called  DBOPEN, but rather one called, say, VEOPEN. Then, you can
     put  the  VEOPEN  procedure into the system  SL, since it will no
     longer  conflict with the real DBOPEN -- furthermore, since it is
     called  VEOPEN,  there'll be no problem  with it calling the real
     DBOPEN without threat of recursion. When a new version of VEOPEN,
     incidentally,  is  installed, you won't have  to re-patch all the
     programs,  but only replace the module  in the system SL. (On the
     other  hand,  whenever  you  roll  in a new  version of a patched
     program, you'd have to re-patch it.)

     Patching  the programs might at  first glance seem difficult, but
     it  actually isn't. All program files (I'm speaking here of MPE/V
     and  CM  programs)  contain at a  well-defined place an "external
     reference  list",  which  is  a  list of the names  of all the SL
     procedures that they call (together with some other information).
     Simply  by  replacing  the  procedure  name "DBOPEN"  by the name
     "VEOPEN"  you  can  make  the  procedure  call VEOPEN  instead of
     DBOPEN.  (Note  that  the  two procedure  names are intentionally
     chosen  to  be  the  same  length.)  The layout of  this table is
     described in Chapter 10 of the MPE/V System Tables Manual -- it's
     not  that hard to write a program that modifies it. It's not much
     more  difficult (though it is extra work) to write a program that
     modifies the external reference list of an SL segment.

     All  in all, we've found patching to  work quite well for us, but
     the  additional  cost of writing a  program to patch the external
     reference  list  might  make  it a rather  expensive solution for
     some.


GLOBAL STORAGE

So  far,  with  the READX and INT'READX  procedures, we've done pretty
much  what  needs  to be done to  get our new-and-improved LISTDIR5 to
work. All we need to do is:

   *  Copy  LISTDIR5  into  our  own  group  (it'll  have  to  have PM
     capability, but that's just because LISTDIR5 itself needs PM).

   * Add the READX procedure (as finally corrected) to the group SL.

   * Add the INT'READX procedure to the account or system SL.

   * :RUN LISTDIR5.ourgroup;LIB=G

If  we've done everything right, our toy should work just fine; we can
even  move  other  programs (e.g. DBUTIL) into  our group and have the
very same procedure work for them.

Unfortunately,  one of the reasons why this was so easy (you did think
it  was easy, didn't you?) is because  the problem that we set for our
ourselves  was  quite  easy.  The feature that  we wanted to implement
could be implemented entirely within one READX call; we didn't need to
save any information from one call to the next.

What  if we did need to save information this way? For instance, if we
wanted  to  implement  a  multi-line  REDO,  we'd  have  to  save some
information  (e.g.  the  file number of the  REDO file) from one READX
call to another -- we'd also need to be able to tell when our READX is
called for the first time, so that we can initialize this information.
(Actually,  a number of useful features -- like SECURITY/3000's VEOPEN
and  STREAMX's  interception  of  the  COMMAND  intrinsic  --  can  be
implemented  without  using  global  storage, but  many other features
can't be.)

SL procedures in MPE/V are not allowed to have global storage of their
own -- if you try to add a procedure that uses global or OWN variables
to  an  SL, it will fail. Procedures  that have the cooperation of the
caller  (like the V/3000 intrinsics) can get around this by having the
caller  pass  them  an array that contains the  data that they need to
preserve  from  call to call (e.g. the  V/3000 VCOMAREA array), but we
don't  have  that luxury since we  must remain scrupulously compatible
with the calling sequences of the procedures we're intercepting.

Where  can  we  put  our global data? There are  a number of places in
which  MPE  lets  us  keep  information  that  won't  vanish  from one
procedure call to the next, but all of them have their own problems:

   *  Files and extra data segments -- you can put a lot of data here,
     but  it's rather slow to access (even a few milliseconds per call
     can be slow when we're intercepting a frequently called procedure
     like  FREAD, though it can be acceptable for, say, READX, DBOPEN,
     or  COMMAND,  which  take  much longer  anyway). Furthermore, you
     still  have to find a place to keep the file number or extra data
     segment index!

   *  JCWs -- these are also rather  slow, and they can only contain a
     single  word each. Furthermore, they're session-local rather than
     process-local,  so multiple hooked processes  in the same session
     might have trouble.

   *  The  "DL-to-DB"  area of the stack --  this can be accessed very
     quickly  (since  it's  just  as  much part of  your stack as your
     procedure-local  variables),  but  is  often already  used by the
     hooked  program  (especially if it calls  V/3000 or uses PASCAL's
     "heap"  mechanism).  There are a few words  a little bit below DB
     (DB-10  through  DB-1) that are often  not used by most programs,
     especially  programs written in SPL, but again it's possible that
     the  program you're trying to hook  uses them. This is especially
     relevant if you're trying to write a general-purpose hook routine
     that  is supposed to work for all  programs -- in fact, the first
     version  of  our MPEX hook routine  used one of these DL-negative
     words until we ran into a program that wanted to use it, too.

As  you  see, this is not a pretty  sight. There are things you can do
with  one  or  more  above  mechanisms  that  might work  in your case
(especially if speed is not a problem), but there doesn't seem to be a
very good general solution.

The  best  solution (pioneered by Bob  Green) is somewhat difficult to
implement  but  ultimately  far  superior  to any of  the above. As we
mentioned  before, Bob's QEDIT text editor was written with efficiency
very  much  in mind, and when he  decided to have compilers read QEDIT
files,  it  was very important that they  do this as fast as possible.
One  of  the key procedures that he  needed to intercept was the FREAD
intrinsic  (which  often takes only a  couple of milliseconds), so the
access  to his global storage had to be as fast as possible. He pretty
much had to have all the global storage be kept in the stack.

To  understand  how  this  approach  works, one has  to realize what a
program  file contains. A program file  is essentially a blueprint for
the  loader  that  describes  how  to  load  the process.  It contains
information  on all the code segments (which  is to go into the CSTX),
the  names of all the external references (which are to be loaded from
SLs), and an image of what the process's stack is to look like when it
starts  up.  All  the  initial  values of all  of the program's global
variables  are  kept  here, and when the  loader loads the program, it
allocates the right amount of global area (the size of the global area
is  also  kept  in  the program file) and  fills it with these initial
values.

Bob  was  already patching the program  file's external reference list
(see  the  discussion  above),  so he decided  to expand the program's
initial  global values area to include room for his own global values.
Since  the  program  by  definition didn't use any  of the global area
beyond  what  it  thought was available,  his storage wouldn't collide
with  the  program's  storage;  and, he could add  as much space as he
needed  (keeping in mind, however, that  if the program already used a
lot of stack space, this might cause stack overflow problems).

So the the general plan -- again, you might want to look at Chapter 10
of your System Tables Manual for this -- was:

   *  Modify the "global area size" word  in record 0 to indicate that
     there is more global area.

   *  Insert  as  many  records  as needed after  the global area (and
     before  the code segment area)  in the program file, initializing
     them  to  whatever  values you wanted to  initialize them to. The
     insertion,  of  course, has to be done  by creating a new copy of
     the program and copying all the data from the old program.

   *  Modify  the  record  numbers (in record 0)  of the segment area,
     external  reference list area, and entry list area to reflect the
     fact that we've inserted records.

   * Set the record number of the FPMAP area to 0, since unfortunately
     the FPMAP area of the program contains a lot of internal pointers
     with  record  numbers, and rather than  readjusting them all, you
     should  probably  just tell the system that  there is no FPMAP in
     this program.

For example, if the old program used 5000 words of stack space and you
wanted  to  have  256 words of your own,  you'd change the global area
size  to  5256, insert 2 records (256  words, 128 words per record) at
the  end of the global area,  and increment the segment area, external
reference list area, and entry list area record numbers by 2. It seems
like  a  fairly  complicated manipulation, but  it really isn't; armed
with  Chapter 10 of the System Tables  Reference Manual, you can do it
quite easily.

There  is  one  more  problem  to  be dealt with.  You run the patched
program  and it has 256 extra words  of global area; but how does your
hook procedure know where those words are? You can't just hardcode the
address  into  the procedure, since you'd like  it to work for various
programs  (and  in  any  event the end of the  global area of even one
program will probably change from version to version). Instead, here's
what you can do:

   *  When  you insert your global area,  make sure that the part that
     you  want  to  use  starts  on,  say,  a  128-word  boundary. For
     instance,  in our 5000-word-global-area  program, you'd make sure
     that  your  256-word global area starts  at word #5120 (the first
     multiple of 128 after 5000) -- thus, you'd expand the global area
     to 5376 words and just waste the 120 words between words 5000 and
     5119.

   *  Set  the  first few words of the  non-waste part of the data you
     insert  into the program to some unique pattern that's not likely
     to  appear in a normal program's  global area. (Remember that the
     data  that  you  insert  into the copy of  the global area in the
     program file will make its way into the program's stack.)

   *  In  your  hook  procedure,  try  to find this  unique pattern by
     looking at word 0, then word 128, then word 256, etc.

This  way, you find your global area by the unique pattern that you've
initialized  its first few words to, but you don't have to check every
word  in your stack (which would take  too long) because you know that
your  global area starts on a 128-word boundary. An example of this in
SPL might be:

   INTEGER POINTER IP;
   @IP:=0;  << make IP point to DB+0 >>
   WHILE IP(0)<>123 AND IP(1)<>456 AND IP(2)<>789 AND IP(3)<>555 DO
     @IP:=@IP(128);

Note  that  your  unique  sequence  must  be a  sequence that's highly
unlikely  to ever appear in the program's stack; if, for instance, you
choose  a  normal piece of text,  it's possible (though unlikely) that
this  piece  of  text will somehow appear in  the program's stack at a
128-word boundary (perhaps input from the terminal or a file) and will
thus make you find the wrong area. I use a fairly unlikely sequence of
5 words, many of which represent unprintable ASCII characters.


WHICH PROCEDURES TO PATCH

The preceding discussion assumed that you knew exactly which procedure
is to be patched, e.g. READX in LISTDIR5. Unfortunately, things aren't
always quite this simple.

Most  tasks  can  be  performed  by a program  in different ways. Some
programs,  for  instance,  use  READX  to read from  the terminal, but
others (like SPOOK) use READ, and others (like EDITOR) use FREAD. When
writing  our  "MPEX  hook"  procedures, we wanted to  work with all of
these  programs,  so we needed to hook  all of the procedures. Hooking
READ  was quite simple, since it is  very simple to READX, but dealing
with  FREAD was more difficult, because it  was used by EDITOR to read
both  from  the  terminal  and from files. We  wanted to have terminal
input  that  was prefixed by "%" be  executed as MPEX commands, and we
wanted  to save terminal input in  the multi-line REDO history, but we
obviously  didn't want this done to, say,  lines from the file that we
were /TEXTing in.

Our  first  thought was to call FGETINFO  inside each execution of our
FREAD  hook  procedure to see if we  were reading from the terminal or
not,  but this was far too inefficient -- imagine calling FGETINFO for
each  line  of  a  10,000-line long file.  Instead, we found ourselves
having  to  hook  FOPEN calls just so that  we can check once per file
open  whether  we  were opening $STDIN or  $STDINX, and recording this
information  for each file -- then our FREAD hook could just look into
this array of flags to see if this was a terminal file or not.

Similar  problems arise when programs use other mechanisms for reading
from  the  terminal  --  programs  written in PASCAL  often use PASCAL
compiler  library  routines to do terminal  I/O; these routines can be
quite  difficult  to  hook  simply  because, unlike  intrinsics, their
calling sequences are undocumented.

The  problem  of  FREADs  from  the terminal vs.  FREADs from files is
actually  a  symptom  of  a greater problem --  what we really want to
distinguish  is  not  terminal  vs.  file  input, but  rather input of
commands  (which might come from files,  e.g. /USE files) from reading
of  data  (which might come from the  terminal, e.g. in /ADD mode). We
really want to distinguish FREAD calls based on what EDITOR intends to
do  with  the  data read, which unfortunately  we cannot do, since the
essence  of  the problem is that EDITOR  isn't cooperating with us and
isn't telling us anything about what it's doing.

We  might try to tell which FREAD call is which by looking at where in
the  program  the  FREAD is being called  (we can get this information
from  our hook FREAD procedure's stack marker), and seeing if it's one
of  those locations in which EDITOR does command input; unfortunately,
this  leaves us with almost all the problems that would be involved in
directly  patching the program's code --  we'd have to read the object
code  to find all the right locations  to patch, they would only apply
to this particular program, and they would have to be recalculated for
each new version of the program.

Fortunately, sometimes, you can get information as to the "purpose" of
a  call  in  surprising  places  -- for instance, you  may find that a
particular program reads command input by passing a read length of -80
to  READX but reads /ADD-mode input by  passing a read length of -255,
and thus use the read length to distinguish the two kinds of input.

To  keep your hook procedure general, you  might even want to keep the
expected  read  length as a value in  the global area that the program
that  you  use  to  hook  other  program files inserts,  and have this
"hooking  program"  prompt for what expected read  length is to be put
into  the  global  area.  This  way,  you can, at the  time you hook a
program,  communicate various special attributes of the hooked program
to the hook procedure that will be called from this program.


HOOKING NATIVE MODE PROGRAMS

Hooking  MPE/XL  native  mode  (NM)  programs is  a somewhat different
story,  but  the  essentials  are  still  the same. Just  as you might
intercept,  say,  FREAD calls from a CM  (or MPE/V) program by putting
your  own FREAD procedure into a group SL and running the program with
;LIB=G, so you can intercept FREAD calls from an NM program by putting
your  own  FREAD  into  an  XL  and  then  running  the  program  with
;XL=yourfile.  Some  aspects  of  this  are easier to  do than with CM
programs, while others are a bit harder.


UPPER AND LOWER CASE

Unlike  MPE/V,  in which all procedure  names are in uppercase, MPE/XL
lets  you  have upper and lower (and  mixed) case procedure names; the
procedures  "FREAD" and "fread" are two different procedures. (This is
necessary for supporting C, which cares about case.)

All  system  intrinsics  are  declared  in  uppercase, but  by default
PASCAL/XL  procedures are created with lowercase names. If your native
mode  program calls "FREAD", and you run  it with an XL= of yours that
contains your own procedure called "fread" (which is what a "PROCEDURE
FREAD"  declaration  will by default create),  your procedure will not
get called because of the difference in names.

Fortunately,  the solution is simple. Do a "$UPPERCASE ON$" before the
procedure declaration; this will tell the PASCAL/XL compiler to create
the procedure with an uppercase name.


DECLARING YOUR PARAMETERS CORRECTLY

Just  as it is vital in MPE/V  to emulate exactly the calling sequence
of  the procedure to be intercepted, so it is equally vital in MPE/XL.
All  the  by value parameters must be  by value, the by reference ones
must be by reference, and all the types must be identical.

However, there are quite a few more subtle issues involved as well:

   *  ALIGNMENT:  You  can  tell  the  PASCAL/XL  compiler  whether  a
     parameter  must start on a byte,  half-word, or word boundary. By
     default, PASCAL/XL will expect most parameters to start on a word
     boundary, so if you just declare your procedure as

       TYPE TARRAY = ARRAY [1..65536] OF INTEGER;
       ...
       FUNCTION FREAD (FNUM: SHORTINT;
                       VAR BUFFER: TARRAY;
                       LEN: SHORTINT): SHORTINT;

     then  PASCAL/XL will emit code that assumes that BUFFER begins on
     a  word  boundary.  If the caller then  calls FREAD with a BUFFER
     that  doesn't  start  on  a word boundary,  you'll get a run-time
     error.

     What can you do? You should get a listing of the system intrinsic
     file by compiling the following short program:

       $LISTINTR 'LISTFILE'$
       PROGRAM DUMMY;
       BEGIN
       END.

     This  will  send  a  listing of the calling  sequences of all the
     intrinsics  in  the system intrinsics file  to the file LISTFILE;
     you  can then look up your intrinsic there (note that the file is
     not  in  alphabetical  order)  and see what  sort of alignment it
     shows  for  that  parameter. If it's  "8-BIT ALIGNED", you should
     prefix   the   parameter  with  "$ALIGNMENT  1$"  (e.g.  "BUFFER:
     $ALIGNMENT 1$ TARRAY"); if it's "16-BIT ALIGNED", use "$ALIGNMENT
     2$";  if  it's  "32-BIT  ALIGNED",  you don't  need a $ALIGNMENT$
     keyword.

     I  suspect,  however,  that if you declare  all your by reference
     parameters  with  "$ALIGNMENT  1$", you should  have no problems;
     your procedure will run a bit slower, but not by very much.

   * LONG VS. SHORT POINTERS: One other problem with the FREAD calling
     sequence  shown above is that it declares BUFFER as just being of
     type  "TARRAY", i.e. being passed as a 32-bit pointer to an array
     kept  in  the  process's "short address  space". Actually, if you
     look  at  the  intrinsic  file  listing for  the FREAD intrinsic,
     you'll  find  that  BUFFER  is passed as a  "LONG ADDR", a 64-bit
     pointer.  You  must  declare  the parameter as  a 64-bit pointer,
     either by saying:

       TYPE TARRAY_PTR = ^ $EXTNADDR$ ARRAY [1..65536] OF INTEGER;
       ...
       FUNCTION FREAD (... BUFFER: TARRAY_PTR; ...): SHORTINT;

     or by saying

       FUNCTION FREAD (... BUFFER: GLOBALANYPTR; ...): SHORTINT;

     Note  that, once you've declared BUFFER  as a pointer rather than
     as an array, you should no longer pass it as a "VAR".

   *  OPTION EXTENSIBLE: Some MPE/XL  intrinsics which take a variable
     number  of  parameters  are  declared as  OPTION EXTENSIBLE. This
     tells  the compiler to pass a single word at the beginning of the
     parameter list that contains the total number of parameters being
     passed.  If  you're  trying  to  intercept  an  OPTION EXTENSIBLE
     procedure, you need to make your own procedure OPTION EXTENSIBLE,
     too.

     Unfortunately,  it's  hard  to  tell which  procedures are OPTION
     EXTENSIBLE  and  which  are  not.  Some, instead  of being OPTION
     EXTENSIBLE,  are  declared with OPTION  DEFAULT_PARMS; this tells
     the  compiler  to  set  the  values of omitted  parameters to the
     specified  default  values, which makes  the parameter count word
     unnecessary.  You must look at the intrinsic file listing and see
     whether the procedure was indeed declared with OPTION EXTENSIBLE.
     If  it  was,  however, it doesn't  matter how many non-extensible
     parameters  it  has;  an  "OPTION  EXTENSIBLE  0"  declaration is
     enough. You need not compile your intercepting procedure with the
     same OPTION DEFAULT_PARMS values as the intrinsic was; those only
     matter when the calling program is compiled.

   *  ANYVARs:  If  a  PASCAL/XL  procedure  is  declared  with ANYVAR
     parameters  but does not have  an OPTION UNCHECKABLE_ANYVAR, then
     for  every  such  ANYVAR parameter, the size  of the parameter is
     passed  together  with  its  address.  If an  ANYVAR parameter is
     passed  to  the  procedure  being  intercepted, and it  is not an
     OPTION  UNCHECKABLE_ANYVAR, then the  intercepting procedure must
     also  be non-UNCHECKABLE_ANYVAR and must declare the parameter as
     an  ANYVAR. If, however, the parameter is  a VAR, or is an ANYVAR
     and  the  procedure  is  an  OPTION UNCHECKABLE_ANYVAR,  then the
     intercepting procedure can declare it as a simple VAR, too.


GLOBAL STORAGE

Native  mode XL routines can have global  storage of their own, so you
don't need to use any of the tricks we've discussed above to save data
from  one  call  of the procedure to the  next. In particular, if your
FREAD  needs to call the real FREAD, it can do an HPGETPROCPLABEL (the
NM  equivalent of LOADPROC) of the real FREAD and save the plabel in a
global variable. Declaring these global variables is quite simple:

   $SUBPROGRAM$
   $GLOBAL$
   PROGRAM DUMMY_OUTER_BLOCK;
   VAR globvar: type;
   ...
   PROCEDURE FREAD ...

One  problem  is  that  PASCAL/XL  has  no way  of initializing global
variables  to a particular value, and  thus no way of checking whether
the  procedure  has been called before  and thus some special behavior
(e.g.  loading  a  procedure,  opening a file,  etc.) is required. One
trick  that  you can use is to check  if the variable is equal to some
special  constant  of yours, and if it  isn't, assume that this is the
first call to the procedure, do the initialization stuff, and then set
the  variable to that constant. Unfortunately, it is possible that the
variable will have had that value by accident from the beginning; this
may  be more likely than you might  expect, since this chunk of memory
might  have been used earlier by another process which was running the
same  routine,  and which initialized that  location in memory to your
flag value.

The  safest  solution would probably be to  write your procedure in HP
C/XL  (which  lets you initialize global  variables), or possibly just
write  one small procedure in HP  C/XL that declares this variable and
lets you access it.


RUNNING THE HOOKED PROGRAM

To run the hooked program, you should simply

   :RUN myprog;XL="interceptingxl"

What  if  you don't like to have  to always specify the XL= parameter?
Too  bad. Although a program can be :LINKed with a default ;XL=, it is
very  hard  to  patch  after  a  :LINK;  also, the  MPE/V technique of
changing the name of the called routine in the external reference list
and  adding  the procedure to the system  SL doesn't work, because the
format  of  the external reference list  is not documented and because
adding things to the system XL is much more difficult than adding them
to the SL.

The  only  time  that  having to run the program  with ;XL= would be a
serious  program  is  if  the program is  process-handled from another
program.  Fortunately, hooking gives us a solution (albeit a difficult
one)  to  this  problem---just  intercept CREATEPROCESS  (or CREATE or
COMMAND  or HPCICOMMAND or whatever the program uses) and "add" an XL=
parameter  to  the  calling  sequence.  (This  is what we  did for our
VECMMND  and  VEOPEN  routines,  which  intercept  COMMAND  and DBOPEN
calls.)

Another  possible solution, which might be easier in some cases, is to
rename  your old program (say,  MYPROG) to MYPROGUH ("Un-Hooked"), and
create  a  small  shell  program called  MYPROG, which CREATEPROCESSes
MYPROGUH  with  the  right XL= parameter (and  possibly also passes on
whatever  ;PARM= and ;INFO= values it was run with). This will cause a
bit more overhead, but this way whenever MYPROG is run, it will always
execute MYPROGUH with the correct XL=.


CONCLUSION

In  part,  this  paper is more a discussion  of an interesting type of
problem  solved  in  an interesting way than  a blueprint for your own
development  -- not everyone has the  needs described in this paper or
the means to satisfy these needs.

However,  various  people in the HP3000  community have,  more or less
independently,  used these techniques to accomplish some very valuable
things:

   * Robelle has gotten compilers to read QEDIT-format files.

   *  Various  people  have  intercepted IMAGE calls  to instead go to
     their own extended-IMAGE utilities.

   *  VESOFT has used hooking to  implement MPEX command execution and
     multi-line  REDO  from  EDITOR,  TDP,  etc., to allow  an SM user
     running some such editor to save files across account boundaries,
     to preserve the ACDs of files being edited, to implement an IMAGE
     database  security  system  by  hooking DBOPEN,  and to intercept
     COMMAND  intrinsic  calls  and to route  executions of the STREAM
     command through STREAMX.

There  are a couple of relatively simple things that come to mind that
you might do yourself:

   *  If you have your own internal  data storage format, you can hook
     your  favorite  text  editor  to  be able to  properly read those
     files.

   *  If  you want to disallow people  to execute certain MPE commands
     from  a program that normally  allows MPE command execution (e.g.
     EDITOR), you can hook it to reject those commands.

   *  If you want to implement a  control-Y trap in a program, you can
     hook  some procedure that the program calls at the very beginning
     and have your hook procedure arm the control-Y trap.

If you really want to do something substantial, I believe that you can
hook  QUERY to handle MPE and KSAM files by intercepting all the DBxxx
calls  to make the MPE and KSAM  files look like IMAGE databases. This
would be truly a feat.

Go to Adager's index of technical papers