MAKING OTHER PEOPLE'S PROGRAMS DO WHAT THEY WERE NEVER INTENDED TO DO by Eugene Volokh, VESOFT Presented at 1990 INTEREX Conference, Boston, MA, USA Published by INTERACT Magazine, Apr 1991. THE PROBLEM We first got the idea for the MPEX hook in 1980, when one of our earliest users complained to us about the time it took him to get into MPEX. Whenever he was in EDITOR and wanted to use MPEX, he'd have to /KEEP the file, exit EDITOR, get into MPEX, do the command, exit MPEX, re-enter EDITOR, and re-/TEXT the file -- a lot of work, especially on his overloaded Series III (remember those?). We were thus faced with a substantial problem. What our customer really wanted was a change to EDITOR -- some way in which he could execute MPEX commands directly from within EDITOR, without exiting or re-entering, /TEXTing or /KEEPing. He wanted us to modify somebody else's program. Unfortunately, we did not have the EDITOR source code; however, even if we had it and modified it to suit our needs, we'd have to repeat this modification every time a new version of EDITOR came out (and re-send this new version to all of our customers). Furthermore, the same sort of feature would be needed in TDP, QUERY, etc. -- even if we had the source code to these programs, we wouldn't want to modify them all (and then re-modify them for each subsequent version). Now, if we couldn't modify the source code, could we modify the object code? Perhaps find out which locations needed to be patched to implement this feature, much like HP sometimes sends out patches to fix certain MPE bugs? There were several reasons why we could not do this: * Patching object code -- especially someone else's -- is hard. Object code is very hard to understand, and often it's difficult to tell if a patch you make might have an unexpected side-effect. (I say this now, in 1990 -- in 1980, it was even harder for me to deal with.) * While patches can be used to delete chunks of code (by branching around them) or to make small changes, they cannot readily be used for additions. It's very difficult to insert code into a segment, and even more difficult to add calls to external procedures that the segment doesn't already call. To do this, you'd almost have to write a "program file editor" program that could manipulate program files, and though I know how to do it now, I didn't know how to do it then, and don't want to do it now even though I know how. * Patches would have to be generated for every new version of the patched program that comes out, and we'd have to start almost from scratch for every such version (since the locations of the various pieces of code in the program and in each segment are likely to change quite radically, and the entire internal structure of this part of the program might change). All in all, just patching object code was dangerous and difficult. TRAPPING PROCEDURE CALLS Fortunately, there was an alternative to patching code directly, an alternative that was pioneered (to the best of my knowledge) by Bob Green of Robelle. (Even if he didn't originate it himself, he's certainly the one from whom I adapted it.) For space and performances reasons, the files handled by Bob's QEDIT text editor had their own special internal format; a program (like a compiler) that expected normal EDITOR-generated files would be quite surprised to get a QEDIT file. But, since QEDIT aimed to be substantially faster (as well as more powerful) than EDITOR, Bob didn't want to have to convert the QEDIT file to EDITOR format before each compile. The thing that Bob took advantage of -- and I eventually did too -- is that programs are not self-contained. All of their dealings with the outside world -- with disc files, with the terminal, etc. -- are done through intrinsics (or some other system SL procedures). If only we could cause the programs to call our own procedures that would look like the system intrinsics but actually do our own stuff, too (e.g. pretend that QEDIT files are actually normal EDITOR files, or process user input before the program gets it in order to possibly execute it as an MPEX command), we could change the program's behavior without its even noticing. (The way that we would make the programs call our own procedures is by moving the programs into a separate group and putting procedures with the same names as intrinsics into the group SL.) For instance, if we could replace the READX called by a program by our own procedure that: * accepts exactly the same parameters as the real READX; * calls the real READX; * checks to see if the user's input starts with a "%" character, and if so, passes it to MPEX to be executed as an MPEX command; * returns exactly the same values as the real READX (including the condition code) then a user will be able to execute MPEX commands from that program by just prefixing his input with a "%"; the program itself would not have to be patched, since all of the logic will be in our SL procedure. Note how this approach avoids the problems of object code patching: * We don't have to read object code, since all that we need to do is emulate the calling sequence of a well-documented procedure. * We can easily add new functionality, since the SL procedure can be of almost unlimited size. * We probably won't have to worry about new versions of the program, since no matter what changes are made to the program, the program will probably still call READX in the same context and for the same purpose as it did before. Of course, there are also some limitations with this approach: * We can only alter those aspects of the program's behavior that are accomplished through external procedure calls -- internal calculations and checks will often be beyond our reach. For example, we can implement a multi-line REDO facility in EDITOR (since we can, by intercepting terminal input calls, record all the input that the user's given us and replace "redo" commands by the appropriate line of input), but we can't input, say, a new feature on a /CHANGE or /LIST command because we don't have any access to the EDITOR work file and all of the work-file-related tables that EDITOR keeps. * Though we will know all the parameters of the procedure call, we may have very limited information about its purpose -- for instance, is this READX call intended to prompt for a command (in which case we want to process commands prefixed by a "%"), or for a line of text being added to the file (in which case we don't want this). And, there are some practical difficulties that we need to overcome to make this approach fully workable -- more about them in the pages ahead. WRITING THE PROCEDURE The best way of discussing things further, I think, is to walk through a very simple example of "hooking" a program that you can actually do as you read the paper. Unfortunately, it'll be of limited use (since it was chosen for simplicity rather than utility), but it might still be somewhat impressive -- we'll "teach" LISTDIR5 to honor MPE commands prefixed by ":"s, so you can, for instance, say ":ALTSEC xxx" or ":NEWGROUP xxx" or something like that when prompted with the LISTDIR5 ">" prompt. The first question that we must ask is which system intrinsic call can we intercept to get the job done? The answer to this is quite simple -- the READX intrinsic. Our plan of attack will be: * Write a procedure with exactly the same calling sequence as READX. * Call the real READX to get the input from the user. * Check the input to see if it starts with a ":" -- if so, execute it as an MPE command using the COMMAND intrinsic. * Return exactly the same results as READX would. So, let's begin: $CONTROL USLINIT, SUBPROGRAM, SEGMENT=MY'READX BEGIN INTEGER PROCEDURE READX (BUFFER, LEN); VALUE LEN; ARRAY BUFFER; INTEGER LEN; BEGIN The very first thing that you notice is: this is written in SPL. What, you say that you don't know SPL? Well, that's perfectly understandable, but unfortunately there are two crucial things that a hook procedure needs to be able to do that just cannot be done in some languages: * Accept as input virtually any kind of parameter -- word address or byte address, by value or by reference. * Return a condition code. To the best of my knowledge, only SPL and PASCAL procedures can take by-value parameters (like READX's LEN parameter); in MPE/V, only SPL procedures can return a condition code (though MPE/XL's HPSETCCODE intrinsic permits other languages to do this on MPE/XL). However, with the following procedure: PROCEDURE VESETCCODE (I << 0 = CCG, 1 = CCL, 2 = CCE >>); VALUE I; INTEGER I; BEGIN INTEGER ARRAY Q(*)=Q+0; Q(-Q(0)-1).(6:2):=I; END; you can set the condition code from, say, a PASCAL procedure, as long as you call it (VESETCCODE) from the hook procedure itself and not from any of the procedures called from within it. Armed with VESETCCODE, there's no reason why you can't write hook procedures in PASCAL on MPE/V (in fact, I'll even use it in my SPL examples) though I think that you still can't do them in any other language. OK, back to our sample procedure. Note that we created a procedure header that exactly corresponds to the calling sequence of the READX intrinsic. Each of the parameters must match exactly, both in type and in mode (by value/by reference); the return value must be exactly the right type, and if the procedure we're intercepting is an OPTION VARIABLE procedure, so must ours be. (PASCAL programmers: you can still hook OPTION VARIABLE procedures if you realize that an OPTION VARIABLE procedure is just the same as a normal one but has an extra by-value parameter at the end that contains the OPTION VARIABLE bit mask.) Now, let's continue: $CONTROL USLINIT, SUBPROGRAM, SEGMENT=MY'READX BEGIN INTEGER PROCEDURE READX (BUFFER, LEN); VALUE LEN; ARRAY BUFFER; INTEGER LEN; BEGIN INTRINSIC READX; BYTE ARRAY BUFFER'B(*)=BUFFER; INTEGER LEN'READ; BYTE ARRAY TEMP'CMD(0:255); INTEGER CIERR; INTEGER FSERR; LEN'READ:=READX (BUFFER, LEN); IF > THEN VESETCCODE (0) ELSE IF < THEN VESETCCODE (1) ELSE BEGIN IF LEN'READ<>0 AND BUFFER'B=":" THEN BEGIN MOVE TEMP'CMD:=BUFFER'B(1),(LEN'READ-1); TEMP'CMD(LEN'READ-1):=%15; << carriage return >> COMMAND (TEMP'CMD, CIERR, FSERR); END; VESETCCODE (2); END; READX:=LEN'READ; END; As you see, we call READX, check the condition code, set our own return condition code appropriately, and if the read succeeded and the input line starts with a ":", call the COMMAND intrinsic. What is wrong with this picture? Well, there are three problems: * First, and most important of all (I'm sure you noticed this), we have our procedure called READX calling the intrinsic READX. You'd think that since you declared READX as an intrinsic, the compiler will recognize that you want to call the READX intrinsic in the system SL. This, unfortunately, is not the case. When the linker sees the call to READX, it views it as a recursive call to our own procedure and not a call to the READX intrinsic. (To make matters worse, the SPL compiler will not flag the "INTRINSIC READX" declaration as a duplicate symbol error.) In fact, we will find that this -- how to call the real procedure from our hook procedure -- is one of the more substantial problems that we face. * Secondly, note the MOVE TEMP'CMD:=BUFFER'B(1),(LEN'READ-1) statement -- why is it wrong? Because the way that the READX intrinsic is defined, its result (which we put into LEN'READ) may be the number of bytes or the number of words read (depending on whether the LEN parameter was negative or positive). Actually, we might discover that LISTDIR5 always passes a negative LEN parameter and thus the READX result will always be in bytes, but we don't want to count on that (especially if we want the hook procedure to be general). The rule is thus that you must be prepared for any possible set of input parameters and any possible result returned by the intrinsic. In other words, instead of LEN'READ, we should have said (IF LEN<0 THEN LEN'READ ELSE 2*LEN'READ). * Thirdly, what happens when a command that's prefixed by a ":" is input? Indeed, it will be executed as an MPE command, but it will then be returned to LISTDIR5 as the result of the read -- LISTDIR5 will see it as an invalid command, and will output a nasty message. This is important to remember -- when you intercept a procedure call, from the program's point of view the call still completes, and the program will act upon the data returned by the hook procedure. In this case, we should make sure that the data returned to LISTDIR5 is such that LISTDIR5 will do as little with it as possible -- in LISTDIR5's case, returning an empty string (just as if the user hit return). For this, we'd have to set the function result to 0 (0 characters read), but we'd also have to make sure that the buffer returned to the program is in the same state as it was when our procedure was called, since programs often calculate the length of the data input not by the READX result, but by the position of some terminating character (e.g. a carriage return) that they filled the buffer with. CALLING THE REAL PROCEDURE Let's get back for a moment to the first problem mentioned in the above list -- if we call the intrinsic READX from our READX procedure, we get an infinite loop. What can we do about this? Well, there are three possible solutions: * Since we're putting our READX in a group or account SL anyway, we can put an "intermediary" procedure called, say, INT'READX into the system SL (or any SL higher than the one in which our READX is) -- our READX can call INT'READX, which will then call READX. Since SL's are always searched in the order group, account, system, a call to READX from an INT'READX that's in the system SL will not call back to our group/account SL READX, but rather go to the real READX in the system SL. The INT'READX procedure might then look something like this: INTEGER PROCEDURE INT'READX (BUFFER, LEN); VALUE LEN; ARRAY BUFFER; INTEGER LEN; BEGIN INTRINSIC READX; INT'READX:=READX (BUFFER, LEN); IF > THEN VESETCCODE (0) ELSE IF < THEN VESETCCODE (1) ELSE VESETCCODE (2); END; * Another alternative is to have our READX call the real READX using the LOADPROC intrinsic. Among other things, LOADPROC lets you specify that you want to load the procedure from the system SL, so we won't get into the same recursive loop that we would have had if our READX just tried to call READX directly. I won't go into any more detail as to how this is done, but I do want to point out that one problem with this approach is where to put the plabel returned by the LOADPROC procedure so that we don't have to re-LOADPROC for every procedure call. Actually, for intercepting READX we can afford to re-LOADPROC the real READX every time our READX is called because subsequent LOADPROCs of a procedure that's already been LOADPROCed take only a few milliseconds; however, if we're intercepting a more time-critical procedure, like FREAD, we'd have to be sure to LOADPROC it only once, in which case we'd need some global storage to keep the plabel. More about the global storage problem later. * One other alternative that's worth considering is somewhat more difficult to do but cures a very substantial disadvantage present in the first two solutions. Say that you want to hook a number of programs all over your system, for instance to call your own DBOPEN replacement procedure instead of the DBOPEN intrinsic (which we do in our SECURITY/3000 VEOPEN module), or to call a replacement COMMAND procedure (which we do in our SECURITY/3000 STREAMX module). If you want to intercept these calls using procedures named DBOPEN and COMMAND, you'd have to put these procedures into local group or account SLs in every group or account in which the programs to be hooked reside. This can prove quite cumbersome, especially when it comes time to install a new version of your procedures -- you might have to replace dozen of SLs in dozens of different accounts. The trouble, of course, is that you can't put your own hook procedures into the system SL, since they have the same names as the real intrinsics. The way that you can get around this problem is by actually patching all the programs to be hooked to call not a procedure called DBOPEN, but rather one called, say, VEOPEN. Then, you can put the VEOPEN procedure into the system SL, since it will no longer conflict with the real DBOPEN -- furthermore, since it is called VEOPEN, there'll be no problem with it calling the real DBOPEN without threat of recursion. When a new version of VEOPEN, incidentally, is installed, you won't have to re-patch all the programs, but only replace the module in the system SL. (On the other hand, whenever you roll in a new version of a patched program, you'd have to re-patch it.) Patching the programs might at first glance seem difficult, but it actually isn't. All program files (I'm speaking here of MPE/V and CM programs) contain at a well-defined place an "external reference list", which is a list of the names of all the SL procedures that they call (together with some other information). Simply by replacing the procedure name "DBOPEN" by the name "VEOPEN" you can make the procedure call VEOPEN instead of DBOPEN. (Note that the two procedure names are intentionally chosen to be the same length.) The layout of this table is described in Chapter 10 of the MPE/V System Tables Manual -- it's not that hard to write a program that modifies it. It's not much more difficult (though it is extra work) to write a program that modifies the external reference list of an SL segment. All in all, we've found patching to work quite well for us, but the additional cost of writing a program to patch the external reference list might make it a rather expensive solution for some. GLOBAL STORAGE So far, with the READX and INT'READX procedures, we've done pretty much what needs to be done to get our new-and-improved LISTDIR5 to work. All we need to do is: * Copy LISTDIR5 into our own group (it'll have to have PM capability, but that's just because LISTDIR5 itself needs PM). * Add the READX procedure (as finally corrected) to the group SL. * Add the INT'READX procedure to the account or system SL. * :RUN LISTDIR5.ourgroup;LIB=G If we've done everything right, our toy should work just fine; we can even move other programs (e.g. DBUTIL) into our group and have the very same procedure work for them. Unfortunately, one of the reasons why this was so easy (you did think it was easy, didn't you?) is because the problem that we set for our ourselves was quite easy. The feature that we wanted to implement could be implemented entirely within one READX call; we didn't need to save any information from one call to the next. What if we did need to save information this way? For instance, if we wanted to implement a multi-line REDO, we'd have to save some information (e.g. the file number of the REDO file) from one READX call to another -- we'd also need to be able to tell when our READX is called for the first time, so that we can initialize this information. (Actually, a number of useful features -- like SECURITY/3000's VEOPEN and STREAMX's interception of the COMMAND intrinsic -- can be implemented without using global storage, but many other features can't be.) SL procedures in MPE/V are not allowed to have global storage of their own -- if you try to add a procedure that uses global or OWN variables to an SL, it will fail. Procedures that have the cooperation of the caller (like the V/3000 intrinsics) can get around this by having the caller pass them an array that contains the data that they need to preserve from call to call (e.g. the V/3000 VCOMAREA array), but we don't have that luxury since we must remain scrupulously compatible with the calling sequences of the procedures we're intercepting. Where can we put our global data? There are a number of places in which MPE lets us keep information that won't vanish from one procedure call to the next, but all of them have their own problems: * Files and extra data segments -- you can put a lot of data here, but it's rather slow to access (even a few milliseconds per call can be slow when we're intercepting a frequently called procedure like FREAD, though it can be acceptable for, say, READX, DBOPEN, or COMMAND, which take much longer anyway). Furthermore, you still have to find a place to keep the file number or extra data segment index! * JCWs -- these are also rather slow, and they can only contain a single word each. Furthermore, they're session-local rather than process-local, so multiple hooked processes in the same session might have trouble. * The "DL-to-DB" area of the stack -- this can be accessed very quickly (since it's just as much part of your stack as your procedure-local variables), but is often already used by the hooked program (especially if it calls V/3000 or uses PASCAL's "heap" mechanism). There are a few words a little bit below DB (DB-10 through DB-1) that are often not used by most programs, especially programs written in SPL, but again it's possible that the program you're trying to hook uses them. This is especially relevant if you're trying to write a general-purpose hook routine that is supposed to work for all programs -- in fact, the first version of our MPEX hook routine used one of these DL-negative words until we ran into a program that wanted to use it, too. As you see, this is not a pretty sight. There are things you can do with one or more above mechanisms that might work in your case (especially if speed is not a problem), but there doesn't seem to be a very good general solution. The best solution (pioneered by Bob Green) is somewhat difficult to implement but ultimately far superior to any of the above. As we mentioned before, Bob's QEDIT text editor was written with efficiency very much in mind, and when he decided to have compilers read QEDIT files, it was very important that they do this as fast as possible. One of the key procedures that he needed to intercept was the FREAD intrinsic (which often takes only a couple of milliseconds), so the access to his global storage had to be as fast as possible. He pretty much had to have all the global storage be kept in the stack. To understand how this approach works, one has to realize what a program file contains. A program file is essentially a blueprint for the loader that describes how to load the process. It contains information on all the code segments (which is to go into the CSTX), the names of all the external references (which are to be loaded from SLs), and an image of what the process's stack is to look like when it starts up. All the initial values of all of the program's global variables are kept here, and when the loader loads the program, it allocates the right amount of global area (the size of the global area is also kept in the program file) and fills it with these initial values. Bob was already patching the program file's external reference list (see the discussion above), so he decided to expand the program's initial global values area to include room for his own global values. Since the program by definition didn't use any of the global area beyond what it thought was available, his storage wouldn't collide with the program's storage; and, he could add as much space as he needed (keeping in mind, however, that if the program already used a lot of stack space, this might cause stack overflow problems). So the the general plan -- again, you might want to look at Chapter 10 of your System Tables Manual for this -- was: * Modify the "global area size" word in record 0 to indicate that there is more global area. * Insert as many records as needed after the global area (and before the code segment area) in the program file, initializing them to whatever values you wanted to initialize them to. The insertion, of course, has to be done by creating a new copy of the program and copying all the data from the old program. * Modify the record numbers (in record 0) of the segment area, external reference list area, and entry list area to reflect the fact that we've inserted records. * Set the record number of the FPMAP area to 0, since unfortunately the FPMAP area of the program contains a lot of internal pointers with record numbers, and rather than readjusting them all, you should probably just tell the system that there is no FPMAP in this program. For example, if the old program used 5000 words of stack space and you wanted to have 256 words of your own, you'd change the global area size to 5256, insert 2 records (256 words, 128 words per record) at the end of the global area, and increment the segment area, external reference list area, and entry list area record numbers by 2. It seems like a fairly complicated manipulation, but it really isn't; armed with Chapter 10 of the System Tables Reference Manual, you can do it quite easily. There is one more problem to be dealt with. You run the patched program and it has 256 extra words of global area; but how does your hook procedure know where those words are? You can't just hardcode the address into the procedure, since you'd like it to work for various programs (and in any event the end of the global area of even one program will probably change from version to version). Instead, here's what you can do: * When you insert your global area, make sure that the part that you want to use starts on, say, a 128-word boundary. For instance, in our 5000-word-global-area program, you'd make sure that your 256-word global area starts at word #5120 (the first multiple of 128 after 5000) -- thus, you'd expand the global area to 5376 words and just waste the 120 words between words 5000 and 5119. * Set the first few words of the non-waste part of the data you insert into the program to some unique pattern that's not likely to appear in a normal program's global area. (Remember that the data that you insert into the copy of the global area in the program file will make its way into the program's stack.) * In your hook procedure, try to find this unique pattern by looking at word 0, then word 128, then word 256, etc. This way, you find your global area by the unique pattern that you've initialized its first few words to, but you don't have to check every word in your stack (which would take too long) because you know that your global area starts on a 128-word boundary. An example of this in SPL might be: INTEGER POINTER IP; @IP:=0; << make IP point to DB+0 >> WHILE IP(0)<>123 AND IP(1)<>456 AND IP(2)<>789 AND IP(3)<>555 DO @IP:=@IP(128); Note that your unique sequence must be a sequence that's highly unlikely to ever appear in the program's stack; if, for instance, you choose a normal piece of text, it's possible (though unlikely) that this piece of text will somehow appear in the program's stack at a 128-word boundary (perhaps input from the terminal or a file) and will thus make you find the wrong area. I use a fairly unlikely sequence of 5 words, many of which represent unprintable ASCII characters. WHICH PROCEDURES TO PATCH The preceding discussion assumed that you knew exactly which procedure is to be patched, e.g. READX in LISTDIR5. Unfortunately, things aren't always quite this simple. Most tasks can be performed by a program in different ways. Some programs, for instance, use READX to read from the terminal, but others (like SPOOK) use READ, and others (like EDITOR) use FREAD. When writing our "MPEX hook" procedures, we wanted to work with all of these programs, so we needed to hook all of the procedures. Hooking READ was quite simple, since it is very simple to READX, but dealing with FREAD was more difficult, because it was used by EDITOR to read both from the terminal and from files. We wanted to have terminal input that was prefixed by "%" be executed as MPEX commands, and we wanted to save terminal input in the multi-line REDO history, but we obviously didn't want this done to, say, lines from the file that we were /TEXTing in. Our first thought was to call FGETINFO inside each execution of our FREAD hook procedure to see if we were reading from the terminal or not, but this was far too inefficient -- imagine calling FGETINFO for each line of a 10,000-line long file. Instead, we found ourselves having to hook FOPEN calls just so that we can check once per file open whether we were opening $STDIN or $STDINX, and recording this information for each file -- then our FREAD hook could just look into this array of flags to see if this was a terminal file or not. Similar problems arise when programs use other mechanisms for reading from the terminal -- programs written in PASCAL often use PASCAL compiler library routines to do terminal I/O; these routines can be quite difficult to hook simply because, unlike intrinsics, their calling sequences are undocumented. The problem of FREADs from the terminal vs. FREADs from files is actually a symptom of a greater problem -- what we really want to distinguish is not terminal vs. file input, but rather input of commands (which might come from files, e.g. /USE files) from reading of data (which might come from the terminal, e.g. in /ADD mode). We really want to distinguish FREAD calls based on what EDITOR intends to do with the data read, which unfortunately we cannot do, since the essence of the problem is that EDITOR isn't cooperating with us and isn't telling us anything about what it's doing. We might try to tell which FREAD call is which by looking at where in the program the FREAD is being called (we can get this information from our hook FREAD procedure's stack marker), and seeing if it's one of those locations in which EDITOR does command input; unfortunately, this leaves us with almost all the problems that would be involved in directly patching the program's code -- we'd have to read the object code to find all the right locations to patch, they would only apply to this particular program, and they would have to be recalculated for each new version of the program. Fortunately, sometimes, you can get information as to the "purpose" of a call in surprising places -- for instance, you may find that a particular program reads command input by passing a read length of -80 to READX but reads /ADD-mode input by passing a read length of -255, and thus use the read length to distinguish the two kinds of input. To keep your hook procedure general, you might even want to keep the expected read length as a value in the global area that the program that you use to hook other program files inserts, and have this "hooking program" prompt for what expected read length is to be put into the global area. This way, you can, at the time you hook a program, communicate various special attributes of the hooked program to the hook procedure that will be called from this program. HOOKING NATIVE MODE PROGRAMS Hooking MPE/XL native mode (NM) programs is a somewhat different story, but the essentials are still the same. Just as you might intercept, say, FREAD calls from a CM (or MPE/V) program by putting your own FREAD procedure into a group SL and running the program with ;LIB=G, so you can intercept FREAD calls from an NM program by putting your own FREAD into an XL and then running the program with ;XL=yourfile. Some aspects of this are easier to do than with CM programs, while others are a bit harder. UPPER AND LOWER CASE Unlike MPE/V, in which all procedure names are in uppercase, MPE/XL lets you have upper and lower (and mixed) case procedure names; the procedures "FREAD" and "fread" are two different procedures. (This is necessary for supporting C, which cares about case.) All system intrinsics are declared in uppercase, but by default PASCAL/XL procedures are created with lowercase names. If your native mode program calls "FREAD", and you run it with an XL= of yours that contains your own procedure called "fread" (which is what a "PROCEDURE FREAD" declaration will by default create), your procedure will not get called because of the difference in names. Fortunately, the solution is simple. Do a "$UPPERCASE ON$" before the procedure declaration; this will tell the PASCAL/XL compiler to create the procedure with an uppercase name. DECLARING YOUR PARAMETERS CORRECTLY Just as it is vital in MPE/V to emulate exactly the calling sequence of the procedure to be intercepted, so it is equally vital in MPE/XL. All the by value parameters must be by value, the by reference ones must be by reference, and all the types must be identical. However, there are quite a few more subtle issues involved as well: * ALIGNMENT: You can tell the PASCAL/XL compiler whether a parameter must start on a byte, half-word, or word boundary. By default, PASCAL/XL will expect most parameters to start on a word boundary, so if you just declare your procedure as TYPE TARRAY = ARRAY [1..65536] OF INTEGER; ... FUNCTION FREAD (FNUM: SHORTINT; VAR BUFFER: TARRAY; LEN: SHORTINT): SHORTINT; then PASCAL/XL will emit code that assumes that BUFFER begins on a word boundary. If the caller then calls FREAD with a BUFFER that doesn't start on a word boundary, you'll get a run-time error. What can you do? You should get a listing of the system intrinsic file by compiling the following short program: $LISTINTR 'LISTFILE'$ PROGRAM DUMMY; BEGIN END. This will send a listing of the calling sequences of all the intrinsics in the system intrinsics file to the file LISTFILE; you can then look up your intrinsic there (note that the file is not in alphabetical order) and see what sort of alignment it shows for that parameter. If it's "8-BIT ALIGNED", you should prefix the parameter with "$ALIGNMENT 1$" (e.g. "BUFFER: $ALIGNMENT 1$ TARRAY"); if it's "16-BIT ALIGNED", use "$ALIGNMENT 2$"; if it's "32-BIT ALIGNED", you don't need a $ALIGNMENT$ keyword. I suspect, however, that if you declare all your by reference parameters with "$ALIGNMENT 1$", you should have no problems; your procedure will run a bit slower, but not by very much. * LONG VS. SHORT POINTERS: One other problem with the FREAD calling sequence shown above is that it declares BUFFER as just being of type "TARRAY", i.e. being passed as a 32-bit pointer to an array kept in the process's "short address space". Actually, if you look at the intrinsic file listing for the FREAD intrinsic, you'll find that BUFFER is passed as a "LONG ADDR", a 64-bit pointer. You must declare the parameter as a 64-bit pointer, either by saying: TYPE TARRAY_PTR = ^ $EXTNADDR$ ARRAY [1..65536] OF INTEGER; ... FUNCTION FREAD (... BUFFER: TARRAY_PTR; ...): SHORTINT; or by saying FUNCTION FREAD (... BUFFER: GLOBALANYPTR; ...): SHORTINT; Note that, once you've declared BUFFER as a pointer rather than as an array, you should no longer pass it as a "VAR". * OPTION EXTENSIBLE: Some MPE/XL intrinsics which take a variable number of parameters are declared as OPTION EXTENSIBLE. This tells the compiler to pass a single word at the beginning of the parameter list that contains the total number of parameters being passed. If you're trying to intercept an OPTION EXTENSIBLE procedure, you need to make your own procedure OPTION EXTENSIBLE, too. Unfortunately, it's hard to tell which procedures are OPTION EXTENSIBLE and which are not. Some, instead of being OPTION EXTENSIBLE, are declared with OPTION DEFAULT_PARMS; this tells the compiler to set the values of omitted parameters to the specified default values, which makes the parameter count word unnecessary. You must look at the intrinsic file listing and see whether the procedure was indeed declared with OPTION EXTENSIBLE. If it was, however, it doesn't matter how many non-extensible parameters it has; an "OPTION EXTENSIBLE 0" declaration is enough. You need not compile your intercepting procedure with the same OPTION DEFAULT_PARMS values as the intrinsic was; those only matter when the calling program is compiled. * ANYVARs: If a PASCAL/XL procedure is declared with ANYVAR parameters but does not have an OPTION UNCHECKABLE_ANYVAR, then for every such ANYVAR parameter, the size of the parameter is passed together with its address. If an ANYVAR parameter is passed to the procedure being intercepted, and it is not an OPTION UNCHECKABLE_ANYVAR, then the intercepting procedure must also be non-UNCHECKABLE_ANYVAR and must declare the parameter as an ANYVAR. If, however, the parameter is a VAR, or is an ANYVAR and the procedure is an OPTION UNCHECKABLE_ANYVAR, then the intercepting procedure can declare it as a simple VAR, too. GLOBAL STORAGE Native mode XL routines can have global storage of their own, so you don't need to use any of the tricks we've discussed above to save data from one call of the procedure to the next. In particular, if your FREAD needs to call the real FREAD, it can do an HPGETPROCPLABEL (the NM equivalent of LOADPROC) of the real FREAD and save the plabel in a global variable. Declaring these global variables is quite simple: $SUBPROGRAM$ $GLOBAL$ PROGRAM DUMMY_OUTER_BLOCK; VAR globvar: type; ... PROCEDURE FREAD ... One problem is that PASCAL/XL has no way of initializing global variables to a particular value, and thus no way of checking whether the procedure has been called before and thus some special behavior (e.g. loading a procedure, opening a file, etc.) is required. One trick that you can use is to check if the variable is equal to some special constant of yours, and if it isn't, assume that this is the first call to the procedure, do the initialization stuff, and then set the variable to that constant. Unfortunately, it is possible that the variable will have had that value by accident from the beginning; this may be more likely than you might expect, since this chunk of memory might have been used earlier by another process which was running the same routine, and which initialized that location in memory to your flag value. The safest solution would probably be to write your procedure in HP C/XL (which lets you initialize global variables), or possibly just write one small procedure in HP C/XL that declares this variable and lets you access it. RUNNING THE HOOKED PROGRAM To run the hooked program, you should simply :RUN myprog;XL="interceptingxl" What if you don't like to have to always specify the XL= parameter? Too bad. Although a program can be :LINKed with a default ;XL=, it is very hard to patch after a :LINK; also, the MPE/V technique of changing the name of the called routine in the external reference list and adding the procedure to the system SL doesn't work, because the format of the external reference list is not documented and because adding things to the system XL is much more difficult than adding them to the SL. The only time that having to run the program with ;XL= would be a serious program is if the program is process-handled from another program. Fortunately, hooking gives us a solution (albeit a difficult one) to this problem---just intercept CREATEPROCESS (or CREATE or COMMAND or HPCICOMMAND or whatever the program uses) and "add" an XL= parameter to the calling sequence. (This is what we did for our VECMMND and VEOPEN routines, which intercept COMMAND and DBOPEN calls.) Another possible solution, which might be easier in some cases, is to rename your old program (say, MYPROG) to MYPROGUH ("Un-Hooked"), and create a small shell program called MYPROG, which CREATEPROCESSes MYPROGUH with the right XL= parameter (and possibly also passes on whatever ;PARM= and ;INFO= values it was run with). This will cause a bit more overhead, but this way whenever MYPROG is run, it will always execute MYPROGUH with the correct XL=. CONCLUSION In part, this paper is more a discussion of an interesting type of problem solved in an interesting way than a blueprint for your own development -- not everyone has the needs described in this paper or the means to satisfy these needs. However, various people in the HP3000 community have, more or less independently, used these techniques to accomplish some very valuable things: * Robelle has gotten compilers to read QEDIT-format files. * Various people have intercepted IMAGE calls to instead go to their own extended-IMAGE utilities. * VESOFT has used hooking to implement MPEX command execution and multi-line REDO from EDITOR, TDP, etc., to allow an SM user running some such editor to save files across account boundaries, to preserve the ACDs of files being edited, to implement an IMAGE database security system by hooking DBOPEN, and to intercept COMMAND intrinsic calls and to route executions of the STREAM command through STREAMX. There are a couple of relatively simple things that come to mind that you might do yourself: * If you have your own internal data storage format, you can hook your favorite text editor to be able to properly read those files. * If you want to disallow people to execute certain MPE commands from a program that normally allows MPE command execution (e.g. EDITOR), you can hook it to reject those commands. * If you want to implement a control-Y trap in a program, you can hook some procedure that the program calls at the very beginning and have your hook procedure arm the control-Y trap. If you really want to do something substantial, I believe that you can hook QUERY to handle MPE and KSAM files by intercepting all the DBxxx calls to make the MPE and KSAM files look like IMAGE databases. This would be truly a feat.