DISC SPACE: HOW MUCH IS ENOUGH? by Vladimir Volokh, VESOFT Presented at 1993 INTEREX Conference, San Francisco, CA, USA Published by INTERACT Magazine, June 1993. ABSTRACT. In spite of many new developments in disc technology, the good old Winchester drive, with its mechanically movable arms, is still the primary medium for all our files -- be it programs, sources or Data Bases. It seems that many simple questions related to disc usage do not have easy answers: How do we measure disc space -- what's the relation between megs and sectors? How can an MPE user find out the disc capacity? How much of it is free and how usable is the free space? And is 'used space' really used by us? What is reblocking, squeezing, trimming, condensing and other space transformations? The paper presents some observations on HP3000 file structure, similarities and differences between "Classic" and "Spectrum". This author's hope is that such knowledge will help users to better control their computing environment. BIOGRAPHY OF THE AUTHOR: Vladimir Volokh is the president of VESOFT, Inc., a software house based in Los Angeles, CA, USA which was founded in 1980 by him and his son, Eugene. They are the creators of MPEX/3000, a productivity and system control tool, SECURITY/3000, a log-on access control package, and VEAUDIT/3000, an auditing tool that reports loopholes on the HP3000 system, of which there are 10,000+ packages installed worldwide. Vladimir Volokh is a computer scientist with more than 10 years of HP3000 experience as system analyst, consultant, and technical manager; he is a frequent speaker at users groups around the world. In spite of many new developments in disc technology, the good old Winchester drive, with its mechanically movable arms, is still the primary medium for all of our files. In this article I will try to present some observations on HP3000 file structure -- both for "Classic" (MPE/V) and "Spectrum" (MPE/iX) computers -- in the hope that it might help HP3000 users manage disc space better. It seems that many simple questions have answers that are not so simple. HOW DO WE MEASURE DISC SPACE? In various discussions about disc space, you've seen terms like "sectors", "kilobytes", "megabytes", and "gigabytes". What do these words mean? Well, nothing is simple. A sector, by HP's definition, is 256 bytes; "kilo" (K) is 1000, or 1024 for memory devices; "mega" is 1000000, or 1024K for memory devices; and "giga" is a prefix denoting a billion, or 1.073 billion for memory devices. My dictionary tells me that "tera" means one trillion (10`^12) of a given unit. HP's Glossary of Terms mentions only "kilo" and "mega" (in 1989). So, considering all this mathematics, how many megabytes does your disc have if after :DISCFREE C on your XL machine you see the following: ALL MEASUREMENTS ARE IN SECTORS. ALL PERCENTAGES ARE RELATIVE TO THE DEVICE SIZE. | Configured | In Use | Availab -----------+-------------------+-------------------+--------------- LDEV : 1 -- (MPEXL_SYSTEM_VOLUME_SET:MEMBER1) Device | 2232192 | 1708144 ( 77%) | 524048 Permanent | 1852720 ( 83%) | 1605344 ( 72%) | 247376 Transient | 1852720 ( 83%) | 102800 ( 5%) | 524048 Considering that 4*256 is close enough to 1000 you can do it easily -- just divide the number of sectors by 4000 and you will have it in megs. In this case, it'll be 2232192/4000 = 558, close enough to the real answer, 545 megs. HOW BIG IS THE DISC? As you've seen above, MPE/iX gives you an answer via :DISCFREE. In MPE/V the utility FREE5.PUB.SYS -- true to its name -- shows only free space. But if it shows you that X sectors are free, that's X sectors out of how many? This information is hidden deep inside the VINIT utility (as if it were unimportant). Try this: :VINIT >pfspace 1;addr You will see a lot of information about addresses and sizes of free space (and you don't care much about that). But at the end of this listing you will see: TOTAL VOLUME CAPACITY: 216832 SECTORS TOTAL FREE SPACE AVAILABLE: 16490 SPACE MAXIMUM CONTIGUOUS AREA: 5505 SECTORS By the way, it's not my typo (if you're wondering about "16490 SPACE") -- it's an unknown MPE designer's mistake, frozen in time .... HOW MUCH OF IT IS FREE? MPE/iX gives a pretty straightforward answer to this question: look at its output in the example above -- this time not on the first line but on the second: Permanent | 1852720 ( 83%) | 1605344 ( 72%) | 247376 As you see, 83% of the whole space is configured to be used as permanent, 72% is used, so only 11% (which is 83-72) is available for permanent files. But why doesn't this simple calculation work for the third line (transient space)? Transient | 1852720 ( 83%) | 102800 ( 5%) | 524048 Even though transient space can also take up to 83% of the space on LDEV 1, in this case only 28% is left for that: 17% can't be permanent and 11% is unused by permanent files; because 5% is actually used by transient space, 23% is available. On MPE/V machines available space is supposed to be shown by the FREE5.PUB.SYS utility or via the PFSPACE command of :VINIT (16490 sectors in the PFSPACE example above). But what about virtual (transient) space? This information, again, is hidden -- this time inside the :SYSDUMP output: :SYSDUMP $NULL ANY CHANGES? YES ... DISC ALLOCATION CHANGES? YES VIRTUAL MEMORY CHANGES? YES LIST VIRTUAL MEMORY DEVICE ALLOCATION? YES VOLUME NAME LDEV # VM ALLOCATION LDEV1 1 25 ... ENTER VOLUME NAME , SIZE IN KILOSECTORS (MAX = 255 )? This means that MPE/V knows nothing about virtual space utilization at the moment; some space is also taken (possibly) by spool files and by temporary files. Note also that even though total, free and virtual space is given by DEV#, used space is not. (The :REPORT command gives used filespace-sectors by group and accounts.) One way to know this distribution is to use the MPEX command %LISTF @.@.@,DISCUSE. IS FREE SPACE REALLY AVAILABLE TO US? Seeing the FREE5 output on "Classic" one should pay attention not only to the "TOTAL FREE SPACE" line but also to the preceding ones: :RUN FREE5.PUB.SYS VOLUME MH7945U1 LDEV 1 LARGEST FREE AREA= 25530 SIZE COUNT SPACE AVERAGE >100000 0 0 0 >10000 2 42796 21398 >1000 0 0 0 >100 4 540 135 >10 29 1065 36 >1 107 217 2 TOTAL FREE SPACE=44618 If you have a lot of small pieces, they might not be usable at all because none of your files may have small enough extents (more on this later). What you need is not just free space but CONTIGUOUS space. On "Classics", disc space can be condensed to some degree by the >COND command in :VINIT; on "Spectrum" machines the disc fragmentation shouldn't be a problem (or so HP tells us). IS THE "USED" SPACE REALLY USED BY US? OK, by subtracting "free" space from the "total" space or just looking at the :DISCFREE output we might get an idea of how many sectors are "used" -- physically, that is. Keep in mind, however, that probably about half of those files which you see on the full backup listing |1have not been used| (either modified or accessed) for a long time -- 6 months or more. But which half? Some answers to this question can be found in the :STORE command of MPE or, better yet, using selection by ACCDATE and/or MODDATE in MPEX (with totals of files and space). Archiving and purging seldom used files saves a lot of disc space, directory space, and backup time. TO BLOCK OR NOT TO BLOCK? Another question is: how is the space used inside "active" files? One factor -- relevant on MPE/V machines but not on MPE/iX machines -- is blocking. MPE/V does all disc I/Os in multiples of one sector (256 bytes). The blocking factor is the number of records that we choose to fit into a certain number of sectors (block). But very often we don't choose -- we simply rely on MPE/V defaults, which can range from good to very bad (see [1] for more details). A bad blocking factor wastes not only disc space, but also I/O time -- the more records per one I/O we read/write, the better. Consider some examples: ACCOUNT= SYS GROUP= OPERATOR FILENAME CODE ------------LOGICAL RECORD----------- ----SPACE---- SIZE TYP EOF LIMIT R/B SECTORS #X MX REPORT1 132B FA 26 10000 1 1251 1 8 REPORT2 132B FA 26 10000 60 651 1 8 REPORT3 132B FA 26 26 60 62 1 1 REPORT4 132B FA 26 26 9 20 1 1 Here the file REPORT1 is built with the default blocking factor 1 (1 = 256 bytes / 132 bytes); the remainder (256 - 132 = 124 bytes) is simply wasted, though it's almost 50% of the space; this file is like a piece of swiss cheese -- with many big holes inside. The second file is the result of changing the blocking factor to 60, thus achieving the BEST space utilization for this file -- now 60 records take 60*132=7920 bytes which is close to the size of a block of 31 sectors (256*31=7936). However, we can get an even bigger saving by SQUEEZEing this file (setting FLIMIT down to EOF) -- that's how we got the file REPORT3. By reblocking it again we save more space; as a result, the difference in size between REPORT1 and REPORT4 files is quite significant. Things like this can be done using our very own MPEX (the %ALTFILE command with options SQUEEZE and BLKFACT=BEST). And what about XL (or should we say iX) computers? The blocking factor does not mean much there; all the records are tightly packed, except for the last extent which can (for very big files) be up to 2048 sectors. The good news is that the FCLOSE intrinsic (on the XL) has a new option called "XLTRIM", which allows the system to reuse free space beyond the end of file without decreasing the file limit. Look at the following before-and-after example: ACCOUNT= SYS GROUP= PUB FILENAME CODE ------------LOGICAL RECORD----------- ----SPACE---- SIZE TYP EOF LIMIT R/B SECTORS #X MX REPORT1 132B FA 26 10000 1 256 1 * REPORT2 132B FA 26 10000 1 16 1 8 Quite a savings (MPEX's %ALTFILE ;XLTRIM does it) -- and we can append to the file! THE EXTENT QUESTION, OR WHERE THE FILE IS? The extent is MPE's compromise between two extremes in file size management: assigning all file space requested to the file immediately or giving space one record (or sector) at a time. In MPE/V a file can consist of anywhere from 1 to 32 extents (the default number is 8). Each extent resides wholly on one disc, but different extents may be located on different discs. So where is any given file? You have to know the full extent map of the file and only then can you think about improving system performance through disc balancing. If you use the LISTDIR5 >LISTF you might see the DISC DEV # line, but this of course is only the device of the first extent (the same goes for :STORE listings). >LISTF ...;MAP, however, gives you a map (the first digit is the "volume table index", which is not necessarily the device number, and is hard to convert to the device number): LISTDIR5 G.06.00 (C) HEWLETT-PACKARD CO., 1983 >LISTF VESOFT.PUB.SYS FCODE: 0 FOPTIONS: STD,ASCII,VARIABLE BLK FACTOR: 1 CREATOR: ** REC SIZE: 1276(B) LOCKWORD: ** BLK SIZE: 640(W) SECURITY--READ: ANY EXT SIZE: 10(S) WRITE: ANY # REC: 482 APPEND: ANY # SEC: 70 LOCK: ANY # EXT: 7 EXECUTE: ANY MAX REC: 13 **SECURITY IS ON MAX EXT: 7 COLD LOAD ID: %24025 # LABELS: 0 CREATED: THU, 9 APR 1992 MAX LABELS: 0 MODIFIED: THU, 9 APR 1992 DISC DEV #: 3 ACCESSED: THU, 9 APR 1992 DISC TYPE: 3 LABEL ADR: ** DISC SUBTYPE: 4 SEC OFFSET: %5 CLASS: DISC FLAGS: NO ACCESSORS >LISTF VESOFT.PUB.SYS;MAP FCODE: 0 FOPTIONS: STD,ASCII,VARIABLE BLK FACTOR: 1 CREATOR: ** REC SIZE: 1276(B) LOCKWORD: ** BLK SIZE: 640(W) SECURITY--READ: ANY EXT SIZE: 10(S) WRITE: ANY # REC: 482 APPEND: ANY # SEC: 70 LOCK: ANY # EXT: 7 EXECUTE: ANY MAX REC: 13 **SECURITY IS ON MAX EXT: 7 COLD LOAD ID: %24025 # LABELS: 0 CREATED: THU, 9 APR 1992 MAX LABELS: 0 MODIFIED: THU, 9 APR 1992 DISC DEV #: 3 ACCESSED: THU, 9 APR 1992 DISC TYPE: 3 LABEL ADR: ** DISC SUBTYPE: 4 SEC OFFSET: %5 CLASS: DISC FLAGS: NO ACCESSORS EXT MAP: %300161067 %200233735 %300162124 %200240307 %300162207 %100211521 %200240326 > In MPE/XL file labels are kept separately from the data, and yet :LISTF ,3 still shows the file label address, which might have no relevance to the location of the data at all. Here is an example of :LISTF ,3 and MPEX's %LISTF ,4 showing the full extent map of the file: :LISTF LOG3320,3 FILE CODE : 0 FOPTIONS: BINARY,VARIABLE,NOCCTL,STD BLK FACTOR: 1 CREATOR : ** REC SIZE: 2044(BYTES) LOCKWORD: ** BLK SIZE: 2048(BYTES) SECURITY--READ : CR EXT SIZE: 0(SECT) WRITE : CR NUM REC: 2720 APPEND : CR NUM SEC: 2304 LOCK : CR NUM EXT: 9 EXECUTE : CR MAX REC: 1024 **SECURITY IS ON FLAGS : 1 ACCESSORS,SHARED,1 R,1 W NUM LABELS: 0 CREATED : THU, APR 9, 1992, 2:01 PM MAX LABELS: 0 MODIFIED: THU, APR 9, 1992, 2:01 PM DISC DEV #: 1 ACCESSED: THU, APR 9, 1992, 2:01 PM SEC OFFSET: 0 LABEL ADDR: ** VOLCLASS : MPEXL_SYSTEM_VOLUME_SET:DISC MPEX %LISTF log3320 PAGE 1 MANAGER.SYS,PUB THU, APR 9, 1992, 4:01 PM ACCOUNT= SYS GROUP= PUB -----FILE------ EXTENTS -----SECTORS----- DEVICE NAME CODE NUM MAX USED NOW SAVABLE CLASS LOG3320 10 * 2560 208 DISC Dev/Sector: 2/%0000004516700 2/%0000000444440 2/%0000000072040 Dev/Sector: 3/%0000001407620 3/%0000007737620 1/%0000003207300 Dev/Sector: 1/%0000002675520 2/%0000006572620 3/%0000000536200 Dev/Sector: 2/%0000006577760 To finish this little essay I propose a puzzle to MPE/iX users: what do these two "*" mean in the following :LISTF ,2 ?? ACCOUNT= SYS GROUP= PUB FILENAME CODE ------------LOGICAL RECORD----------- ----SPACE---- SIZE TYP EOF LIMIT R/B SECTORS #X MX PUZZLE 128W FB 1608 2222 1 1616 * * The answer is in one of the recommended reading items: 1. Eugene Volokh, "The Truth About Disc Files", Presented at 1982 HPIUG Conference, San Antonio, TX, USA 2. Andy Tauber, "Disc Balancing", INTERACT Magazine, Jan. 1986 3. Greg Englestad, "HP3000 Disc Management", SUPERGROUP Magazine, Sep.-Nov. 1987 4. Eugene Volokh, "The Truth About MPE/XL Disc Files", Presented at 1989 INTEREX Conference, San Francisco, CA USA 5. S.Gordon, V.Volokh, "The Art And Science Of Disc Space Management", INTERACT Magazine, July 1991