A Comparison of PDS and PDSE

PDSEs (Partitioned dataset extended) were first introduced by IBM in MVS/AFP 3.2 in the year 1989. Even though regular PDSes were quite adequate for normal tasks, many IBM customers were not happy with them. One of the IBM user groups, SHARE, did a project on MVS Storage Management and published a white paper. This paper summarized the findings of the project and asked for a number of improvements and new features to then current PDSes. Another IBM user group, GUIDE, also published their requirements and asked for similar changes. IBM listened and the result was PDSE. But first a bit of background on plain old PDS to understand its limitations.

A PDS is essentially made up of two parts – a directory and a few members. The directory is a set of contiguous 256-byte blocks, present at the beginning of the dataset. Each of these directory blocks contains a 2-byte count field at the beginning and from 3 to 21 directory entries after that. There is one directory entry for each member in the PDS. Each directory entry contains – 8 byte member name (padded with spaces, if needed), starting position of the member in the PDS (in TTR addressing format) and some optional (up to 62 bytes) user data.

A directory block will contain only as many complete entries as can fit in 254 bytes (2 bytes are reserved for count field). The remaining bytes are left unused. The length of the user data determines how many complete entries can fit in one directory block. The 2-byte count field contains the number of used (also called ‘active’) bytes, including the bytes used for the count field.

This directory structure was the reason why there was a need to improve PDS. When IBM introduced PDSE, it replaced the rigid directory structure of the PDS with a new flexible scheme and also brought in many new features. And all this was done while keeping the PDSE backward compatible with PDS. That means that except for very low-level (hardware dependent) processing, users need not even be aware of what they are dealing with.

Some of the new features introduced in PDSE and their comparison with PDS is given below –

  1. Expandable directory size: The number of directory blocks in a PDS is specified at the time of its creation and can not be changed after that. Also the space for all the directory blocks is allocated at the time of creating the dataset. Lets say that a PDS was allocated with a directory block count of 20. Assume that an average 256-byte directory block holds 10 directory entries. So now this PDS can contain at most 20×10 = 200 members. But what if you use this up and want to create 201st member? Tough luck!PDSE solved this problem by creating an indexed directory structure. Now each directory entry points to the one coming next to it. This matters because now there is no need to allocate all the directory blocks at the time of creating the dataset. This also means that they need not be contiguous and need not be fixed in number. They can be interleaved with the member data blocks and they indeed are! When you want to create new members, a new directory block is created in the next available storage and the pointers updated.

    Note that its only the directory blocks that increase in number. The total size of the PDSE does not grow beyond one primary extent and 123 secondary extents. In other words, the directory can expand only if there is enough space in the dataset. The maximum size of the PDSE itself remains fixed.

  2. Better search and insertion: The directory entries in a PDS are stored in the alphabetical order of member names. So if a new entry is to be created, all the entries coming after it need to be shifted to make room for it. This is called ‘Ripple Stow’ and it results in many I/O operations, making the whole process a lot slower. Same holds true for searching for a member within a PDS. The entire directory needs to be scanned to locate a particular member.Since the directory in a PDSE is an indexed structure, there are no such performance problems in PDSE. So it always takes the same amount of time to search/insert a new member whether it starts with ‘A’ or ‘Z’.
  3. Improved sharing facilities: The locking mechanism in a PDS operates at the dataset level. If you want to update a single member in a PDS, you need the exclusive access to the entire dataset. No other user or job can update any other member in that PDS during that time. While in a PDSE, the access control is implemented at the member level. So two users can update two different members at the same time. Makes you wonder how people worked before PDSE came.
  4. Better use of disk space: When a PDS member gets deleted, the space that gets freed up is not used for allocating new members. Since the deletion of a member causes the deletion of that directory entry, the pointer to that member location is lost and so is that space. As the members get allocated/deleted during the lifetime of a PDS, the amount of this wasted space keeps growing. This wasted space, also called PDS gas, can be as much as 40% of the total allocated space. So the PDS needs to be compressed periodically to re-claim this space. The compression can be done by either typing ‘Z’ in front of the PDS name (in ISPF) or by using IEBCOPY utility.On the other hand, a PDSE keeps on re-claiming the freed space automatically, using a first-fit algorithm. Issuing a ‘Z’ command or doing an IEBCOPY has no effect on a PDSE.

    Also, whenever a new member is created in a PDS, the data blocks allocated for it have to be contiguous. But there is no such restriction in a PDSE. So the space re-claimed from deleted members can be allocated to new or existing members. This results in a much better space utilization.

  5. Improved dataset integrity: If a PDS is opened for output in a sequential mode, e.g. if an IEBGENER step omits the member name and uses only the PDS name, say in

    //SYSUT1 DD DSN=Some.input.sequential.file,DISP=SHR

    //SYSUT2 DD DSN=PDS

    the entire directory would get destroyed all the members would be lost. If a similar thing is attempted on a PDSE, The job would terminate with a abend code of S213-4C and the PDSE would remain intact.

    S213-4C : WHEN OPENING A PDSE DSORG=PS WAS SPECIFIED, BUT NO MEMBER WAS SPECIFIED.

  6. Hardware independence: PDS uses an addressing scheme called TTR (Track-Track-Record) which is based on the DASD geometry. TTR addresses are stored in hexadecimal format. So an address of Xā€™002E26ā€™ would mean track number X’002E’ and record X’26’. The name TTR comes from the fact that first two bytes of the address denote track number and third byte denotes record number. This dependence on the DASD geometry makes it very difficult to migrate PDS from one type of DASD to another one, e.g. from 3380 to 3390.The PDSE addressing scheme is not dependent on the physical device geometry. It uses a “simulated” 3-byte TTR address to locate the members and the records which makes the migration easier. Incidentally, this simulation of addresses places some limitations on the number of members and the number of records per member in a PDSE. A TTR address of X’000001′ in a PDSE points to the directory. The addresses from X’000002′ to X’07FFFF’ point to the first record of each member, which is why there is a limit of 524,286 members. The addresses from X’100001′ to X’FFFFFF’ point to records within each member, which is why there is a limit of 15,728,639 records in each member.

16 responses to “A Comparison of PDS and PDSE”

  1. Really informative !!!

  2. Very Informative article. Are you aware of any issues with PDSEs NDM?

    Thank you for your assistance!

    Bob

  3. Very helpful – Thanks!

  4. Norman Graessle

    Excellent article. I just wanted to find out if PDSE’S still have a problem of sharing for update across multiple lpars.

  5. Hi Norman, I do not know anything about this problem with PDSEs. I am sorry I can’t help you on this one.

  6. IBM main listserv http://bama.ua.edu/archives/ibm-main.html had some traffic on pds/e and the net was PDS/E still has problems with updates from multiple lpars. there are some new ways to cache and manage the pds/e though.

  7. I’m trying work out an algorithm to calculate the space taken up by what is effectively a PDSE directory.

    If you take Pages used and multiply them by 4096 this falls short of the % Utilized. I assume the difference is the space used by the PDSE directory.

    If the PDSE has a small amount of members, say 100 the difference works out to be about 1300bytes per member. However, in a PDSE with 4000+ members the difference works out to be aprox 600bytes per member.

    Any ideas on how to calculate the directory space based on the number members?

  8. Its very informative, thanks for the knowledge !!!

  9. Dilip Chakrabarty

    Useful info, well researched. Some of the nitty gritty of the structure blocks will be useful in assembler programming.

    Found the IBM manual http://publib.boulder.ibm.com/infocenter/zoslnctr/v1r7/topic/com.ibm.zconcepts.doc/zconcepts.pdf , pages 162-163 have coverage on PDSE.

  10. Excellent info !! Thanks for sharing.

  11. Hello I am facing a problem in deleting members in PDSE with IDCAMS..As you have mentioned that control is in member level but while deleting members with IDCAMS using multible jobs at a time delete it shows dataset already in used Possibly one holds at a time ..Is there any solution for this problem

  12. I agree, a very useful article. Allow me to add some points that I have found in my tests with large PDSEs.

    1) PDS space waste can go over 90 percent. it all depends on how many times the same member gets STOWed.

    2) PDSE directory performance should theoretically be better than PDS performance, but unfortunately in my tests I have found no proof of this. Maybe new member insertion is faster, but a simple BLDL or browsing a PDS or PDSE directory (B on 3.4) with 30,000 or more members is definitely not faster when using PDSE, on the contrary: A BLDL instruction for a very large PDSE perform worse that the same BLDL on the PDS.
    I cannot imagine what would happen with a PDSE of 500K members. Don’t even think of doing it!

    3) PDSEs cannot be compressed. That is correct. Z or IEBCOPY compress will say that this is not needed. I have experienced that PDSE copies of PDSEs (with the same DCBs) are smaller than their original data sets. So although PDSE compression is not needed, over time there is some waste of space. Note that this waste of space is close to nothing compared to the potential waste of PDS space.

  13. Jan,

    thanks for your comments, are very useful.

  14. Thanks Jan for adding valuable points.

    1. Thanks for the information but there are several matters which are no benefits from PDSE and you need to take care about them:

      – when a PDSE release the unuse space? If the library is on LLA, XCF, or allocated as DFHRPL in a CICS region as exam,ple, then, you need to release the “connections” (stop the corresponding task), and after that, in the following librery update, the dfree space will be released. I donĀ“t know how can I get the information about when severalk task (jobs, STC, TSO users) have a library allocated, which is the corresponding free space pending to be released from each one.
      – If you have a STC which updates a POE library (and release/alloc on each use), take care aboyut the performance, because the directory will be loaded into memory on each access (when you use big libraries, it can take seconds)

Leave a reply to Kumaresh Cancel reply