GPACK  -  general input/output package


Author:   Sergey Esenov
               HERA-B Collaboration
               ITEP-Moscow
               Russia

Introduction


GPACK is an attempt to replace existing ARTE I/O based on ZEBRA FZ package.   ZEBRA FZ package is used in HERA-B collaboration for storing events (data records of variable length).

The package pretends to be an layer between ARTE and UNIX operating system and is oriented to the needs of the ARTE.

The authors used experience and common ideas of H1 FPACK design: (FPACK - fortran based I/O package used by H1 collaboration )

    - machine-independent format of data files
    - index of data records
    - record selections

The implementation of GPACK is completely different:

 o  No physical records of fixed length with its internal structure.

    FPACK file is a sequence of blocks of fixed length.  The logical records
    (events) reside over physical records.  To support logical record spanning
    across physical records, each physical record has internal descriptors.
    All this was needed for supporting IBM mainframes to provide package
    portability.  Now this solution was treated as obsolete and time consuming.

 o  No network support ( for the time being ).

    For the time being, network support by I/O package is not a first priority
    task for collaboration.  It is foreseen that an access to the data will be
    provided by system mechanisms: NFS, AFS, etc.

 o  Supporting the data transmission between processes on one computer through
    the shared memory.

    It is used a simple (and, therefore, reliable), well known schema for
    transferring events through shared memory, namely, the ring buffer with
    three semaphores to control access to this buffer between different
    "reading" and "writing" processes  concurrently.  One of possible
    applications is a "data logger", the final part of "data taking" chain.

 o  Embedded data compression "on the fly".

    The data have a table structure in the relational database sense. These
    tables are written on the media along columns, not rows.  It helps to
    apply data type dependent algorithms to the whole column.  For the time
    being, the only method used is "zero bits suppression".  The other methods
    can be applied in near future.

 o  Index files or Event Directory
 
 

            General structure of GPACK
 

   The package is implemented on C++ as a hierarchy of classes.  There are
   2 base classes: 'filestream' class for storing/retrieving  to/from  disk
   files, and 'ringbase' and 'shmstream' for accessing to shared memory.
   The purpose of these classes is to hide the differences between storage
   media inside these classes.  They treat the user data as the "strings" of
   variable lengths and are not interested in the internal structure of that
   "strings" ("string" - a sequence of bytes of some length).

   On the next level the 'datastream' parameterized class (template) was built.
   This class is not interested in details of storage media,   but is
   responsible for main part of job: data compression, conversions and so on.
 
 

                 +-------------------+
                 |  C wrapper        |
                 +-------------------+
                   |             |
                   |             |
                   |             |
   +--------------------+   +---------------------+
   | "PublicED" class   |   | "datastream" class  |
   +--------------------+   +---------------------+
            |                 |              |
   +--------------------+     |              |
   | "evdstream" class  |     |              |
   +--------------------+     |              |
            |                 |        +-------------------+
   +-------------------------------+   | "shmstream" class |
   |      "filestream" class       |   +-------------------+
   +-------------------------------+             |
                                       +-------------------+
                                       | "ringbase" class  |
                                       +-------------------+
 

   In parallel of the mentioned above hierarchy, the 'evdstream' class was
   built as derived from 'filestream' class for supporting Event Directory
   files ( "Private" and "Public" ).

   On the top of all these classes the C wrapper was written in order to hide
   all the details of the package inside.

 As a result the following interfaces are proposed:
 

           Short description of GPACK functions
 

   NOTE:

   1. If the function returns the value < 0 then this is the error code.

   2. Output variables are underscored.

   3. In order to use these functions you need to include the 2 files:

       gpack.h                    describes the data structures used by user,  and
       gpackproto.h           describes the function prototypes.

   4. As follows from gpack.h  4 stream types are supported:

      FileStream, ShmStream, PublicStream, PrivateStream
 
 

      Declare the stream:
 
      lun = gp_setstream( type )
      --
         Input:  stream type  (integer) -- see above

         Output:
             lun >= 0 -- Logical Unit Number connected to the stream
 

      Define file name and openmode:  ( FileStream & PrivateStream only )
 
      rc = gp_setname(lun, filename, namelen, openmode)
      --
         Input: lun (integer)
                filename (character string, for example, CHARACTER*256)
                namelen  (integer) --- MUST be LEN(filename)
                openmode (integer) --- see above
 

      Define the key to the shared memory & semaphores. ( ShmStream only )
 
      rc = gp_setkey(lun, key)
      --
         Input: lun (integer)
                key (integer) -- range (0 ... 65535)
 

      Define the buffer size
 

      ShmStream

         o  shared memory's ring size; it should be large enough to keep one or several events

      FileStream
      PrivateStream
      PublicStream

         o   internal cache size; it can be less than event size

      rc = gp_setbuf(lun, size)
      --
         Input: lun (integer)
                size (integer) -- buffer size in bytes
 

      Create shared memory's ring buffer:      ( ShmStream only )
 
      rc = gp_create_ring(lun)
      --
 

      Destroy shared memory's ring buffer:     ( ShmStream only )
 
      rc = gp_destroy_ring(lun)
      --
 

      Set target system type:
 

It means that you are going to write data in format of particular machine.  The following target types are supported:

              IEEE_BigEndian            (IRIX, AIX, HP_UX, Sun....)
              IEEE_LittleEndian         (Linux, OSF Alpha )
              G_Float_LittleEndian     ( OpenVMS Alpha )

      rc = gp_settarget(lun, TargetType type)
      --
 

      Open the stream
 
      rc = gp_open(lun)
      --
 

      Close the stream
 
      rc = gp_close(lun)
      --
 

      Read the event header
 
      rc = gp_getevt(lun, EventHdr *evh)
      --                           ----------

      Input:

         lun (integer) --- Logical Unit Number connected to the stream

      Output:

         returns EventHdr structure:

              run (integer)                  - run number
              event (integer)               - event number
              experiment (integer)     - experiment number
              datime (integer)            - time stamp ( usual Unix time format )
              classmask (integer)       - event classification mask

              rc > 0                                      - event length in bytes
              rc < 0                                      - error code ( see above ), but ...
              rc == GP_END_OF_DATA  - End-of-file condition
 

      Write event header
 
      rc = gp_putevt(lun, EventHdr *evh)
      --

      Input:

              lun (integer)
              evh                  - pointer to EventHdr structure

      Output: rc
 

      Get the length of the next table in the stream
 
     length = gp_gettablen(name)
 

      Read the next table header from the current event
 
      rc = gp_gettab(lun,name,namelen, ncols, nrows, desc, desclen)
      --                          ----                ----  ---- ----
      Input:  lun (integer)
              namelen (integer)       - string length ( LEN(name) )
              desclen (integer)         - string length ( LEN(desc) )

      Output: name (CHARACTER string) - Table Name; It MUST be large enough
                                                                        to accomodate the name
                    ncols (integer)          - number of columns (fields) in table. The column can be array of simple
                                                         types( integers, floats, ... )
                    nrows (integer)         - number of rows; ALL rows MUST have dentical structure
                    desc (CHARACTER string) - description of the row; the string MUST be large enough to
                                                                      accomodate the description
 
              rc (integer)                                - as above
              rc == GP_END_OF_DATA    - No more tables in the current event
 

      Write the table header to the stream
 
      rc = gp_puttab(lun,name,namelen,ncols,nrows,desc,desclen)
      --
      Input:  lun, name, namelen, ncols, nrows, desc, desclen -- see above
      Output: rc
 

      Read data from the current table
 
      rc = gp_getdat(lun, array, nfields)
      --                           ---
      Input:  lun (integer)
              nfields (integer)           - number of fields (columns)

      Output: array ( any but one type )  - table data; you have to know what are the data you read

              rc (integer)
              rc == GP_END_OF_DATA        - No more data in the current table
 

      Write data to the current table
 
      rc = gp_putdat(lun, array, nfields)
      --
      Input:   lun, array, nfileds      -- see above

      Output:  rc (integer)
 

      Flush the current event
 
      rc = gp_flush(lun)
      --
 

      Set event type
 
      rc = gp_setevtype(lun, eventtype)
      --
      Input: lun (integer)
                 eventtype (integer) -- see record types above
 

      Get event type
 
      rc = gp_getevtype(lun,eventtype)
      --                                ------
      Input:  lun (integer)
      Output: eventtype (integer) -- see possible event (record) types above
                    rc (integer)
 
 

      Get event position along the stream
 
      position = gp_getevtpos(lun)
      -----
      Input: lun (integer)
      Output: position - byte position of the beginning of the current event.

              For GP_FILESTREAM:  position in the file
              For GP_SHMSTREAM:   returns 0


 !!! A T T E N T I O N !!!


This description reflects the current status of GPACK (16 Dec 1997) which is not finished yet (DESCRIPTION Format, Event Directory, etc ... )