Minutes of the Level 2 Trigger meeting, 8-Aug.-1994

    Present: I. Abt, R. Dipple, H. Leich, M. Medinnis, P. Wegner

    The meeting was in the form of an informal discussion. The prose
    here is derived from my notes with some added embellishments,
    intended to make the discussion more comprehensible.

    At present, the interests of the people in the Level-2 group appear
    to be as follows: the group from Zeuthen is primarily interested in
    reconstruction of tracks in the chamber system using a DSP based
    processor as explained below. The Munich group (in particular M.
    Manz) is interested in simulation and algorithm development. The
    UCLA and U. Mass. groups will presumably work primarily on 
    implementation of the silicon trigger algorithm. The UCLA group will
    also involve itself with the simulation and algorithm development
    effort.

    This division of labor may well change since it appears that the
    track reconstruction problem in the chambers and in the silicon is
    not fundamentally different, although the higher data rates and
    larger number of planes in the silicon system pose a somewhat
    harder technical challenge.

    The expected levels of suppression at Level-2 are a factor of 3 from
    refining the lepton candidate tracks in the chambers, a factor of 5
    from lepton track reconstruction in the silicon and a factor of 5
    from the requirement that the two lepton tracks have a common
    vertex. It is not required that the lepton vertex be separated from
    the wires. The factor of 3 from reconstruction in the chambers is 
    likely to be highly correlated with the factor of 5 from track
    reconstruction in the silicon, so the total suppression is probably
    closer to 25 than 75. If this is the case, it's not clear that
    track refinement in the chambers is worthwhile.

    P. Wegner explained the work done which led to the estimate of 60
    microsec for the prediction / sum-update step of a Kalman filter
    algorithm given in the LOI.

    The calculation was done on a VME board manufactured by LSI
    (Loughbourough Sound Images) which contains four daughter boards,
    each with a Texas Instrument TMS320C40 DSP chip and 20-Mbytes of
    memory. Each C40 connects to the other three via 4-bit data paths.
    A 4-bit bus connects the DSPs to the VME backplane. Each DSP has
    two additional 4-bit buses which connect to the front panel. 

    The LSI development system runs on a Sun workstation and includes a
    C cross-compiler and debugger. The LSI board is equipped with a
    JTAG serial bus (called the SBus) which connects the DSPs to the
    VME back plane and is useful for control and debugging.

    The C40s operate with a 40 MHz clock (50 nsec instruction
    execution). Later this year, a 60 MHz version should be available,
    and, next year, an 80 MHz version. The Kalman filter algorithm used
    for the benchmark is essentially the "offline" algorithm, recoded
    in C40 assembler, and uses the full covariance matrix. The 4 DSPs
    contained identical copies of the program and were interconnected in
    the form of a 4-unit pipeline. Each DSP performed the Kalman filter
    step for one chamber station.

    The hit data for the benchmark test consisted of hits for a single
    track and were downloaded into each DSP before the benchmark test
    began. The estimate of 60 microsec for a single Kalman filter step,
    thus does not include steps necessary to turn raw hit information
    into coordinates and find those inside a selected interval.


    Implications for track finding in the silicon detector

    According to M. Spahn's simulation, the number of planes of silicon
    traversed by a track is between 5 and 12. Thus the latency for a
    single track is between 300 and 720 microsec (excluding hit
    conditioning). Since the system is pipelined, an additional track
    adds just 60 microsec.

    I estimate (very roughly, this needs to be checked) that 15
    instructions are needed to read a hit, check it against a list of
    hot strips, perform clustering, add an offset and check upper and
    lower limits. With an average of 8 hits per "quadrant" and assuming
    a 40 MHz clock (50 nsec instruction execution), this procedure adds
    an average of 12 microsec for both views. I also assume that, on 
    average only half the list is searched. As for the worst case,
    Spahn's hit distribution histogram extends to 30 hits, which leads
    to 90 microsec, assuming the full lists for both views are
    searched.

    With this added latency, the worst case estimate turns into 1920
    microsec for the first track, exceeding the maximum allowable
    latency (1 millisec) by nearly a factor of two. The average latency
    for the first track becomes nearly 600 microsec. When 80 MHz DSPs
    become available next year, the maximum latency limit is just
    satisfied, but the vertex estimation step is not yet included.

    If these numbers are correct, they indicate that doing a full Kalman
    filter algorithm on a system of pipelined DSPs, where each DSP is
    responsible for one plane probably won't be fast enough,
    particularly when allowances are made for inefficiencies and more
    than one hit within the selected region of a detector. Another
    problem with this approach comes from the need to maintain a 50 kHz
    decision rate. Since each DSP is occupied for at least 120 microsec
    per event (two tracks), a farm of more than 6 pipelines is needed (3
    with 80 MHz DSPs).

    It may still be possible to implement Level-2 in a DSP pipeline if a
    simple Kalman filter algorithm, more in the spirit of Level-1,
    gives acceptable results. I estimate that a total of 82 arithmetic
    operations is needed to perform a Level-1 style Kalman filter step,
    including rotations and chi-square, slope, intercept and limit 
    calculations. The arithmetic would take 2 microsec on an 80 MHz
    DSP. Some overhead must be added for reading and writing. Hit
    conditioning and searching would then dominate the calculation time. 

    Additional speed-ups could come from conditioning the hits in
    parallel in a separate unit. It thus appears feasible to implement
    a simplified Kalman filter algorithm on a fully pipelined DSP
    system. Before this conclusion is confirmed, a realistic
    implementation, which allows for hit inefficiencies and the
    possibility of double hits must be written and simulated.

    Other architectures and algorithms must also be considered. For
    architectures, the Nevis/ U. Mass approach, possibly supplemented
    with DSPs or other additions to the existing board set are being
    considered by the U. Mass. group.

    Another interesting possibility is the DecPerle board. The heart of
    the DecPerle board is 4x4 array of Xilinx 3090 FPGA chips. The RD-11
    (East) collaboration have implemented a track-finding algorithm
    based on the Hough transform for TRDs which operates at a 100 kHz
    decision rate with latency well under 1 millisec. Several other
    complex algorithms have been implemented by Digital's Paris Research
    Lab. Unfortunately, for budgetary reasons, the Paris Research Lab is
    being closed and the future availability of the board is uncertain.
    Nonetheless, the approach is appealing and could prove very cost
    effective.

    As for track-finding algorithms, the choice appears limited to a
    family of Kalman filter algorithms, a Hough transform approach, or
    the general Nevis-based track-finding algorithm described in the
    LOI. This latter approach suffers from not exploiting the
    region-of-interests defined by the Level-1 trigger. This leads to a 
    unnecessarily complex system. If general track-finding in the
    silicon detector before Level-3 is desirable, a more cost-effective
    solution would be to insert a special-purpose track-finding
    processor between Level-2 and Level-3. Such a processor would then
    benefit from the factor of 20 or so reduction in event rate coming
    from Level-2.

    Detailed simulation work will begin in September, when a sample of
    Level-1 triggered events becomes available.

    A working session of the Level-2 group is scheduled for Sept. 31
    through Oct. 2, at Zeuthen.

    M. Medinnis