Minutes of the Level 2 Trigger meeting, 8-Aug.-1994 Present: I. Abt, R. Dipple, H. Leich, M. Medinnis, P. Wegner The meeting was in the form of an informal discussion. The prose here is derived from my notes with some added embellishments, intended to make the discussion more comprehensible. At present, the interests of the people in the Level-2 group appear to be as follows: the group from Zeuthen is primarily interested in reconstruction of tracks in the chamber system using a DSP based processor as explained below. The Munich group (in particular M. Manz) is interested in simulation and algorithm development. The UCLA and U. Mass. groups will presumably work primarily on implementation of the silicon trigger algorithm. The UCLA group will also involve itself with the simulation and algorithm development effort. This division of labor may well change since it appears that the track reconstruction problem in the chambers and in the silicon is not fundamentally different, although the higher data rates and larger number of planes in the silicon system pose a somewhat harder technical challenge. The expected levels of suppression at Level-2 are a factor of 3 from refining the lepton candidate tracks in the chambers, a factor of 5 from lepton track reconstruction in the silicon and a factor of 5 from the requirement that the two lepton tracks have a common vertex. It is not required that the lepton vertex be separated from the wires. The factor of 3 from reconstruction in the chambers is likely to be highly correlated with the factor of 5 from track reconstruction in the silicon, so the total suppression is probably closer to 25 than 75. If this is the case, it's not clear that track refinement in the chambers is worthwhile. P. Wegner explained the work done which led to the estimate of 60 microsec for the prediction / sum-update step of a Kalman filter algorithm given in the LOI. The calculation was done on a VME board manufactured by LSI (Loughbourough Sound Images) which contains four daughter boards, each with a Texas Instrument TMS320C40 DSP chip and 20-Mbytes of memory. Each C40 connects to the other three via 4-bit data paths. A 4-bit bus connects the DSPs to the VME backplane. Each DSP has two additional 4-bit buses which connect to the front panel. The LSI development system runs on a Sun workstation and includes a C cross-compiler and debugger. The LSI board is equipped with a JTAG serial bus (called the SBus) which connects the DSPs to the VME back plane and is useful for control and debugging. The C40s operate with a 40 MHz clock (50 nsec instruction execution). Later this year, a 60 MHz version should be available, and, next year, an 80 MHz version. The Kalman filter algorithm used for the benchmark is essentially the "offline" algorithm, recoded in C40 assembler, and uses the full covariance matrix. The 4 DSPs contained identical copies of the program and were interconnected in the form of a 4-unit pipeline. Each DSP performed the Kalman filter step for one chamber station. The hit data for the benchmark test consisted of hits for a single track and were downloaded into each DSP before the benchmark test began. The estimate of 60 microsec for a single Kalman filter step, thus does not include steps necessary to turn raw hit information into coordinates and find those inside a selected interval. Implications for track finding in the silicon detector According to M. Spahn's simulation, the number of planes of silicon traversed by a track is between 5 and 12. Thus the latency for a single track is between 300 and 720 microsec (excluding hit conditioning). Since the system is pipelined, an additional track adds just 60 microsec. I estimate (very roughly, this needs to be checked) that 15 instructions are needed to read a hit, check it against a list of hot strips, perform clustering, add an offset and check upper and lower limits. With an average of 8 hits per "quadrant" and assuming a 40 MHz clock (50 nsec instruction execution), this procedure adds an average of 12 microsec for both views. I also assume that, on average only half the list is searched. As for the worst case, Spahn's hit distribution histogram extends to 30 hits, which leads to 90 microsec, assuming the full lists for both views are searched. With this added latency, the worst case estimate turns into 1920 microsec for the first track, exceeding the maximum allowable latency (1 millisec) by nearly a factor of two. The average latency for the first track becomes nearly 600 microsec. When 80 MHz DSPs become available next year, the maximum latency limit is just satisfied, but the vertex estimation step is not yet included. If these numbers are correct, they indicate that doing a full Kalman filter algorithm on a system of pipelined DSPs, where each DSP is responsible for one plane probably won't be fast enough, particularly when allowances are made for inefficiencies and more than one hit within the selected region of a detector. Another problem with this approach comes from the need to maintain a 50 kHz decision rate. Since each DSP is occupied for at least 120 microsec per event (two tracks), a farm of more than 6 pipelines is needed (3 with 80 MHz DSPs). It may still be possible to implement Level-2 in a DSP pipeline if a simple Kalman filter algorithm, more in the spirit of Level-1, gives acceptable results. I estimate that a total of 82 arithmetic operations is needed to perform a Level-1 style Kalman filter step, including rotations and chi-square, slope, intercept and limit calculations. The arithmetic would take 2 microsec on an 80 MHz DSP. Some overhead must be added for reading and writing. Hit conditioning and searching would then dominate the calculation time. Additional speed-ups could come from conditioning the hits in parallel in a separate unit. It thus appears feasible to implement a simplified Kalman filter algorithm on a fully pipelined DSP system. Before this conclusion is confirmed, a realistic implementation, which allows for hit inefficiencies and the possibility of double hits must be written and simulated. Other architectures and algorithms must also be considered. For architectures, the Nevis/ U. Mass approach, possibly supplemented with DSPs or other additions to the existing board set are being considered by the U. Mass. group. Another interesting possibility is the DecPerle board. The heart of the DecPerle board is 4x4 array of Xilinx 3090 FPGA chips. The RD-11 (East) collaboration have implemented a track-finding algorithm based on the Hough transform for TRDs which operates at a 100 kHz decision rate with latency well under 1 millisec. Several other complex algorithms have been implemented by Digital's Paris Research Lab. Unfortunately, for budgetary reasons, the Paris Research Lab is being closed and the future availability of the board is uncertain. Nonetheless, the approach is appealing and could prove very cost effective. As for track-finding algorithms, the choice appears limited to a family of Kalman filter algorithms, a Hough transform approach, or the general Nevis-based track-finding algorithm described in the LOI. This latter approach suffers from not exploiting the region-of-interests defined by the Level-1 trigger. This leads to a unnecessarily complex system. If general track-finding in the silicon detector before Level-3 is desirable, a more cost-effective solution would be to insert a special-purpose track-finding processor between Level-2 and Level-3. Such a processor would then benefit from the factor of 20 or so reduction in event rate coming from Level-2. Detailed simulation work will begin in September, when a sample of Level-1 triggered events becomes available. A working session of the Level-2 group is scheduled for Sept. 31 through Oct. 2, at Zeuthen. M. Medinnis