Reprocessing

Author: Jose Hernandez, e-mail:  Jose.Hernandez@desy.de, office: 7224 (Zeuthen), 4846 (Hamburg), mobile 01704509401

  1. Starting the DAQ gui for reprocessing

  2. In hb-cr10 in the hbshift account click on the "Reprocessing" button (or execute directly  /online/ONLINE/pro/Linux_intel/bin/start_repro)
     
  3. Configuring the DAQ for running reprocessing

  4. Click on "New Run". The run configuration windows will pop up. In the list of run types click on "repro". The repro menu will pop up:

     1: Experiment number and reprocessing number (fixed)
     2: To select  manually a range of runs to be reprocessed, one can enter the first and last run numbers and click "All Runs" button.
     3: One can also add a run number manually by typing the run number and pressing <return>. The run will appear in the list. Click on the run to select it.
     4: If a run has already been partially reprocessed the background turns pink when selected
     5: If a run has not been reprocessed at all, the background turns green when selected
     6: If the background of a run does not change of color when selected, then the run has already been completely reprocessed.
     7: One can change the order of reprocessing of the runs using the buttons top, bottom, up and down.
     8: By pressing this button the runs completely reprocessed are removed from the list
     9: The standard way of selecting the runs to be reprocessed is pressing this button. A pop-up window appears where one can select the list of runs available for the current reprocessing number.
    10: List of runs and total number of events selected in the list of runs (runs with pink or green background)
    11: Timeout for changing from one run to the next one when all events of the run have been provided to the reconstruction node but the total number of reprocessed events is smaller than the total number of events in the run.
    12: This button shows the list of the runs reprocessed since the gui was started.
     

    - Make sure the logging type is "Archive"  and the maximum number of nodes in the SLT and 4LT farms are selected.
     
     

  5. Selecting runs to be reprocessed

  6. Click on "Runs from DB" to select the list of runs.
    - The number of runs selected and the total number of events will be displayed under "Runs:" and "Events:"
    - Click OK in the repro menu and click OK in the window that pops us asking you if you are ready to start the run.
     
  7. Starting the run

  8. - In the run control window press "Menu" and then "create". All the processes will be created in the same way as in a normal /RUN_switch/ run. When the system is in the INITIALIZED state, press "Ready" to bring the system to the READY state and then press "Start" to bring the system to the RUN state.
    - When the reprocessing of a run is completed, the system will automatically move to the next run doing automatically the transitions RUN->STANDBY-> READY->STANDBY->RUN
    - The repro menu is accessible from the RunCfg->REPROCESSING->Reprocessing menu of the run control window. At any moment runs can be added or deselected.
     
  9. Monitoring the reprocessing

  10. -The reprocessing monitor looks like this:

    It shows the input rate (upper left) and the output rates in every logger (middle and lower left). The number of free ARTE processes is displayed in the upper right plot.

    - The error logger yellow window displays useful info and error messages. In many cases one can diagnose a problem reading these messages.

    - The OSM Robot staging and archiving status and queues can be checked here: Usage, Queues.
     

    Known problems and solutions:

    - Some times the RHP gatherer 4LT processes have problems with the state transitions. When changing from one run to the next one, if the transition gets stuck for few minutes you might have to terminate and restart the reprocessing. In the "expert" menu,  select the "checkDaughtherList" item to check which process is holding the transition. To terminate the reprocessing the state cannot be STANDBY or RUN. If the state transition is stuck, you'll will have to skip all the branches by selecting the item "CheckDautherCfg" in the "expert" menu and changing from "active" to "skipped" all the components. 

    - It might happen that the number of active FARM nodes (caption of the upper right plot) gets too small compared to the total number of booted nodes. The reason is that the ARTE processes die from time to time due to crashes in the reconstruction. The input and output rates decrease as less ARTE processes are running. If the reprocessing rate has significantly decreased, one should terminate and restart the reprocessing.