Skip to content

Testing drunc with the latest nightly

Pawel Plesniak edited this page Aug 8, 2024 · 17 revisions

Setup

You first need to set up an OKS nightly/release.

cd <directory_above_where_you_want_the_new_software_area>

source /cvmfs/dunedaq.opensciencegrid.org/setup_dunedaq.sh

setup_dbt latest_v5

dbt-create -n NFD_DEV_YYMMDD_A9 <work_dir> # NFD_DEV_240705_A9 or newer
# Or, to use the v5.1.0 rc1 candidate release
# dbt-create -b candidate fddaq-v5.1.0-rc1-a9 <work_dir>

cd <work_dir>/sourcecode

# List below is indicative
git clone https://github.com/DUNE-DAQ/appmodel.git -b develop
git clone https://github.com/DUNE-DAQ/confmodel.git -b develop # maybe don't clone this one, it takes ages to compile it.

cd ..
source env.sh

dbt-build
dbt-workarea-env

Optional step, if you want to develop drunc, or use the latest and greatest version of drunc

Clone and install drunc (better to do that in the <work_dir> described in the step above):

cd <work_dir>

git@github.com:DUNE-DAQ/drunc.git
cd drunc
pip install -e .

cd <work_dir>

git@github.com:DUNE-DAQ/druncschema.git
cd druncschema
pip install -e .

Getting the DruncPM Configuration

You will need one of the following file:

Again 2 choices:

  • Either you have git clone'd drunc and it's there in the repository
  • Or you need to download it:
wget https://raw.githubusercontent.com/DUNE-DAQ/drunc/develop/data/process-manager-CERN-kafka.json

Getting the DAQ Configuration

Let's say you are using the default: appmodel/test/config/test-session.data.xml.

  • Either you have cloned appmodel, and you can modify this line and point it to your RTE script (which should be in your <work_dir>/install directory) and compile (dbt-build)again ;
  • Or you need to copy the appmodel/test/config/test-session.data.xml to your current working directory, and modify the line, which you can do with wget:
wget https://raw.githubusercontent.com/DUNE-DAQ/appmodel/develop/test/config/test-session.data.xml

Running

drunc-unified-shell <configuration>

Where <configuration> points to the file from the instructions above, which can either be one of the defined types:

  • k8s
  • ssh-CERN-kafka
  • ssh-kafka
  • ssh-standalone These are the packaged configurations found in drunc/src/drunc/data/process-manager. The configuration can also be defined relative to the current working directory (defined with a file:// prefix) as e.g. file://data/process-manager-no-kafka.json if working from the drunc root.

Once there, you can boot:

drunc-unified-shell > boot test/config/test-session.data.xml test-session
# OR, if you decided to wget test-session.data.xml to PWD:
drunc-unified-shell > boot test-session.data.xml test-session

Listing apps

You can list all the apps with ps:

drunc-unified-shell > ps
                                          Processes running
┏━━━━━━━━━━━━━━┳━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━┳━━━━━━━━━━━┓
┃ session      ┃ user     ┃ friendly name    ┃ uuid                                 ┃ alive ┃ exit-code ┃
┡━━━━━━━━━━━━━━╇━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━╇━━━━━━━━━━━┩
│ test-session │ plasorak │ root-controller  │ ecf48bea-c4f3-404a-9b07-a762c2f5aaa7 │ True  │ 0         │
│ test-session │ plasorak │ ru-controller    │ c24f579c-60ca-4e9d-a288-698ba1c2ad42 │ True  │ 0         │
│ test-session │ plasorak │ ru-01            │ c085cd31-4fd6-450a-a3f0-93ee762b5efa │ True  │ 0         │
│ test-session │ plasorak │ df-controller    │ 4dcaf726-7bf3-440d-8c76-ff175b5f4507 │ True  │ 0         │
│ test-session │ plasorak │ df-01            │ daaecb18-921a-4a98-a89c-185bce247dc5 │ False │ 134       │
│ test-session │ plasorak │ dfo-01           │ f7859b07-93c3-4390-9e44-e1a8c3eb2398 │ True  │ 0         │
│ test-session │ plasorak │ tp-stream-writer │ 1ed50313-96fb-459e-9eb9-36c6309c4d4f │ True  │ 0         │
│ test-session │ plasorak │ trg-controller   │ bd116248-2d47-4957-bb24-8fb339bab21e │ True  │ 0         │
│ test-session │ plasorak │ mlt              │ a114ba8b-9152-4350-899b-ffbaa29dfdcf │ True  │ 0         │
└──────────────┴──────────┴──────────────────┴──────────────────────────────────────┴───────┴───────────┘

Looking at the logs

... can be done with the logs command:

drunc-unified-shell > logs  --name df-01
───────────────────────────────────── daaecb18-921a-4a98-a89c-185bce247dc5 logs ──────────────────────────────────────
<snippet>
2024-Mar-11 18:35:44,768 LOG [dunedaq::iomanager::QueueSenderModel<Datatype>::QueueSenderModel(const
dunedaq::iomanager::connection::ConnectionId&) [with Datatype =
std::unique_ptr<dunedaq::daqdataformats::TriggerRecord>] at
/cvmfs/dunedaq-development.opensciencegrid.org/nightly/NB_DEV_240306_A9/spack-0.20.0/opt/spack/linux-almalinux9-x86_64
/gcc-12.1.0/iomanager-NB_DEV_240306_A9-2xz2rt44fleigv3eqwu4arneebeq5bhv/include/iomanager/queue/detail/QueueSenderMode
l.hxx:68] QueueSenderModel created with DT! Addr: 0x7f25db252090
2024-Mar-11 18:35:44,768 LOG [dunedaq::iomanager::QueueSenderModel<Datatype>::QueueSenderModel(const
dunedaq::iomanager::connection::ConnectionId&) [with Datatype =
std::unique_ptr<dunedaq::daqdataformats::TriggerRecord>] at
/cvmfs/dunedaq-development.opensciencegrid.org/nightly/NB_DEV_240306_A9/spack-0.20.0/opt/spack/linux-almalinux9-x86_64
/gcc-12.1.0/iomanager-NB_DEV_240306_A9-2xz2rt44fleigv3eqwu4arneebeq5bhv/include/iomanager/queue/detail/QueueSenderMode
l.hxx:70] QueueSenderModel m_queue=0x7f25db223080
2024-Mar-11 18:35:44,768 ERROR [static void ers::ErrorHandler::SignalHandler::action(int, siginfo_t*, void*) at
/tmp/root/spack-stage/spack-stage-ers-NB_DEV_240306_A9-ypb44oo4yxx6glfbk4bna7ogqvbnlauw/spack-src/src/ErrorHandler.cpp
:90] Got signal 11 Segmentation fault (invalid memory reference)
        Parameters = 'name=Segmentation fault (invalid memory reference)' 'signum=11'
        Qualifiers = 'unknown'
        host = np04-srv-019
        user = plasorak (122687)
        process id = 336403
        thread id = 336403
        process wd = /nfs/home/plasorak/NFD_DEV_240306_A9/runarea
        stack trace of the crashing thread:
          #0
/cvmfs/dunedaq-development.opensciencegrid.org/nightly/NB_DEV_240306_A9/spack-0.20.0/opt/spack/linux-almalinux9-x86_64
/gcc-12.1.0/dfmodules-NB_DEV_240306_A9-vbjtsx3ofhhig3gc7ocrub44iy4ldd3o/lib64/libdfmodules_TriggerRecordBuilder_duneDA
QModule.so(dunedaq::dfmodules::TriggerRecordBuilder::setup_data_request_connections(dunedaq::appdal::ReadoutApplicatio
n const*)+0xaa2) [0x7f25d7de6b22]
          #1  /lib64/libc.so.6(+0x54df0) [0x7f25de254df0]
          #2
/cvmfs/dunedaq-development.opensciencegrid.org/nightly/NB_DEV_240306_A9/spack-0.20.0/opt/spack/linux-almalinux9-x86_64
/gcc-12.1.0/dfmodules-NB_DEV_240306_A9-vbjtsx3ofhhig3gc7ocrub44iy4ldd3o/lib64/libdfmodules_TriggerRecordBuilder_duneDA
QModule.so(dunedaq::dfmodules::TriggerRecordBuilder::setup_data_request_connections(dunedaq::appdal::ReadoutApplicatio
n const*)+0xaa2) [0x7f25d7de6b22]
          #3
/cvmfs/dunedaq-development.opensciencegrid.org/nightly/NB_DEV_240306_A9/spack-0.20.0/opt/spack/linux-almalinux9-x86_64
/gcc-12.1.0/dfmodules-NB_DEV_240306_A9-vbjtsx3ofhhig3gc7ocrub44iy4ldd3o/lib64/libdfmodules_TriggerRecordBuilder_duneDA
QModule.so(dunedaq::dfmodules::TriggerRecordBuilder::init(std::shared_ptr<dunedaq::appfwk::ModuleConfiguration>)+0xf3b
) [0x7f25d7de807b]
          #4  daq_application() [0x438b82]
          #5  daq_application() [0x4396d8]
          #6  daq_application() [0x439abf]
          #7  daq_application() [0x42e7d1]
          #8  /lib64/libc.so.6(+0x3feb0) [0x7f25de23feb0]
          #9  /lib64/libc.so.6(__libc_start_main+0x80) [0x7f25de23ff60]
          #10 daq_application() [0x4301a5]
bash: line 1: 336403 Aborted                 (core dumped) daq_application --name df-01 -c rest://localhost:3339 -i
kafka://monkafka.cern.ch:30092/opmon --configurationService oksconfig:test/config/test-session.data.xml

Killing the session

... with kill (to kill the applications) and flush (to erase them from memory, i.e. you won't be able to restart them)

drunc-unified-shell > kill --user plasorak
<snip>
drunc-unified-shell > flush
<snip>

Starting the run

drunc-unified-shell > fsm conf
drunc-unified-shell > fsm start run_number 123 # Note the run number here!
drunc-unified-shell > fsm enable_trigger
drunc-unified-shell > status

Stopping the run

drunc-unified-shell > fsm disable_trigger
drunc-unified-shell > fsm stop_trigger_sources
drunc-unified-shell > fsm stop
drunc-unified-shell > fsm scrap

Getting help on FSM commands

drunc-unified-shell > describe --command fsm

Caveats

  • The FSM sequences have not been implemented. This is the occasion for you to revise the DAQ FSM! So you will actually need to send enable_trigger, drain_dataflow, disable_trigger, stop_trigger_sources...
  • The thread pinning file that is used is specified in appdal/config/appdal/fsm.data.xml, it's readoutlibs/share/config/cpupins/cpupin-example.json before and after conf, and after start. In this example, it's the same for all the pre and post transitions, but that can modified (do check out fsm.data.xml, you should be able to figure out how).
Clone this wiki locally