Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Design and Implement Data Models For Raw/Calibrated Data #74

Open
drbenmorgan opened this issue Dec 12, 2017 · 19 comments
Open

Design and Implement Data Models For Raw/Calibrated Data #74

drbenmorgan opened this issue Dec 12, 2017 · 19 comments

Comments

@drbenmorgan
Copy link
Member

drbenmorgan commented Dec 12, 2017

As outlined in the Analysis Board Meeting of 11/12/2017 (see DocDB-4521), we should document, review and implement data models for:

  1. The raw data objects, including the so-called "UDD" and "EH" banks
  2. The calibrated data objects, mainly the "CD" data bank
  3. Define the major interactions between the calibration stage and the conditions database:
    • Use of input raw data to derive calibration constants, store in conditionsDB
    • Use of input raw data, read calibration constants from conditionsDB, output calibrated data objects

Useful inputs to the discussion are provided in the above DocDB, and existing C++ code is available in

The initial aim is just to identify the main objects and their attributes, e.g.

struct RawGeigerHit {
  int anAttribute; //< this records the blah blah
};

moving on to relations and interactions. It's expected this will be an iterative process, but eventually lead to a concrete write up on the Caen Wiki and as a tech note.

The main "stakeholders" in this data are the Data Quality, Calibration and Reconstruction, and Software (inc. developers of the digitization), hence the assignees, but all input is welcome.

@drbenmorgan
Copy link
Member Author

I think a good starting point would be to see what, if anything, we know about:

  • "UDD" data: what is known about high level objects in this?
  • "CD" data: what should calibrated geiger/calorimeter objects hold? Are existing objects in Falaise sufficient?

@pguzowski
Copy link

Can you explain more about the UDD & EH databanks? You links to the documentation are 404ing, and there is nothing explaining what they are for in ddb 4521

@drbenmorgan
Copy link
Member Author

@pguzowski links should now be fixed. DocDB 4521 has additional DocDB ids where there's some info on the banks: DocDB-4504, DocDB-4467, DocDB-4254 (Calibration/Reconstruction side: DocDB-4483). "UDD" is basically the raw event data (so geiger, calo etc), "EH" is event level metadata, likely counter, timestamps and so on.

@pguzowski
Copy link

Is there anywhere we should be doing this work? I.e. a shared document...

@drbenmorgan
Copy link
Member Author

@pguzowski, right here in the comments! It's a discovery/discussion exercise at this point, so useful to link to code, add snippets etc. Once we converge on structures/design, it can be written up as a tech note.

@pguzowski
Copy link

From the point of view of the tracker

UDD will need for each event, a list of geiger digits.
A geiger digit is made of a channel ID and 7 timestamps (anode t0, t1, t2, t3, t4; bottom cathode; top cathode). I assume the timestamps will be relative to the trigger time, probably stored in the EH.

CD will store tracker hits.
A tracker hit has some geometric ID, a radial distance and longitudinal position (if available), along with uncertainties.

@drbenmorgan
Copy link
Member Author

Thanks @pguzowski! So sketching things out as they stand:

struct EventHeader {
  TimeStamp triggerTime;
};

struct RawGeigerHit {
  ChannelID id;

  TimeType anodeT0;
  TimeType anodeT1;
  TimeType anodeT2;
  TimeType anodeT3;
  TimeType anodeT4;

  TimeType cathodeTop;
  TimeType cathodeBottom;
};

struct CalibratedGeigerHit {
  GeometryID id;

  float radialDistance;
  float radialDistanceError;

  float longitudinalDistance;
  float longitudinalDistanceError;
};

Here I've only sketched attributes - one thing it highlights is the need to identify what TimeType and TimeStamp should be, plus the Channel\GeometryId (and how these map to each other).

Existing currently are https://github.com/SuperNEMO-DBD/Falaise/blob/develop/source/falaise/snemo/datamodels/raw_tracker_hit.h and https://github.com/SuperNEMO-DBD/Falaise/blob/develop/source/falaise/snemo/datamodels/calibrated_tracker_hit.h which have additional data that I'm not sure about. There's a possible geometric id from Bayeux: https://github.com/SuperNEMO-DBD/Bayeux/blob/develop/source/bxgeomtools/include/geomtools/geom_id.h, which basically reduces to

struct GeometryID {
  uint32_t type;
  std::vector<uint32_t> addresses;
};

@goliviero
Copy link
Contributor

Hi,
For the commissioning of Feb 2017, I code some classes in fecom (front-end commissioning) for Raw Data parsing :
https://github.com/SuperNEMO-DBD/Falaise/tree/develop/companions/fecom/src/libs/libfecom/fecom

The idea was to parse the Raw Data text files produce by the Front-End Boards, serialize the hits and pack calorimeter and tracker hits into events.
In this way, I also did a first 'event builder' with reasonable gates : https://github.com/SuperNEMO-DBD/Falaise/blob/develop/companions/fecom/programs/hc_event_builder.cxx

I think this preliminary work is a good starting point for the Raw Data / Event Builder parts and have to be integrated (and certainly merged because few classes already exists) in the source of Falaise.

@drbenmorgan
Copy link
Member Author

@goliviero definitely useful! I noted a few references to "SNDER":

// SNDER p.30-31 for channel / cell association
geomtools::geom_id cell_geometric_id; ///< Cell ID [Type:layer,row]

Is this on DocDB, if so which ID?

@goliviero
Copy link
Contributor

goliviero commented Feb 19, 2018

The SNDER (SuperNEMO Demonstator electronics reference) ID on doc db is #2557.
It is not up-to date concerning the trigger and readout part but we will update it soon with Yves.

@pguzowski
Copy link

I think we should add some sort of calibrated time (probably of the time the underlying charged particle passed through the cell) to the calibrated tracker hit

struct CalibratedGeigerHit {
  GeometryID id;

  float radialDistance;
  float radialDistanceError;

  float longitudinalDistance;
  float longitudinalDistanceError;

  float time;
  float timeError;
};

@drbenmorgan
Copy link
Member Author

@goliviero, @lemiere (and cc'ing @yramachers, @macolino for comment if required), I'd like to follow up on a couple of items from DocDB-4612 presented at the last collaboration meeting. Specifically, the proposed data flow from Slides 3, 4:

DAQ -> "RawHitData" (RHD) -> EventBuilder -> "RawDigitizedData" (RDD)
                      |                                           |
                    Files                                       Files
                                                                  |
                                                                  `-> "DataQuality/UDD/CD" (TBD)
  1. Are the tracker_digitized_hit and calo_digitized_hit structs shown on slides 8 and 9 the same in RHD and RDD?
  2. What is the role of the EventBuilder here? There are seemingly two distinct "Raw" outputs here, RHD and RDD, so I think we need some clarification on how these differ.
    • e.g. Is RHD a purely an internal/hardware stream, the EventBuilder being an online client of the DAQ that builds events on the fly and persisting them to file?
    • If it isn't internal, what is the format of the RHD files and where and who runs EventBuilder?
  3. Given the question in 2., how tied is EventBuilder to the configuration of the DAQ, i.e. what additional metadata might be needed to generate RDD from RHD, and how will this be stored/cross-referenced through the data flow?

I realise that some of the above might not be totally definable at this point, but I think some rough outlines/ideas are critical to start thinking about what will go into the following DataQuality/UDD/CD steps.

@drbenmorgan
Copy link
Member Author

One further item: the current tracker_digitized_hit and calo_digitized_hit structs have no data members that identify which channel (physical/electronic) they originate from. How will this be addressing be done?

@goliviero
Copy link
Contributor

So, the tracker_digitzed_hit and calo_digitized_hit should have 2 GIDs. One for the Geometrical ID and the other for the Electronic ID. I totally forgot them in my presentation... They should inherit from a base_hit and probably the geomtools::base_hit.

On the other points, things are not that clear for the moment :

  1. I dont think they will be the same because in the RHD it come from directly from the DAQ and the format is strict. Maybe the calorimeter hits will be the same but for the tracker it is impossible because the DAQ send timestamps by FEAST channels. Here is an example of DAQ tracker hits :

Hit2 TRACKER ID 1
Slot2 F0 Ch0 AN R0 97758333
Hit3 TRACKER ID 1
Slot2 F0 Ch0 AN R1 97758333

As you can see for each threshold cross it fills a register. The FEAST read channels / channels and then send it to the DAQ. So for each Geiger cell, 7 registers (timestamps) are filled.

  1. The RHD stream will not be sorted in time. So the role of the event builder is to construct Commissioning events from the unsorted stream. It has to 'pack' calorimeters_hits and tracker_channels_hit (previously define as 1 register only). Then a tracker_hit has 7 timestamps in the structure and it is for 1 Geiger cell (from 7 raw tracker_channels_hit).

I think we will save the RHD stream because it is raw data and it is mandatory to keep things from the DAQ. The format is not defined yet but it will be compress into binaries.
I don't know where the EventBuilder will be run @fmauger can comment on that.

  1. For this point, I don't know yet how the eventbuilder will be link to the daq.

@drbenmorgan
Copy link
Member Author

So, the tracker_digitzed_hit and calo_digitized_hit should have 2 GIDs. One for the Geometrical ID and the other for the Electronic ID. I totally forgot them in my presentation... They should inherit from a base_hit and probably the geomtools::base_hit.

I think composition is better here, both as a model and because geomtools::base_hit has extraneous data for raw data, so something like:

struct tracker_digitized_hit {
  geomtools::geom_id geo_address;
  geomtools::geom_id daq_address;
  ... other attributes ...
};

and similar for the calorimeter?

@drbenmorgan
Copy link
Member Author

I dont think they will be the same because in the RHD it come from directly from the DAQ and the format is strict. Maybe the calorimeter hits will be the same but for the tracker it is impossible because the DAQ send timestamps by FEAST channels. Here is an example of DAQ tracker hits :
Hit2 TRACKER ID 1
Slot2 F0 Ch0 AN R0 97758333
Hit3 TRACKER ID 1
Slot2 F0 Ch0 AN R1 97758333
As you can see for each threshold cross it fills a register. The FEAST read channels / channels and then send it to the DAQ. So for each Geiger cell, 7 registers (timestamps) are filled.

O.k., these should be fully documented ASAP in Section 8 of DocDB #2557 (most things appear there, but not everything).

The RHD stream will not be sorted in time. So the role of the event builder is to construct Commissioning events from the unsorted stream. It has to 'pack' calorimeters_hits and tracker_channels_hit (previously define as 1 register only). Then a tracker_hit has 7 timestamps in the structure and it is for 1 Geiger cell (from 7 raw tracker_channels_hit).
I think we will save the RHD stream because it is raw data and it is mandatory to keep things from the DAQ. The format is not defined yet but it will be compress into binaries.

Agreed on saving the stream, and I'd encourage some thought and discussion now on the format so we have a heads up on any additional requirements for external packages, and have a rough idea of what it'll take to read the files.

@drbenmorgan
Copy link
Member Author

To update this, the https://github.com/supernemo-dbd/SNRawDataProducts project will form the basis for the Raw Data side. It builds on the internal data+daq project at Lyon, eventually, as agreed at the Annecy workshop, to be split out here for sharing between online/offline.

@fmauger, @lemiere, could we please have an update on the status of the internal project please? This is needed for critical upcoming work on tracker commissioning (cc'ing @cherylepatrick, @davewaters, @yramachers).

@drbenmorgan drbenmorgan changed the title Review Data Models For Raw/Calibrated Data Design and Implement Data Models For Raw/Calibrated Data Sep 16, 2019
@drbenmorgan
Copy link
Member Author

@fmauger, @lemiere, could we have an update on the status of the internal "snfee" project on the Lyon GitLab please? Whilst a DocDB is available, there has been no update to the Lyon GitLab in several months, including review/merge of a Merge Request to improve ROOT output for commissioning analysis.

In going forward with the tracker commissioning we will require at least the https://github.com/supernemo-dbd/SNRawDataProducts library to enable reading of raw data in Falaise. I'd like to get this on a more stable and available path as agreed at Annecy so we're ready for tracker data both online and offline.

Just to cross-reference: related issues/tasks are #157 on metadata tracking, plus handling Boost.serialization schema evolution as requested in BxCppDev/Bayeux#48.

@drbenmorgan drbenmorgan pinned this issue Oct 20, 2019
@drbenmorgan drbenmorgan unpinned this issue Apr 3, 2020
@fmauger
Copy link
Contributor

fmauger commented Mar 25, 2022

TODO: Discussion about some digitization datamodel and calibration datamodel + connection with the raw data model.
@manu @francois @goliviero @yves

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment