-
Notifications
You must be signed in to change notification settings - Fork 6
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Speculative visit directory service #283
Conversation
Codecov Report
@@ Coverage Diff @@
## main #283 +/- ##
==========================================
- Coverage 89.07% 88.34% -0.73%
==========================================
Files 39 44 +5
Lines 1446 1536 +90
==========================================
+ Hits 1288 1357 +69
- Misses 158 179 +21
... and 9 files with indirect coverage changes 📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more |
def stage(self) -> List[object]: | ||
collection = self._provider.current_data_collection | ||
if collection is None: | ||
raise Exception("No active collection") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This seems pretty readable to me, requires explicit knowledge of a DataCollectionProvider when making the class, but that makes it obvious where the collection number is coming from
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
From discussion, happy with this because it is theoretically easy to strip out later, as long as:
- We keep in mind that we want to strip it out later
- We make an interface or ABC for the singleton in ophyd-async
- We make a DLS-specific implementation in blueapi
- We maintain a parallel implementation in ophyd-async for offline use, that just writes to a preconfigured directory or similar
- We inject references to the singleton into each device
wrapped_plan = functools.reduce( | ||
lambda wrapped, next_wrapper: next_wrapper(wrapped), | ||
self.plan_wrappers, | ||
plan, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Might be obvious to people coming from python, but making plan be a kwarg initializer=plan
would stop me running off to read the docs for functools every time I see it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reduce explicitly disallows keyword arguments
src/blueapi/plugins/data_writing.py
Outdated
DATA_COLLECTION_NUMBER = "data_collection_number" | ||
|
||
|
||
class DataCollectionProvider(ABC): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should be removed and replaced with
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can't see a direct replacement to the DataCollectionProvider in the file you've linked there...
your DataCollectionProvider gives a DataCollection, which has 4 things: collection_number, group, raw_data_files_root and nexus_file_path. Tom's work that you've linked has a DirectoryProvider which gives a DirectoryInfo object, which just contains a directory path and filename prefix.
So it's not a direct mapping. Perhaps I'll chat to you about this tomorrow morning in more detail.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It doesn't need to be a direct mapping, you can just change the logic here to match up with Tom's DirectoryProvider
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Cool, got it
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've come across a bit of an annoyance; I want to subclass DirectoryProvider
so it gets the directory from the visit service. However, I want to use aiohttp or some similar asynchronous library... DirectoryProvider
has one synchronous method, __call__
.
My solution for now, is to make a subclass GDADirectoryProvider
(suggestions for a better name?) and have a async update
method on this. Then the synchronous __call__
can do something like asyncio.wait_for(update)
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yep, I think that's what I told @coretl I had in mind at the time
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As discussed, the pre-processor should do the async
access to the directory service, then cache the result in DirectoryProvider
, then the __call__
method just returns the cached value
src/blueapi/plugins/data_writing.py
Outdated
def data_writing_wrapper( | ||
plan: MsgGenerator, | ||
provider: DataCollectionProvider, | ||
) -> MsgGenerator: | ||
staging = False | ||
for message in plan: | ||
if message.command == "stage": | ||
if not staging: | ||
yield from bps.wait_for([provider.update]) | ||
staging = True | ||
if provider.current_data_collection is None: | ||
raise Exception("There is no active data collection") | ||
elif staging: | ||
staging = False | ||
|
||
if message.command == "open_run": | ||
if provider.current_data_collection is None: | ||
yield from bps.wait_for([provider.update]) | ||
if provider.current_data_collection is None: | ||
raise Exception("There is no active data collection") | ||
message.kwargs[ | ||
DATA_COLLECTION_NUMBER | ||
] = provider.current_data_collection.collection_number | ||
yield message | ||
|
||
|
||
data_writing_decorator = make_decorator(data_writing_wrapper) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is the only actual part of this file that should stick around. I'm not convinced it should be called plugins/data_writing.py
either. Maybe preprocessors/xyz.py
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If it's going to be a plan preprocessor should it be moved to dls-bluesky-core?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I thought about this. It is specifically a part of blueapi, that fudges plans to write data in our structure, maybe it shouldn't be allowed to get outside of blueapi?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm fine with it existing only within BlueAPI and being applied on plan import: prevents possibility of attempting to apply multiple times (pretty sure from glancing through the logic that would just doubly increment the data collection number each run, but that's disruptive enough) or being missed.
Should anyone want to be a. using vanilla Bluesky/not BlueAPI, b. wanting to write to DLS filesystem, they can extract and duplicate the logic. But I don't think that'll be very often.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But on the original comment: Yes, having a blueapi.preprocessors package seems best.
plans: Dict[str, Plan] = field(default_factory=dict) | ||
devices: Dict[str, Device] = field(default_factory=dict) | ||
plan_functions: Dict[str, PlanGenerator] = field(default_factory=dict) | ||
|
||
_reference_cache: Dict[Type, Type] = field(default_factory=dict) | ||
|
||
def wrap(self, plan: MsgGenerator) -> MsgGenerator: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should be tested
wrapped_plan_generator = ctx.wrap(plan_generator) | ||
ctx.run_engine(wrapped_plan_generator) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This needs testing too
src/blueapi/plugins/data_writing.py
Outdated
) | ||
|
||
|
||
def data_writing_wrapper( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There may be a nicer way of doing this, @rosesyrett to investigate and make my flowchart look less ugly
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
to be discussed further...
src/blueapi/config.py
Outdated
return str(self.sources) == str(other.sources) | ||
return ( | ||
(str(self.sources) == str(other.sources)) | ||
and (self.facility == other.facility) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
left out visit_id here
9387506
to
c363f1b
Compare
8c2e5c1
to
c572c02
Compare
return DataCollectionIdentifier(collectionNumber=self._count) | ||
|
||
|
||
class VisitDirectoryProvider(DirectoryProvider): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Needs tests
@@ -86,6 +104,18 @@ def with_config(self, config: EnvironmentConfig) -> None: | |||
elif source.kind is SourceKind.DODAL: | |||
self.with_dodal_module(mod) | |||
|
|||
call_in_bluesky_event_loop(self.connect_devices(self.sim)) | |||
|
|||
async def connect_devices(self, sim: bool = False) -> None: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should make private and rename to _connect_ophyd_async_devices
…315) Production version of #283 This PR allows coordination with a central service for creating unique groups of data (called collections) and configures ophyd-async detectors to write their data to the same location for a given collection. Changes: - Add mechanism to preprocess all plans with set bluesky [preprocessors](https://blueskyproject.io/bluesky/plans.html#plan-preprocessors) - Create directory provider that knows how to talk to GDA's visit directory API and provide a unique collection number to group data files - Create dummy directory provider that works in a similar way without the need for an external server (useful for development) - Create preprocessor that uses the directory provider and groups detectors by staging, also bundles the data group information into run start documents on a best effort basis. - Add tests --------- Co-authored-by: Rose Yemelyanova <rose.yemelyanova@diamond.ac.uk>
willing to test with with the webcam at visr https://github.com/DiamondLightSource/ViSR @coretl and @callumforrester is this a good idea? |
This is very old and out-of-date, everything in it now exists in blueapi in some form. |
PLEASE DO NOT MERGE!
This PR is intended as a speculative prototyping exercise only.
Design
In order to comply with the DLS visit directory structure, we need to group acquisitions into files. There are many solutions to this, the basics of this one:
Design Issues
I remain very sceptical about point 2! It introduces a shared mutable state between all detectors just to get around the fact that there is no direct, universal way to pass the current visit down to them, by design! The other alternative I explored is to put all support for configuring individual detectors in one place, in blueapi, and if your detector isn't supported it will just write as it normally does. Simple example
What's Prototyped Here?
The actual contents of this PR:
data_collection_number
, which can be passed back to the visit service for more information.stage
s, it will be unaffected)How to test
Checkout this branch. To test with the dummy detector, run
To start the visit service:
It should run on port 8089, you can test it with
curl -X POST "http://localhost:8089/collection/mybeamline"
Which should create a new group and return a blob of JSON about it.
Running a scan with blueapi should now produce a start document that contains an attribute called
data_collection_number
which we can use to aggregate runs.Still to do
This PR contains a dummy detector with the correct hooks. It would be good to get this working with an actual Ophyd v2 detector, and ideally a v1 one too. We could then run a full collection via blueapi that uses this functionality @coretl @callumforrester
More discussion on point 2. of the design would be helpful, I think. @coretl @DiamondJoseph @keithralphs @callumforrester
We should try to get the nexus writer to interpret the group number in the run start document and ensure there are no problems there @DiamondJoseph
Possibly leverage the
data_session
metadata in the run start schema. @callumforrester @DiamondJoseph