You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
What problem will this feature solve?
The upcoming GFDL model run to generate POD sample data (diag table info was added for documentation purposes in #269) will only output data at a minimum 6-hourly frequency, to save data size. The two current high-frequency PODs, convective_transition_diag and precip_diurnal_cycle, can in principle do a valid analysis on data of this frequency, but are currently set to request data at 1hr and 3hr frequencies respectively.
In order to run these PODs on the sample model data being generated, the POD settings file format needs to be extended to allow PODs to request data in a range of acceptable frequencies, and the data query logic needs to be extended to execute that query.
Describe the solution you'd like
The user-facing changes have been described in the docs for some time, but the feature hasn't been implemented in the framework's data query logic. Each varlist entry in the POD settings file can have optional min_frequency and max_frequency attributes to specify a range of acceptable data frequencies, as an alternative to the currently recognized frequency attribute.
Input parsing: I believe the code to parse these settings from the json file is already functional.
Input validation: verify min_frequency <= frequency <= max_frequency for each varlist entry.
Query rewriting: We would like PODs to be able to specify frequency to identify a preferred frequency for data, with the min_frequency-max_frequency range defining a fallback option if data at frequency is not available. The general mechanism for doing so is specifying alternate VarlistEntries, via the edit_request() method on the preprocessor. For VarlistEntries with both frequency and min_frequency-max_frequency specified, this would need to insert an alternate with the min_frequency-max_frequency range after every alternate in the linked list of alternates. This would happen after edit_request() is called, since it's preprocessor-independent.
Query logic: querying on the min_frequency-max_frequency range has been implemented but not tested.
Query tiebreaker logic: this is necessary to handle the case in which the query finds multiple variables with frequency in the min_frequency-max_frequency range. This should be done by defining a base class for the ExperimentSelectionMixin classes in data_sources.py, and defining a resolve_var_expt() method acting on the DataFrame of data catalog entries to select the row with the desired frequency (presumably the highest available within the range.)
POD compatibility: The code for convective_transition_diag and precip_diurnal_cycle should be checked to verify that these PODs properly deal with data at different frequencies -- the claim above is based on the PODs' documentation only and hasn't been substantiated.
Describe alternatives you've considered
N/A
Additional context
The text was updated successfully, but these errors were encountered:
What problem will this feature solve?
The upcoming GFDL model run to generate POD sample data (diag table info was added for documentation purposes in #269) will only output data at a minimum 6-hourly frequency, to save data size. The two current high-frequency PODs, convective_transition_diag and precip_diurnal_cycle, can in principle do a valid analysis on data of this frequency, but are currently set to request data at 1hr and 3hr frequencies respectively.
In order to run these PODs on the sample model data being generated, the POD settings file format needs to be extended to allow PODs to request data in a range of acceptable frequencies, and the data query logic needs to be extended to execute that query.
Describe the solution you'd like
The user-facing changes have been described in the docs for some time, but the feature hasn't been implemented in the framework's data query logic. Each
varlist
entry in the POD settings file can have optionalmin_frequency
andmax_frequency
attributes to specify a range of acceptable data frequencies, as an alternative to the currently recognizedfrequency
attribute.min_frequency
<=frequency
<=max_frequency
for each varlist entry.frequency
to identify a preferred frequency for data, with themin_frequency
-max_frequency
range defining a fallback option if data atfrequency
is not available. The general mechanism for doing so is specifying alternate VarlistEntries, via the edit_request() method on the preprocessor. For VarlistEntries with bothfrequency
andmin_frequency
-max_frequency
specified, this would need to insert an alternate with themin_frequency
-max_frequency
range after every alternate in the linked list of alternates. This would happen after edit_request() is called, since it's preprocessor-independent.min_frequency
-max_frequency
range has been implemented but not tested.min_frequency
-max_frequency
range. This should be done by defining a base class for the ExperimentSelectionMixin classes in data_sources.py, and defining a resolve_var_expt() method acting on the DataFrame of data catalog entries to select the row with the desired frequency (presumably the highest available within the range.)Describe alternatives you've considered
N/A
Additional context
The text was updated successfully, but these errors were encountered: