Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Change format of input control file to a standard format? #395

Open
ekluzek opened this issue Jun 1, 2023 · 9 comments
Open

Change format of input control file to a standard format? #395

ekluzek opened this issue Jun 1, 2023 · 9 comments

Comments

@ekluzek
Copy link
Collaborator

ekluzek commented Jun 1, 2023

The control file for mizuRoute is in a custom format rather than an industry standard format. When industry standard formats are used (such as XML, JSON, YAML, INI-config, etc.) readers and validators can also be used to handle the file, to read, write, change, and verify syntax. This can be very useful.

In bringing mizuRoute into CESM, we may get pushback unless a standard format is used. I just want to start discussing this to hear how important the file format for the control file is. If there is flexibility in the mizuRoute community maybe we'll just pick on and convert. If there is a desire to keep the same format we can start preparing our arguments. We could also possibly use different formats for standalone vs cesm-coupled.

We don't need to decide on this anytime soon, but should be thinking about it longer term.

@ekluzek
Copy link
Collaborator Author

ekluzek commented Jun 1, 2023

@nmizukami and I talked about this some yesterday. In terms of a format to pick we thought the INI config file format is preferred (this is the format used for the Externals.cfg file at the top level).

@martynpclark @ShervanGharari feel free to add your thoughts here. Thanks.

@nmizukami
Copy link
Collaborator

Personally, I am ok with INI-config or Yaml. These are also more modern? (than namelist - this is ok to me). The format like JSON requires opening and closing brackets and also quote for string, so looks cumbersome and more mistake-prone.

So like Erik said, this is not urgent, and for now current control file works just fine with cesm coupling.

@andywood
Copy link

andywood commented Jun 1, 2023

Hi Erik,
Good topic - it's worth making good choices since the control file is often the most handled in workflows that use a model.
I think some requirements are to be: commentable (rules out JSON), easily human readable/editable and concise/non-cluttered (rules out XML), machine-readable/writable w/ common open source libraries & code (eg python), keyword rather than order-controlled, and also ideally handles array and structured/hierarchical entries. I'm not familiar with the INI format but it looks like it might be suitable and consistency with other CESM models also gets a heavy weight here. For our new PyGMET program we adopted a TOML format, but if INI looks better I'd be open to switching. -Andy

@ekluzek
Copy link
Collaborator Author

ekluzek commented Jun 1, 2023

From wikipedia, it looks like INI config files are almost identical to TOML. But, TOML is standardized, so if we convert it would be the leading candidate.

Here's the quote from wikipedia...

"TOML's syntax primarily consists of key = value pairs, [section names], and # (for comments). TOML's syntax somewhat resembles that of .INI files, but it includes a formal specification, whereas the INI file format suffers from many competing variants."

@martynpclark
Copy link
Collaborator

Andy's framing of the problem is excellent. I do not have experience with TOML -- on first glace it looks really good.

@ShervanGharari
Copy link
Collaborator

TOML looks very good! One advantage that comes to mind for TOML in comparison with the existing control file is the modularity in reading variables or groups of variables. Flags can be sorted, and if a flag such as lake is True then it is easier to check if needed variables for lakes are given and populated as a subset of network topology or forcing. With a structured config file, it can be easier to have these checks in a centralized location. Just my opinion working with lake parameters (that are sometimes many).

@ekluzek
Copy link
Collaborator Author

ekluzek commented Jun 5, 2023

I looked more into this for bringing it into CESM/cime, and one caveat is that in python the TOML library is a non-standard library you have to include in your environment. In cime we've tried to make python be limited to the python standard library. We could set it up so that TOML was only required if mizuRoute is used, but you would need to have a python environment that included it. Using conda that would be simple, but it is an extra step. On cheyenne you could use the npl environment, so it would be easy to setup. But, this is an additional concern.

I'm going to propose that we add TOML as an option to cime, so this is a bit of pushback that I do expect to get from that.

@andywood
Copy link

andywood commented Jun 6, 2023

Hi Erik, thanks for looking deeper into this. I understand the desire to use a standard library stack (I have that desire myself in creating software). Perhaps this opens up the question of whether CESM/cime could have a defined 'extended library', a kind of supplemental stack that could evolve a bit ahead of the standard library. I don't know enough about the pros/cons to know if that would be radical or incremental to their plans. If TOML is a bridge too far, we can just go with INI. I like the extension, actually.

@andywood
Copy link

andywood commented Jun 6, 2023

Also the fact that TOML is in NPL does make it easy to use for those on Cheyenne (nothing overly custom).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

5 participants