Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Start adding tests to Derecho for mizuRoute #417

Open
ekluzek opened this issue Aug 9, 2023 · 8 comments · Fixed by #451
Open

Start adding tests to Derecho for mizuRoute #417

ekluzek opened this issue Aug 9, 2023 · 8 comments · Fixed by #451
Assignees
Labels
cesm-coupling For cesm coupling high priority Need immediate attention and fix infrastructure issues or code changes related to code organization, data structure, refactoring
Milestone

Comments

@ekluzek
Copy link
Collaborator

ekluzek commented Aug 9, 2023

Derecho is now available for general use and so we should add tests to Derecho for mizuRoute. First just assessing what tests work, and then eventually switching Cheyenne tests for Derecho.

There are three compilers for intel on Derecho: intel, intel-oneapi, and intel-classic. intel-classic will go away the soonest, so intel should be used for standard tests, but intel-oneapi for bleeding edge testing. There is also gnu, nvhpc, and nvhpc-gpu. The nvhpc compiler is the only one that can be used on the GPU's on Derecho. So we should also investigate running mizuRoute with GPU's on Derecho.

@ekluzek ekluzek added cesm-coupling For cesm coupling infrastructure issues or code changes related to code organization, data structure, refactoring labels Aug 9, 2023
@ekluzek ekluzek self-assigned this Aug 9, 2023
@ekluzek
Copy link
Collaborator Author

ekluzek commented Nov 8, 2023

CTSM issue is here.

ESCOMP/CTSM#1995

We will need a CTSM version that works with mizuRoute with updated externals for running on Derecho (and PE layouts for it).

@ekluzek
Copy link
Collaborator Author

ekluzek commented Nov 8, 2023

PE layouts don't need to be done in mizuRoute, just in CTSM. The two tasks important for Derecho for mizuRoute are:

  1. testlist for Derecho (mostly same as Cheyenne, with some tweaks for compilers)
  2. Update to CTSM version that can run on Derecho (there isn't a version yet)
  3. Also need to add Derecho for the standalone build

We figure we'll wait for CTSM to have a tag before going down this. This also would be something most efficient for @ekluzek to do. And it shouldn't be too time consuming, but time consuming enough that it's not a dead simple task.

@nmizukami will work on the standalone build.

@ekluzek ekluzek added the low priority no immediate attention needed label Nov 8, 2023
@nmizukami nmizukami added high priority Need immediate attention and fix and removed low priority no immediate attention needed labels Nov 22, 2023
@ekluzek
Copy link
Collaborator Author

ekluzek commented Jan 4, 2024

The tag ctsm5.1.dev159 is the tag to update to for CTSM. I've done the update, but running into problems because the file:

route/settings/mizuRoute_control.py

is using the "six" module and it's not avaialable in CIME anymore. It looks like it wasn't actually used, so removing it seems to work.

@nmizukami
Copy link
Collaborator

thanks Erik, I fetched add_mizuRoute branch from your repo and rebase it to my local add_mizuRoute. or should I fetch ctsm5.1.dev159 from ESCOMP/CTSM repo and then use this??

@ekluzek
Copy link
Collaborator Author

ekluzek commented Jan 4, 2024

@nmizukami I haven't pushed the branches yet. I'm seeing if I can get some tests to run first before I do that. And yes you need to fetch the add_mizuRoute branch after I've pushed the updates.

@nmizukami
Copy link
Collaborator

Ah ok. I was wondering.. i fetched and looked at add_mizuRoute and it was update 8 weeks ago last time. so just wait, and I just changed python script (i have already merged)

@ekluzek
Copy link
Collaborator Author

ekluzek commented Jan 5, 2024

OK, all of the tests fail on Derecho.

These fail because of a recognized problem with externals

Documented here: ESMCI/ccs_config_cesm#130

ERP_D_Mmpi-serial_P1x25.5x5_amazon_r05.I2000Clm50SpMizGs.derecho_intel.mizuroute-default MODEL_BUILD
ERS_D_Mmpi-serial.5x5_amazon_r05.I2000Clm50SpMizGs.derecho_intel.mizuroute-default MODEL_BUILD
ERS_D_Mmpi-serial_P1x25.5x5_amazon_r05.I2000Clm50SpMizGs.derecho_intel.mizuroute-default MODEL_BUILD
SMS_D_Mmpi-serial.5x5_amazon_r05.I2000Clm50SpMizGs.derecho_intel.mizuroute-default MODEL_BUILD
SMS_Mmpi-serial_D_P1x25.5x5_amazon_r05.I2000Clm50SpMizGs.derecho_intel.mizuroute-default MODEL_BUILD

And these fail with a timeout being given an overly generous 3:40 wallclock time...

ERI.nldas2_nldas2_rHDMA_mnldas2.I2000Clm50SpMizGs.derecho_intel.mizuroute-default RUN
ERI_Mmpi-serial.5x5_amazon_r05.I2000Clm50SpMizGs.derecho_intel.mizuroute-default RUN
ERI_PS.f19_f19_rHDMAlk_mg17.I2000Clm50SpMizGs.derecho_gnu.mizuroute-default RUN
ERS.f09_f09_mg17.I2000Clm50SpMizGs.derecho_intel.mizuroute-default RUN
ERS_PS.f19_f19_mg17.I2000Clm50SpMizGs.derecho_gnu.mizuroute-default RUN
ERS_PS.f19_f19_mg17.I2000Clm50SpMizGs.derecho_intel.mizuroute-default RUN
ERS_PS.f19_f19_rHDMAlk_mg17.I2000Clm50SpMizGs.derecho_gnu.mizuroute-default RUN
ERS_PS.f19_f19_rHDMAlk_mg17.I2000Clm50SpMizGs.derecho_intel.mizuroute-default RUN
ERS_PS.nldas2_nldas2_rHDMA_mnldas2.I2000Clm50SpMizGs.derecho_gnu.mizuroute-default RUN
ERS_PS.nldas2_nldas2_rHDMA_mnldas2.I2000Clm50SpMizGs.derecho_intel.mizuroute-default RUN
ERS_PS.nldas2_nldas2_rUSGS_mnldas2.I2000Clm50SpMizGs.derecho_gnu.mizuroute-default RUN
ERS_PS.nldas2_nldas2_rUSGS_mnldas2.I2000Clm50SpMizGs.derecho_intel.mizuroute-default RUN
PET_Mmpi-serial_P1x25.5x5_amazon_r05.I2000Clm50SpMizGs.derecho_gnu.mizuroute-default RUN
PET_Mmpi-serial_P1x25.5x5_amazon_r05.I2000Clm50SpMizGs.derecho_intel.mizuroute-default RUN
PET_P215x8.nldas2_nldas2_rHDMA_mnldas2.I2000Clm50SpMizGs.derecho_gnu.mizuroute-default RUN
PET_P215x8.nldas2_nldas2_rHDMA_mnldas2.I2000Clm50SpMizGs.derecho_intel.mizuroute-default RUN
PFS.f19_f19_rHDMA_mg17.I2000Clm50SpMizGs.derecho_gnu.mizuroute-default RUN
PFS.f19_f19_rHDMA_mg17.I2000Clm50SpMizGs.derecho_intel.mizuroute-default RUN
SMS.f09_f09_rHDMAlk_mg17.I2000Clm50SpMizGs.derecho_gnu.mizuroute-default RUN
SMS.f09_f09_rMERIT_mg17.I2000Clm50SpMizGs.derecho_gnu.mizuroute-default RUN
SMS.f19_f19_rMERIT_mg17.I2000Clm50SpMizGs.derecho_gnu.mizuroute-default RUN
SMS.f19_f19_rMERIT_mg17.I2000Clm50SpMizGs.derecho_intel.mizuroute-default RUN
SMS_D.5x5_amazon_r05.I2000Clm50SpMizGs.derecho_gnu.mizuroute-default RUN
SMS_D.5x5_amazon_r05.I2000Clm50SpMizGs.derecho_intel.mizuroute-default RUN
SMS_D.5x5_amazon_rHDMA.I2000Clm50SpMizGs.derecho_intel.mizuroute-default RUN
SMS_D.nldas2_nldas2_rUSGS_mnldas2.I2000Clm50SpMizGs.derecho_intel.mizuroute-default RUN
SMS_D_Mmpi-serial.5x5_amazon_r05.I2000Clm50SpMizGs.derecho_gnu.mizuroute-default RUN
SMS_Mmpi-serial_D_P1x25.5x5_amazon_r05.I2000Clm50SpMizGs.derecho_gnu.mizuroute-default RUN
SMS_P720x4.nldas2_nldas2_rMERIT_mnldas2.I2000Clm50SpMizGs.derecho_gnu.mizuroute-default RUN
SMS_P720x4.nldas2_nldas2_rMERIT_mnldas2.I2000Clm50SpMizGs.derecho_intel.mizuroute-default RUN
SMS_P80x18.f19_f19_rMERIT_mg17.I2000Clm50SpMizGs.derecho_intel.mizuroute-default RUN
SMS_PS.hcru_hcru_mt13.I2000Clm50SpMizGs.derecho_intel.mizuroute-hcru RUN
SMS_PS.hcru_hcru_rHDMAlk_mt13.I2000Clm50SpMizGs.derecho_intel.mizuroute-hcru RUN

Looking at /glade/derecho/scratch/erik/SMS_D.5x5_amazon_r05.I2000Clm50SpMizGs.derecho_intel.mizuroute-default.GC.mizu_c-cpln2_v211_ctsm51d159delist/run/cesm.log.2727922.desched1.240104-171614

I don't see a lot of information, but it seems to be hanging just after initialization.

@nmizukami
Copy link
Collaborator

I see this in traceback in the cesm log file - 590 rof_comp_nuopc.F90
Line 590 is
590 Mesh = ESMF_MeshCreate(filename=trim(cvalue), fileformat=ESMF_FILEFORMAT_ESMFMESH, rc=rc)

some problem in reading mesh file?

@ekluzek ekluzek added this to the cesm3.0.0 milestone Aug 7, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cesm-coupling For cesm coupling high priority Need immediate attention and fix infrastructure issues or code changes related to code organization, data structure, refactoring
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants