Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add GEFS regression test suite from EP5r2 configuration/case #2442

Draft
wants to merge 66 commits into
base: develop
Choose a base branch
from

Conversation

NickSzapiro-NOAA
Copy link
Collaborator

@NickSzapiro-NOAA NickSzapiro-NOAA commented Sep 19, 2024

Commit Queue Requirements:

  • Fill out all sections of this template.
  • All sub component pull requests have been reviewed by their code managers.
  • Run the full Intel+GNU RT suite (compared to current baselines) on either Hera/Derecho/Hercules
  • Commit 'test_changes.list' from previous step

Description:

This PR updates the cpld_bmark_p8 tests to a prototype GEFS test case of fully coupled s2swa+IAU+stochastics physics, with configuration and warm starts from restarts of EP5r2 ensemble member 1 for 2021-03-25 06Z.

The EP5r2 test case was kindly provided by @bingfu-NOAA via @junwang-noaa with aerosol input data and configurations from @lipan-NOAA.

A separate INPUTDATA_ROOT_BMIC is no longer needed and is removed.

This PR is in a draft mode subject to meeting basic reproducibility/quality checks. The following have been tested on Hera:

  • control reproduces itself
  • restart reproduces control
  • changing number of tasks reproduces control
  • Intel debug version runs
  • GNU debug version runs
    • GNU debug on hera fails with likely openmpi error:
      140: The OSC pt2pt component does not support MPI_THREAD_MULTIPLE in this release.
      140: Workarounds are to run on a single node, or to use a system with an RDMA
      140: capable network such as Infiniband.
    • GNU debug on hercules fails with NetCDF HDF error
      Error in handle_err: get_var3_r4 get_vara_real delp_inc NetCDF: HDF error
  • Runs on supported platforms
    • Hera
    • Hercules
    • Derecho
cxil_map: write error
...
libmpi_intel.so.1  000015539CBE3611  PMPI_File_write_a     Unknown  Unknown
libpnetcdf.so.4.0  0000155393BBA048  ncmpio_read_write     Unknown  Unknown
libpnetcdf.so.4.0  0000155393BB4DA9  Unknown               Unknown  Unknown
libpnetcdf.so.4.0  0000155393BB2319  Unknown               Unknown  Unknown
libpnetcdf.so.4.0  0000155393BAFA5B  Unknown               Unknown  Unknown
libpnetcdf.so.4.0  0000155393AFA1A2  ncmpi_wait_all        Unknown  Unknown
libpioc.so         0000155399FE9E05  flush_output_buff     Unknown  Unknown
libpioc.so         0000155399FE3746  PIOc_write_darray     Unknown  Unknown
libpioc.so         0000155399FEA085  flush_buffer          Unknown  Unknown
libpioc.so         0000155399FB3955  PIOc_sync             Unknown  Unknown
libpiof.so         0000155399900160  piolib_mod_mp_fre     Unknown  Unknown
fv3.exe            0000000005B4880F  ice_history_write        1237  ice_history_write.F90
fv3.exe            000000000589BC2A  ice_history_mp_ac        4134  ice_history.F90
fv3.exe            00000000057F47B7  ice_comp_nuopc_mp         888  ice_comp_nuopc.F90
...
libmpi_intel.so.1  000014E4701213A5  PMPI_Alltoallw        Unknown  Unknown
libpioc.so         000014E46F8CE5EF  pio_swapm             Unknown  Unknown
libpioc.so         000014E46F8D0EBA  rearrange_io2comp     Unknown  Unknown
libpioc.so         000014E46F8F20BA  PIOc_read_darray      Unknown  Unknown
fv3.exe            00000000018D148B  Unknown               Unknown  Unknown
fv3.exe            0000000001610043  Unknown               Unknown  Unknown
fv3.exe            000000000160F20D  Unknown               Unknown  Unknown
fv3.exe            0000000000CC4D85  Unknown               Unknown  Unknown
fv3.exe            0000000000CBF971  Unknown               Unknown  Unknown
fv3.exe            000000000059B5B3  Unknown               Unknown  Unknown
fv3.exe            0000000001C48B10  wav_comp_nuopc_mp         823  wav_comp_nuopc.F90

This wav line calls ESMF_MeshCreate

  • No major diffs from GEFS workflow configuration

Input data is currently in user space on hera and scripts need updating once filepaths are in shared space.

Commit Message:

* UFSWM - Add GEFS regression test suite from EP5r2 configuration/case

Priority:

  • Normal

Git Tracking

UFSWM:

Sub component Pull Requests:

  • None

UFSWM Blocking Dependencies:

  • None

Changes

Regression Test Changes (Please commit test_changes.list):

  • PR Adds New Tests/Baselines.

Input data Changes:

  • New input data.

Library Changes/Upgrades:

  • No Updates

Testing Log:

  • RDHPCS
    • Hera
    • Orion
    • Hercules
    • Jet
    • Gaea
    • Derecho
  • WCOSS2
    • Dogwood/Cactus
    • Acorn
  • CI
  • opnReqTest (complete task if unnecessary)

NickSzapiro-NOAA and others added 30 commits May 6, 2024 06:24
@NickSzapiro-NOAA
Copy link
Collaborator Author

Yes, this divide by 0 error is with intel debug

COMPILE | s2swa_debug | intel | -DAPP=S2SWA -DDEBUG=ON -DCCPP_SUITES=FV3_GFS_v17_coupled_p8_ugwpv1 | | fv3 |
RUN | cpld_debug_gefs                                   | - noaacloud                          | baseline |

The only change for the debug test is to shorten forecast length fhmax, since it is slower

@weiyuan-jiang
Copy link
Collaborator

weiyuan-jiang commented Sep 20, 2024 via email

@NickSzapiro-NOAA
Copy link
Collaborator Author

NickSzapiro-NOAA commented Sep 23, 2024

@weiyuan-jiang @tclune I added

k = count(dz .LT. .001)
write(*,*) 'Gocart dz check for vanishing dz : ', k, i1,i2,j1,j2,km

There are dozens of tasks (out of 768) with small dz, with counts ranging from 1 up to 111. All have the same size
[Task 232:] 111 1 48 1 24 127
and the count on each task does not change throughout the simulation

The curiosities continue as using the alternate loop with tau(i,j,k) = vs(i,j,k)/dz(i,j,k) leads to different counts, with counts ranging up to 110...

fltng_pnt
lossless
pos_pert_fcst
12
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@NickSzapiro-NOAA It seems to me the UPP control files "postxconfig-NT-gefs.txt" and "postxconfig-NT-gefs_FH00.txt" haven't been updated to the new format. I wonder if any grib2 files (inline post results) have been successfully generated from your new RT.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you @WenMeng-NOAA . These postxconfig files are from the provided EP5r2 workflow and stopped working after UPP update in #2326.

Do you know how to reformat these?

As temporary fix, I'm using the gfs postxconfig instead. This choice happens in tests/fv3_conf/cpld_control_run.IN

#inline post
if [ $WRITE_DOPOST = .true. ]; then
  cp    ${PATHRT}/parm/post_itag_gfs itag
  cp    ${PATHRT}/parm/postxconfig-NT-gfs.txt postxconfig-NT.txt
  cp    ${PATHRT}/parm/postxconfig-NT-gfs_FH00.txt postxconfig-NT_FH00.txt
  cp    ${PATHRT}/parm/params_grib2_tbl_new params_grib2_tbl_new
  if [[ ${BMIC} == .true. ]]; then
    cp    ${PATHRT}/parm/post_itag_gefs itag
    #copied "gefs" postxconfig files not working afer UFS #2326 
    #cp    ${PATHRT}/parm/postxconfig-NT-gefs.txt postxconfig-NT.txt
    #cp    ${PATHRT}/parm/postxconfig-NT-gefs_FH00.txt postxconfig-NT_FH00.txt
    cp    ${PATHRT}/parm/postxconfig-NT-gfs.txt postxconfig-NT.txt
    cp    ${PATHRT}/parm/postxconfig-NT-gfs_FH00.txt postxconfig-NT_FH00.txt
    cp    ${PATHRT}/parm/params_grib2_tbl_new params_grib2_tbl_new
  else
    cp    ${PATHRT}/parm/post_itag_gfs itag
    cp    ${PATHRT}/parm/postxconfig-NT-gfs.txt postxconfig-NT.txt
    cp    ${PATHRT}/parm/postxconfig-NT-gfs_FH00.txt postxconfig-NT_FH00.txt
    cp    ${PATHRT}/parm/params_grib2_tbl_new params_grib2_tbl_new
  fi
fi

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@NickSzapiro-NOAA @lipan-NOAA If you provide me the UPP control files in xml format, I can generate the text files in new format for you.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@NickSzapiro-NOAA I have regenerated "postxconfig-NT-gefs.txt" with the xml file "postcntrl_gefs.xml" provided by @lipan-NOAA. Please let me know if you pick it up from Hera or other machines.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@WenMeng-NOAA Can you make PR to this branch with file changes? If not, happy to bring in from Hera

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@NickSzapiro-NOAA A PR was just submitted to your branch.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

An example run directory using the updated postxconfig files is on hera at:
/scratch1/NCEPDEV/nems/Nick.Szapiro/tasks/updateToEP5/uwm_gefs_upp/tests/run_dir/cpld_control_gefs_intel/
wgrib2 -v seems reasonable, but please let me know if we can verify contents are ok

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@NickSzapiro-NOAA Your test results look good to me, except for missing aerosol fields. I will provide you changes for generating these aerosol fields from the inline post.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I re-ran and see aerosol fields (same run_dir). Please let me know if there is anything more to resolve

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@NickSzapiro-NOAA Your test results look good to me. @lipan-NOAA Can you also validate aerosol fields in grib2 files?

MODELNAME='GFS'
/
&NAMPGB
KPO=50,PO=1000.,975.,950.,925.,900.,875.,850.,825.,800.,775.,750.,725.,700.,675.,650.,625.,600.,575.,550.,525.,500.,475.,450.,425.,400.,375.,350.,325.,300.,275.,250.,225.,200.,175.,150.,125.,100.,70.,50.,40.,30.,20.,15.,10.,7.,5.,3.,2.,1.,0.4,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@NickSzapiro-NOAA To generate aerosol fields from the inline post, please add change as:
from

KPO=50,PO=1000.,975.,950.,925.,900.,875.,850.,825.,800.,775.,750.,725.,700.,675.,650.,625.,600.,575.,550.,525.,500.,475.,450.,425.,400.,375.,350.,325.,300.,275.,250.,225.,200.,175.,150.,125.,100.,70.,50.,40.,30.,20.,15.,10.,7.,5.,3.,2.,1.,0.4,

into

KPO=50,PO=1000.,975.,950.,925.,900.,875.,850.,825.,800.,775.,750.,725.,700.,675.,650.,625.,600.,575.,550.,525.,500.,475.,450.,425.,400.,375.,350.,325.,300.,275.,250.,225.,200.,175.,150.,125.,100.,70.,50.,40.,30.,20.,15.,10.,7.,5.,3.,2.,1.,0.4,nasa_on=.true.,

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@NickSzapiro-NOAA Also copy all optics_luts_*_nasa.dat files from UPP/fix/chem to your run directory.

@junwang-noaa
Copy link
Collaborator

@weiyuan-jiang @tclune I added

k = count(dz .LT. .001)
write(*,*) 'Gocart dz check for vanishing dz : ', k, i1,i2,j1,j2,km

There are dozens of tasks (out of 768) with small dz, with counts ranging from 1 up to 111. All have the same size [Task 232:] 111 1 48 1 24 127 and the count on each task does not change throughout the simulation

The curiosities continue as using the alternate loop with tau(i,j,k) = vs(i,j,k)/dz(i,j,k) leads to different counts, with counts ranging up to 110...

@yangfanglin @

@weiyuan-jiang @tclune I added

k = count(dz .LT. .001)
write(*,*) 'Gocart dz check for vanishing dz : ', k, i1,i2,j1,j2,km

There are dozens of tasks (out of 768) with small dz, with counts ranging from 1 up to 111. All have the same size [Task 232:] 111 1 48 1 24 127 and the count on each task does not change throughout the simulation

The curiosities continue as using the alternate loop with tau(i,j,k) = vs(i,j,k)/dz(i,j,k) leads to different counts, with counts ranging up to 110...

@NickSzapiro-NOAA can you check if dz is zero on any of grid pionts since the error message is "Floating point exception: floating-point divide by zero", we may then need to track where the zero value dz comes from. Thanks

@NickSzapiro-NOAA
Copy link
Collaborator Author

NickSzapiro-NOAA commented Oct 1, 2024

The GOCART dz=0 issue seems to have been resolved at scripting level with the test case now running the specified 3 hours on Hera (taking ~40 minutes of runtime). The fix is from a commit that moves where ExtData is symbolic linked (commit), e.g.,

ln -sf stage_ExtData run_dir
...
cp run_dir/ExtData/monochromatic/optics_BC.v1_3.nc  run_dir/optics_BC.nc
cp run_dir/ExtData/monochromatic/optics_OC.v1_3.nc  run_dir/optics_OC.nc
[more cp's]
...
ln -sf stage_ExtData run_dir {*removing this line fixes crash*}

While I understand that there are subtle filesystem differences for reading files in debug mode, I don't actually know why anything has changed. The good news is that Intel debug GEFS test case runs now, at least on Hera

tests/rt.conf Outdated Show resolved Hide resolved
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@bbakernoaa Could you make these consistent with the monthly data?

@WenMeng-NOAA
Copy link
Contributor

@NickSzapiro-NOAA The UPP develop recently had a commit (a6c1a38c) thst includes aerosol fields in the UPP control files for GEFS. Could you add changes to link postxconfig-NT-gefs.txt and postxconfig-NT-gefs_FH00.txt to the files postxconfig-NT-gefs.txt and postxconfig-NT-gefs-f00.txt under parm/gefs from UPP repository? Thanks!

@NickSzapiro-NOAA
Copy link
Collaborator Author

@WenMeng-NOAA Should there be an fv3atm PR first to update the UPP hash?

@WenMeng-NOAA
Copy link
Contributor

@WenMeng-NOAA Should there be an fv3atm PR first to update the UPP hash?

@NickSzapiro-NOAA That's right.

@NickSzapiro-NOAA
Copy link
Collaborator Author

@WenMeng-NOAA Can we reduce the number of times lines like these get logged?

 820:  GEFS env var            0           0           0
 820:  Processing for GEFS and default setting is tmpl4_1 and tmpl4_11
 820:  After g2sec1 call we need to set listsec1(2) =            2
 820:  After g2sec1 call we need to set listsec1(13) =            1

They may be for development and double the size of the log file

@WenMeng-NOAA
Copy link
Contributor

@WenMeng-NOAA Can we reduce the number of times lines like these get logged?

 820:  GEFS env var            0           0           0
 820:  Processing for GEFS and default setting is tmpl4_1 and tmpl4_11
 820:  After g2sec1 call we need to set listsec1(2) =            2
 820:  After g2sec1 call we need to set listsec1(13) =            1

They may be for development and double the size of the log file

@NickSzapiro-NOAA Could you open an issue at https://github.com/NOAA-EMC/UPP/issues, so we will work on that?

@NickSzapiro-NOAA
Copy link
Collaborator Author

@WenMeng-NOAA Can we reduce the number of times lines like these get logged?

 820:  GEFS env var            0           0           0
 820:  Processing for GEFS and default setting is tmpl4_1 and tmpl4_11
 820:  After g2sec1 call we need to set listsec1(2) =            2
 820:  After g2sec1 call we need to set listsec1(13) =            1

They may be for development and double the size of the log file

@NickSzapiro-NOAA Could you open an issue at https://github.com/NOAA-EMC/UPP/issues, so we will work on that?

NOAA-EMC/UPP#1074

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Update cpld_bmark_p8 with GEFSv13 EP5 configuration Add RT test for gocart_on, gccpp_on, nasa_on
7 participants