Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ZStd with Netcdf #2444

Draft
wants to merge 5 commits into
base: develop
Choose a base branch
from

Conversation

BrianCurtis-NOAA
Copy link
Collaborator

Commit Queue Requirements:

  • Fill out all sections of this template.
  • All sub component pull requests have been reviewed by their code managers.
  • Run the full Intel+GNU RT suite (compared to current baselines) on either Hera/Derecho/Hercules
  • Commit 'test_changes.list' from previous step

Description:

This PR brings in the zstd library and with it the netcdf support for it.

Commit Message:

* UFSWM - Add ZStd library and enable netcdf support for it.

Priority:

  • Normal

Git Tracking

UFSWM:

Sub component Pull Requests:

  • None

UFSWM Blocking Dependencies:

  • None

Changes

Regression Test Changes (Please commit test_changes.list):

  • PR Adds New Tests/Baselines.
  • PR Updates/Changes Baselines.
  • No Baseline Changes.

Input data Changes:

  • None.

Library Changes/Upgrades:

  • No Updates

Testing Log:

  • RDHPCS
    • Hera
    • Orion
    • Hercules
    • Jet
    • Gaea
    • Derecho
  • WCOSS2
    • Dogwood/Cactus
    • Acorn
  • CI
  • opnReqTest (complete task if unnecessary)

@BrianCurtis-NOAA
Copy link
Collaborator Author

I've generated new baselines with the test_changes.list and they generated OK and passed using those in comparison (-c and -m).

@junwang-noaa I'm trying to recall, were baselines changes acceptable, or do I need to make tests/tests file changes to ensure no baselines are changing in the end?

@junwang-noaa
Copy link
Collaborator

I don't expect those tests to change results. Would you please check which files are changed in cpld_control_p8 intel? It seems these tests have gocart, but the zstd Netcdf should not impact those tests.

@BrianCurtis-NOAA
Copy link
Collaborator Author

@junwang-noaa

baseline dir = /lfs/h2/emc/nems/noscrub/emc.nems/RT/NEMSfv3gfs/develop-20240909/cpld_control_p8_intel
working dir  = /lfs/h2/emc/ptmp/brian.curtis/FV3_RT/rt_255026/cpld_control_p8_intel
Checking test cpld_control_p8_intel results ....
 Comparing sfcf021.tile1.nc .....USING NCCMP......OK
 Comparing sfcf021.tile2.nc .....USING NCCMP......OK
 Comparing sfcf021.tile3.nc .....USING NCCMP......OK
 Comparing sfcf021.tile4.nc .....USING NCCMP......OK
 Comparing sfcf021.tile5.nc .....USING NCCMP......OK
 Comparing sfcf021.tile6.nc .....USING NCCMP......OK
 Comparing atmf021.tile1.nc .....USING NCCMP......NOT IDENTICAL
 Comparing atmf021.tile2.nc .....USING NCCMP......NOT IDENTICAL
 Comparing atmf021.tile3.nc .....USING NCCMP......NOT IDENTICAL
 Comparing atmf021.tile4.nc .....USING NCCMP......NOT IDENTICAL
 Comparing atmf021.tile5.nc .....USING NCCMP......NOT IDENTICAL
 Comparing atmf021.tile6.nc .....USING NCCMP......NOT IDENTICAL
 Comparing sfcf024.tile1.nc .....USING NCCMP......OK
 Comparing sfcf024.tile2.nc .....USING NCCMP......OK
 Comparing sfcf024.tile3.nc .....USING NCCMP......OK
 Comparing sfcf024.tile4.nc .....USING NCCMP......OK
 Comparing sfcf024.tile5.nc .....USING NCCMP......OK
 Comparing sfcf024.tile6.nc .....USING NCCMP......OK
 Comparing atmf024.tile1.nc .....USING NCCMP......NOT IDENTICAL
 Comparing atmf024.tile2.nc .....USING NCCMP......NOT IDENTICAL
 Comparing atmf024.tile3.nc .....USING NCCMP......NOT IDENTICAL
 Comparing atmf024.tile4.nc .....USING NCCMP......NOT IDENTICAL
 Comparing atmf024.tile5.nc .....USING NCCMP......NOT IDENTICAL
 Comparing atmf024.tile6.nc .....USING NCCMP......NOT IDENTICAL
 Comparing gocart.inst_aod.20210323_0600z.nc4 .....USING NCCMP......NOT IDENTICAL
 Comparing RESTART/20210323.060000.coupler.res .....USING CMP......OK
 Comparing RESTART/20210323.060000.fv_core.res.nc .....USING NCCMP......OK
 Comparing RESTART/20210323.060000.fv_core.res.tile1.nc .....USING NCCMP......OK
 Comparing RESTART/20210323.060000.fv_core.res.tile2.nc .....USING NCCMP......OK
 Comparing RESTART/20210323.060000.fv_core.res.tile3.nc .....USING NCCMP......OK
 Comparing RESTART/20210323.060000.fv_core.res.tile4.nc .....USING NCCMP......OK
 Comparing RESTART/20210323.060000.fv_core.res.tile5.nc .....USING NCCMP......OK
 Comparing RESTART/20210323.060000.fv_core.res.tile6.nc .....USING NCCMP......OK
 Comparing RESTART/20210323.060000.fv_srf_wnd.res.tile1.nc .....USING NCCMP......OK
 Comparing RESTART/20210323.060000.fv_srf_wnd.res.tile2.nc .....USING NCCMP......OK
 Comparing RESTART/20210323.060000.fv_srf_wnd.res.tile3.nc .....USING NCCMP......OK
 Comparing RESTART/20210323.060000.fv_srf_wnd.res.tile4.nc .....USING NCCMP......OK
 Comparing RESTART/20210323.060000.fv_srf_wnd.res.tile5.nc .....USING NCCMP......OK
 Comparing RESTART/20210323.060000.fv_srf_wnd.res.tile6.nc .....USING NCCMP......OK
 Comparing RESTART/20210323.060000.fv_tracer.res.tile1.nc .....USING NCCMP......NOT IDENTICAL
 Comparing RESTART/20210323.060000.fv_tracer.res.tile2.nc .....USING NCCMP......NOT IDENTICAL
 Comparing RESTART/20210323.060000.fv_tracer.res.tile3.nc .....USING NCCMP......NOT IDENTICAL
 Comparing RESTART/20210323.060000.fv_tracer.res.tile4.nc .....USING NCCMP......NOT IDENTICAL
 Comparing RESTART/20210323.060000.fv_tracer.res.tile5.nc .....USING NCCMP......NOT IDENTICAL
 Comparing RESTART/20210323.060000.fv_tracer.res.tile6.nc .....USING NCCMP......NOT IDENTICAL
 Comparing RESTART/20210323.060000.phy_data.tile1.nc .....USING NCCMP......OK
 Comparing RESTART/20210323.060000.phy_data.tile2.nc .....USING NCCMP......OK
 Comparing RESTART/20210323.060000.phy_data.tile3.nc .....USING NCCMP......OK
 Comparing RESTART/20210323.060000.phy_data.tile4.nc .....USING NCCMP......OK
 Comparing RESTART/20210323.060000.phy_data.tile5.nc .....USING NCCMP......OK
 Comparing RESTART/20210323.060000.phy_data.tile6.nc .....USING NCCMP......OK
 Comparing RESTART/20210323.060000.sfc_data.tile1.nc .....USING NCCMP......OK
 Comparing RESTART/20210323.060000.sfc_data.tile2.nc .....USING NCCMP......OK
 Comparing RESTART/20210323.060000.sfc_data.tile3.nc .....USING NCCMP......OK
 Comparing RESTART/20210323.060000.sfc_data.tile4.nc .....USING NCCMP......OK
 Comparing RESTART/20210323.060000.sfc_data.tile5.nc .....USING NCCMP......OK
 Comparing RESTART/20210323.060000.sfc_data.tile6.nc .....USING NCCMP......OK
 Comparing RESTART/20210323.060000.MOM.res.nc .....USING NCCMP......OK
 Comparing RESTART/iced.2021-03-23-21600.nc .....USING NCCMP......OK
 Comparing RESTART/ufs.cpld.cpl.r.2021-03-23-21600.nc .....USING NCCMP......OK
 Comparing 20210323.060000.out_pnt.ww3 .....USING CMP......OK
 Comparing 20210323.060000.out_grd.ww3 .....USING CMP......OK

@junwang-noaa
Copy link
Collaborator

junwang-noaa commented Sep 24, 2024

Thanks, Brain. @lipan-NOAA @bbakernoaa @weiyuan-jiang @tclune may I ask if you have any idea why the GOCART results are changed when the netcdf library is built with zstandard?

@mathomp4
Copy link

mathomp4 commented Sep 25, 2024

@junwang-noaa I honestly have no idea why as zstandard should be lossless.

I'm currently doing a test build of GEOS with preliminary zstandard support. Once I get that working, I'll see what I can see with a run of GEOS+GOCART. (I'll go with level 5 as I think that is what you are using...)

NOTE: There is no zstandard support in MAPL yet, so whatever you are producing is either compressed offline or not compressed at all with zstandard, I suppose.

@mathomp4
Copy link

Update. I was able to get zstandard support into MAPL. And I do not see any difference in the output.

I had history output a 3d GOCART collection as uncompressed (tavg3d_aer_p_uncompress), with deflate compression (tavg3d_aer_p) and with zstandard compression (tavg3d_aer_p_zstd). And as can be seen the compressed ones are smaller:

❯ lt addzstd-2024Sep25-1day-c24.tavg3d_aer_p*
Permissions Size User     Group Date Modified    Name
.rw-r--r--@  13M mathomp4 staff 2024-09-25 10:55  addzstd-2024Sep25-1day-c24.tavg3d_aer_p_uncompress.20000414_2230z.nc4
.rw-r--r--@ 9.3M mathomp4 staff 2024-09-25 10:55  addzstd-2024Sep25-1day-c24.tavg3d_aer_p.20000414_2230z.nc4
.rw-r--r--@ 9.4M mathomp4 staff 2024-09-25 10:55  addzstd-2024Sep25-1day-c24.tavg3d_aer_p_zstd.20000414_2230z.nc4

Now, comparing with nccmp we see no data differences:

❯ nccmp -dmfsB addzstd-2024Sep25-1day-c24.tavg3d_aer_p_uncompress.20000414_2230z.nc4 addzstd-2024Sep25-1day-c24.tavg3d_aer_p.20000414_2230z.nc4
Files "addzstd-2024Sep25-1day-c24.tavg3d_aer_p_uncompress.20000414_2230z.nc4" and "addzstd-2024Sep25-1day-c24.tavg3d_aer_p.20000414_2230z.nc4" are identical.

❯ nccmp -dmfsB addzstd-2024Sep25-1day-c24.tavg3d_aer_p_uncompress.20000414_2230z.nc4 addzstd-2024Sep25-1day-c24.tavg3d_aer_p_zstd.20000414_2230z.nc4
Files "addzstd-2024Sep25-1day-c24.tavg3d_aer_p_uncompress.20000414_2230z.nc4" and "addzstd-2024Sep25-1day-c24.tavg3d_aer_p_zstd.20000414_2230z.nc4" are identical.

❯ nccmp -dmfsB addzstd-2024Sep25-1day-c24.tavg3d_aer_p.20000414_2230z.nc4 addzstd-2024Sep25-1day-c24.tavg3d_aer_p_zstd.20000414_2230z.nc4
Files "addzstd-2024Sep25-1day-c24.tavg3d_aer_p.20000414_2230z.nc4" and "addzstd-2024Sep25-1day-c24.tavg3d_aer_p_zstd.20000414_2230z.nc4" are identical.

Now, technically if you turn on comparison of global metadata there is one difference:

❯ nccmp -dmgfsB addzstd-2024Sep25-1day-c24.tavg3d_aer_p_uncompress.20000414_2230z.nc4 addzstd-2024Sep25-1day-c24.tavg3d_aer_p_zstd.20000414_2230z.nc4
DIFFER : LENGTHS OF GLOBAL ATTRIBUTE : Filename : 23 <> 17 : VALUES : tavg3d_aer_p_uncompress <> tavg3d_aer_p_zstd

but that's just the Filename metadata.

@junwang-noaa
Copy link
Collaborator

@mathomp4 Thanks for the testing. In our test case, we just write out the fields without doing any compression:

ideflate= 0
quantize_mode=quantize_bitround quantize_nsd= 0
zstandard_level= 0

Also I do see the aerosol fields, e.g. dms, are different:

8122c8122
<     8.579448e-32, 1.445623e-31,
---
>     8.579447e-32, 1.445623e-31,
8127c8127
<     5.531654e-32, 0, 0, 0, 0, 0, 6.213818e-32, 1.554074e-31, 1.65499e-31,
---
>     5.531652e-32, 0, 0, 0, 0, 0, 6.213818e-32, 1.554074e-31, 1.65499e-31,
...

Any clue?

@mathomp4
Copy link

I don't know your system but does this:

quantize_mode=quantize_bitround quantize_nsd= 0

mean you are not quantizing? In MAPL, I don't think we allow for a mode to be set without an nsd also set.

But beyond that, I can't see how MAPL would care about zstandard in your case since I only wrote in zstandard support today!

Are you reading any zstandard compressed files? Even then, ExtData shouldn't care since at that point we depend on netCDF to read in files correctly.

And I looked and MAPL hasn't done anything in recent times that should change answers. The last non-zero-diff change was a bug fix bit-shaved binary output which was a weird odd case someone reported (we don't do binary output much).

@mathomp4
Copy link

Well, one note is that you are running with Intel 19 it looks like. We haven't used that in years (5, 6 years?) so it's possible MAPL is interacting badly with it? We would never test it. (Heck our latest machines don't have anything newer than Intel 2022...and even that was a "we'll install for you this time")

I know @AlexanderRichert-NOAA also was having issues with Intel 19 and MAPL, but that was at unit test time and it was a test about reading in CS data. Are you ingesting cubed-sphere input data?

@junwang-noaa
Copy link
Collaborator

junwang-noaa commented Sep 27, 2024

Yes, we do ingest cubed sphere grid. Since there is only last digit change in GOCART fields and we won't be able to move to higher version Intel compiler with this PR, and we have a list of tests with results change, I think we can move on with this PR.

@BrianCurtis-NOAA
Copy link
Collaborator Author

@jkbk2004 all RDHPCS spack installs have netcdf with zstd, correct?
@AlexanderRichert-NOAA does the Acorn spack install have netcdf with zstd?

@jkbk2004
Copy link
Collaborator

@RatkoVasic-NOAA @ulmononian Please, make sure the netcdf with zstd is available in the current version of the spack stack on RDHPCS machines.

@BrianCurtis-NOAA
Copy link
Collaborator Author

BrianCurtis-NOAA commented Sep 27, 2024

I'm getting this error:

+ mpiexec -n 256 -ppn 128 -depth 1 ./fv3.exe
 file: module_write_netcdf.F90 line:          424
 NetCDF: Filter error: undefined filter encountered

on these tests:

control_wrtGauss_netcdf_parallel_intel
control_p8_intel
control_p8.v2.sfc_intel
regional_netcdf_parallel_intel
rrfs_v1beta_intel
control_wam_intel
control_wrtGauss_netcdf_parallel_debug_intel
control_debug_p8_intel
control_wam_debug_intel

line 424 of module_write_netcdf is:
https://github.com/NOAA-EMC/fv3atm/blob/a9364591091c836984a40107729720705847c195/io/module_write_netcdf.F90#L424

@jkbk2004
Copy link
Collaborator

I'm getting this error:

+ mpiexec -n 256 -ppn 128 -depth 1 ./fv3.exe
 file: module_write_netcdf.F90 line:          424
 NetCDF: Filter error: undefined filter encountered

on these tests:

control_wrtGauss_netcdf_parallel_intel
control_p8_intel
control_p8.v2.sfc_intel
regional_netcdf_parallel_intel
rrfs_v1beta_intel
control_wam_intel
control_wrtGauss_netcdf_parallel_debug_intel
control_debug_p8_intel
control_wam_debug_intel

line 424 of module_write_netcdf is: https://github.com/NOAA-EMC/fv3atm/blob/a9364591091c836984a40107729720705847c195/io/module_write_netcdf.F90#L424

@BrianCurtis-NOAA which machine?

@BrianCurtis-NOAA
Copy link
Collaborator Author

WCOSS2

@junwang-noaa
Copy link
Collaborator

@BrianCurtis-NOAA where is your run directory?

@BrianCurtis-NOAA
Copy link
Collaborator Author

@BrianCurtis-NOAA where is your run directory?

/lfs/h2/emc/ptmp/brian.curtis/FV3_RT/rt_259224

@junwang-noaa
Copy link
Collaborator

Thanks, Brain. Have you run these tests before? I saw your comments: "I've generated new baselines with the test_changes.list and they generated OK and passed using those in comparison (-c and -m)"

@DusanJovic-NOAA would you please take a look? Is it OK to set quantize_nsd 0? I see in control_wrtGauss_netcdf_parallel_intel, we have:

quilting:                .true.
quilting_restart:        .true.
write_groups:            1
write_tasks_per_group:   6
itasks:                  1
output_history:          .true.
history_file_on_native_grid: .false.
write_dopost:            .true.
write_nsflip:            .true.
num_files:               2
filename_base:           'atm' 'sfc'
output_grid:             gaussian_grid
output_file:             'netcdf'
zstandard_level:         5
ideflate:                0
quantize_mode:           'quantize_bitround'
quantize_nsd:            0

@RatkoVasic-NOAA
Copy link
Collaborator

@RatkoVasic-NOAA @ulmononian Please, make sure the netcdf with zstd is available in the current version of the spack stack on RDHPCS machines.

I checked on 5 machines, it looks OK:

Hera:
[role.epic]# grep zstd /scratch1/NCEPDEV/nems/role.epic/spack-stack/spack-stack-1.6.0/envs/fms-2024.01/install/modulefiles/intel-oneapi-mpi/2021.5.1/intel/2021.5.0/netcdf-c/4.9.2.lua
-- netcdf-c@4.9.2%intel@2021.5.0+blosc~byterange+dap~fsync~hdf4~jna+mpi~nczarr_zip+optimize~parallel-netcdf+pic+shared+szip+zstd build_system=autotools patches=0161eb8 arch=linux-rocky8-haswell/ejp7j3k
depends_on("zstd/1.5.2")

Jet
[role.epic]$ grep zstd /contrib/spack-stack/spack-stack-1.6.0/envs/fms-2024.01/install/modulefiles/intel-oneapi-mpi/2021.5.1/intel/2021.5.0/netcdf-c/4.9.2.lua
-- netcdf-c@4.9.2%intel@2021.5.0+blosc~byterange+dap~fsync~hdf4~jna+mpi~nczarr_zip+optimize~parallel-netcdf+pic+shared+szip+zstd build_system=autotools patches=0161eb8 arch=linux-rocky8-core2/wxvro24
depends_on("zstd/1.5.2")

Gaea:
[role.epic]# grep zstd /ncrc/proj/epic/spack-stack/spack-stack-1.6.0/envs/fms-2024.01/install/modulefiles/cray-mpich/8.1.25/intel/2023.1.0/netcdf-c/4.9.2.lua
-- netcdf-c@4.9.2%intel@2023.1.0+blosc~byterange+dap~fsync~hdf4~jna+mpi~nczarr_zip+optimize~parallel-netcdf+pic+shared+szip+zstd build_system=autotools patches=0161eb8 arch=linux-sles15-zen2/zo6ia6l
depends_on("zstd/1.5.2")

Hercules:
[role-epic]# grep zstd /work/noaa/epic/role-epic/spack-stack/hercules/spack-stack-1.6.0/envs/fms-2024.01/install/modulefiles/intel-oneapi-mpi/2021.9.0/intel/2021.9.0/netcdf-c/4.9.2.lua
-- netcdf-c@4.9.2%intel@2021.9.0+blosc~byterange+dap~fsync~hdf4~jna+mpi~nczarr_zip+optimize~parallel-netcdf+pic+shared+szip+zstd build_system=autotools patches=0161eb8 arch=linux-rocky9-icelake/tslbcfy
depends_on("zstd/1.5.2")

Orion:
[role-epic]$ grep zstd /work/noaa/epic/role-epic/spack-stack/orion/spack-stack-1.6.0/envs/fms-2024.01/install/modulefiles/intel-oneapi-mpi/2021.9.0/intel/2021.9.0/netcdf-c/4.9.2.lua
-- netcdf-c@4.9.2%intel@2021.9.0+blosc~byterange+dap~fsync~hdf4~jna+mpi~nczarr_zip+optimize~parallel-netcdf+pic+shared+szip+zstd build_system=autotools patches=0161eb8 arch=linux-rocky9-skylake_avx512/oup2wyi
depends_on("zstd/1.5.2")

@BrianCurtis-NOAA
Copy link
Collaborator Author

@junwang-noaa The first run was for seeing what was goign to change before adjusting IDEFLATE to ZSTANDARD_LEVEL, These next failed tests are issues arising after switching IDEFLATE to ZSTANDARD_LEVEL.

@junwang-noaa
Copy link
Collaborator

I see,thanks.

@AlexanderRichert-NOAA
Copy link
Collaborator

@BrianCurtis-NOAA yes, spack-stack installs netcdf with zstd support (including on acorn).

@DusanJovic-NOAA
Copy link
Collaborator

Thanks, Brain. Have you run these tests before? I saw your comments: "I've generated new baselines with the test_changes.list and they generated OK and passed using those in comparison (-c and -m)"

@DusanJovic-NOAA would you please take a look? Is it OK to set quantize_nsd 0? I see in control_wrtGauss_netcdf_parallel_intel, we have:

quilting:                .true.
quilting_restart:        .true.
write_groups:            1
write_tasks_per_group:   6
itasks:                  1
output_history:          .true.
history_file_on_native_grid: .false.
write_dopost:            .true.
write_nsflip:            .true.
num_files:               2
filename_base:           'atm' 'sfc'
output_grid:             gaussian_grid
output_file:             'netcdf'
zstandard_level:         5
ideflate:                0
quantize_mode:           'quantize_bitround'
quantize_nsd:            0

Setting quantize_nsd to zero turns off quantization.

@DusanJovic-NOAA
Copy link
Collaborator

DusanJovic-NOAA commented Sep 27, 2024

I'm getting this error:

+ mpiexec -n 256 -ppn 128 -depth 1 ./fv3.exe
 file: module_write_netcdf.F90 line:          424
 NetCDF: Filter error: undefined filter encountered

on these tests:

control_wrtGauss_netcdf_parallel_intel
control_p8_intel
control_p8.v2.sfc_intel
regional_netcdf_parallel_intel
rrfs_v1beta_intel
control_wam_intel
control_wrtGauss_netcdf_parallel_debug_intel
control_debug_p8_intel
control_wam_debug_intel

line 424 of module_write_netcdf is: https://github.com/NOAA-EMC/fv3atm/blob/a9364591091c836984a40107729720705847c195/io/module_write_netcdf.F90#L424

NetCDF: Filter error: undefined filter encountered

This error means that the netcdf library does not support zstd filter.

@BrianCurtis-NOAA
Copy link
Collaborator Author

@DusanJovic-NOAA does it make sense only for those tests? There are many others that use ZSTANDARD_LEVEL.

@DusanJovic-NOAA
Copy link
Collaborator

@DusanJovic-NOAA does it make sense only for those tests? There are many others that use ZSTANDARD_LEVEL.

Where is your run directory?

@BrianCurtis-NOAA
Copy link
Collaborator Author

@DusanJovic-NOAA does it make sense only for those tests? There are many others that use ZSTANDARD_LEVEL.

Where is your run directory?

/lfs/h2/emc/ptmp/brian.curtis/FV3_RT/rt_259224

@DusanJovic-NOAA
Copy link
Collaborator

@DusanJovic-NOAA does it make sense only for those tests? There are many others that use ZSTANDARD_LEVEL.

Where is your run directory?

/lfs/h2/emc/ptmp/brian.curtis/FV3_RT/rt_259224

Ok. Which tests use zstd and ran successfully?

@BrianCurtis-NOAA
Copy link
Collaborator Author

@DusanJovic-NOAA does it make sense only for those tests? There are many others that use ZSTANDARD_LEVEL.

Where is your run directory?

/lfs/h2/emc/ptmp/brian.curtis/FV3_RT/rt_259224

Ok. Which tests use zstd and ran successfully?

all of the tests listed in the changes files list in github are what i've changed to use ZSTANDARD_LEVEL=5

@BrianCurtis-NOAA
Copy link
Collaborator Author

looks like conus13km_control passed

@DusanJovic-NOAA
Copy link
Collaborator

@DusanJovic-NOAA does it make sense only for those tests? There are many others that use ZSTANDARD_LEVEL.

Where is your run directory?

/lfs/h2/emc/ptmp/brian.curtis/FV3_RT/rt_259224

Ok. Which tests use zstd and ran successfully?

all of the tests listed in the changes files list in github are what i've changed to use ZSTANDARD_LEVEL=5

cpld_control_p8_intel is listed in the changes file, but it does not use zstd:

$ grep zstandard_level /lfs/h2/emc/ptmp/brian.curtis/FV3_RT/rt_259224/cpld_control_p8_intel/model_configure
zstandard_level:         0

@DusanJovic-NOAA
Copy link
Collaborator

Only these tests are using zstd:

$ grep zstandard_level */model_configure | grep ' 5'
control_debug_p8_intel/model_configure:zstandard_level:         5
control_p8_intel/model_configure:zstandard_level:         5
control_p8.v2.sfc_intel/model_configure:zstandard_level:         5
control_wam_debug_intel/model_configure:zstandard_level:         5
control_wam_intel/model_configure:zstandard_level:         5
control_wrtGauss_netcdf_parallel_debug_intel/model_configure:zstandard_level:         5
control_wrtGauss_netcdf_parallel_intel/model_configure:zstandard_level:         5
regional_netcdf_parallel_intel/model_configure:zstandard_level:         5
rrfs_v1beta_intel/model_configure:zstandard_level:         5

@BrianCurtis-NOAA
Copy link
Collaborator Author

@DusanJovic-NOAA does it make sense only for those tests? There are many others that use ZSTANDARD_LEVEL.

Where is your run directory?

/lfs/h2/emc/ptmp/brian.curtis/FV3_RT/rt_259224

Ok. Which tests use zstd and ran successfully?

all of the tests listed in the changes files list in github are what i've changed to use ZSTANDARD_LEVEL=5

cpld_control_p8_intel is listed in the changes file, but it does not use zstd:

$ grep zstandard_level /lfs/h2/emc/ptmp/brian.curtis/FV3_RT/rt_259224/cpld_control_p8_intel/model_configure
zstandard_level:         0

right, i mean the github list of changes files, not test_changes.list. That one has a end bit wrong from MAPL, but we are OK with that for now. try conus13km_control

@BrianCurtis-NOAA
Copy link
Collaborator Author

that is weird, OK. https://github.com/ufs-community/ufs-weather-model/pull/2444/files#diff-7c4488807d15b8e4cd0ac60d420a1857ef84720e5287db3858e8401b5b1450b0 i've added WCOSS2 to that list, so it should be using it.. Why wouldn't it make it to model_configure..

@DusanJovic-NOAA
Copy link
Collaborator

conus13km_control

@DusanJovic-NOAA does it make sense only for those tests? There are many others that use ZSTANDARD_LEVEL.

Where is your run directory?

/lfs/h2/emc/ptmp/brian.curtis/FV3_RT/rt_259224

Ok. Which tests use zstd and ran successfully?

all of the tests listed in the changes files list in github are what i've changed to use ZSTANDARD_LEVEL=5

cpld_control_p8_intel is listed in the changes file, but it does not use zstd:

$ grep zstandard_level /lfs/h2/emc/ptmp/brian.curtis/FV3_RT/rt_259224/cpld_control_p8_intel/model_configure
zstandard_level:         0

right, i mean the github list of changes files, not test_changes.list. That one has a end bit wrong from MAPL, but we are OK with that for now. try conus13km_control

$ grep zstandard_level /lfs/h2/emc/ptmp/brian.curtis/FV3_RT/rt_259224/conus13km_control_intel/model_configure

@DusanJovic-NOAA
Copy link
Collaborator

Only 9 tests have zstd enabled, and I think those 9 tests are the one that failed. Which indicates the problem with netcdf installation to me.

@DusanJovic-NOAA
Copy link
Collaborator

Try to run on some other machine, like Hera for example.

@BrianCurtis-NOAA
Copy link
Collaborator Author

@Hang-Lei-NOAA can you double check everything looks OK on WCOSS2 install for netcdf with zstd.

@DusanJovic-NOAA
Copy link
Collaborator

This module:

$ module show netcdf/4.9.2
-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
   /lfs/h2/emc/eib/save/hang.lei/forgdit/nco_wcoss2/install2/modulefiles/mpi/intel/19.1.3.304/cray-mpich/8.1.12/netcdf/4.9.2.lua:
-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
help([[]])
conflict("netcdf")
prepend_path("PATH","/lfs/h2/emc/eib/save/hang.lei/forgdit/nco_wcoss2/install2/intel-19.1.3.304/cray-mpich-8.1.12/netcdf/4.9.2/bin")
prepend_path("LD_LIBRARY_PATH","/lfs/h2/emc/eib/save/hang.lei/forgdit/nco_wcoss2/install2/intel-19.1.3.304/cray-mpich-8.1.12/netcdf/4.9.2/lib")
prepend_path("DYLD_LIBRARY_PATH","/lfs/h2/emc/eib/save/hang.lei/forgdit/nco_wcoss2/install2/intel-19.1.3.304/cray-mpich-8.1.12/netcdf/4.9.2/lib")
prepend_path("CPATH","/lfs/h2/emc/eib/save/hang.lei/forgdit/nco_wcoss2/install2/intel-19.1.3.304/cray-mpich-8.1.12/netcdf/4.9.2/include")
prepend_path("MANPATH","/lfs/h2/emc/eib/save/hang.lei/forgdit/nco_wcoss2/install2/intel-19.1.3.304/cray-mpich-8.1.12/netcdf/4.9.2/share/man")
setenv("NETCDF","/lfs/h2/emc/eib/save/hang.lei/forgdit/nco_wcoss2/install2/intel-19.1.3.304/cray-mpich-8.1.12/netcdf/4.9.2")
setenv("NETCDF_ROOT","/lfs/h2/emc/eib/save/hang.lei/forgdit/nco_wcoss2/install2/intel-19.1.3.304/cray-mpich-8.1.12/netcdf/4.9.2")
setenv("NETCDF_INCLUDES","/lfs/h2/emc/eib/save/hang.lei/forgdit/nco_wcoss2/install2/intel-19.1.3.304/cray-mpich-8.1.12/netcdf/4.9.2/include")
setenv("NETCDF_LIBRARIES","/lfs/h2/emc/eib/save/hang.lei/forgdit/nco_wcoss2/install2/intel-19.1.3.304/cray-mpich-8.1.12/netcdf/4.9.2/lib")
setenv("NETCDF_VERSION","4.9.2")
setenv("NetCDF","/lfs/h2/emc/eib/save/hang.lei/forgdit/nco_wcoss2/install2/intel-19.1.3.304/cray-mpich-8.1.12/netcdf/4.9.2")
setenv("NetCDF_ROOT","/lfs/h2/emc/eib/save/hang.lei/forgdit/nco_wcoss2/install2/intel-19.1.3.304/cray-mpich-8.1.12/netcdf/4.9.2")
setenv("NetCDF_INCLUDES","/lfs/h2/emc/eib/save/hang.lei/forgdit/nco_wcoss2/install2/intel-19.1.3.304/cray-mpich-8.1.12/netcdf/4.9.2/include")
setenv("NetCDF_LIBRARIES","/lfs/h2/emc/eib/save/hang.lei/forgdit/nco_wcoss2/install2/intel-19.1.3.304/cray-mpich-8.1.12/netcdf/4.9.2/lib")
setenv("NetCDF_VERSION","4.9.2")
whatis("Name: netcdf")
whatis("Version: 4.9.2")
whatis("Category: library")
whatis("Description: NetCDF4 C, CXX and Fortran library")

does not set HDF5_PLUGIN_PATH environment variable, but it should. So the executable can not 'find' the filter dynamic library.

@Hang-Lei-NOAA
Copy link

@BrianCurtis-NOAA I double check the old installation on acorn and cactus.
The only difference in zstd enbaled netcdf libraries are: Due to env difference (not our settings), the old one on acorn has curl lib, the cactus one do not have it.

===========acorn==(Dusan test last year)====================
libnetcdf.la:dependency_libs=' -L/lfs/h1/emc/nceplibs/noscrub/hpc-stack/libs/hpc-stack/intel-19.1.3.304/cray-mpich-8.1.9/hdf5/1.14.0/lib -L/lfs/h1/emc/nceplibs/noscrub/hpc-stack/libs/hpc-stack/intel-19.1.3.304/zlib/1.2.13/lib -L/lfs/h1/emc/nceplibs/noscrub/hpc-stack/libs/hpc-stack/intel-19.1.3.304/cray-mpich-8.1.9/pnetcdf/1.12.2/lib -L/lfs/h1/emc/nceplibs/noscrub/hpc-stack/libs/hpc-stack/inte
l-19.1.3.304/zstd/1.5.0/lib -lstdc++ /lfs/h1/emc/nceplibs/noscrub/hpc-stack/libs/hpc-stack/intel-19.1.3.304/cray-mpich-8.1.9/hdf5/1.14.0/lib/libhdf5_hl.la /lfs/h1/emc/nceplibs/noscrub/hpc-stack/libs/hpc-stack/intel-19.1.3.304/cray-mpich-8.1.9/hdf5/1.14.0/lib/libhdf5.la -lz -ldl /lfs/h1/emc/nceplibs/noscrub/hpc-stack/libs/hpc-stack/intel-19.1.3.304/cray-mpich-8.1.9/pnetcdf/1.12.2/lib/libpnetcd
f.la -lmpifort_intel -lifcoremt_pic -lirc -lm -lbz2 -lzstd -lcurl'
libnetcdf.settings:LDFLAGS: -L/lfs/h1/emc/nceplibs/noscrub/hpc-stack/libs/hpc-stack/intel-19.1.3.304/cray-mpich-8.1.9/hdf5/1.14.0/lib -lhdf5_hl -lhdf5 -L/lfs/h1/emc/nceplibs/noscrub/hpc-stack/libs/hpc-stack/intel-19.1.3.304/zlib/1.2.13/lib -lz -ldl -lm -L/lfs/h1/emc/nceplibs/noscrub/hpc-stack/libs/hpc-stack/intel-19.1.3.304/cray-mpich-8.1.9/pnetcdf/1.12.2/lib -lpnetcdf -L/lfs/
h1/emc/nceplibs/noscrub/hpc-stack/libs/hpc-stack/intel-19.1.3.304/zstd/1.5.0/lib
libnetcdf.settings:Extra libraries: -lpnetcdf -lm -lbz2 -lzstd -lcurl
libnetcdf.settings:Standard Filters: deflate bz2 zstd

===========cactus======================
libnetcdf.la:dependency_libs=' -L/lfs/h2/emc/eib/save/hang.lei/forgdit/nco_wcoss2/install2/intel-19.1.3.304/cray-mpich-8.1.12/hdf5/1.14.0/lib -L/apps/spack/zlib/1.2.11/intel/19.1.3.304/hjotqkckeoyt6j6tibalwzrlfljcjtdh/lib -L/lfs/h2/emc/eib/save/hang.lei/forgdit/nco_wcoss2/install2/intel-19.1.3.304/cray-mpich-8.1.12/pne
tcdf/1.12.2/lib -L/lfs/h2/emc/eib/save/hang.lei/forgdit/nco_wcoss2/install2/intel/19.1.3.304/zstd/1.5.0/lib -lstdc++ /lfs/h2/emc/eib/save/hang.lei/forgdit/nco_wcoss2/install2/intel-19.1.3.304/cray-mpich-8.1.12/hdf5/1.14.0/lib/libhdf5_hl.la /lfs/h2/emc/eib/save/hang.lei/forgdit/nco_wcoss2/install2/intel-19.1.3.304/cray-mpich-8.1.12/hdf5/1.14.0/lib/libhdf5.la -lz -ldl /lfs/h2/emc/eib/save/hang.lei/forgdit/nco_wcoss2/install2/int
el-19.1.3.304/cray-mpich-8.1.12/pnetcdf/1.12.2/lib/libpnetcdf.la -lmpifort_intel -lifcoremt_pic -lirc -lm -lbz2 -lzstd'
libnetcdf.settings:LDFLAGS: -L/lfs/h2/emc/eib/save/hang.lei/forgdit/nco_wcoss2/install2/intel-19.1.3.304/cray-mpich-8.1.12/hdf5/1.14.0/lib -lhdf5_hl -lhdf5 -L/apps/spack/zlib/1.2.11/intel/19.1.3.304/hjotqkckeoyt6j6tibalwzrlfljcjtdh/lib -lz -ldl -lm -L/lfs/h2/emc/eib/save/hang.lei/forgdit/nco_wcoss2/in
stall2/intel-19.1.3.304/cray-mpich-8.1.12/pnetcdf/1.12.2/lib -lpnetcdf -L/lfs/h2/emc/eib/save/hang.lei/forgdit/nco_wcoss2/install2/intel/19.1.3.304/zstd/1.5.0/lib
libnetcdf.settings:Extra libraries: -lpnetcdf -lm -lbz2 -lzstd
libnetcdf.settings:Standard Filters: deflate bz2 zstd

======================================

@DusanJovic-NOAA
Copy link
Collaborator

Curl has nothing to with zstd. You need to export HDF5_PLUGIN_PATH to point to location of filter dynamic libraries.

@DusanJovic-NOAA
Copy link
Collaborator

Our libraries should not depend or use curl at all, but that's a separate issue.

@Hang-Lei-NOAA
Copy link

Hang-Lei-NOAA commented Sep 27, 2024

@DusanJovic-NOAA and @BrianCurtis-NOAA I have added "HDF5_PLUGIN_PATH" into the hdf5 modulefile:

hang.lei@clogin06:/lfs/h2/emc/eib/save/hang.lei/forgdit/nco_wcoss2/install2/modulefiles/mpi/intel/19.1.3.304/cray-mpich/8.1.hang.lei@clogin06:/lfs/h2/emc/eib/save/hang.lei/forgdit/nco_wcoss2/install2/modulefiles/mpi/intel/19.1.3.304/cray-mpich/8.1.12> module show hdf5/1.14.0

/lfs/h2/emc/eib/save/hang.lei/forgdit/nco_wcoss2/install2/modulefiles/mpi/intel/19.1.3.304/cray-mpich/8.1.12/hdf5/1.14.0.lua:

help([[]])
conflict("hdf5")
prepend_path("PATH","/lfs/h2/emc/eib/save/hang.lei/forgdit/nco_wcoss2/install2/intel-19.1.3.304/cray-mpich-8.1.12/hdf5/1.14.0/bin")
prepend_path("LD_LIBRARY_PATH","/lfs/h2/emc/eib/save/hang.lei/forgdit/nco_wcoss2/install2/intel-19.1.3.304/cray-mpich-8.1.12/hdf5/1.14.0/lib")
prepend_path("DYLD_LIBRARY_PATH","/lfs/h2/emc/eib/save/hang.lei/forgdit/nco_wcoss2/install2/intel-19.1.3.304/cray-mpich-8.1.12/hdf5/1.14.0/lib")
prepend_path("CPATH","/lfs/h2/emc/eib/save/hang.lei/forgdit/nco_wcoss2/install2/intel-19.1.3.304/cray-mpich-8.1.12/hdf5/1.14.0/include")
prepend_path("MANPATH","/lfs/h2/emc/eib/save/hang.lei/forgdit/nco_wcoss2/install2/intel-19.1.3.304/cray-mpich-8.1.12/hdf5/1.14.0/share/man")
prepend_path("HDF5_PLUGIN_PATH","/lfs/h2/emc/eib/save/hang.lei/forgdit/nco_wcoss2/install2/intel-19.1.3.304/cray-mpich-8.1.12/hdf5/1.14.0/lib/plugin")
setenv("HDF5_ROOT","/lfs/h2/emc/eib/save/hang.lei/forgdit/nco_wcoss2/install2/intel-19.1.3.304/cray-mpich-8.1.12/hdf5/1.14.0")
setenv("HDF5_INCLUDES","/lfs/h2/emc/eib/save/hang.lei/forgdit/nco_wcoss2/install2/intel-19.1.3.304/cray-mpich-8.1.12/hdf5/1.14.0/include")
setenv("HDF5_LIBRARIES","/lfs/h2/emc/eib/save/hang.lei/forgdit/nco_wcoss2/install2/intel-19.1.3.304/cray-mpich-8.1.12/hdf5/1.14.0/lib")
setenv("HDF5_VERSION","1.14.0")
whatis("Name: hdf5")
whatis("Version: 1.14.0")
whatis("Category: library")
whatis("Description: HDF5 library")

@BrianCurtis-NOAA
Copy link
Collaborator Author

OK, I'll give it a whirl.

@BrianCurtis-NOAA
Copy link
Collaborator Author

All tests ran to completion. But not all tests are getting zstandard_level into model_configure so I imagine there's still some work for the different model_configure files to make sure they get added where needed.

@BrianCurtis-NOAA
Copy link
Collaborator Author

After modifying the model_configures to add zstandard_level and running the full suite, all tests were able to run to completion with the following list of tests failing in comparison:

cpld_control_p8_mixedmode intel
cpld_control_p8 intel
cpld_control_p8.v2.sfc intel
cpld_restart_p8 intel
cpld_control_qr_p8 intel
cpld_restart_qr_p8 intel
cpld_2threads_p8 intel
cpld_decomp_p8 intel
cpld_mpi_p8 intel
cpld_control_ciceC_p8 intel
cpld_bmark_p8 intel
cpld_restart_bmark_p8 intel
cpld_s2sa_p8 intel
cpld_control_p8_faster intel
control_wrtGauss_netcdf_parallel intel
control_p8 intel
control_p8.v2.sfc intel
control_restart_p8 intel
regional_netcdf_parallel intel
rrfs_v1beta intel
control_wam intel
control_wrtGauss_netcdf_parallel_debug intel
control_debug_p8 intel
control_wam_debug intel
conus13km_control intel
conus13km_2threads intel
conus13km_restart_mismatch intel
hafs_regional_atm intel
hafs_regional_atm_ocn intel
hafs_regional_atm_wav intel
hafs_regional_atm_ocn_wav intel
hafs_regional_1nest_atm intel
hafs_regional_telescopic_2nests_atm intel
hafs_global_1nest_atm intel
hafs_global_multiple_4nests_atm intel
hafs_regional_specified_moving_1nest_atm intel
hafs_regional_storm_following_1nest_atm intel
hafs_regional_storm_following_1nest_atm_ocn intel
hafs_global_storm_following_1nest_atm intel
hafs_regional_storm_following_1nest_atm_ocn_debug intel
hafs_regional_storm_following_1nest_atm_ocn_wav intel
hafs_regional_storm_following_1nest_atm_ocn_wav_inline intel
hafs_regional_storm_following_1nest_atm_ocn_wav_mom6 intel
hafs_regional_docn intel
hafs_regional_docn_oisst intel
atmaero_control_p8 intel
atmaero_control_p8_rad intel
atmaero_control_p8_rad_micro intel

all the "UNABLE TO START TEST" were explainable with their parent tests failing comparison.

Onto running the full suite on Hera to get the official test_changes.list

@BrianCurtis-NOAA
Copy link
Collaborator Author

@Hang-Lei-NOAA Are these in official locations already? If not let me know and i'll re-run a full suite to confirm we still see the ancitipated changes.

@Hang-Lei-NOAA
Copy link

@BrianCurtis-NOAA they are not, and still there as before. We have not heard of a confirmation from ufs team for if the difference are acceptable.

@BrianCurtis-NOAA
Copy link
Collaborator Author

@BrianCurtis-NOAA they are not, and still there as before. We have not heard of a confirmation from ufs team for if the difference are acceptable.

OK Thanks! My re-test for WCOSS2 is almost finished.

@BrianCurtis-NOAA
Copy link
Collaborator Author

@Hang-Lei-NOAA go ahead and tell NCO to install officially. Please let me know if the official location is any different than normal.

@BrianCurtis-NOAA
Copy link
Collaborator Author

@Hang-Lei-NOAA go ahead and tell NCO to install officially. Please let me know if the official location is any different than normal.

Hold off please if you can. Running a couple more things may have some issue.

@Hang-Lei-NOAA
Copy link

Hang-Lei-NOAA commented Oct 16, 2024 via email

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Get zstd compression in netcdf on wcoss2 operation
8 participants