Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How do you avoid default fill value of NaN in xarray? #35

Open
emontgomery-usgs opened this issue Oct 15, 2018 · 4 comments
Open

How do you avoid default fill value of NaN in xarray? #35

emontgomery-usgs opened this issue Oct 15, 2018 · 4 comments

Comments

@emontgomery-usgs
Copy link
Contributor

emontgomery-usgs commented Oct 15, 2018

@dnowacki-usgs, @mmartini-usgs
One of the approaches to using python for our code in the future is using xarray for everything (so time is CF), then at the end, convert back to EPIC. Dan, you've already written most of this I think, but I'm stumped on some of the details.

I'm using a file MM wrote with xarray, to test. In that file _FillValue for all variables is NaN, which doesn't match our convention. In the files I've reviewed generated with your code, _FillValue is correct. Do you avoid having the wrong thing from the get-go using some xarray.ds argument, or is did you write a replace_nan_fillvalue that I haven't found?

In utils I found ds_add_attributes() that has this:
def add_attributes(var, dsattrs):
var.attrs.update({
'serial_number': dsattrs['serial_number'],
'initial_instrument_height': dsattrs['initial_instrument_height'],
'nominal_instrument_depth': dsattrs['nominal_instrument_depth'],
'height_depth_units': 'm',
'sensor_type': dsattrs['INST_TYPE'],
'_FillValue': 1e35})

Is that how you deal with it? What about variables that are defined as short? Is it smart enough to cast the 1e35 to float or double, depending on how the variable is declared?

Thanks!

@mmartini-usgs
Copy link
Contributor

I am currently tracking down the reappearance of NaNs in data once I access it with xarray, I'm not sure yet if it's operator error.

@dnowacki-usgs
Copy link
Member

Hi Ellyn, yes I set _FillValue to be 1e35 before writing to netCDF on every file. Xarray reads in netCDF files, interprets the _FillValue values, and assigns those values nan for use in Python. You need to set the _FillValue encoding on every variable prior to writing back out to netCDF.

@mmartini-usgs
Copy link
Contributor

It looks like we need a helper function similar to "ds_coord_no_fillvalue" that will instead make sure fill values are correct for all variables. It will need to check the original data file's fill value / data type because ints are not set to 1E35 (obviously) and the data I'm dealing with can contain short and long ints.

@mmartini-usgs
Copy link
Contributor

This little combo just worked - shall I add it to utils and do a pull request?

# save the fill values we will need
# ds = an xarray dataset
# var_fills = a dict of {varname : fill value} including coordinate variables
def get_fill_values(ds):
    var_fills = {}
    for var in ds.variables.items():
        # var = tuple(var_name, var_object)
        
        # this method positivley sets coordinate variables to False
        #try:
        #    var_fills[var[0]] = var[1].encoding['_FillValue']
        #except KeyError:
        #    # coordinate variables should not have fill values
        #    var_fills[var[0]] = False
        
        # this method is succint, and depnds on coordinate variables
        # _FillValue remaining untouched by xarray
        if '_FillValue' in var[1].encoding:
            var_fills[var[0]] = var[1].encoding['_FillValue']
    
    return var_fills
        
# restore original fill values 
# ds = an xarray dataset
# fills = dict of variable names : fill value
def apply_fill_values(ds, var_fills):
    for varname in var_fills.keys():
        if varname in ds:
            ds[varname].encoding['_FillValue'] = var_fills[varname]
    
    return ds

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants