.. _tutorial-files: netCDF4 Files ------------- .. _tutorial-files-write: Writing Files ^^^^^^^^^^^^^ pysat includes support for creating netCDF4 files suitable for public scientific distribution. Both the data and metadata attached to a :py:class:`pysat.Instrument` object are used to create a file that both humans and machines may understand and parse without any outside information. This process is built with a variety of options to help meet the range of needs of the scientific community. For many users a netCDF4 file suitable for distribution to research colleagues may be created using default parameters as shown below. .. code:: python import datetime as dt import pysat # Instantiate Instrument object inst = pysat.Instrument('pysat', 'testing') stime = dt.datetime(2009, 1, 1) # Load data into Instrument inst.load(date=stime) # Create netCDF4 file fname = stime.strftime('example/file/path/name/test_%Y%j.nc') pysat.utils.io.inst_to_netcdf(inst, fname) This process writes all of the data within ``inst.data`` to a netCDF4 file, including the metadata stored at ``inst.meta.data`` and ``inst.meta.header``. It also adds a variety of supplemental attributes to the file indicating the file's conventions, creation date, and more. pysat's default conventions are a simplified implemention of the standards developed as part of NASA's `Ionospheric Connections (ICON) Explorer Mission `_. ICON's standards were generated by creating the most compatible combination of parameters from other existing standards and software implementations within the community. The primary underlying standards come from the Space Physics Data Facility (SPDF) International Solar Terrestrial Physics ISTP/IACG Inter-Agency Consultative Group. This standard is formally for NASA Common Data Format (CDF) files officially distributed to the public via government systems. The standard has been modified, as noted above, and accommodates and includes basic netCDF4 standards. While the overlap between standards results in some duplicated information, pysat's default user facing configuration minimizes this duplicated information. A table of attributes written to every netCDF file is shown below. Any :py:class:`pysat.Instrument` attributes added by a user are also written to the file. ================ ================================== File Attribute Description ================ ================================== acknowledgements Ackowledgements from Instrument Conventions File metadata convention name Date_End Timestamp of last data entry Date_Start Timestamp of first data entry File Original filepath and filename File_Date Timestamp of last data entry Generation_Date YearMonthDay of file creation inst_id pysat.Instrument ``inst_id`` Logical_File_ID Filename without any path or type name pysat.Instrument ``name`` pysat_version pysat version information platform pysat.Instrument ``platform`` references Journal references from Instrument tag pysat.Instrument ``tag`` Text_Supplement Supplement string ================ ================================== Metadata is also provided for each variable. An example of the default metadata stored within a variable as directly loaded using :py:mod:`netCDF4` is included below for the variable ``longitude``. Note that pysat added the Format, Depend_0, Display_Type, and Var_Type metadata parameters, part of the SPDF standard. .. code:: float64 longitude(Epoch) units: degrees long_name: Longitude notes: desc: value_min: 0.0 value_max: 360.0 _FillValue : nan FillVal : nan fill: nan Format: f8 Var_Type: data Depend_0: Epoch Display_Type: Time Series unlimited dimensions: Epoch current shape = (86400,) filling on, default _FillValue of 9.969209968386869e+36 used An example of the output produced when loading a pysat produced file direct via netCDF4 is included below. Note that the pysat produced file attributes are present along with some user defined values, such as references and acknowledgements, that are attached to the pysat testing Instrument object. Further, for basic netCDF standards, as well as community compatibility, the fill metdata information is automatically replicated as fill, _FillValue, and FillVal. .. code:: netCDF4.Dataset(fname) root group (NETCDF4 data model, file format HDF5): acknowledgements: Test instruments provided through the pysat project. https://www.github.com/pysat/pysat new_thing: 1 references: Stoneback, Russell, et al. (2021). pysat/pysat v3.0 (Version v3.0). Zenodo. http://doi.org/10.5281/zenodo.1199703 test_clean_kwarg: test_init_kwarg: test_preprocess_kwarg: pysat_version: 3.0.1 Conventions: pysat-simplified SPDF ISTP/IACG for NetCDF Text_Supplement: Date_End: Thu, 01 Jan 2009, 2009-01-01T23:59:59.000 UTC Date_Start: Thu, 01 Jan 2009, 2009-01-01T00:00:00.000 UTC File: ['.', 'test.nc'] File_Date: Thu, 01 Jan 2009, 2009-01-01T23:59:59.000 UTC Generation_Date: 20211022 Logical_File_ID: test dimensions(sizes): Epoch(86400) variables(dimensions): int64 Epoch(Epoch), float64 uts(Epoch), float64 mlt(Epoch), float64 slt(Epoch), float64 longitude(Epoch), float64 latitude(Epoch), float64 altitude(Epoch), int64 orbit_num(Epoch), int64 dummy1(Epoch), int64 dummy2(Epoch), float64 dummy3(Epoch), float64 dummy4(Epoch), string_dummy(Epoch), unicode_dummy(Epoch), int8 int8_dummy(Epoch), int16 int16_dummy(Epoch), int32 int32_dummy(Epoch), int64 int64_dummy(Epoch) groups: When writing files pysat processes metadata for both xarray and pandas before writing the file. For xarray, pysat leverages xarray's built-in file writing capabilities. For pandas, pysat interfaces with netCDF4 directly to translate data into netCDF4. .. _tutorial-files-meta: Translating Metadata ^^^^^^^^^^^^^^^^^^^^ Compatible file formats, such as those used by ICON, may achieve that compatibility by simultaneously adopting multiple standards. As different file standards may attempt to cover the same functionality this can result in duplicated information. To minimize the impact of working with duplicted metadata pysat includes support for automatically translating the metadata labels used at the Instrument level with one or more different labels used when writing the file. Thus, simple metadata labels may be maintained throughout a users code, but, when writing files the metadata labels will be expanded to maintain standards compatibility. Consider the following example. The current metadata labels used by an Instrument are accessed programatically and used to define the range of keys for a meta label translation table. Thus, regardless of the label setting at runtime, the current metadata keys will be assigned appropriately. The targets for the metadata labels at the file level are defined as the values for each key in the dictionary. Fill metadata values, ``inst.meta.labels.fill`` will be written to the file as both '_FillValue' and 'FillVal'. Similary, the maximum and minimum supported variables values ``inst.meta.labels.max_val`` and ``inst.meta.labels.min_val`` will be written to the file as 'ValidMax', 'Valid_Max', and 'ValidMin', 'Valid_Min', respectively. If no translation table is provided then pysat will use a default translation that maps ``inst.meta.labels.fill_val`` to '_FillValue', 'FillVal', and 'fill'. .. code:: python # Define translation between metadata labels currenlty in use by # the Instrument object (inst.meta.labels.*) and those that will # be used when writing the netCDF file. meta_translation_table = {inst.meta.labels.fill_val: ['_FillValue', 'FillVal'], inst.meta.labels.desc: ['CatDesc'], inst.meta.labels.name: ['Long_Name'], inst.meta.labels.units: ['Units'], inst.meta.labels.max_val: ['ValidMax', 'Valid_Max'], inst.meta.labels.min_val: ['ValidMin', 'Valid_Min'], inst.meta.labels.notes: ['Var_Notes']} # Write netCDF file pysat.utils.io.inst_to_netcdf(inst, fname, meta_translation=meta_translation_table) As noted above pysat will add some metadata for variables as part of pysat's file standard. To further ensure compatibility with netCDF formats, boolean values are translated to integers (1/0 for True/False), and fill and range metadata for string variables is removed. The ``export_nan`` keyword in :py:func:`pysat.utils.io.inst_to_netcdf` controls which of the metadata labels is allowed to transfer values of NaN to the file. By default, the ``fill_val``, ``min_val``, and ``max_val`` labels support NaN values. Similarly, the ``check_type`` keyword accepts a list of metadata labels where the type of the metadata value is compared against the data type of the variable. By default, the ``fill_val``, ``min_val``, and ``max_val`` labels are checked. Custom metadata labels, in addition to :py:mod:`pysat`'s defaults, can be written to the file by adding the information to a :py:class:`pysat.Instrument`. The simplest method is shown below. The case of the label is retained when writing to the file. .. code:: python # Add additional metadata to cover default plot label, like used by ICON. # Default values of '' for 'FieldNam' are added for all remaining variables. # Remaining metadata labels for 'longitude' other than 'FieldNam' are left # unchanged. inst.meta['longitude'] = {'FieldNam': 'Geographic Longitude'} # Create netCDF4 file fname = stime.strftime('example/file/path/name/test_%Y%j.nc') pysat.utils.io.inst_to_netcdf(inst, fname) For the most general method for adding additional metdata is recommended that a :py:class:`pysat.Instrument` is instantiated with the additional metadata labels, including the type. .. code:: python # Define SPDF metadata labels labels = {'units': ('units', str), 'name': ('long_name', str), 'notes': ('notes', str), 'desc': ('desc', str), 'plot': ('plot_label', str), 'axis': ('axis', str), 'scale': ('scale', str), 'min_val': ('value_min', np.float64), 'max_val': ('value_max', np.float64), 'fill_val': ('fill', np.float64)} # Instantiate instrument inst = pysat.Instrument('pysat', 'testing', labels=labels) # Define translation of pysat metadata labels to those in the netCDF file meta_translation_table = {inst.meta.labels.fill_val: ['_FillValue', 'FillVal'], inst.meta.labels.desc: ['CatDesc'], inst.meta.labels.name: ['Long_Name'], inst.meta.labels.units: ['Units'], inst.meta.labels.max_val: ['ValidMax', 'Valid_Max'], inst.meta.labels.min_val: ['ValidMin', 'Valid_Min'], inst.meta.labels.notes: ['Var_Notes'], inst.meta.labels.scale: ['ScaleTyp'], inst.meta.labels.plot: ['FieldNam'], inst.meta.labels.axis: ['LablAxis']} # Write netCDF file pysat.utils.io.inst_to_netcdf(inst, fname, meta_translation=meta_translation) The final opportunity to modify metadata before it is written to a file is provided by the ``meta_processor`` keyword. This keyword accepts a function that will receive a dictionary with all metadata, modify it as neeeded, and return the modified dictionary. The returned dictionary will then be written to the netCDF file. The function itself provides an opportunity for developers to add/modify/delete metadata in any manner. Note that the processor function is applied as the last step in the pysat metadata processing. Thus all translations, filtering, or other modifications to metadata are all applied before the meta_processor. .. code:: python def example_processor(meta_dict): """Example meta processor function. Parameters ---------- meta_dict : dict Dictionary with all metadata information, keyed by variable name. Returns ------- meta_dict : dict Updated metadata information. """ for variable in meta_dict.keys(): for label in meta_dict[variable].keys(): fstr = ''.join(['Information for variable: ', variable, ' and label: ', label, ' is easily accesbile.', ' Value is: ', meta_dict[variable][label]]) print(fstr) return meta_dict # Write netCDF file pysat.utils.io.inst_to_netcdf(inst, fname, meta_translation=meta_translation, meta_processor=example_processor) .. _tutorial-files-load: Loading Files ^^^^^^^^^^^^^ pysat includes support for loading netCDF4 files, particularly those produced by pysat, directly into compatible pandas and xarray formats. These routines will load the data and metadata into the appropriate structures. pysat NetCDF files may also be directly loaded into a general :py:class:`pysat.Instrument`. Loading functions are provided under :py:mod:`pysat.utils.io` and includes a general data indepdent interface, :py:func:`pysat.utils.load_netcdf4`, as well as pandas and xarray specific readers (:py:func:`pysat.utils.io.load_netcdf_pandas` and :py:func:`pysat.utils.io.load_netcdf_xarray`). These functions are intended to be used within a :py:class:`pysat.Instrument` support module, particularly the :py:meth:`load` function. For example, consider the complete instrument load function needed (single dataset) when loading a pysat produced file into pandas. For more information on adding a new dataset to pysat, see :ref:`rst_new_inst`. .. code:: python def load(fnames, tag='', inst_id=''): """Load the example Instrument pysat produced data files. Parameters ---------- fnames : list List of filenames tag : str Instrument tag (accepts '' or a string to change the behaviour of certain instrument aspects for testing). (default='') inst_id : str Instrument ID (accepts ''). (default='') Returns ------- data : pds.DataFrame Instrument data meta : pysat.Meta Metadata """ return pysat.utils.io.load_netcdf4_pandas(fnames) Now consider loading the file written in the example shown in Section :ref:`tutorial-files-write`. Because this :py:class:`pysat.Instrument` module may support either pandas or xarray data, the expected type must be specified upon :py:class:`pysat.Instrument` instantiation. pysat also expects all filenames to have some type of date format. However, by using the ``data_dir`` keyword argument, we can easily load files outside of the standard pysat data paths. .. code:: python import datetime as dt import pysat stime = dt.datetime(2009, 1, 1) test_inst = pysat.Instrument("pysat", "netcdf", pandas_format=True, data_dir='/example/file/path/name', file_format='test_{year:04}{day:03}.nc') test_inst.load(date=stime) To enable support for a wider variety of netCDF file standards pysat also provides support for translating, dropping, and modifying metadata information after it is loaded from file but before it is input into a :py:class:`pysat.Meta` instance. We will use the file with SPDF standards as an example. The general order of metadata operations is, load from file, remove netCDF4 specific metadata ('Format', 'Var_Type', 'Depend_0'), apply table translations, apply the meta processor, apply a meta array expander (pysat does not support array elements within metadata), and finally the metadata information is loaded into a :py:class:`pysat.Meta` instance. .. code:: python import numpy as np # Define metadata labels, the keys are labels using by pysat, # while the values are the labels in the file and type. # Only one type is currently supported for each metadata label. labels = {'units': ('Units', str), 'name': ('Long_Name', str), 'notes': ('Var_Notes', str), 'desc': ('CatDesc', str), 'plot': ('FieldNam', str), 'axis': ('LablAxis', str), 'scale': ('ScaleTyp', str), 'min_val': ('Valid_Min', np.float64), 'max_val': ('Valid_Max', np.float64), 'fill_val': ('FillVal', np.float64)} # Both 'ValidMin' and 'Valid_Min' are in the file with the same # content. Only need one. drop_labels = ['ValidMin', 'ValidMax'] # Instantiate generic Instrument and pass in modification options test_inst = pysat.Instrument("pysat", "netcdf", pandas_format=True, data_dir='/example/file/path/name', file_format='test_{year:04}{day:03}.nc', load_labels=labels, drop_meta_labels=drop_labels) # Load data test_inst.load(date=stime) # Feedback on metadata print(list(test_inst.meta.attrs())) ['FieldNam', 'LablAxis', 'ScaleTyp', 'units', 'long_name', 'notes', 'desc', 'value_min', 'value_max', 'fill'] Metadata labels for units, long name, notes, description, value min/max and fill were all translated to the default metadata labels of ``test_inst``. The default metadata labels don't include entries for all SPDF parameters, thus 'FieldNam', 'LablAxis', 'ScaleTyp' retain the values in the file. While users can apply their own labels when instantiating a :py:class:`pysat.Instrument`, for non default metadata labels we recommend developers apply a translation table to map the labels in the file to a more user friendly label. .. code:: python # Define metadata labels, the keys are labels using by pysat, # while the values are the labels from the file and type. # The labels are applied last in the loading process. # Only one type is currently supported for each metadata label. labels = {'units': ('Units', str), 'name': ('Long_Name', str), 'notes': ('Var_Notes', str), 'desc': ('CatDesc', str), 'plot': ('plot', str), 'axis': ('axis', str), 'scale': ('scale', str), 'min_val': ('Valid_Min', np.float64), 'max_val': ('Valid_Max', np.float64), 'fill_val': ('fill', np.float64)} # Generate custom meta translation table. When left unspecified the default # table handles the multiple values for fill. We must recreate that # functionality in our table. The targets for meta_translation should # map to values in `labels` above. meta_translation = {'FieldNam': 'plot', 'LablAxis': 'axis', 'ScaleTyp': 'scale', 'ValidMin': 'Valid_Min', 'Valid_Min': 'Valid_Min', 'ValidMax': 'Valid_Max', 'Valid_Max': 'Valid_Max', '_FillValue': 'fill', 'FillVal': 'fill'} # Instantiate generic Instrument and pass in modification options. test_inst = pysat.Instrument("pysat", "netcdf", pandas_format=True, data_dir='/example/file/path/name', file_format='test_{year:04}{day:03}.nc', load_labels=labels, meta_translation=meta_translation) # Load data test_inst.load(date=stime) # Feedback on metadata print(list(test_inst.meta.attrs())) ['fill', 'plot', 'axis', 'scale', 'units', 'long_name', 'notes', 'desc', 'value_min', 'value_max'] Note that ``drop_labels`` is no longer used. Instead, multiple metadata labels in the file are mapped to a single label using the ``meta_translation`` keyword. If there is an inconsistency in values during this process a warning is issued. The example below demonstrates how users can control the labels used to access metadata. .. code:: python # Define metadata labels, the keys are labels using by pysat, # while the values are the labels from the file and type. # The labels are applied last in the loading process. # Only one type is currently supported for each metadata label. local_labels = {'units': ('UNITS', str), 'name': ('LongEST_Name', str), 'notes': ('FLY', str), 'desc': ('DIGits', str), 'plot': ('plottER', str), 'axis': ('axisER', str), 'scale': ('scalER', str), 'min_val': ('INVALIDmin', np.float64), 'max_val': ('invalidMAX', np.float64), 'fill_val': ('fillerest', np.float64)} # Instantiate generic Instrument and pass in modification options test_inst = pysat.Instrument("pysat", "netcdf", pandas_format=True, data_dir='/example/file/path/name', file_format='test_{year:04}{day:03}.nc', load_labels=labels, labels=local_labels, meta_translation=meta_translation) # Load data test_inst.load(date=stime) # Feedback on metadata print(list(test_inst.meta.attrs())) ['UNITS', 'LongEST_Name', 'FLY', 'DIGits', 'INVALIDmin', 'invalidMAX', 'fillerest', 'plottER', 'axisER', 'scalER']