API

Instrument

class pysat.Instrument(platform=None, name=None, tag='', inst_id='', clean_level=None, update_files=None, pad=None, orbit_info=None, inst_module=None, data_dir='', directory_format=None, file_format=None, temporary_file_list=False, strict_time_flag=True, ignore_empty_files=False, meta_kwargs=None, custom=None, **kwargs)

Download, load, manage, modify and analyze science data.

Parameters:
  • platform (str or NoneType) – Name of instrument platform. If None and name is also None, creates an Instrument with empty platform and name attributes. (default=None)

  • name (str or NoneType) – Name of instrument. If None and platform is also None, creates an Instrument with empty platform and name attributes. (default=None)

  • tag (str) – Identifies particular subset of instrument data (default=’’)

  • inst_id (str) – Secondary level of identification, such as spacecraft within a constellation platform (default=’’)

  • clean_level (str or NoneType) – Level of data quality. If not provided, will default to the setting in pysat.params[‘clean_level’]. (default=None)

  • update_files (bool or NoneType) – If True, immediately query filesystem for instrument files and store. If False, the local files are presumed to be the same. By default, this setting will be obtained from pysat.params (default=None)

  • pad (pandas.DateOffset, dict, or NoneType) – Length of time to pad the begining and end of loaded data for time-series processing. Extra data is removed after applying all custom functions. Dictionary, if supplied, is simply passed to pandas DateOffset. (default=None)

  • orbit_info (dict or NoneType) – Orbit information, {‘index’: index, ‘kind’: kind, ‘period’: period}. See pysat.Orbits for more information. (default=None)

  • inst_module (module or NoneType) – Provide instrument module directly, takes precedence over platform/name. (default=None)

  • data_dir (str) – Directory without sub-directory variables that allows one to bypass the directories provided by pysat.params[‘data_dirs’]. Only applied if the directory exists. (default=’’)

  • directory_format (str, function, or NoneType) – Sub-directory naming structure, which is expected to exist or be created within one of the python.params[‘data_dirs’] directories. Variables such as platform, name, tag, and inst_id will be filled in as needed using python string formatting, if a string is supplied. The default directory structure, which is used if None is specified, is provided by pysat.params[‘directory_format’] and is typically ‘{platform}/{name}/{tag}/{inst_id}’. If a function is provided, it must take tag and inst_id as arguments and return an appropriate string. (default=None)

  • file_format (str or NoneType) – File naming structure in string format. Variables such as year, month, day, etc. will be filled in as needed using python string formatting. The default file format structure is supplied in the instrument list_files routine. See pysat.utils.files.parse_delimited_filenames and pysat.utils.files.parse_fixed_width_filenames for more information. The value will be None if not specified by the user at instantiation. (default=None)

  • temporary_file_list (bool) – If true, the list of Instrument files will not be written to disk (default=False)

  • strict_time_flag (bool) – If true, pysat will check data to ensure times are unique and monotonically increasing (default=True)

  • ignore_empty_files (bool) – Flag controling behavior for listing available files. If True, the list of files found will be checked to ensure the filesizes are greater than zero. Empty files are removed from the stored list of files. (default=False)

  • meta_kwargs (dict or NoneType) – Dict to specify custom Meta initialization (default=None)

  • custom (list or NoneType) – Input list containing dicts of inputs for custom_attach method inputs that may be applied or None (default=None)

platform
name
tag
inst_id
clean_level
pad
orbit_info
inst_module
data_dir
directory_format
file_format
temporary_file_list
strict_time_flag
bounds

Tuple of datetime objects or filenames indicating bounds for loading data, or a tuple of NoneType objects. Users may provide as a tuple or tuple of lists (useful for bounds with gaps). The attribute is always stored as a tuple of lists for consistency.

Type:

tuple

custom_functions

List of functions to be applied by instrument nano-kernel

Type:

list

custom_args

List of lists containing arguments to be passed to particular custom function

Type:

list

custom_kwargs

List of dictionaries with keywords and values to be passed to a custom function

Type:

list

data

Class object holding the loaded science data

Type:

pandas.DataFrame or xarray.Dataset

date

Date and time for loaded data, None if no data is loaded

Type:

dt.datetime or NoneType

doy

Day of year for loaded data, None if no data is loaded

Type:

int or NoneType

files

Class to hold and interact with the available instrument files

Type:

pysat.Files

kwargs

Keyword arguments passed to the standard Instrument routines

Type:

dict

kwargs_supported

Stores all supported keywords for user edification

Type:

dict

kwargs_reserved

Keyword arguments for reserved method arguments

Type:

dict

load_step

The temporal increment for loading data, defaults to a timestep of one day

Type:

dt.timedelta

meta

Class holding the instrument metadata

Type:

pysat.Meta

meta_kwargs

Dict containing defaults for Meta data

Type:

dict

orbits

Interface to extracting data orbit-by-orbit

Type:

pysat.Orbits

pandas_format

Flag indicating whether data is stored as a pandas.DataFrame (True) or an xarray.Dataset (False)

Type:

bool

today

Date and time for the current day in UT

Type:

dt.datetime

tomorrow

Date and time for tomorrow in UT

Type:

dt.datetime

variables

List of loaded data variables

Type:

list

yesterday

Date and time for yesterday in UT

Type:

dt.datetime

yr

Year for loaded data, None if no data is loaded

Type:

int or NoneType

Raises:

ValueError – If platform and name are mixture of None and str, an unknown or reserved keyword is used, or if file_format, custom, or pad are improperly formatted

Note

pysat attempts to load the module platform_name.py located in the pysat/instruments directory. This module provides the underlying functionality to download, load, and clean instrument data. Alternatively, the module may be supplied directly using keyword inst_module.

Examples

# 1-second mag field data
vefi = pysat.Instrument(platform='cnofs', name='vefi', tag='dc_b')
start = dt.datetime(2009, 1, 1)
stop = dt.datetime(2009, 1, 2)
vefi.download(start, stop)
vefi.load(date=start)
print(vefi['dB_mer'])
print(vefi.meta['db_mer'])

# 1-second thermal plasma parameters
ivm = pysat.Instrument(platform='cnofs', name='ivm')
ivm.download(start, stop)
ivm.load(2009, 1)
print(ivm['ionVelmeridional'])

# Ionosphere profiles from GPS occultation. Enable binning profile
# data using a constant step-size. Feature provided by the underlying
# COSMIC support code.
cosmic = pysat.Instrument('cosmic', 'gps', 'ionprf', altitude_bin=3)
cosmic.download(start, stop, user=user, password=password)
cosmic.load(date=start)

# Nano-kernel functionality enables instrument objects that are
# 'set and forget'. The functions are always run whenever
# the instrument load routine is called so instrument objects may
# be passed safely to other routines and the data will always
# be processed appropriately.

# Define custom function to modify Instrument in place.
def custom_func(inst, opt_param1=False, opt_param2=False):
    # perform calculations and store in new_data
    inst['new_data'] = new_data
    return

inst = pysat.Instrument('pysat', 'testing')
inst.custom_attach(custom_func, kwargs={'opt_param1': True})

# Custom methods are applied to data when loaded.
inst.load(date=date)

print(inst['new_data2'])

# Custom methods may also be attached at instantiation.
# Create a dictionary for each custom method and associated inputs
custom_func_1 = {'function': custom_func,
                 'kwargs': {'opt_param1': True}}
custom_func_2 = {'function': custom_func, 'args'=[True, False]}
custom_func_3 = {'function': custom_func, 'at_pos'=0,
                 'kwargs': {'opt_param2': True}}

# Combine all dicts into a list in order of application and execution,
# although this can be modified by specifying 'at_pos'. The actual
# order these functions will run is: 3, 1, 2.
custom = [custom_func_1, custom_func_2, custom_func_3]

# Instantiate `pysat.Instrument`
inst = pysat.Instrument(platform, name, inst_id=inst_id, tag=tag,
                        custom=custom)

Initialize pysat.Instrument object.

property bounds

Boundaries for iterating over instrument object by date or file.

Parameters:
  • start (dt.datetime, str, or NoneType) – Start of iteration, disregarding any time of day information. If None uses first data date. List-like collection also accepted, allowing mutliple bound ranges. (default=None)

  • stop (dt.datetime, str, or None) – Stop of iteration, inclusive of the entire day regardless of time of day in the bounds. If None uses last data date. List-like collection also accepted, allowing mutliple bound ranges, though it must match start. (default=None)

  • step (str, int, or NoneType) – Step size used when iterating from start to stop. Use a Pandas frequency string (‘3D’, ‘1M’) or an integer (will assume a base frequency equal to the file frequency). If None, defaults to a single unit of file frequency (typically 1 day) (default=None).

  • width (pandas.DateOffset, int, or NoneType) – Data window used when loading data within iteration. If None, defaults to a single file frequency (typically 1 day) (default=None)

Raises:

ValueError – If start and stop don’t have the same type, or if too many input argument supplied, or unequal number of elements in start/stop, or if bounds aren’t in increasing order, or if the input type for start or stop isn’t recognized

Note

Both start and stop must be the same type (date, or filename) or None. Only the year, month, and day are used for date inputs.

Examples

import datetime as dt
import pandas as pds
import pysat

inst = pysat.Instrument(platform=platform,
                        name=name,
                        tag=tag)
start = dt.datetime(2009, 1, 1)
stop = dt.datetime(2009, 1, 31)

# Defaults to stepping by a single day and a data loading window
# of one day/file.
inst.bounds = (start, stop)

# Set bounds by file. Iterates a file at a time.
inst.bounds = ('filename1', 'filename2')

# Create a more complicated season, multiple start and stop dates.
start2 = dt.datetetime(2010,1,1)
stop2 = dt.datetime(2010,2,14)
inst.bounds = ([start, start2], [stop, stop2])

# Iterate via a non-standard step size of two days
inst.bounds = ([start, start2], [stop, stop2], '2D')

# Load more than a single day/file at a time when iterating
inst.bounds = ([start, start2], [stop, stop2], '2D',
               dt.timedelta(days=3))
concat_data(new_data, prepend=False, include=None, **kwargs)

Concatonate data to self.data for xarray or pandas as needed.

Parameters:
  • new_data (pandas.DataFrame, xarray.Dataset, or list of such objects) – New data objects to be concatonated

  • prepend (bool) – If True, assign new data before existing data; if False append new data (default=False)

  • include (int or NoneType) – Index at which self.data should be included in new_data or None to use prepend (default=None)

  • **kwargs (dict) – Optional keyword arguments passed to pds.concat or xr.concat

Note

For pandas, sort=False is passed along to the underlying pandas.concat method. If sort is supplied as a keyword, the user provided value is used instead. Recall that sort orders the data columns, not the data values or the index.

For xarray, dim=Instrument.index.name is passed along to xarray.concat except if the user includes a value for dim as a keyword argument.

Examples

# Concatonate data before and after the existing Instrument data
inst.concat_data([prev_data, next_data], include=1)
copy()

Create a deep copy of the entire Instrument object.

Return type:

pysat.Instrument

custom_apply_all()

Apply all of the custom functions to the satellite data object.

Raises:

ValueError – Raised when function returns any value

Note

This method does not generally need to be invoked directly by users.

custom_attach(function, at_pos='end', args=None, kwargs=None)

Attach a function to custom processing queue.

Custom functions are applied automatically whenever .load() command called.

Parameters:
  • function (str or function object) – Name of function or function object to be added to queue

  • at_pos (str or int) – Accepts string ‘end’ or a number that will be used to determine the insertion order if multiple custom functions are attached to an Instrument object (default=’end’)

  • args (list, tuple, or NoneType) – Ordered arguments following the instrument object input that are required by the custom function (default=None)

  • kwargs (dict or NoneType) – Dictionary of keyword arguments required by the custom function (default=None)

Note

Functions applied using custom_attach may add, modify, or use the data within Instrument inside of the function, and so should not return anything.

custom_clear()

Clear the custom function list.

property date

Date for loaded data.

download(start=None, stop=None, date_array=None, **kwargs)

Download data for given Instrument object from start to stop.

Parameters:
  • start (pandas.datetime or NoneType) – Start date to download data, or yesterday if None is provided. (default=None)

  • stop (pandas.datetime or NoneType) – Stop date (inclusive) to download data, or tomorrow if None is provided (default=None)

  • date_array (list-like or NoneType) – Sequence of dates to download date for. Takes precedence over start and stop inputs (default=None)

  • **kwargs (dict) – Dictionary of keywords that may be options for specific instruments. The keyword arguments ‘user’ and ‘password’ are expected for remote databases requiring sign in or registration. ‘freq’ temporarily ingested through this input option.

Raises:

ValueError – Raised if there is an issue creating self.files.data_path

Note

Data will be downloaded to self.files.data_path

If Instrument bounds are set to defaults they are updated after files are downloaded.

See also

pandas.DatetimeIndex

download_updated_files(**kwargs)

Download new files after comparing available remote and local files.

Parameters:

**kwargs (dict) – Dictionary of keywords that may be options for specific instruments

Note

Data will be downloaded to self.files.data_path

If Instrument bounds are set to defaults they are updated after files are downloaded.

If no remote file listing method is available, existing local files are assumed to be up-to-date and gaps are assumed to be missing files.

If start, stop, or date_array are provided, only files at/between these times are considered for updating. If no times are provided and a remote listing method is available, all new files will be downloaded. If no remote listing method is available, the current file limits are used as the starting and ending times.

drop(names)

Drop variables from Instrument.

Parameters:

names (str or list-like) – String or list of strings specifying the variables names to drop

Raises:

KeyError – If all of the variable names provided in names are not found in the variable list. If a subset is missing, a logger warning is issued instead.

property empty

Boolean flag reflecting lack of data, True if there is no data.

property index

Time index of the loaded data.

load(yr=None, doy=None, end_yr=None, end_doy=None, date=None, end_date=None, fname=None, stop_fname=None, verifyPad=False, **kwargs)

Load the instrument data and metadata.

Parameters:
  • yr (int or NoneType) – Year for desired data. pysat will load all files with an associated date between yr, doy and yr, doy + 1. (default=None)

  • doy (int or NoneType) – Day of year for desired data. Must be present with yr input. (default=None)

  • end_yr (int or NoneType) – Used when loading a range of dates, from yr, doy to end_yr, end_doy based upon the dates associated with the Instrument’s files. Date range is inclusive for yr, doy but exclusive for end_yr, end_doy. (default=None)

  • end_doy (int or NoneType) – Used when loading a range of dates, from yr, doy to end_yr, end_doy based upon the dates associated with the Instrument’s files. Date range is inclusive for yr, doy but exclusive for end_yr, end_doy. (default=None)

  • date (dt.datetime or NoneType) – Date to load data. pysat will load all files with an associated date between date and date + 1 day. (default=None)

  • end_date (dt.datetime or NoneType) – Used when loading a range of data from date to end_date based upon the dates associated with the Instrument’s files. Date range is inclusive for date but exclusive for end_date. (default=None)

  • fname (str or NoneType) – Filename to be loaded (default=None)

  • stop_fname (str or NoneType) – Used when loading a range of filenames from fname to stop_fname, inclusive. (default=None)

  • verifyPad (bool) – If True, padding data not removed for debugging. Padding parameters are provided at Instrument instantiation. (default=False)

  • **kwargs (dict) – Dictionary of keywords that may be options for specific instruments.

Raises:
  • TypeError – For incomplete or incorrect input

  • ValueError – For input incompatible with Instrument set-up

Note

Loads data for a chosen instrument into .data. Any functions chosen by the user and added to the custom processing queue (.custom.attach) are automatically applied to the data before it is available to user in .data.

A mixed combination of .load() keywords such as yr and date are not allowed.

end kwargs have exclusive ranges (stop before the condition is reached), while stop kwargs have inclusive ranges (stop once the condition is reached).

Examples

import datetime as dt
import pysat

inst = pysat.Instrument('pysat', 'testing')

# Load a single day by year and day of year
inst.load(2009, 1)

# Load a single day by date
date = dt.datetime(2009, 1, 1)
inst.load(date=date)

# Load a single file, first file in this example
inst.load(fname=inst.files[0])

# Load a range of days, data between
# Jan. 1st (inclusive) - Jan. 3rd (exclusive)
inst.load(2009, 1, 2009, 3)

# Load a range of days using datetimes
date = dt.datetime(2009, 1, 1)
end_date = dt.datetime(2009, 1, 3)
inst.load(date=date, end_date=end_date)

# Load several files by filename. Note the change in index due to
# inclusive slicing on filenames!
inst.load(fname=inst.files[0], stop_fname=inst.files[1])
next(verifyPad=False)

Iterate forward through the data loaded in Instrument object.

Bounds of iteration and iteration type (day/file) are set by bounds attribute.

Parameters:

verifyPad (bool) – Passed to self.load(). If True, then padded data within the load method will be retained. (default=False)

Note

If there were no previous calls to load then the first day(default)/file will be loaded.

property pandas_format

Boolean flag for pandas data support.

prev(verifyPad=False)

Iterate backwards through the data in Instrument object.

Bounds of iteration and iteration type (day/file) are set by bounds attribute.

Parameters:

verifyPad (bool) – Passed to self.load(). If True, then padded data within the load method will be retained. (default=False)

Note

If there were no previous calls to load then the first day (default) or file will be loaded.

remote_date_range(start=None, stop=None, **kwargs)

Determine first and last available dates for remote data.

Parameters:
  • start (dt.datetime or NoneType) – Starting time for file list. A None value will start with the first file found. (default=None)

  • stop (dt.datetime or NoneType) – Ending time for the file list. A None value will stop with the last file found. (default=None)

  • **kwargs (dict) – Dictionary of keywords that may be options for specific instruments. The keyword arguments ‘user’ and ‘password’ are expected for remote databases requiring sign in or registration.

Returns:

First and last datetimes obtained from remote_file_list

Return type:

List

Note

Default behaviour is to search all files. User may additionally specify a given year, year/month, or year/month/day combination to return a subset of available files.

remote_file_list(start=None, stop=None, **kwargs)

Retrieve a time-series of remote files for chosen instrument.

Parameters:
  • start (dt.datetime or NoneType) – Starting time for file list. A None value will start with the first file found. (default=None)

  • stop (dt.datetime or NoneType) – Ending time for the file list. A None value will stop with the last file found. (default=None)

  • **kwargs (dict) – Dictionary of keywords that may be options for specific instruments. The keyword arguments ‘user’ and ‘password’ are expected for remote databases requiring sign in or registration.

Returns:

pandas Series of filenames indexed by date and time

Return type:

pds.Series

Note

Default behaviour is to return all files. User may additionally specify a given year, year/month, or year/month/day combination to return a subset of available files.

rename(mapper, lowercase_data_labels=False)

Rename variables within both data and metadata.

Parameters:
  • mapper (dict or func) – Dictionary with old names as keys and new names as variables or a function to apply to all names

  • lowercase_data_labels (bool) – If True, the labels applied to self.data are forced to lowercase. The case supplied in mapper is retained within inst.meta.

Examples

# Standard renaming using a dict
new_mapper = {'old_name': 'new_name', 'old_name2':, 'new_name2'}
inst.rename(new_mapper)

# Standard renaming using a function
inst.rename(str.upper)

pysat supports differing case for variable labels across the data and metadata objects attached to an Instrument. Since Meta is case-preserving (on assignment) but case-insensitive to access, the labels used for data are always valid for metadata. This feature may be used to provide friendlier variable names within pysat while also maintaining external format compatibility when writing files.

# Example with lowercase_data_labels
inst = pysat.Instrument('pysat', 'testing')
inst.load(2009, 1)
mapper = {'uts': 'Pysat_UTS'}
inst.rename(mapper, lowercase_data_labels=True)

# Note that 'Pysat_UTS' was applied to data as 'pysat_uts'
print(inst['pysat_uts'])

# Case is retained within inst.meta, though data access to meta is
# case insensitive
print('True meta variable name is ', inst.meta['pysat_uts'].name)

# Note that the labels in meta may be used when creating a file,
# thus, 'Pysat_UTS' would be found in the resulting file
inst.to_netcdf4('./test.nc', preserve_meta_case=True)

# Load in file and check
raw = netCDF4.Dataset('./test.nc')
print(raw.variables['Pysat_UTS'])
to_netcdf4(fname, base_instrument=None, epoch_name=None, zlib=False, complevel=4, shuffle=True, preserve_meta_case=False, export_nan=None, export_pysat_info=True, unlimited_time=True, modify=False)

Store loaded data into a netCDF4 file.

Parameters:
  • fname (str) – Full path to save instrument object to netCDF

  • base_instrument (pysat.Instrument or NoneType) – Class used as a comparison, only attributes that are present with self and not on base_instrument are written to netCDF. Using None assigns an unmodified pysat.Instrument object. (default=None)

  • epoch_name (str or NoneType) – Label in file for datetime index of Instrument object (default=None)

  • zlib (bool) – Flag for engaging zlib compression (True - compression on) (default=False)

  • complevel (int) – An integer flag between 1 and 9 describing the level of compression desired. Ignored if zlib=False. (default=4)

  • shuffle (bool) – The HDF5 shuffle filter will be applied before compressing the data. This significantly improves compression. Ignored if zlib=False. (default=True)

  • preserve_meta_case (bool) – Flag specifying the case of the meta data variable strings. If True, then the variable strings within the MetaData object (which preserves case) are used to name variables in the written netCDF file. If False, then the variable strings used to access data from the pysat.Instrument object are used instead. (default=False)

  • export_nan (list or NoneType) – By default, the metadata variables where a value of NaN is allowed and written to the netCDF4 file is maintained by the Meta object attached to the pysat.Instrument object. A list supplied here will override the settings provided by Meta, and all parameters included will be written to the file. If not listed and a value is NaN then that attribute simply won’t be included in the netCDF4 file. (default=None)

  • export_pysat_info (bool) – If True, platform, name, tag, and inst_id will be appended to the metadata. (default=True)

  • unlimited_time (bool) – Flag specifying whether or not the epoch/time dimension should be unlimited; it is when the flag is True. (default=True)

  • modify (bool) – Flag specifying whether or not the changes made to the Instrument object needed to prepare it for writing should also be made to this object. If False, the current Instrument object will remain unchanged. (default=False)

Raises:

ValueError – If required kwargs are not given values

See also

pysat.utils.io.to_netcdf

today()

Get today’s date (UTC), with no hour, minute, second, etc.

Returns:

today_utc – Today’s date in UTC

Return type:

datetime

tomorrow()

Get tomorrow’s date (UTC), with no hour, minute, second, etc.

Returns:

Tomorrow’s date in UTC

Return type:

datetime

property variables

List of variables for the loaded data.

property vars_no_time

List of variables for the loaded data, excluding time index.

yesterday()

Get yesterday’s date (UTC), with no hour, minute, second, etc.

Returns:

Yesterday’s date in UTC

Return type:

datetime

Constellation

class pysat.Constellation(platforms=None, names=None, tags=None, inst_ids=None, const_module=None, instruments=None, index_res=None, common_index=True, custom=None, **kwargs)

Manage and analyze data from multiple pysat Instruments.

Parameters:
  • platforms (list or NoneType) – List of strings indicating the desired Instrument platforms. If None is specified on initiation, a list will be created to hold the platform attributes from each pysat.Instrument object in instruments. (default=None)

  • names (list or NoneType) – List of strings indicating the desired Instrument names. If None is specified on initiation, a list will be created to hold the name attributes from each pysat.Instrument object in instruments. (default=None)

  • tags (list or NoneType) – List of strings indicating the desired Instrument tags. If None is specified on initiation, a list will be created to hold the tag attributes from each pysat.Instrument object in instruments. (default=None)

  • inst_ids (list or NoneType) – List of strings indicating the desired Instrument inst_ids. If None is specified on initiation, a list will be created to hold the inst_id attributes from each pysat.Instrument object in instruments. (default=None)

  • const_module (module or NoneType) – Name of a pysat constellation module (default=None)

  • instruments (list-like or NoneType) – A list of pysat Instruments to include in the Constellation (default=None)

  • index_res (float or NoneType) – Output index resolution in seconds or None to determine from Constellation instruments (default=None)

  • common_index (bool) – True to include times where all instruments have data, False to use the maximum time range from the Constellation (default=True)

  • custom (list or NoneType) – Input dict containing dicts of inputs for custom_attach method inputs that may be applied to all instruments or at the Constellation-level or None (default=None)

  • **kwargs (dict) – Additional keyword arguments are passed to Instruments instantiated within the class through use of input arguments platforms, names, tags, and inst_ids. Additional keywords are not applied when using the const_module or instruments inputs.

platforms
names
tags
inst_ids
instruments
index_res
common_index
bounds

Tuple of two datetime objects or filenames indicating bounds for loading data, or a tuple of NoneType objects. Users may provide as a tuple or tuple of lists (useful for bounds with gaps). The attribute is always stored as a tuple of lists for consistency.

Type:

tuple

custom_functions

List of functions to be applied at the Constellation-level upon load

Type:

list

custom_args

List of lists containing arguments to be passed to particular Constellation-level custom function

Type:

list

custom_kwargs

List of dictionaries with keywords and values to be passed to a Constellation-level custom function

Type:

list

date

Date and time for loaded data, None if no data is loaded

Type:

dt.datetime or NoneType

yr

Year for loaded data, None if no data is loaded

Type:

int or NoneType

doy

Day of year for loaded data, None if no data is loaded

Type:

int or NoneType

yesterday

Date and time for yesterday in UT

Type:

dt.datetime

today

Date and time for the current day in UT

Type:

dt.datetime

tomorrow

Date and time for tomorrow in UT

Type:

dt.datetime

empty

Flag that indicates all Instruments do not contain data when True.

Type:

bool

empty_partial

Flag that indicates at least one Instrument in the Constellation does not have data when True.

Type:

bool

variables

List of loaded data variables for all instruments.

Type:

list

Raises:
  • ValueError – When instruments is not list-like, when all inputs to load through the registered Instrument list are unknown, or when one of the items assigned is not an Instrument.

  • AttributeError – When module provided through const_module is missing the required attribute instruments.

Note

Omit platforms, names, tags, inst_ids, instruments, and const_module to create an empty constellation.

Initialize the Constellation object.

property bounds

Obtain boundaries for Instruments in Constellation.

When setting, sets for all instruments in Constellation.

Parameters:

value (tuple or NoneType) – Tuple containing starting time and ending time for Instrument bounds attribute or None (default=None)

custom_attach(function, apply_inst=True, at_pos='end', args=None, kwargs=None)

Register a function to modify data of member Instruments.

Parameters:
  • function (str or function object) – Name of function or function object to be added to queue

  • apply_inst (bool) – Apply the custom function to all Instruments if True, or at the Constellation level if False. (default=True)

  • at_pos (str or int) – Accepts string ‘end’ or a number that will be used to determine the insertion order if multiple custom functions are attached to an Instrument object. (default=’end’).

  • args (list, tuple, or NoneType) – Ordered arguments following the instrument object input that are required by the custom function (default=None)

  • kwargs (dict or NoneType) – Dictionary of keyword arguments required by the custom function (default=None)

Note

Functions applied using custom_attach_cost may add, modify, or use the data within any Instrument inside of the function, and so should not return anything.

Constellation-level custom functions are applied after Instrument-level custom functions whenever the load method is called.

Unlike Instrument-level custom functions, Constellation-level custom functions should take a Constellation object as their first input argument.

See also

Instrument.custom_attach

base method for attaching custom functions

custom_clear()

Clear the custom function list.

See also

Instrument.custom_clear

base method for clearing custom functions

property date

Date for loaded data.

download(*args, **kwargs)

Download instrument data into Instrument object.data.

Parameters:
  • *args (list reference) – References a list of input arguments

  • **kwargs (dict reference) – References a dict of input keyword arguments

See also

Instrument.download

base method for loading Instrument data

Note

If individual instruments require specific kwargs that differ from other instruments, define that in the individual instrument rather than this method.

drop(names)

Drop variables (names) from metadata.

Parameters:

names (str or list-like) – String or list of strings specifying the variable names to drop

Raises:

KeyError – If all of the keys provided in names is not found in the standard metadata, labels, or header metadata. If a subset is missing, a logger warning is issued instead.

property empty

Boolean flag reflecting lack of data.

Note

True if there is no Instrument data in all Constellation Instrument.

property empty_partial

Boolean flag reflecting lack of data.

Note

True if there is no Instrument data in any Constellation Instrument.

property index

Obtain time index of loaded data.

load(*args, **kwargs)

Load instrument data into Instrument object.data.

Parameters:
  • *args (list reference) – References a list of input arguments

  • **kwargs (dict reference) – References a dict of input keyword arguments

See also

Instrument.load

base method for loading Instrument data

to_inst(common_coord=True, fill_method=None)

Combine Constellation data into an Instrument.

Parameters:
  • common_coord (bool) – For Constellations with any xarray.Dataset Instruments, True to include locations where all coordinate arrays cover, False to use the maximum location range from the list of coordinates (default=True)

  • fill_method (str or NoneType) – Fill method if common data coordinates do not match exactly. If one of ‘nearest’, ‘pad’/’ffill’, ‘backfill’/’bfill’, or None then no interpolation will occur. If ‘linear’, ‘zero’, ‘slinear’, ‘quadratic’, ‘cubic’, or ‘polynomial’ are used, then 1D or ND interpolation will be used. (default=None)

Returns:

inst – A pysat Instrument containing all data from the constellation at a common time index

Return type:

pysat.Instrument

Note

Uses the common index, self.index, that was defined using information from the Constellation Instruments in combination with a potential user-supplied resolution defined through self.index_res.

today()

Obtain UTC date for today, see pysat.Instrument for details.

tomorrow()

Obtain UTC date for tomorrow, see pysat.Instrument for details.

property variables

Retrieve list of uniquely named variables from all loaded data.

yesterday()

Obtain UTC date for yesterday, see pysat.Instrument for details.

Files

class pysat.Files(inst, data_dir=None, directory_format=None, update_files=False, file_format=None, write_to_disk=True, ignore_empty_files=False)

Maintain collection of files and associated methods.

Parameters:
  • inst (pysat.Instrument) – Instrument object

  • data_dir (str or NoneType) – Directory without sub-directory variables that allows one to bypass the directories provided by pysat.params[‘data_dirs’]. Only applied if the directory exists. (default=None)

  • directory_format (str or NoneType) – Sub-directory naming structure, which is expected to exist or be created within one of the pysat.params[‘data_dirs’] directories. Variables such as platform, name, tag, and inst_id will be filled in as needed using python string formatting, if a string is supplied. The default directory structure, which is used if None is specified, is provided by pysat.params[‘directory_format’] and is typically ‘{platform}/{name}/{tag}/{inst_id}’. (default=None)

  • update_files (bool) – If True, immediately query filesystem for instrument files and store (default=False)

  • file_format (str or NoneType) – File naming structure in string format. Variables such as year, month, day, etc. will be filled in as needed using python string formatting. The default file format structure is supplied in the instrument list_files routine. See pysat.utils.files.parse_delimited_filenames and pysat.utils.files.parse_fixed_width_filenames for more information. (default=None)

  • write_to_disk (bool) – If true, the list of Instrument files will be written to disk. (default=True)

  • ignore_empty_files (bool) – If True, the list of files found will be checked to ensure the filesizes are greater than zero. Empty files are removed from the stored list of files. (default=False)

directory_format
update_files
file_format
write_to_disk
ignore_empty_files
home_path

Path to the pysat information directory.

Type:

str

data_path

Path to the top-level directory containing instrument files, selected from data_paths.

Type:

str

data_paths

Available paths that pysat will use when looking for files. The class uses the first directory with relevant data, stored in data_path.

Type:

list of str

files

Series of data files, indexed by file start time.

Type:

pds.Series

inst_info

Contains pysat.Instrument parameters ‘platform’, ‘name’, ‘tag’, and ‘inst_id’, identifying the source of the files.

Type:

dict

list_files_creator

Experimental feature for Instruments that internally generate data and thus don’t have a defined supported date range.

Type:

functools.partial or NoneType

list_files_rtn

Method used to locate relevant files on the local system. Provided by associated pysat.Instrument object.

Type:

method

multi_file_day

Flag copied from associated pysat.Instrument object that indicates when data for day n may be found in files for days n-1, or n+1

Type:

bool

start_date

Date of first file, used as default start bound for instrument object, or None if no files are loaded.

Type:

datetime or NoneType

stop_date

Date of last file, used as default stop bound for instrument object, or None if no files are loaded.

Type:

datetime or NoneType

stored_file_name

Name of the hidden file containing the list of archived data files for this instrument.

Type:

str

sub_dir_path

directory_format string formatted for the local system.

Type:

str

Raises:

NameError – If pysat.params[‘data_dirs’] not assigned

Note

Interfaces with the list_files method for a given instrument support module to create an ordered collection of files in time, used primarily by the pysat.Instrument object to identify files to be loaded. The Files class mediates access to the files by datetime and contains helper methods for determining the presence of new files and filtering out empty files.

User should generally use the interface provided by a pysat.Instrument instance. Exceptions are the classmethod from_os, provided to assist in generating the appropriate output for an instrument routine.

Examples

# Instantiate instrument to generate file list
inst = pysat.Instrument(platform=platform, name=name, tag=tag,
                        inst_id=inst_id)
# First file
inst.files[0]

# Files from start up to stop (exclusive on stop)
start = dt.datetime(2009,1,1)
stop = dt.datetime(2009,1,3)
print(inst.files[start:stop])

# Files for date
print(inst.files[start])

# Files by slicing
print(inst.files[0:4])

# Get a list of new files. New files are those that weren't present
# the last time a given instrument's file list was stored.
new_files = inst.files.get_new()

# Search pysat appropriate directory for instrument files and
# update Files instance.
inst.files.refresh()

Initialize pysat.Files object.

copy()

Provide a deep copy of object.

Returns:

Copy of self

Return type:

Files class instance

classmethod from_os(data_path=None, format_str=None, two_digit_year_break=None, delimiter=None)

Produce a list of files and format it for Files class.

Parameters:
  • data_path (str or NoneType) – Top level directory to search files for. This directory is provided by pysat to the instrument_module.list_files functions as data_path. (default=None)

  • format_str (str or NoneType) – Provides the naming pattern of the instrument files and the locations of date information so an ordered list may be produced. Supports ‘year’, ‘month’, ‘day’, ‘hour’, ‘minute’, ‘second’, ‘version’, ‘revision’, and ‘cycle’ Ex: ‘cnofs_cindi_ivm_500ms_{year:4d}{month:02d}{day:02d}_v01.cdf’ (deafult=None)

  • two_digit_year_break (int or NoneType) – If filenames only store two digits for the year, then ‘1900’ will be added for years >= two_digit_year_break and ‘2000’ will be added for years < two_digit_year_break. If None, then four-digit years are assumed. (default=None)

  • delimiter (str or NoneType) – Delimiter string upon which files will be split (e.g., ‘.’). If None, filenames will be parsed presuming a fixed width format. (default=None)

Returns:

A Series of filenames indexed by time. See pysat.utils.files.process_parsed_filenames for details.

Return type:

pds.Series

Raises:

ValueError – If data_path or format_str is None

Note

Requires fixed_width or delimited filename

Does not produce a Files instance, but the proper output from instrument_module.list_files method

The ‘?’ may be used to indicate a set number of spaces for a variable part of the name that need not be extracted. ‘cnofs_cindi_ivm_500ms_{year:4d}{month:02d}{day:02d}_v??.cdf’

When parsing using fixed width filenames (delimiter=None), leading ‘*’ wilcards are supported, ‘{year:4d}{month:02d}{day:02d}_v??.cdf’, though the ‘’ is not supported after the first template variable. The ‘?’ wildcard may be used anywhere in the template string.

When parsing using a delimiter, the ‘*’ wildcard is supported when leading, trailing, or wholly contained between delimiters, such as ‘data_name-{year:04d}--{day:02d}.txt’, or ‘-{year:04d}-{day:02d}*’, where ‘-’ is the delimiter. There can not be a mixture of a template variable and ‘*’ without a delimiter in between, unless the ‘*’ occurs after the variable. The ‘?’ wildcard may be used anywhere in the template string.

The ‘day’ format keyword may be used to specify either day of month (if month is included) or day of year.

get_file_array(start, stop)

Return a list of filenames between and including start and stop.

Parameters:
  • start (array-like or str) – Filenames for start of returned filelist

  • stop (array-like or str) – Filenames inclusive of the ending of list provided by the stop time

Returns:

files – A list of filenames between and including start and stop times over all intervals.

Return type:

list

Note

start and stop must be of the same type: both array-like or both strings

get_index(fname)

Return index for a given filename.

Parameters:

fname (str) – Filename for the desired time index

Raises:

ValueError – Filename not in index

Note

If fname not found in the file information already attached to the instrument.files instance, then a files.refresh() call is made.

get_new()

List new files since last recorded file state.

Returns:

A datetime-index Series of all new fileanmes since the last known change to the files.

Return type:

pandas.Series

Note

pysat stores filenames in the user_home/.pysat directory. Filenames are stored if there is a change and either update_files is True at instrument object level or files.refresh() is called.

refresh()

Update list of files, if there are changes.

Note

Calls underlying list_files_rtn for the particular science instrument. Typically, these routines search in the pysat provided path, pysat_data_dir/platform/name/tag/inst_id, where pysat_data_dir is set by pysat.params[‘data_dirs’] = path.

set_top_level_directory(path)

Set top-level data directory.

Sets a valid self.data_path using provided top-level directory path and the associated pysat subdirectories derived from the directory_format attribute as stored in self.sub_dir_path

Parameters:

path (str) – Top-level path to use when looking for files. Must be in pysat.params[‘data_dirs’].

Raises:

ValueError – If path not in pysat.params[‘data_dirs’]

Warning

If there are Instrument files on the system under a top-level directory other than path, then, under certain conditions, self.data_path may be later updated by the object to point back to the directory with files.

Meta

class pysat.Meta(metadata=None, header_data=None, labels={'desc': ('desc', <class 'str'>), 'fill_val': ('fill', (<class 'float'>, <class 'int'>, <class 'str'>)), 'max_val': ('value_max', (<class 'float'>, <class 'int'>)), 'min_val': ('value_min', (<class 'float'>, <class 'int'>)), 'name': ('long_name', <class 'str'>), 'notes': ('notes', <class 'str'>), 'units': ('units', <class 'str'>)}, export_nan=None, data_types=None)

Store metadata for the Instrument and Constellation classes.

Parameters:
  • metadata (pandas.DataFrame) – DataFrame should be indexed by variable name that contains at minimum the standard_name (name), units, and long_name for the data stored in the associated pysat Instrument object.

  • header_data (dict or NoneType) – Global meta data to be assigned to the header attribute. Keys denote the desired attribute names and values the metadata for that attribute. (default=None)

  • labels (dict) – Dict where keys are the label attribute names and the values are tuples that have the label values and value types in that order. (default={‘units’: (‘units’, str), ‘name’: (‘long_name’, str), ‘notes’: (‘notes’, str), ‘desc’: (‘desc’, str), ‘min_val’: (‘value_min’, (float, int)), ‘max_val’: (‘value_max’, (float, int)), ‘fill_val’: (‘fill’, (float, int, str))})

  • export_nan (list or NoneType) – List of labels that should be exported even if their value is NaN or None for an empty list. When used, metadata with a value of NaN will be excluded from export. Will always allow NaN export for labels of the float type. (default=None)

  • data_types (dict or NoneType) – Dict of data types for variables names or None to determine after loading the data. (default=None)

data

Index is variable standard name, ‘units’, ‘long_name’, and other defaults are also stored along with additional user provided labels.

Type:

pandas.DataFrame

labels

Labels for MetaData attributes

Type:

MetaLabels

mutable

If True, attributes directly attached to Meta are modifiable

Type:

bool

header

Class containing global metadata

Type:

MetaHeader

Note

Meta object preserves the case of variables and attributes as it first receives the data. Subsequent calls to set new metadata with the same variable or attribute will use case of first call. Accessing or setting data thereafter is case insensitive. In practice, use is case insensitive but the original case is preserved. Case preseveration is built in to support writing files with a desired case to meet standards.

Supports any custom metadata values in addition to the expected metadata attributes (units, name, notes, desc, value_min, value_max, and fill). These base attributes may be used to programatically access and set types of metadata regardless of the string values used for the attribute. String values for attributes may need to be changed depending upon the standards of code or files interacting with pysat.

Meta objects returned as part of pysat loading routines are automatically updated to use the same values of units, etc. as found in the pysat.Instrument object.

Meta objects have a structure similar to the CF-1.6 netCDF data standard.

Examples

# Instantiate Meta object, default values for attribute labels are used
meta = pysat.Meta()

# Set several variable units. Note that other base parameters are not
# set below, and so will be assigned a default value.
meta['var_name'] = {meta.labels.name: 'Variable Name',
                    meta.labels.units: 'MegaUnits'}

# Update only 'units' to new value.  You can use the value of
# `meta.labels.units` instead of the class attribute, as was done in
# the above example.
meta['var_name'] = {'units': 'MU'}

# Custom meta data variables may be assigned using the same method.
# This example uses non-standard meta data variables 'scale', 'PI',
# and 'axis_multiplier'.  You can include or not include any of the
# standard meta data information.
meta['var_name'] = {'units': 'MU', 'long_name': 'Variable Name',
                    'scale': 'linear', 'axis_multiplier': 1e4}
meta['var_name'] = {'PI': 'Dr. R. Song'}

# Meta data may be assigned to multiple variables at once
meta[['var_name1', 'var_name2']] = {'long_name': ['Name1', 'Name2'],
                                    'units': ['Units1', 'Units2'],
                                    'scale': ['linear', 'linear']}

# Sometimes n-Dimensional (nD) variables require multi-dimensional
# meta data structures.
meta2 = pysat.Meta()
meta2['var_name41'] = {'long_name': 'name1of4', 'units': 'Units1'}
meta2['var_name42'] = {'long_name': 'name2of4', 'units': 'Units2'}
meta['var_name4'] = {'meta': meta2}

# Meta data may be assigned from another Meta object using dict-like
# assignments
key1 = 'var_name'
key2 = 'var_name4'
meta[key1] = meta2[key2]

# When accessing one meta data value for any data variable, first use
# the data variable and then the meta data label.
meta['var_name', 'fill']

# A more robust method is to use the available Meta variable attributes
# in the attached MetaLabels class object.
meta[key1, meta.labels.fill_val]

# You may change a label used by Meta object to have a different value
meta.labels.fill_val = '_FillValue'

# Note that the fill label is intended for use when interacting
# with external files. Thus, any fill values (NaN) within the Meta
# object are not updated when changing the metadata string label,
# or when updating the value representing fill data. A future update
# (Issue #707) will expand functionality to include these custom
# fill values when producing files.

Initialize pysat.Meta object.

accept_default_labels(other_meta)

Apply labels for default meta labels from other onto self.

Parameters:

other_meta (Meta) – Meta object to take default labels from

add_epoch_metadata(epoch_name)

Add epoch or time-index metadata if it is missing.

Parameters:

epoch_name (str) – Data key for time-index or epoch data

apply_meta_labels(other_meta)

Apply the existing meta labels from self onto different MetaData.

Parameters:

other_meta (Meta) – Meta object to have default labels applied

Returns:

other_updated – Meta object with the default labels applied

Return type:

Meta

attr_case_name(name)

Retrieve preserved case name for case insensitive value of name.

Parameters:

name (str or list) – Single or multiple variable name(s) to get stored case form.

Returns:

out_name – Maintains same type as input. Name(s) in proper case.

Return type:

str or list

Note

Checks first within standard attributes. If not found, returns supplied name as it is available for use. Intended to be used to help ensure that the same case is applied to all repetitions of a given variable name.

attrs()

Yield metadata products stored for each variable name.

concat(other_meta, strict=False)

Concats two metadata objects together.

Parameters:
  • other_meta (Meta) – Meta object to be concatenated

  • strict (bool) – If True, this flag ensures there are no duplicate variable names (default=False)

Returns:

mdata – Concatenated object

Return type:

Meta

Raises:

KeyError – If there are duplicate keys and the strict flag is True.

Note

Uses units and name label of self if other_meta is different

copy()

Deep copy of the meta object.

property data

Retrieve data.

May be set using data.setter(new_frame), where new_frame is a pandas Dataframe containing the metadata with label names as columns.

drop(names)

Drop variables (names) from metadata.

Parameters:

names (str or list-like) – String or list of strings specifying the variable names to drop

Raises:

KeyError – If all of the keys provided in names is not found in the standard metadata, labels, or header metadata. If a subset is missing, a logger warning is issued instead.

property empty

Return boolean True if there is no metadata.

Returns:

Returns True if there is no data, and False if there is data

Return type:

bool

classmethod from_csv(filename=None, col_names=None, sep=None, **kwargs)

Create instrument metadata object from csv.

Parameters:
  • filename (string) – Absolute filename for csv file or name of file stored in pandas instruments location

  • col_names (list-like collection of strings) – Column names in csv and resultant meta object

  • sep (string) – Column seperator for supplied csv filename

  • **kwargs (dict) – Optional kwargs used by pds.read_csv

Note

Column names must include at least [‘name’, ‘long_name’, ‘units’], which are assumed if col_names is None

hasattr_case_neutral(attr_name)

Case-insensitive check for attribute names in this class.

Parameters:

attr_name (str) – Name of attribute to find

Returns:

has_name – True if the case-insensitive check for attribute name is successful, False if no attribute name is present.

Return type:

bool

keep(keep_names)

Keep variables (keep_names) while dropping other parameters.

Parameters:

keep_names (list-like) – Variables to keep

keys()

Yield variable names stored for 1D variables.

merge(other)

Add metadata variables to self that are in other but not in self.

Parameters:

other (pysat.Meta) – Metadata to be merged into self

pop(label_name)

Remove and return metadata about variable.

Parameters:

label_name (str) – Meta key for a data variable

Returns:

output – Series of metadata for variable

Return type:

pds.Series

rename(mapper)

Update the preserved case name for mapped value of name.

Parameters:

mapper (dict or func) – Dictionary with old names as keys and new names as variables or a function to apply to all names

Note

Checks first within standard attributes. If not found, returns supplied name as it is available for use. Intended to be used to help ensure that the same case is applied to all repetitions of a given variable name.

to_dict(preserve_case=False)

Convert self into a dictionary.

Parameters:

preserve_case (bool) – If True, the case of variables within self are preserved. If False, all variables returned as lower case. (default=False)

Returns:

export_dict – A dictionary of the metadata for each variable of an output file

Return type:

dict

transfer_attributes_to_header(strict_names=False)

Transfer non-standard attributes in Meta to the MetaHeader object.

Parameters:

strict_names (bool) – If True, produces an error if the MetaHeader object already has an attribute with the same name to be copied (default=False)

Raises:

AttributeError – If strict_names is True and a global attribute would be updated.

transfer_attributes_to_instrument(inst, strict_names=False)

Transfer non-standard attributes in Meta to Instrument object.

Parameters:
  • inst (pysat.Instrument) – Instrument object to transfer attributes to

  • strict_names (bool) – If True, produces an error if the Instrument object already has an attribute with the same name to be copied (default=False)

Raises:

ValueError – If inst type is not pysat.Instrument.

Note

pysat.files.io.load_netCDF and similar routines are only able to attach netCDF4 attributes to a Meta object. This routine identifies these attributes and removes them from the Meta object. Intent is to support simple transfers to the pysat.Instrument object.

Will not transfer names that conflict with pysat default attributes.

var_case_name(name)

Provide stored name (case preserved) for case insensitive input.

Parameters:

name (str or list) – Single or multiple variable name(s) using any capitalization scheme.

Returns:

case_names – Maintains the same type as input, returning the stored name(s) of the meta object.

Return type:

str or list

Note

If name is not found (case-insensitive check) then name is returned, as input. This function is intended to be used to help ensure the case of a given variable name is the same across the Meta object.

MetaLabels

class pysat.MetaLabels(metadata=None, units=('units', <class 'str'>), name=('long_name', <class 'str'>), notes=('notes', <class 'str'>), desc=('desc', <class 'str'>), min_val=('value_min', (<class 'float'>, <class 'int'>)), max_val=('value_max', (<class 'float'>, <class 'int'>)), fill_val=('fill', (<class 'float'>, <class 'int'>, <class 'str'>)), **kwargs)

Store metadata labels for Instrument instance.

Parameters:
  • units (tuple) – Units label name and value type(s) (default=(‘units’, str))

  • name (tuple) – Name label name and value type(s) (default=(‘long_name’, str))

  • notes (tuple) – Notes label name and value type(s) (default=(‘notes’, str))

  • desc (tuple) – Description label name and value type(s) (default=(‘desc’, str))

  • min_val (tuple) – Minimum value label name and value type(s) (default=(‘value_min’, (float, int)))

  • max_val (tuple) – Maximum value label name and value type(s) (default=(‘value_max’, (float, int)))

  • fill_val (tuple) – Fill value label name and value type(s) (default=(‘fill’, (float, int, str)))

  • kwargs (dict) – Dictionary containing optional label attributes, where the keys are the attribute names and the values are tuples containing the label name and value type

meta

Coupled MetaData data object or NoneType

Type:

pandas.DataFrame or NoneType

units

String used to label units in storage (default=’units’)

Type:

str

name

String used to label long_name in storage (default=’long_name’)

Type:

str

notes

String used to label notes in storage (default=’notes’)

Type:

str

desc

String used to label variable descriptions in storage (default=’desc’)

Type:

str

min_val

String used to label typical variable value min limit in storage (default=’value_min’)

Type:

str

max_val

String used to label typical variable value max limit in storage (default=’value_max’)

Type:

str

fill_val

String used to label fill value in storage. The default follows the netCDF4 standards. (default=’fill’)

Type:

str

label_type

Dict with attribute names as keys and expected data types as values

Type:

dict

label_attrs

Dict with attribute names as values and attributes values as keys

Type:

dict

Raises:

TypeError – If meta data type is invalid

Note

Meta object preserves the case of variables and attributes as it first receives the data. Subsequent calls to set new metadata with the same variable or attribute will use case of first call. Accessing or setting data thereafter is case insensitive. In practice, use is case insensitive but the original case is preserved. Case preservation is built in to support writing files with a desired case to meet standards.

Supports any custom metadata values in addition to the expected metadata attributes (units, name, notes, desc, value_min, value_max, and fill). These base attributes may be used to programatically access and set types of metadata regardless of the string values used for the attribute. String values for attributes may need to be changed depending upon the standards of code or files interacting with pysat.

Meta objects returned as part of pysat loading routines are automatically updated to use the same values of units, etc. as found in the pysat.Instrument object.

Initialize the MetaLabels class.

default_values_from_attr(attr_name, data_type=None)

Retrieve the default values for each label based on their type.

Parameters:
  • attr_name (str) – Label attribute name (e.g., max_val)

  • data_type (type or NoneType) – Type for the data values or None if not specified (default=None)

Returns:

default_val – Sets NaN for all float values, -1 for all int values, and ‘’ for all str values except for ‘scale’, which defaults to ‘linear’, and None for any other data type

Return type:

str, float, int, or NoneType

Raises:

ValueError – For unknown attr_name

default_values_from_type(val_type, data_type=None)

Retrieve the default values for each label based on their type.

Parameters:
  • val_type (type) – Variable type for the value to be assigned to a MetaLabel

  • data_type (type or NoneType) – Type for the data values or None if not specified (default=None)

Returns:

default_val – Sets NaN for all float values, -1 for all int values, and ‘’ for all str values, and None for any other data type

Return type:

str, float, int, NoneType

drop(names)

Remove data from MetaLabels.

Parameters:

names (str or list-like) – Attribute or MetaData name(s)

Raises:

AttributeError or KeyError – If any part of names is missing and cannot be dropped

update(lattr, lname, ltype)

Update MetaLabels with a new label.

Parameters:
  • lattr (str) – Attribute for this new label

  • lname (str) – MetaData name for this label

  • ltype (type) – Expected data type for this label

Raises:

TypeError – If meta data type is invalid

MetaHeader

class pysat.MetaHeader(header_data=None)

Stores global metadata.

Parameters:

header_data (dict or NoneType) – Meta data to be assigned to the class. Keys denote the desired attribute names and values the metadata for that attribute. (default=None)

global_attrs

List of global attribute names

Type:

list

<attrs>

Attributes with names corresponding to the values of global_attrs, may have any type

to_dict()

Convert global attributes to a dictionary.

Initialize the MetaHeader class.

drop(names)

Drop variables (names) from MetaHeader.

Parameters:

names (list-like) – List of strings specifying the variable names to drop

to_dict()

Convert the header data to a dictionary.

Returns:

header_data – Global meta data where the keys are the attribute names and values the metadata for that attribute.

Return type:

dict

Orbits

class pysat.Orbits(inst, index=None, kind='local time', period=None)

Determine orbits on the fly and provide orbital data in .data.

Parameters:
  • inst (pysat.Instrument) – Instrument object for which the orbits will be determined

  • index (str or NoneType) – Name of the data series to use for determining orbit breaks (default=None)

  • kind (str) – Kind of orbit, which specifies how orbital breaks are determined. Expects one of: ‘local time’, ‘longitude’, ‘polar’, or ‘orbit’ - local time: negative gradients in lt or breaks in inst.data.index - longitude: negative gradients or breaks in inst.data.index - polar: zero crossings in latitude or breaks in inst.data.index - orbit: uses unique values of orbit number (default=’local time’)

  • period (np.timedelta64 or NoneType) – length of time for orbital period, used to gauge when a break in the datetime index inst.index is large enough to consider it a new orbit (default=None)

inst
kind
orbit_period

Pandas Timedelta that specifies the orbit period. Used instead of dt.timedelta to enable np.timedelta64 input. (default=97 min)

Type:

pds.Timedelta

num

Number of orbits in loaded data

Type:

int

orbit_index

Index of currently loaded orbit, zero indexed

Type:

int

Raises:

ValueError – If kind is unsupported

Note

Determines the locations of orbit breaks in the loaded data in inst.data and provides iteration tools and convenient orbit selection via inst.orbit[orbit num]

This class should not be called directly by the user, it uses the interface provided by inst.orbits where inst = pysat.Instrument()

Examples

# Use orbit_info Instrument keyword to pass all Orbit kwargs
orbit_info = {'index': 'longitude', 'kind': 'longitude'}
vefi = pysat.Instrument(platform='cnofs', name='vefi', tag='dc_b',
                        clean_level=None, orbit_info=orbit_info)

# Load data
vefi.load(date=start)

# Set the instrument bounds
start = dt.datetime(2009, 1, 1)
stop = dt.datetime(2009, 1, 10)
vefi.bounds(start, stop)

# Iterate over orbits
for loop_vefi in vefi.orbits:
    print('Next available orbit ', loop_vefi['dB_mer'])

# Load fifth orbit of first day
vefi.load(date=start)
vefi.orbits[5]

# Equivalent but less convenient load
vefi.orbits.load(5)

# Manually iterate forwards to the orbit
vefi.orbits.next()

# Manually iterate backwards to the previous orbit
vefi.orbits.prev()

Initialize pysat.Instrument.orbits object.

copy()

Provide a deep copy of object.

Returns:

Copy of self

Return type:

Orbits class instance

property current

Retrieve current orbit number.

Returns:

None if no orbit data. Otherwise, returns orbit number, beginning with zero. The first and last orbit of a day is somewhat ambiguous. The first orbit for day n is generally also the last orbit on day n - 1. When iterating forward, the orbit will be labeled as first (0). When iterating backward, orbit labeled as the last.

Return type:

int or NoneType

load(orbit_num)

Load a particular orbit into .data for loaded day.

Parameters:

orbit_num (int) – orbit number, 1 indexed (1-length or -1 to -length) with sign denoting forward or backward indexing

Raises:

ValueError – If index requested lies beyond the number of orbits

Note

A day of data must be loaded before this routine functions properly. If the last orbit of the day is requested, it will automatically be padded with data from the next day. The orbit counter will be reset to 1.

next()

Load the next orbit into associated Instrument.data object.

Raises:

RuntimeError – Placed in code that a user should never be able to reach

Note

Forms complete orbits across day boundaries. If no data loaded then the first orbit from the first date of data is returned.

prev()

Load the previous orbit into associated Instrument.data object.

Raises:

RuntimeError – Placed in code that a user should never be able to reach

Note

Forms complete orbits across day boundaries. If no data loaded then the last orbit of data from the last day is loaded.

Parameters

class pysat._params.Parameters(path=None, create_new=False)

Stores user parameters used by pysat.

Also stores custom user parameters provided the keys don’t conflict with default pysat parameters.

Parameters:
  • path (str) – If provided, the directory path will be used to load/store a parameters file with name ‘pysat_settings.json’ (default=None)

  • create_new (bool) – If True, a new parameters file is created. Will be created at path if provided. If not, file will be created in .pysat directory stored under the user’s home directory.

data

pysat user settings dictionary

Type:

dict

defaults

Default parameters (keys) and values used by pysat that include {‘clean_level’: ‘clean’, ‘directory_format’: os.path.join(‘{platform}’, ‘{name}’, ‘{tag}’, ‘{inst_id}’), ‘ignore_empty_files’: False, ‘update_files’: True, ‘file_timeout’: 10, ‘user_modules’ : {}, ‘warn_empty_file_list’: False}

Type:

dict

file_path

Location of file used to store settings

Type:

str

non_defaults

List of pysat parameters (strings) that don’t have a defined default and are unaffected by self.restore_defaults()

Type:

list

Raises:
  • ValueError – The ‘user_modules’ parameter may not be set directly by the user. Please use the pysat.utils.regsitry module to modify the packages stored in ‘user_modules’.

  • OSError – User provided path does not exist

Note

This method will look for ‘pysat_settings.json’ file first in the current working directory and then in the home ‘~/.pysat’ directory.

All pysat parameters are automatically stored whenever a parameter is assigned or modified. The default parameters and values tracked by this class are grouped by type below.

Values that map to the corresponding keywords on pysat.Instrument: clean_level, directory_format, ignore_empty_files, and update_files. See the Instrument docstring for more information on these keywords.

Values that map to internal pysat settings: file_timeout, user_modules, and warn_empty_file_list.

Stored pysat parameters without a working default value: data_dirs.

file_timeout - Time in seconds that pysat will wait to modify a busy file

user_modules - Stores information on modules registered by pysat

warn_empty_file_list - Raise a warning when no Instrument files are found

data_dirs - Directory(ies) where data are stored, in access order

Initialize Parameters object.

clear_and_restart()

Clear all stored settings and sets pysat defaults.

Note

pysat parameters without a default value are set to []

restore_defaults()

Restore default pysat parameters.

Note

Does not modify any stored custom user keys or pysat parameters without a default value.

store()

Store parameters using the filename specified in self.file_path.

Instrument Methods

The following methods support a variety of actions commonly needed by pysat.Instrument modules regardless of the data source.

General

Provides generalized routines for integrating instruments into pysat.

pysat.instruments.methods.general.filename_creator(value, format_str=None, start_date=None, stop_date=None)

Create filenames as needed to support use of generated pysat data sets.

Parameters:
  • value (slice) – Datetime slice, see _instrument.py, fname = self.files[date:(date + inc)]

  • format_str (str or NoneType) – File format template string (default=None)

  • start_date (datetime.datetime or NoneType) – First date supported (default=None)

  • stop_date (datetime.datetime or NoneType) – Last date supported (default=None)

Returns:

Created filenames from format_str indexed by datetime

Return type:

pandas.Series

Raises:

NotImplementedError – This method is a stub to support development of a potential feature slated for a future release.

pysat.instruments.methods.general.is_daily_file_cadence(file_cadence)

Evaluate file cadence to see if it is daily or greater than daily.

Parameters:

file_cadence (dt.timedelta or pds.DateOffset) – pysat assumes a daily file cadence, but some instrument data file contain longer periods of time. This parameter allows the specification of regular file cadences greater than or equal to a day (e.g., weekly, monthly, or yearly). (default=dt.timedelta(days=1))

Returns:

is_daily – True if the cadence is daily or less, False if the cadence is greater than daily

Return type:

bool

pysat.instruments.methods.general.list_files(tag='', inst_id='', data_path='', format_str=None, supported_tags=None, file_cadence=datetime.timedelta(days=1), two_digit_year_break=None, delimiter=None)

Return a Pandas Series of every file for chosen Instrument data.

This routine provides a standard interface for pysat instrument modules.

Parameters:
  • tag (str) – Tag name used to identify particular data set to be loaded. This input is nominally provided by pysat itself. (default=’’)

  • inst_id (str) – Instrument ID used to identify particular data set to be loaded. This input is nominally provided by pysat itself. (default=’’)

  • data_path (str) – Path to data directory. This input is nominally provided by pysat itself. (default=’’)

  • format_str (string or NoneType) – User specified file format. If None is specified, the default formats associated with the supplied tags are used. See Files.from_os format_str kwarg for more details. (default=None)

  • supported_tags (dict or NoneType) – Keys are inst_id, each containing a dict keyed by tag where the values are file format template strings. (default=None)

  • file_cadence (dt.timedelta or pds.DateOffset) – pysat assumes a daily file cadence, but some instrument data file contain longer periods of time. This parameter allows the specification of regular file cadences greater than or equal to a day (e.g., weekly, monthly, or yearly). (default=dt.timedelta(days=1))

  • two_digit_year_break (int or NoneType) – If filenames only store two digits for the year, then ‘1900’ will be added for years >= two_digit_year_break and ‘2000’ will be added for years < two_digit_year_break. If None, then four-digit years are assumed. (default=None)

  • delimiter (str or NoneType) – Delimiter string upon which files will be split (e.g., ‘.’). If None, filenames will be parsed presuming a fixed width format. (default=None)

Returns:

out – A class containing the verified available files

Return type:

pysat.Files.from_os : pysat._files.Files

Note

This function is intended to be invoked by pysat and not the end user.

Examples

from pysat.instruments.methods import general as mm_gen
fname = 'instrument_{year:04d}{month:02d}{day:02d}_v{version:02}.cdf'
supported_tags = {'tag_name': fname}
list_files = functools.partial(mm_gen.list_files,
                               supported_tags=supported_tags)
pysat.instruments.methods.general.load_csv_data(fnames, read_csv_kwargs=None)

Load CSV data from a list of files into a single DataFrame.

Parameters:
  • fnames (array-like) – Series, list, or array of filenames

  • read_csv_kwargs (dict or NoneType) – Dict of kwargs to apply to pds.read_csv. (default=None)

Returns:

data – Data frame with data from all files in the fnames list

Return type:

pds.DataFrame

See also

pds.read_csv

pysat.instruments.methods.general.remove_leading_text(inst, target=None)

Remove leading text on variable names.

Parameters:
  • inst (pysat.Instrument) – associated pysat.Instrument object

  • target (str or list of strings) – Leading string to remove. If none supplied, returns unmodified

Testing

Standard functions for the test instruments.

pysat.instruments.methods.testing.clean(self, test_clean_kwarg=None)

Pass through when asked to clean a test instrument.

Parameters:

test_clean_kwarg (any) – Testing keyword. If these keywords contain ‘logger’, ‘warning’, or ‘error’, the message entered as the value to that key will be returned as a logging.WARNING, UserWarning, or ValueError, respectively. If the ‘change’ kwarg is set, the clean level will be changed to the specified value. (default=None)

pysat.instruments.methods.testing.concat_data(self, new_data, **kwargs)

Concatonate data to self.data for extra time dimensions.

Parameters:
  • new_data (xarray.Dataset or list of such objects) – New data objects to be concatonated

  • **kwargs (dict) – Optional keyword arguments passed to xr.concat

Note

Expects the extra time dimensions to have a variable name that starts with ‘time’, and no other dimensions to have a name that fits this format.

pysat.instruments.methods.testing.create_files(inst, start, stop, freq='1D', use_doy=True, root_fname='pysat_testing_{year:04d}_{day:03d}.txt', version=False, content=None, timeout=None)

Create a file set using the year and day of year.

Parameters:
  • inst (pysat.Instrument) – A test instrument, used to generate file path

  • start (dt.datetime) – The date for the first file to create

  • stop (dt.datetime) – The date for the last file to create

  • freq (str) – Frequency of file output. Codes correspond to pandas.date_range codes (default=’1D’)

  • use_doy (bool) – If True use Day of Year (doy), if False use day of month and month. (default=True)

  • root_fname (str) – The format of the file name to create. Supports standard pysat template variables ‘year’, ‘month’, ‘day’, ‘hour’, ‘minute’, ‘second’, ‘version’, ‘revision’, ‘cycle’. (default=’pysat_testing_{year:04d}_{day:03d}.txt’)

  • version (bool) – If True, iterate over version / revision / cycle. If False, ignore version / revision / cycle. (default=False)

  • content (str) – Custom text to write to temporary files (default=None)

  • timeout (float) – Time is seconds to lock the files being created. If None, no timeout is used. (default=None)

Examples

# Commands below create empty files located at `inst.files.data_path`,
# one per day, spanning 2008, where `year`, `month`, and `day`
# are filled in using the provided template string appropriately.
# The produced files are named like: 'pysat_testing_2008_01_01.txt'
import datetime as dt
inst = pysat.Instrument('pysat', 'testing')
root_fname='pysat_testing_{year:04d}_{month:02d}_{day:02d}.txt'
create_files(inst, dt.datetime(2008, 1, 1), dt.datetime(2008, 12, 31),
             root_fname=root_fname, use_doy=False)

# The command below uses the default values for `create_files`, which
# produces a daily set of files, labeled by year and day of year.
# The files are names like: 'pysat_testing_2008_001.txt'
create_files(inst, dt.datetime(2008, 1, 1), dt.datetime(2008, 12, 31))
pysat.instruments.methods.testing.define_period()

Define the default periods for the fake data functions.

Returns:

def_period – Dictionary of periods to use in test instruments

Return type:

dict

Note

Local time and longitude slightly out of sync to simulate motion of Earth

pysat.instruments.methods.testing.define_range()

Define the default ranges for the fake data functions.

Returns:

def_range – Dictionary of periods to use in test instruments

Return type:

dict

pysat.instruments.methods.testing.download(date_array, tag, inst_id, data_path='', user=None, password=None, test_download_kwarg=None)

Pass through when asked to download for a test instrument.

Parameters:
  • date_array (array-like) – list of datetimes to download data for. The sequence of dates need not be contiguous.

  • tag (str) – Tag identifier used for particular dataset. This input is provided by pysat.

  • inst_id (str) – Instrument ID string identifier used for particular dataset. This input is provided by pysat.

  • data_path (str) – Path to directory to download data to. (default=’’)

  • user (string or NoneType) – User string input used for download. Provided by user and passed via pysat. If an account is required for downloads this routine here must error if user not supplied. (default=None)

  • password (string or NoneType) – Password for data download. (default=None)

  • test_download_kwarg (any) – Testing keyword (default=None)

Raises:

ValueError – When user/password are required but not supplied

Warning

When no download support will be provided

Note

This routine is invoked by pysat and is not intended for direct use by the end user.

pysat.instruments.methods.testing.generate_fake_data(t0, num_array, period=5820, data_range=[0.0, 24.0], cyclic=True)

Generate fake data over a given range.

Parameters:
  • t0 (float) – Start time in seconds

  • num_array (array_like) – Array of time steps from t0. This is the index of the fake data

  • period (int) – The number of seconds per period. (default = 5820)

  • data_range (float) – For cyclic functions, the range of data values cycled over one period. Not used for non-cyclic functions. (default = 24.0)

  • cyclic (bool) – If True, assume that fake data is a cyclic function (ie, longitude, slt) that will reset to data_range[0] once it reaches data_range[1]. If False, continue to monotonically increase

Returns:

data – Array with fake data

Return type:

array-like

pysat.instruments.methods.testing.generate_times(fnames, num, freq='1S', start_time=None)

Construct list of times for simulated instruments.

Parameters:
  • fnames (list) – List of filenames.

  • num (int) – Maximum number of times to generate. Data points will not go beyond the current day.

  • freq (str) – Frequency of temporal output, compatible with pandas.date_range [default : ‘1S’]

  • start_time (dt.timedelta or NoneType) – Offset time of start time in fractional hours since midnight UT. If None, set to 0. (default=None)

Returns:

  • uts (array) – Array of integers representing uts for a given day

  • index (pds.DatetimeIndex) – The DatetimeIndex to be used in the pysat test instrument objects

  • date (datetime) – The requested date reconstructed from the fake file name

pysat.instruments.methods.testing.init(self, test_init_kwarg=None)

Initialize the Instrument object with instrument specific values.

Runs once upon instantiation.

Shifts time index of files by 5-minutes if mangle_file_dates set to True at pysat.Instrument instantiation.

Creates a file list for a given range if the file_date_range keyword is set at instantiation.

Parameters:

test_init_kwarg (any) – Testing keyword (default=None)

pysat.instruments.methods.testing.initialize_test_meta(epoch_name, data_keys)

Initialize meta data for test instruments.

This routine should be applied to test instruments at the end of the load routine.

Parameters:
  • epoch_name (str) – The variable name of the instrument epoch.

  • data (pds.DataFrame or xr.Dataset) – The dataset keys from the instrument.

pysat.instruments.methods.testing.list_files(tag='', inst_id='', data_path='', format_str=None, file_date_range=None, test_dates=None, mangle_file_dates=False, test_list_files_kwarg=None)

Produce a fake list of files spanning three years.

Parameters:
  • tag (str) – Tag name used to identify particular data set to be loaded. This input is nominally provided by pysat itself. (default=’’)

  • inst_id (str) – Instrument ID used to identify particular data set to be loaded. This input is nominally provided by pysat itself. (default=’’)

  • data_path (str) – Path to data directory. This input is nominally provided by pysat itself. (default=’’)

  • format_str (str or NoneType) –

    File format string. This is passed from the user at pysat.Instrument

    instantiation, if provided. (default=None)

  • file_date_range (pds.date_range) – File date range. The default mode generates a list of 3 years of daily files (1 year back, 2 years forward) based on the test_dates passed through below. Otherwise, accepts a range of files specified by the user. (default=None)

  • test_dates (dt.datetime or NoneType) – Pass the _test_date object through from the test instrument files

  • mangle_file_dates (bool) – If True, file dates are shifted by 5 minutes. (default=False)

  • test_list_files_kwarg (any) – Testing keyword (default=None)

Return type:

Series of filenames indexed by file time

pysat.instruments.methods.testing.list_remote_files(tag='', inst_id='', data_path='', format_str=None, start=None, stop=None, test_dates=None, user=None, password=None, mangle_file_dates=False, test_list_remote_kwarg=None)

Produce a fake list of files to simulate new files on a remote server.

Note

List spans three years and one month.

Parameters:
  • tag (str) – Tag name used to identify particular data set. This input is nominally provided by pysat itself. (default=’’)

  • inst_id (str) – Instrument ID used to identify particular data. This input is nominally provided by pysat itself. (default=’’)

  • data_path (str) – Path to data directory. This input is nominally provided by pysat itself. (default=’’)

  • format_str (str or NoneType) – file format string (default=None)

  • start (dt.datetime or NoneType) – Starting time for file list. A None value will start 1 year before test_date (default=None)

  • stop (dt.datetime or NoneType) – Ending time for the file list. A None value will stop 2 years 1 month after test_date (default=None)

  • test_dates (dt.datetime or NoneType) – Pass the _test_date object through from the test instrument files

  • user (str or NoneType) – User string input used for download. Provided by user and passed via pysat. If an account is required for dowloads this routine here must error if user not supplied. (default=None)

  • password (str or NoneType) – Password for data download. (default=None)

  • mangle_file_dates (bool) – If True, file dates are shifted by 5 minutes. (default=False)

  • test_list_remote_kwarg (any) – Testing keyword (default=None)

Returns:

Filenames indexed by file time, see list_files for more info

Return type:

pds.Series

pysat.instruments.methods.testing.non_monotonic_index(index)

Adjust the index to be non-monotonic.

Parameters:

index (pds.DatetimeIndex) – The index generated in an instrument test file.

Returns:

new_index – A non-montonic index

Return type:

pds.DatetimeIndex

pysat.instruments.methods.testing.non_unique_index(index)

Adjust the index to be non-unique.

Parameters:

index (pds.DatetimeIndex) – The index generated in an instrument test file.

Returns:

new_index – A non-unique index

Return type:

pds.DatetimeIndex

pysat.instruments.methods.testing.preprocess(self, test_preprocess_kwarg=None)

Perform standard preprocessing.

This routine is automatically applied to the Instrument object on every load by the pysat nanokernel (first in queue). Object modified in place.

Parameters:

test_preprocess_kwarg (any) – Testing keyword (default=None)

Utilities

The utilites module contains functions used throughout the pysat package. This includes utilities for determining the available Instruments, loading files, et cetera.

Core Utilities

These utilities are available directly from the pysat.utils module.

class pysat.utils._core.NetworkLock(*args, **kwargs)

Unit tests for NetworkLock manager.

Initialize lock manager compatible with networked file systems.

Parameters:
  • *args (list reference) – References a list of input arguments

  • **kwargs (dict reference) – References a dict of input keyword argument

Note

See portalocker.utils.Lock for more details (portalocker.utils.Lock)

Examples

from pysat.utils import NetworkLock

with NetworkLock(file_to_be_written, 'w') as locked_file:
    locked_file.write('content')
release()

Release the Lock from the file system.

From portalocker docs:

On some networked filesystems it might be needed to force a os.fsync() before closing the file so it’s actually written before another client reads the file.

pysat.utils._core.available_instruments(inst_loc=None)

Obtain basic information about instruments in a given subpackage.

Parameters:

inst_loc (python subpackage or NoneType) – The location of the instrument subpackage (e.g., pysat.instruments) or None to list all registered instruments (default=None)

Returns:

inst_info – Nested dictionary with ‘platform’, ‘name’, ‘inst_module’, ‘inst_ids_tags’, ‘inst_id’, and ‘tag’ with the tag descriptions given as the value for each unique dictionary combination.

Return type:

dict

pysat.utils._core.display_available_instruments(inst_loc=None, show_inst_mod=None, show_platform_name=None)

Display basic information about instruments in a given subpackage.

Parameters:
  • inst_loc (python subpackage or NoneType) – The location of the instrument subpackage (e.g., pysat.instruments) or None to list all registered instruments (default=None)

  • show_inst_mod (boolean or NoneType) – Displays the instrument module if True, does not include it if False, and reverts to standard display based on inst_loc type if None. (default=None)

  • show_platform_name (boolean or NoneType) – Displays the platform and name if True, does not include it if False, and reverts to standard display based on inst_loc type if None. (default=None)

Note

Prints to standard out, a user-friendly interface for availabe_instruments. Defaults to including the instrument module and not the platform/name values if inst_loc is an instrument module and to including the platform/name values and not the instrument module if inst_loc is None (listing the registered instruments).

pysat.utils._core.display_instrument_stats(inst_locs=None)

Display supported instrument stats.

Parameters:

inst_locs (list of packages) – List of instrument library modules to inspect for pysat support. If None, report on default pysat package. (default=None)

pysat.utils._core.fmt_output_in_cols(out_strs, ncols=3, max_num=6, lpad=None)

Format a string with desired output values in columns.

Parameters:
  • out_strs (array-like) – Array like object containing strings to print

  • ncols (int) – Number of columns to print (default=3)

  • max_num (int) – Maximum number of out_strs members to print. Best display achieved if this number is divisable by 2 and ncols (default=6)

  • lpad (int or NoneType) – Left padding or None to use length of longest string + 1 (default=None)

Returns:

output – String with desired data formatted in columns

Return type:

str

pysat.utils._core.generate_instrument_list(inst_loc, user_info=None)

Iterate through and classify instruments in a given subpackage.

Parameters:
  • inst_loc (python subpackage) – The location of the instrument subpackage to test, e.g., ‘pysat.instruments’

  • user_info (dict or NoneType) – Nested dictionary with user and password info for instrument module name. If None, no user or password is assumed. (default=None) EX: user_info = {‘jro_isr’: {‘user’: ‘myname’, ‘password’: ‘email’}}

Returns:

output – Dictionary with keys ‘names’, ‘download’, ‘no_download’ that contain lists with different information for each key: ‘names’ - list of platform_name combinations ‘download’ - list of dicts containing ‘inst_module’, ‘tag’, and ‘inst_id’ for instruments with download routines ‘load_options’ - list of dicts containing load and download options ‘no_download’ - list of dicts containing ‘inst_module’, ‘tag’, and ‘inst_id’ for instruments without download routines

Return type:

dict

Note

This routine currently supports classification of instruments for unit tests both in the core package and in seperate instrument packages that use pysat.

pysat.utils._core.get_mapped_value(value, mapper)

Adjust value using mapping dict or function.

Parameters:
  • value (str) – MetaData variable name to be adjusted

  • mapper (dict or function) – Dictionary with old names as keys and new names as variables or a function to apply to all names

Returns:

mapped_val – Adjusted MetaData variable name or NoneType if input value should stay the same

Return type:

str or NoneType

pysat.utils._core.listify(iterable)

Produce a flattened list of items from input that may not be iterable.

Parameters:

iterable (iter-like) – An iterable object that will be wrapped within a list

Returns:

An enclosing 1-D list of iterable if not already a list

Return type:

list

Note

Does not accept dict_keys or dict_values as input.

pysat.utils._core.scale_units(out_unit, in_unit)

Determine the scaling factor between two units.

Parameters:
  • out_unit (str) – Desired unit after scaling

  • in_unit (str) – Unit to be scaled

Returns:

unit_scale – Scaling factor that will convert from in_units to out_units

Return type:

float

Note

Accepted units include degrees (‘deg’, ‘degree’, ‘degrees’), radians (‘rad’, ‘radian’, ‘radians’), hours (‘h’, ‘hr’, ‘hrs’, ‘hour’, ‘hours’), lengths (‘m’, ‘km’, ‘cm’), volumes (‘m-3’, ‘cm-3’, ‘/cc’, ‘n/cc’, ‘km-3’, ‘m$^{-3}$’, ‘cm$^{-3}$’, ‘km$^{-3}$’), and speeds (‘m/s’, ‘cm/s’, ‘km/s’, ‘m s$^{-1}$’, ‘cm s$^{-1}$’, ‘km s$^{-1}$’, ‘m s-1’, ‘cm s-1’, ‘km s-1’). Can convert between degrees, radians, and hours or different lengths, volumes, or speeds.

Examples

import numpy as np
two_pi = 2.0 * np.pi
scale = scale_units("deg", "RAD")
two_pi *= scale
two_pi # will show 360.0
pysat.utils._core.stringify(strlike)

Convert input into a str type.

Parameters:

strlike (str or bytes) – Input values in str or byte form

Returns:

strlike – If input is not string-like then the input type is retained.

Return type:

str or input type

pysat.utils._core.update_fill_values(inst, variables=None, new_fill_val=nan)

Update Instrument data so that the fill value is consistent with Meta.

Parameters:
  • inst (pysat.Instrument) – Instrument object with data loaded

  • variables (str, list, or NoneType) – List of variables to update or None to update all (default=None)

  • new_fill_val (any) – New fill value to use (default=np.nan)

Note

On Windows OS, this function may not work for data variables that are also xarray coordinates.

Coordinates

Coordinate transformation functions for pysat.

pysat.utils.coords.adjust_cyclic_data(samples, high=6.283185307179586, low=0.0)

Adjust cyclic values such as longitude to a different scale.

Parameters:
  • samples (array_like) – Input array

  • high (float or int) – Upper boundary for circular standard deviation range (default=2 pi)

  • low (float or int) – Lower boundary for circular standard deviation range (default=0)

  • axis (int or NoneType) – Axis along which standard deviations are computed. The default is to compute the standard deviation of the flattened array

Returns:

out_samples – Circular standard deviation

Return type:

float

pysat.utils.coords.calc_solar_local_time(inst, lon_name=None, slt_name='slt', apply_modulus=True, ref_date=None)

Append solar local time to an instrument object.

Parameters:
  • inst (pysat.Instrument) – Instrument class object to be updated

  • lon_name (str) – Name of the longtiude data key (assumes data are in degrees)

  • slt_name (str) – Name of the output solar local time data key (default=’slt’)

  • apply_modulus (bool) – If True, SLT values are confined to [0, 24), if False they may be positive or negative based on the value of their universal time relative to that of the reference date ref_date. (default=True)

  • ref_date (dt.datetime or NoneType) – Reference initial date. If None, will use the date found at inst.date. Only valid if apply_modulus is True. (default=None)

Note

Updates Instrument data in column specified by slt_name, as well as Metadata

pysat.utils.coords.establish_common_coord(coord_vals, common=True)

Create a coordinate array that is appropriate for multiple data sets.

Parameters:
  • coord_vals (list-like) – A list of coordinate arrays of the same type: e.g., all geodetic latitude in degrees

  • common (bool) – True to include locations where all coordinate arrays cover, False to use the maximum location range from the list of coordinates (default=True)

Returns:

out_coord – An array appropriate for the list of coordinate values

Return type:

array-like

Note

Assumes that the supplied coordinates are distinct representations of the same value in the same units and range (e.g., longitude in degrees from 0-360).

pysat.utils.coords.expand_xarray_dims(data_list, meta, dims_equal=False, exclude_dims=None)

Ensure that dimensions do not vary when concatenating data.

Parameters:
  • data_list (list-like) – List of xr.Dataset objects with the same dimensions and variables

  • meta (pysat.Meta) – Metadata for the data in data_list

  • dims_equal (bool) – Assert that all xr.Dataset objects have the same dimensions if True, the Datasets in data_list may have differing dimensions if False. (default=False)

  • exclude_dims (list-like or NoneType) – Dimensions to exclude from evaluation or None (default=None)

Returns:

out_list – List of xr.Dataset objects with the same dimensions and variables, and with dimensions that all have the same values and data padded when needed.

Return type:

list-like

pysat.utils.coords.update_longitude(inst, lon_name=None, high=180.0, low=-180.0)

Update longitude to the desired range.

Parameters:
  • inst (pysat.Instrument) – Instrument class object to be updated

  • lon_name (str) – Name of the longtiude data in inst

  • high (float) – Highest allowed longitude value (default=180.0)

  • low (float) – Lowest allowed longitude value (default=-180.0)

Note

Updates instrument data in column provided by lon_name

I/O

Input/Output utilities for pysat data.

pysat.utils.io.add_netcdf4_standards_to_metadict(inst, in_meta_dict, epoch_name, check_type=None, export_nan=None)

Add metadata variables needed to meet SPDF ISTP/IACG NetCDF standards.

Parameters:
  • inst (pysat.Instrument) – Object containing data and meta data

  • in_meta_dict (dict) – Metadata dictionary, can be obtained from inst.meta.to_dict().

  • epoch_name (str) – Name for epoch or time-index variable.

  • check_type (NoneType or list) – List of keys associated with meta_dict that should have the same data type as coltype. Passed to pysat.utils.io.filter_netcdf4_metadata. (default=None)

  • export_nan (NoneType or list) – Metadata parameters allowed to be NaN. Passed along to pysat.utils.io.filter_netcdf4_metadata. (default=None)

Returns:

in_meta_dict – Input dictionary with additional information for standards.

Return type:

dict

See also

filter_netcdf4_metadata

Removes unsupported SPDF ISTP/IACG variable metadata.

Note

Removes unsupported SPDF ISTP/IACG variable metadata.

For xarray inputs, converts datetimes to integers representing milliseconds since 1970. This does not include the main index, ‘time’.

pysat.utils.io.apply_table_translation_from_file(trans_table, meta_dict)

Modify meta_dict by applying trans_table to metadata keys.

Parameters:
  • trans_table (dict) – Mapping of metadata label used in a file to new value.

  • meta_dict (dict) – Dictionary with metadata information from a loaded file.

Returns:

filt_dictmeta_dict after the mapping in trans_table applied.

Return type:

dict

Note

The purpose of this function is to maintain default compatibility with meta.labels and existing code that writes and reads netcdf files via pysat while also changing the labels for metadata within the file.

pysat.utils.io.apply_table_translation_to_file(inst, meta_dict, trans_table=None)

Translate labels in meta_dict using trans_table.

Parameters:
  • inst (pysat.Instrument) – Instrument object with data to be written to file.

  • meta_dict (dict) – Output starting from Instrument.meta.to_dict() supplying attribute data.

  • trans_table (dict or NoneType) – Keyed by current metalabels containing a list of metadata labels to use within the returned dict. If None, a default translation using self.labels will be used except self.labels.fill_val will be mapped to [‘_FillValue’, ‘FillVal’, ‘fill’].

Returns:

export_dict – A dictionary of the metadata for each variable of an output file.

Return type:

dict

Raises:

ValueError – If there is a duplicated variable label in the translation table

pysat.utils.io.default_from_netcdf_translation_table(meta)

Create metadata translation table with minimal netCDF requirements.

Parameters:

meta (pysat.Meta) – Meta instance to get appropriate default values for.

Returns:

trans_table – Keyed by self.labels with a list of strings to be used when writing netcdf files.

Return type:

dict

Note

The purpose of this function is to maintain default compatibility with meta.labels and existing code that writes and reads netcdf files via pysat while also changing the labels for metadata within the file.

pysat.utils.io.default_to_netcdf_translation_table(inst)

Create metadata translation table with minimal netCDF requirements.

Parameters:

inst (pysat.Instrument) – Instrument object to be written to file.

Returns:

trans_table – Keyed by self.labels with a list of strings to be used when writing netcdf files.

Return type:

dict

pysat.utils.io.filter_netcdf4_metadata(inst, mdata_dict, coltype, remove=False, check_type=None, export_nan=None, varname='')

Filter metadata properties to be consistent with netCDF4.

Parameters:
  • inst (pysat.Instrument) – Object containing data and metadata

  • mdata_dict (dict) – Dictionary equivalent to Meta object info

  • coltype (type or dtype) – Data type provided by pysat.Instrument._get_data_info. If boolean, int will be used instead.

  • remove (bool) – Remove metadata that should be the same type as coltype, but isn’t if True. Recast data if False. (default=False)

  • check_type (list or NoneType) – List of keys associated with meta_dict that should have the same data type as coltype. These will be removed from the filtered output if they differ. If None, this check will not be performed. (default=None)

  • export_nan (list or NoneType) – Metadata parameters allowed to be NaN. If None, assumes no Metadata parameters are allowed to be Nan. (default=None)

  • varname (str) – Variable name to be processed. Used for error feedback. (default=’’)

Returns:

filtered_dict – Modified as needed for netCDf4

Return type:

dict

Warning

UserWarning

When data are removed due to conflict between value and type, and removal was not explicitly requested (remove is False).

Note

Metadata values that are NaN and not listed in export_nan are removed.

pysat.utils.io.inst_to_netcdf(inst, fname, base_instrument=None, epoch_name=None, mode='w', zlib=False, complevel=4, shuffle=True, preserve_meta_case=False, check_type=None, export_nan=None, export_pysat_info=True, unlimited_time=True, meta_translation=None, meta_processor=None)

Store pysat data in a netCDF4 file.

Parameters:
  • inst (pysat.Instrument) – Instrument object with loaded data to save

  • fname (str) – Output filename with full path

  • base_instrument (pysat.Instrument or NoneType) – Class used as a comparison, only attributes that are present with inst and not on base_instrument are written to netCDF. Using None assigns an unmodified pysat.Instrument object. (default=None)

  • epoch_name (str or NoneType) – Label in file for datetime index of inst. If None, uses ‘Epoch’ for pandas data formats, and uses ‘time’ for xarray formats.

  • mode (str) – Write (‘w’) or append (‘a’) mode. If mode=’w’, any existing file at this location will be overwritten. If mode=’a’, existing variables will be overwritten. (default=’w’)

  • zlib (bool) – Flag for engaging zlib compression, if True compression is used (default=False)

  • complevel (int) – An integer flag between 1 and 9 describing the level of compression desired. Ignored if zlib=False. (default=4)

  • shuffle (bool) – The HDF5 shuffle filter will be applied before compressing the data. This significantly improves compression. Ignored if zlib=False. (default=True)

  • preserve_meta_case (bool) – Flag specifying the case of the meta data variable strings. If True, then the variable strings within the MetaData object (which preserves case) are used to name variables in the written netCDF file. If False, then the variable strings used to access data from the pysat.Instrument object are used instead. (default=False)

  • check_type (list or NoneType) – List of keys associated with meta_dict that should have the same data type as coltype. These will be removed from the filtered output if they differ. If None, this check will default to include fill, min, and max values. (default=None)

  • export_nan (list or NoneType) – By default, the metadata variables where a value of NaN is allowed and written to the netCDF4 file is maintained by the Meta object attached to the pysat.Instrument object. A list supplied here will override the settings provided by Meta, and all parameters included will be written to the file. If not listed and a value is NaN then that attribute simply won’t be included in the netCDF4 file. (default=None)

  • export_pysat_info (bool) – Appends the platform, name, tag, and inst_id to the metadata if True. Otherwise these attributes are lost. (default=True)

  • unlimited_time (bool) – Flag specifying whether or not the epoch/time dimension should be unlimited; it is when the flag is True. (default=True)

  • meta_translation (dict or NoneType) – The keys in the input dict are used to map metadata labels for inst to one or more values used when writing the file. E.g., {meta.labels.fill_val: [‘FillVal’, ‘_FillValue’]} would result in both ‘FillVal’ and ‘_FillValue’ being used to store variable fill values in the netCDF file. Overrides use of inst._meta_translation_table.

  • meta_processor (function or NoneType) – If not None, a dict containing all of the metadata will be passed to meta_processor which should return a processed version of the input dict. If None and inst has a valid inst._export_meta_post_processing function then that function is used for meta_processor. (default=None)

Note

Depending on which kwargs are specified, the input class, inst, will be modified.

Stores 1-D data along dimension ‘Epoch’ - the date time index.

  • The name of the main variable column is used to prepend subvariable names within netCDF, var_subvar_sub

  • A netCDF4 dimension is created for each main variable column with higher order data; first dimension Epoch

  • The index organizing the data stored as a dimension variable and long_name will be set to ‘Epoch’.

  • from_netcdf uses the variable dimensions to reconstruct data structure

All attributes attached to instrument meta are written to netCDF attrs with the exception of ‘Date_End’, ‘Date_Start’, ‘File’, ‘File_Date’, ‘Generation_Date’, and ‘Logical_File_ID’. These are defined within to_netCDF at the time the file is written, as per the adopted standard, SPDF ISTP/IACG Modified for NetCDF. Atrributes ‘Conventions’ and ‘Text_Supplement’ are given default values if not present.

pysat.utils.io.load_netcdf(fnames, strict_meta=False, file_format='NETCDF4', epoch_name=None, epoch_unit='ms', epoch_origin='unix', pandas_format=True, decode_timedelta=False, combine_by_coords=True, meta_kwargs=None, meta_processor=None, meta_translation=None, drop_meta_labels=None, decode_times=None, strict_dim_check=True)

Load netCDF-3/4 file produced by pysat.

Parameters:
  • fnames (str or array_like) – Filename(s) to load, will fail if None. (default=None)

  • strict_meta (bool) – Flag that checks if metadata across fnames is the same if True (default=False)

  • file_format (str) – file_format keyword passed to netCDF4 routine. Expects one of ‘NETCDF3_CLASSIC’, ‘NETCDF3_64BIT’, ‘NETCDF4_CLASSIC’, or ‘NETCDF4’. (default=’NETCDF4’)

  • epoch_name (str or NoneType) – Data key for epoch variable. The epoch variable is expected to be an array of integer or float values denoting time elapsed from an origin specified by epoch_origin with units specified by epoch_unit. This epoch variable will be converted to a DatetimeIndex for consistency across pysat instruments. If None, then epoch_name set by the load_netcdf_pandas or load_netcdf_xarray as appropriate. (default=None)

  • epoch_unit (str) – The pandas-defined unit of the epoch variable (‘D’, ‘s’, ‘ms’, ‘us’, ‘ns’). (default=’ms’)

  • epoch_origin (str or timestamp-convertable) – Origin of epoch calculation, following convention for pandas.to_datetime. Accepts timestamp-convertable objects, as well as two specific strings for commonly used calendars. These conversions are handled by pandas.to_datetime. If ‘unix’ (or POSIX) time; origin is set to 1970-01-01. If ‘julian’, epoch_unit must be ‘D’, and origin is set to beginning of Julian Calendar. Julian day number 0 is assigned to the day starting at noon on January 1, 4713 BC. (default=’unix’)

  • pandas_format (bool) – Flag specifying if data is stored in a pandas DataFrame (True) or xarray Dataset (False). (default=False)

  • decode_timedelta (bool) – Used for xarray data (pandas_format is False). If True, variables with unit attributes that are ‘timelike’ (‘hours’, ‘minutes’, etc) are converted to np.timedelta64. (default=False)

  • combine_by_coords (bool) – Used for xarray data (pandas_format is False) when loading a multi-file dataset. If True, uses xarray.combine_by_coords. If False, uses xarray.combine_nested. (default=True)

  • meta_kwargs (dict or NoneType) – Dict to specify custom Meta initialization or None to use Meta defaults (default=None)

  • meta_processor (function or NoneType) – If not None, a dict containing all of the loaded metadata will be passed to meta_processor which should return a filtered version of the input dict. The returned dict is loaded into a pysat.Meta instance and returned as meta. (default=None)

  • meta_translation (dict or NoneType) – Translation table used to map metadata labels in the file to those used by the returned meta. Keys are labels from file and values are labels in meta. Redundant file labels may be mapped to a single pysat label. If None, will use default_from_netcdf_translation_table. This feature is maintained for file compatibility. To disable all translation, input an empty dict. (default=None)

  • drop_meta_labels (list or NoneType) – List of variable metadata labels that should be dropped. Applied to metadata as loaded from the file. (default=None)

  • decode_times (bool or NoneType) – If True, variables with unit attributes that are ‘timelike’ (‘hours’, ‘minutes’, etc) are converted to np.timedelta64 by xarray. If False, then epoch_name will be converted to datetime using epoch_unit and epoch_origin. If None, will be set to False for backwards compatibility. For xarray only. (default=None)

  • strict_dim_check (bool) – Used for xarray data (pandas_format is False). If True, warn the user that the desired epoch is not present in xarray.dims. If False, no warning is raised. (default=True)

Returns:

  • data (pandas.DataFrame or xarray.Dataset) – Class holding file data

  • meta (pysat.Meta) – Class holding file meta data

Raises:
  • KeyError – If epoch/time dimension could not be identified.

  • ValueError – When attempting to load data with more than 2 dimensions or if strict_meta is True and meta data changes across files.

See also

load_netcdf_pandas, load_netcdf_xarray, pandas.to_datetime

pysat.utils.io.load_netcdf_pandas(fnames, strict_meta=False, file_format='NETCDF4', epoch_name='Epoch', epoch_unit='ms', epoch_origin='unix', meta_kwargs=None, meta_processor=None, meta_translation=None, drop_meta_labels=None)

Load netCDF-3/4 file produced by pysat in a pandas format.

Parameters:
  • fnames (str or array_like) – Filename(s) to load

  • strict_meta (bool) – Flag that checks if metadata across fnames is the same if True (default=False)

  • file_format (str) – file_format keyword passed to netCDF4 routine. Expects one of ‘NETCDF3_CLASSIC’, ‘NETCDF3_64BIT’, ‘NETCDF4_CLASSIC’, or ‘NETCDF4’. (default=’NETCDF4’)

  • epoch_name (str or NoneType) – Data key for epoch variable. The epoch variable is expected to be an array of integer or float values denoting time elapsed from an origin specified by epoch_origin with units specified by epoch_unit. This epoch variable will be converted to a DatetimeIndex for consistency across pysat instruments. (default=’Epoch’)

  • epoch_unit (str) – The pandas-defined unit of the epoch variable (‘D’, ‘s’, ‘ms’, ‘us’, ‘ns’). (default=’ms’)

  • epoch_origin (str or timestamp-convertable) – Origin of epoch calculation, following convention for pandas.to_datetime. Accepts timestamp-convertable objects, as well as two specific strings for commonly used calendars. These conversions are handled by pandas.to_datetime. If ‘unix’ (or POSIX) time; origin is set to 1970-01-01. If ‘julian’, epoch_unit must be ‘D’, and origin is set to beginning of Julian Calendar. Julian day number 0 is assigned to the day starting at noon on January 1, 4713 BC. (default=’unix’)

  • meta_kwargs (dict or NoneType) – Dict to specify custom Meta initialization or None to use Meta defaults (default=None)

  • meta_processor (function or NoneType) – If not None, a dict containing all of the loaded metadata will be passed to meta_processor which should return a filtered version of the input dict. The returned dict is loaded into a pysat.Meta instance and returned as meta. (default=None)

  • meta_translation (dict or NoneType) – Translation table used to map metadata labels in the file to those used by the returned meta. Keys are labels from file and values are labels in meta. Redundant file labels may be mapped to a single pysat label. If None, will use default_from_netcdf_translation_table. This feature is maintained for file compatibility. To disable all translation, input an empty dict. (default=None)

  • drop_meta_labels (list or NoneType) – List of variable metadata labels that should be dropped. Applied to metadata as loaded from the file. (default=None)

Returns:

  • data (pandas.DataFrame) – Class holding file data

  • meta (pysat.Meta) – Class holding file meta data

Raises:
  • KeyError – If epoch/time dimension could not be identified.

  • ValueError – When attempting to load data with more than 2 dimensions, or if strict_meta is True and meta data changes across files, or if epoch/time dimension could not be identified.

See also

load_netcdf

pysat.utils.io.load_netcdf_xarray(fnames, strict_meta=False, file_format='NETCDF4', epoch_name='time', epoch_unit='ms', epoch_origin='unix', decode_timedelta=False, combine_by_coords=True, meta_kwargs=None, meta_processor=None, meta_translation=None, drop_meta_labels=None, decode_times=False, strict_dim_check=True)

Load netCDF-3/4 file produced by pysat into an xarray Dataset.

Parameters:
  • fnames (str or array_like) – Filename(s) to load.

  • strict_meta (bool) – Flag that checks if metadata across fnames is the same if True. (default=False)

  • file_format (str or NoneType) – file_format keyword passed to netCDF4 routine. Expects one of ‘NETCDF3_CLASSIC’, ‘NETCDF3_64BIT’, ‘NETCDF4_CLASSIC’, or ‘NETCDF4’. (default=’NETCDF4’)

  • epoch_name (str or NoneType) – Data key for epoch variable. The epoch variable is expected to be an array of integer or float values denoting time elapsed from an origin specified by epoch_origin with units specified by epoch_unit. This epoch variable will be converted to a DatetimeIndex for consistency across pysat instruments. (default=’time’)

  • epoch_unit (str) – The pandas-defined unit of the epoch variable (‘D’, ‘s’, ‘ms’, ‘us’, ‘ns’). (default=’ms’)

  • epoch_origin (str or timestamp-convertable) – Origin of epoch calculation, following convention for pandas.to_datetime. Accepts timestamp-convertable objects, as well as two specific strings for commonly used calendars. These conversions are handled by pandas.to_datetime. If ‘unix’ (or POSIX) time; origin is set to 1970-01-01. If ‘julian’, epoch_unit must be ‘D’, and origin is set to beginning of Julian Calendar. Julian day number 0 is assigned to the day starting at noon on January 1, 4713 BC. (default=’unix’)

  • decode_timedelta (bool) – If True, variables with unit attributes that are ‘timelike’ (‘hours’, ‘minutes’, etc) are converted to np.timedelta64. (default=False)

  • combine_by_coords (bool) – Used for xarray data (pandas_format is False) when loading a multi-file dataset. If True, uses xarray.combine_by_coords. If False, uses xarray.combine_nested. (default=True)

  • meta_kwargs (dict or NoneType) – Dict to specify custom Meta initialization or None to use Meta defaults (default=None)

  • meta_processor (function or NoneType) – If not None, a dict containing all of the loaded metadata will be passed to meta_processor which should return a filtered version of the input dict. The returned dict is loaded into a pysat.Meta instance and returned as meta. (default=None)

  • meta_translation (dict or NoneType) – Translation table used to map metadata labels in the file to those used by the returned meta. Keys are labels from file and values are labels in meta. Redundant file labels may be mapped to a single pysat label. If None, will use default_from_netcdf_translation_table. This feature is maintained for compatibility. To disable all translation, input an empty dict. (default=None)

  • drop_meta_labels (list or NoneType) – List of variable metadata labels that should be dropped. Applied to metadata as loaded from the file. (default=None)

  • decode_times (bool or NoneType) – If True, variables with unit attributes that are ‘timelike’ (‘hours’, ‘minutes’, etc) are converted to np.timedelta64 by xarray. If False, then epoch_name will be converted to datetime using epoch_unit and epoch_origin. If None, will be set to False for backwards compatibility. (default=None)

  • strict_dim_check (bool) – Used for xarray data (pandas_format is False). If True, warn the user that the desired epoch is not present in xarray.dims. If False, no warning is raised. (default=True)

Returns:

  • data (xarray.Dataset) – Class holding file data

  • meta (pysat.Meta) – Class holding file meta data

See also

load_netcdf

pysat.utils.io.meta_array_expander(meta_dict)

Expand meta arrays by storing each element with new incremented label.

If meta_dict[variable][‘label’] = [ item1, item2, …, itemn] then the returned dict will contain: meta_dict[variable][‘label0’] = item1, meta_dict[variable][‘label1’] = item2, and so on up to meta_dict[variable][‘labeln-1’] = itemn.

Parameters:

meta_dict (dict) – Keyed by variable name with a dict as a value. Each variable dict is keyed by metadata name and the value is the metadata.

Returns:

meta_dict – Input dict with expanded array elements.

Return type:

dict

Note

pysat.Meta can not take array-like or list-like data.

pysat.utils.io.pysat_meta_to_xarray_attr(xr_data, pysat_meta, epoch_name)

Attach pysat metadata to xarray Dataset as attributes.

Parameters:
  • xr_data (xarray.Dataset) – Xarray Dataset whose attributes will be updated.

  • pysat_meta (dict) – Output starting from Instrument.meta.to_dict() supplying attribute data.

  • epoch_name (str) – Label for datetime index information.

pysat.utils.io.remove_netcdf4_standards_from_meta(mdict, epoch_name, labels)

Remove metadata from loaded file using SPDF ISTP/IACG NetCDF standards.

Parameters:
  • mdict (dict) – Contains all of the loaded file’s metadata.

  • epoch_name (str) – Name for epoch or time-index variable. Use ‘’ if no epoch variable.

  • labels (Meta.labels) – Meta.labels instance.

Returns:

mdict – File metadata with unnecessary netCDF4 SPDF information removed.

Return type:

dict

See also

add_netcdf4_standards_to_metadict

Adds SPDF ISTP/IACG netCDF4 metadata.

Note

Removes metadata for epoch_name. Also removes metadata such as ‘Depend_*’, ‘Display_Type’, ‘Var_Type’, ‘Format’, ‘Time_Scale’, ‘MonoTon’, ‘calendar’, and ‘Time_Base’.

pysat.utils.io.return_epoch_metadata(inst, epoch_name)

Create epoch or time-index metadata.

Parameters:
  • inst (pysat.Instrument) – Instrument object with data and metadata.

  • epoch_name (str) – Data key for time-index or epoch data.

Returns:

meta_dict – Dictionary with epoch metadata, keyed by metadata label.

Return type:

dict

pysat.utils.io.xarray_all_vars(data)

Extract all variable names, including dimensions and coordinates.

Parameters:

data (xarray.Dataset) – Dataset to get all variables from.

Returns:

all_vars – List of all data.data_vars, data.dims, and data.coords.

Return type:

list

pysat.utils.io.xarray_vars_no_time(data, time_label='time')

Extract all DataSet variables except time_label dimension.

Parameters:
  • data (xarray.Dataset) – Dataset to get variables from.

  • time_label (str) – Label used within data for time information.

Returns:

vars – All variables, dimensions, and coordinates, except for time_label.

Return type:

list

Raises:

ValueError – If time_label not present in data.

Files

Utilities for file management and parsing file names.

pysat.utils.files.check_and_make_path(path, expand_path=False)

Check if path exists and create it if needed.

Parameters:
  • path (str) – String specifying a directory path without any file names. All directories needed to create the full path will be created.

  • expand_path (bool) – If True, input path will be processed through os.path.expanduser (accounting for ~ and ~user constructs, if $HOME and user are known) and os.path.expandvars (accounting for environment variables)

Returns:

made_dir – True, if new directory made. False, if path already existed.

Return type:

bool

Raises:
  • ValueError – If an invalid path is supplied.

  • RuntimeError – If the input path and internally constructed paths differ.

See also

os.path.expanduser, os.path.expandvars

pysat.utils.files.construct_searchstring_from_format(format_str, wildcard=False)

Parse format file string and returns string formatted for searching.

Each variable in the string template is replaced with an appropriate number of ‘?’ based upon the provided length of the data.

Parameters:
  • format_str (str) – Provides the naming pattern of the instrument files and the locations of date information so an ordered list may be produced. For example, instrument_{year:04d}{month:02d}{day:02d}_v{version:02d}.cdf

  • wildcard (bool) – If True, replaces each ‘?’ sequence that would normally be returned with a single ‘*’. (default=False)

Returns:

out_dict – An output dict with the following keys: - ‘search_string’ (format_str with data to be parsed replaced with ?) - ‘keys’ (keys for data to be parsed) - ‘type’ (type of data expected for each key to be parsed) - ‘lengths’ (string length for data to be parsed) - ‘string_blocks’ (the filenames are broken into fixed width segments).

Return type:

dict

Raises:

ValueError – If a filename template isn’t provided in format_str

Note

The ‘?’ may be used to indicate a set number of spaces for a variable part of the name that need not be extracted. cnofs_cindi_ivm_500ms_{year:4d}{month:02d}{day:02d}_v??.cdf

A standards compliant filename can be constructed by adding the first element from string_blocks, then the first item in keys, and iterating that alternating pattern until all items are used.

This is the first function employed by pysat.Files.from_os.

If no type is supplied for datetime parameters, int will be used.

pysat.utils.files.get_file_information(paths, root_dir='')

Retrieve system statistics for the input path(s).

Parameters:
  • paths (str or list) – Full pathnames of files to get attribute information.

  • root_dir (str) – Common root path shared by all paths, if any. (default=’’)

Returns:

file_info – Keyed by file attribute, which uses names that mirror or are expanded upon those used by os.stat. Each attribute maps to a list of values for each file in paths.

Return type:

dict

See also

os.stat

pysat.utils.files.parse_delimited_filenames(files, format_str, delimiter)

Extract specified info from a list of files using a delimiter.

Will parse file using delimiter though the function does not require every parsed item to be a variable, and more than one variable may be within a parsed section. Thus, the main practical difference with parse_fixed_width_filenames is more support for the use of the wildcard ‘*’ within format_str. Overuse of the ‘*’ wildcard increases the probability of false positive matches if there are multiple instrument files in the directory.

Parameters:
  • files (list) – List of files, typically provided by pysat.utils.files.search_local_system_formatted_filename.

  • format_str (str) – Provides the naming pattern of the instrument files and the locations of date information so an ordered list may be produced. Supports all provided string formatting codes though only ‘year’, ‘month’, ‘day’, ‘hour’, ‘minute’, ‘second’, ‘version’, ‘revision’, and ‘cycle’ will be used for time and sorting information. For example, *_{year:4d}_{month:02d}_{day:02d}_*_v{version:02d}_*.cdf

  • delimiter (str) – Delimiter string upon which files will be split (e.g., ‘.’)

Returns:

stored – Information parsed from filenames that includes: ‘year’, ‘month’, ‘day’, ‘hour’, ‘minute’, ‘second’, ‘version’, ‘revision’, and ‘cycle’, as well as any other user provided template variables. Also includes files, an input list of files, and format_str.

Return type:

collections.OrderedDict

Note

The ‘*’ wildcard is supported when leading, trailing, or wholly contained between delimiters, such as ‘data_name-{year:04d}--{day:02d}.txt’, or ‘-{year:04d}*--{day:02d}’, where ‘-’ is the delimiter. There can not be a mixture of a template variable and ‘*’ without a delimiter in between, unless the ‘*’ occurs after the variables. The ‘*’ should not be used to replace the delimited character in the filename.

pysat.utils.files.parse_fixed_width_filenames(files, format_str)

Extract specified info from a list of files with a fixed name width.

Parameters:
  • files (list) – List of files, typically provided by pysat.utils.files.search_local_system_formatted_filename.

  • format_str (str) – Provides the naming pattern of the instrument files and the locations of date information so an ordered list may be produced. Supports all provided string formatting codes though only ‘year’, ‘month’, ‘day’, ‘hour’, ‘minute’, ‘second’, ‘version’, ‘revision’, and ‘cycle’ will be used for time and sorting information. For example, instrument-{year:4d}_{month:02d}-{day:02d}_v{version:02d}.cdf, or *-{year:4d}_{month:02d}hithere{day:02d}_v{version:02d}.cdf

Returns:

stored – Information parsed from filenames that includes: ‘year’, ‘month’, ‘day’, ‘hour’, ‘minute’, ‘second’, ‘version’, ‘revision’, and ‘cycle’, as well as any other user provided template variables. Also includes files, an input list of files, and format_str.

Return type:

collections.OrderedDict

Note

The function uses the lengths of the fixed characters within format_str, as well as the supplied lengths for template variables, to determine where to parse out information. Thus, support for the wildcard ‘*’ is limited to locations before the first template variable.

pysat.utils.files.process_parsed_filenames(stored, two_digit_year_break=None)

Create a Files pandas Series of filenames from a formatted dict.

Parameters:
  • stored (collections.orderedDict) – Ordered dictionary produced by parse_fixed_width_filenames or parse_delimited_filenames, containing date, time, version, and other information extracted from the filenames.

  • two_digit_year_break (int or NoneType) – If filenames only store two digits for the year, then ‘1900’ will be added for years >= two_digit_year_break and ‘2000’ will be added for years < two_digit_year_break. If None, then four-digit years are assumed. (default=None)

Returns:

Series, indexed by datetime, with file strings

Return type:

pds.Series

Note

If two files have the same date and time information in the filename then the file with the higher version/revision/cycle is used. Series returned only has one file per datetime. Version is required for this filtering, revision and cycle are optional.

pysat.utils.files.search_local_system_formatted_filename(data_path, search_str)

Parse format file string and returns string formatted for searching.

Parameters:
  • data_path (str) – Top level directory to search files for. This directory is provided by pysat to the instrument_module.list_files functions as data_path.

  • search_str (str) – String used to search for local files. For example, cnofs_cindi_ivm_500ms_????????_v??.cdf or inst-name-*-v??.cdf Typically this input is provided by files.construct_searchstring_from_format.

Returns:

files – list of files matching the specified file format

Return type:

list

Note

The use of ?s (1 ? per character) rather than the full wildcard * provides a more specific filename search string that limits the false positive rate.

pysat.utils.files.update_data_directory_structure(new_template, test_run=True, full_breakdown=False, remove_empty_dirs=False)

Update pysat data directory structure to match supplied template.

Translates all of pysat’s managed science files to a new directory structure. By default, pysat uses the template string stored in pysat.params[‘directory_format’] to organize files. This method makes it possible to transition an existing pysat installation so it works with the supplied new template.

Parameters:
  • new_template (str) –

    New directory template string. The default value for pysat is

    os.path.join((‘{platform}’, ‘{name}’, ‘{tag}’, ‘{inst_id}’))

  • test_run (bool) – If True, a printout of all proposed changes will be made, but the directory changes will not be enacted. (default=True)

  • full_breakdown (bool) – If True, a full path for every file is printed to terminal. (default=False)

  • remove_empty_dirs (bool) – If True, all directories that had pysat.Instrument data moved to another location and are now empty are deleted. Traverses the directory chain up to the top-level directories in pysat.params[‘data_dirs’]. (default=False)

Note

After updating the data directory structures users should nominally assign new_template as the directory format via

pysat.params['directory_format'] = new_template

Registry

pysat user module registry utilities.

This module allows pysat to provide direct access to external or custom instrument modules by maintaining information about these instrument modules.

Examples

Instrument support modules must be registered before use. This may be done individually or for a collection of Instruments at once. For example, assume there is an implementation for myInstrument in the module my.package.myInstrument with platform and name attributes ‘myplatform’ and ‘myname’. Such an instrument may be registered with

registry.register(['my.package.myInstrument'])

The full module name “my.package.myInstrument” will be registered in pysat.params[‘user_modules’] and stored as a dict of dicts keyed by platform and name.

Once registered, subsequent calls to Instrument may use the platform and name string identifiers.

Instrument('myplatform', 'myname')

A full suite of instrument support modules may be registered at once using

# General form where my.package contains a collection of
# submodules to support Instrument data sets.
registry.register_by_module(my.package)

# Register published packages from pysat team
import pysatSpaceWeather
registry.register_by_module(pysatSpaceWeather.instruments)

import pysatNASA
registry.register_by_module(pysatNASA.instruments)

import pysatModels
registry.register_by_module(pysatModels.models)
pysat.utils.registry.load_saved_modules()

Load registered pysat.Instrument modules.

Returns:

instrument module strings are keyed by platform then name

Return type:

dict of dicts

pysat.utils.registry.register(module_names, overwrite=False)

Register a user pysat.Instrument module by name.

Enables instantiation of a third-party Instrument module using

inst = pysat.Instrument(platform, name, tag=tag, inst_id=inst_id)
Parameters:
  • module_names (list-like of str) – specify package name and instrument modules

  • overwrite (bool) – If True, an existing registration will be updated with the new module information. (default=False)

Raises:

ValueError – If a new module is input with a platform and name that is already associated with a registered module and the overwrite flag is set to False.

Warning

Registering a module that contains code other than pysat instrument files could result in unexpected consequences.

Note

Modules should be importable using

from my.package.name import my_instrument

Module names do not have to follow the pysat platform_name naming convection.

Current registered modules bay be found at

pysat.params['user_modules']

which is stored as a dict of dicts keyed by platform and name.

Examples

from pysat import Instrument
from pysat.utils import registry

registry.register(['my.package.name.myInstrument'])

testInst = Instrument(platform, name)
pysat.utils.registry.register_by_module(module, overwrite=False)

Register all sub-modules attached to input module.

Enables instantiation of a third-party Instrument module using

inst = pysat.Instrument(platform, name)
Parameters:
  • module (Python module) – Module with one or more pysat.Instrument support modules attached as sub-modules to the input module

  • overwrite (bool) – If True, an existing registration will be updated with the new module information. (default=False)

Raises:

ValueError – If platform and name associated with a module are already registered

Note

Gets a list of sub-modules by using the __all__ attribute, defined in the module’s __init__.py

Examples

import pysat
import pysatModels
pysat.utils.registry.register_by_module(pysatModels.models)
pysat.utils.registry.remove(platforms, names)

Remove module from registered user modules.

Parameters:
  • platforms (list-like of str) – Platform identifiers to remove

  • names (list-like of str) – Name identifiers, paired with platforms, to remove. If the names element paired with the platform element is None, then all instruments under the specified platform will be removed. Should be the same type as platforms.

Raises:

ValueError – If platform and/or name are not currently registered

Note

Current registered user modules available at pysat.params[‘user_modules’]

Examples

platforms = ['platform1', 'platform2']
names = ['name1', 'name2']

# remove all instruments with platform=='platform1'
registry.remove(['platform1'], [None])
# remove all instruments with platform 'platform1' or 'platform2'
registry.remove(platforms, [None, None])
# remove all instruments with 'platform1', and individual instrument
# for 'platform2', 'name2'
registry.remove(platforms, [None, 'name2']
# remove 'platform1', 'name1', as well as 'platform2', 'name2'
registry.remove(platforms, names)
pysat.utils.registry.store()

Store current registry onto disk.

Time

Date and time handling utilities.

pysat.utils.time.calc_freq(index)

Determine the frequency for a time index.

Parameters:

index (array-like) – Datetime list, array, or Index

Returns:

freq – Frequency string as described in Pandas Offset Aliases

Return type:

str

Note

Calculates the minimum time difference and sets that as the frequency.

To reduce the amount of calculations done, the returned frequency is either in seconds (if no sub-second resolution is found) or nanoseconds.

See also

pds.offsets.DateOffset

pysat.utils.time.calc_res(index, use_mean=False)

Determine the resolution for a time index.

Parameters:
  • index (array-like) – Datetime list, array, or Index

  • use_mean (bool) – Use the minimum time difference if False, use the mean time difference if True (default=False)

Returns:

res_sec – Resolution value in seconds

Return type:

float

Raises:

ValueError – If index is too short to calculate a time resolution

pysat.utils.time.create_date_range(start, stop, freq='D')

Create array of datetime objects using input freq from start to stop.

Parameters:
  • start (dt.datetime or list-like of dt.datetime) – The beginning of the date range. Supports list, tuple, or ndarray of start dates.

  • stop (dt.datetime or list-like of dt.datetime) – The end of the date range. Supports list, tuple, or ndarray of stop dates.

  • freq (str) – The frequency of the desired output. Codes correspond to pandas date_range codes: ‘D’ daily, ‘M’ monthly, ‘S’ secondly

Returns:

season – Range of dates over desired time with desired frequency.

Return type:

pds.date_range

pysat.utils.time.create_datetime_index(year=None, month=None, day=None, uts=None)

Create a timeseries index using supplied date and time.

Parameters:
  • year (array_like or NoneType) – Array of year values as np.int (default=None)

  • month (array_like or NoneType) – Array of month values as np.int. Leave None if using day for day of year. (default=None)

  • day (array_like or NoneType) – Array of number of days as np.int. If month=None then value interpreted as day of year, otherwise, day of month. (default=None)

  • uts (array-like or NoneType) – Array of UT seconds as np.float64 values (default=None)

Return type:

Pandas timeseries index.

Note

Leap seconds have no meaning here.

pysat.utils.time.datetime_to_dec_year(dtime)

Convert datetime timestamp to a decimal year.

Parameters:

dtime (dt.datetime) – Datetime timestamp

Returns:

year – Year with decimal containing time increments of less than a year

Return type:

float

pysat.utils.time.filter_datetime_input(date)

Create a datetime object that only includes year, month, and day.

Parameters:

date (NoneType, array-like, or datetime) – Single or sequence of datetime inputs

Returns:

out_date – NoneType input yeilds NoneType output, array-like yeilds list of datetimes, datetime object yeilds like. All datetime output excludes the sub-daily temporal increments (keeps only date information).

Return type:

NoneType, datetime, or array-like

Note

Checks for timezone information not in UTC

pysat.utils.time.freq_to_res(freq)

Convert a frequency string to a resolution value in seconds.

Parameters:

freq (str) – Frequency string as described in Pandas Offset Aliases

Returns:

res_sec – Resolution value in seconds

Return type:

np.float64

See also

pds.offsets.DateOffset

pysat.utils.time.getyrdoy(date)

Return a tuple of year, day of year for a supplied datetime object.

Parameters:

date (datetime.datetime) – Datetime object

Returns:

  • year (int) – Integer year

  • doy (int) – Integer day of year

Raises:

AttributeError – If input date does not have toordinal method

pysat.utils.time.parse_date(str_yr, str_mo, str_day, str_hr='0', str_min='0', str_sec='0', century=2000)

Convert string dates to dt.datetime.

Parameters:
  • str_yr (str) – String containing the year (2 or 4 digits)

  • str_mo (str) – String containing month digits

  • str_day (str) – String containing day of month digits

  • str_hr (str) – String containing the hour of day (default=’0’)

  • str_min (str) – String containing the minutes of hour (default=’0’)

  • str_sec (str) – String containing the seconds of minute (default=’0’)

  • century (int) – Century, only used if str_yr is a 2-digit year (default=2000)

Returns:

out_date – datetime object

Return type:

dt.datetime

Raises:

ValueError – If any input results in an unrealistic datetime object value

pysat.utils.time.today()

Obtain today’s date (UTC), with no hour, minute, second, etc.

Returns:

today_utc – Today’s date in UTC

Return type:

datetime

Testing

Utilities to perform common evaluations.

pysat.utils.testing.assert_hasattr(obj, attr_name)

Provide useful info if object is missing a required attribute.

Parameters:
  • obj (object) – Name of object to check

  • attr_name (str) – Name of required attribute that must be present in obj

Raises:

AssertionError – If obj does not have attribute attr_name

pysat.utils.testing.assert_isinstance(obj, obj_type)

Provide useful info if object is the wrong type.

Parameters:
  • obj (object) – Name of object to check

  • obj_type (str) – Required type of object

Raises:

AssertionError – If obj is not type obj_type

pysat.utils.testing.assert_list_contains(small_list, big_list, test_nan=False, test_case=True)

Assert all elements of one list exist within the other list.

Parameters:
  • small_list (list) – List whose values must all be present within big_list

  • big_list (list) – List that must contain all the values in small_list

  • test_nan (bool) – Test the lists for the presence of NaN values

  • test_case (bool) – Requires strings to be the same case when testing

Raises:

AssertionError – If a small_list value is missing from big_list

pysat.utils.testing.assert_lists_equal(list1, list2, test_nan=False, test_case=True)

Assert that the lists contain the same elements.

Parameters:
  • list1 (list) – Input list one

  • list2 (list) – Input list two

  • test_nan (bool) – Test the lists for the presence of NaN values

  • test_case (bool) – Requires strings to be the same case when testing

Raises:

AssertionError – If a list1 value is missing from list2 or list lengths are unequal

Note

This test does not require that the lists have the same elements in the same order, and so is also a good test for keys.

pysat.utils.testing.eval_bad_input(func, error, err_msg, input_args=None, input_kwargs=None)

Evaluate bad function or method input.

Parameters:
  • func (function, method, or class) – Function, class, or method to be evaluated

  • error (class) – Expected error or exception

  • err_msg (str) – Expected error message

  • input_args (list or NoneType) – Input arguments or None for no input arguments (default=None)

  • input_kwargs (dict or NoneType) – Input keyword arguments or None for no input kwargs (default=None)

Raises:
  • AssertionError – If unexpected error message is returned

  • Exception – If error or exception of unexpected type is returned, it is raised

pysat.utils.testing.eval_warnings(warns, check_msgs, warn_type=<class 'DeprecationWarning'>)

Evaluate warnings by category and message.

Parameters:
  • warns (list) – List of warnings.WarningMessage objects

  • check_msgs (list) – List of strings containing the expected warning messages

  • warn_type (type) – Type or list-like for the warning messages (default=DeprecationWarning)

Raises:

AssertionError – If warning category doesn’t match type or an expected message is missing

pysat.utils.testing.nan_equal(value1, value2)

Determine if values are equal or are both NaN.

Parameters:
  • value1 (scalar-like) – Value of any type that can be compared without iterating

  • value2 (scalar-like) – Another value of any type that can be compared without iterating

Returns:

is_equal – True if both values are equal or NaN, False if they are not

Return type:

bool

Instrument Template

Template for a pysat.Instrument support file.

Modify this file as needed when adding a new Instrument to pysat.

This is a good area to introduce the instrument, provide background on the mission, operations, instrumentation, and measurements.

Also a good place to provide contact information. This text will be included in the pysat API documentation.

Properties

platform

List platform string here

name

List name string here

tag

List supported tag strings here

inst_id

List supported inst_id strings here

Note

  • Optional section, remove if no notes

Warning

  • Optional section, remove if no warnings

  • Two blank lines needed afterward for proper formatting

Examples

Example code can go here
pysat.instruments.templates.template_instrument.clean(self)

Return platform_name data cleaned to the specified level.

Cleaning level is specified in inst.clean_level and pysat will accept user input for several strings. The clean_level is specified at instantiation of the Instrument object, though it may be updated to a more stringent level and re-applied after instantiation. The clean method is applied after default every time data is loaded.

Note

  • ‘clean’ All parameters are good, suitable for scientific studies

  • ‘dusty’ Most parameters are good, requires instrument familiarity

  • ‘dirty’ There are data areas that have issues, use with caution

  • ‘none’ No cleaning applied, routine not called in this case.

pysat.instruments.templates.template_instrument.download(date_array, tag, inst_id, data_path=None, user=None, password=None, **kwargs)

Download platform_name data from the remote repository.

This routine is called as needed by pysat. It is not intended for direct user interaction.

Parameters:
  • date_array (array-like) – list of datetimes to download data for. The sequence of dates need not be contiguous.

  • tag (str) – Tag identifier used for particular dataset. This input is provided by pysat. (default=’’)

  • inst_id (str) – Satellite ID string identifier used for particular dataset. This input is provided by pysat. (default=’’)

  • data_path (str or NoneType) – Path to directory to download data to. (default=None)

  • user (str or NoneType (OPTIONAL)) – User string input used for download. Provided by user and passed via pysat. If an account is required for dowloads this routine here must error if user not supplied. (default=None)

  • password (str or NoneType (OPTIONAL)) – Password for data download. (default=None)

  • custom_keywords (placeholder (OPTIONAL)) – Additional keywords supplied by user when invoking the download routine attached to a pysat.Instrument object are passed to this routine. Use of custom keywords here is discouraged.

pysat.instruments.templates.template_instrument.init(self)

Initialize the Instrument object with instrument specific values.

Runs once upon instantiation. Object modified in place. Use this to set the acknowledgements and references.

pysat.instruments.templates.template_instrument.list_files(tag='', inst_id='', data_path='', format_str=None)

Produce a list of files corresponding to PLATFORM/NAME.

This routine is invoked by pysat and is not intended for direct use by the end user. Arguments are provided by pysat.

Parameters:
  • tag (str) – Tag name used to identify particular data set to be loaded. This input is nominally provided by pysat itself. (default=’’)

  • inst_id (str) – Instrument ID used to identify particular data set to be loaded. This input is nominally provided by pysat itself. (default=’’)

  • data_path (str) – Full path to directory containing files to be loaded. This is provided by pysat. The user may specify their own data path at Instrument instantiation and it will appear here. (default=’’)

  • format_str (str) – String template used to parse the datasets filenames. If a user supplies a template string at Instrument instantiation then it will appear here, otherwise defaults to None. (default=None)

Returns:

Series of filename strings, including the path, indexed by datetime.

Return type:

pandas.Series

Examples

If a filename is SPORT_L2_IVM_2019-01-01_v01r0000.NC then the template
is 'SPORT_L2_IVM_{year:04d}-{month:02d}-{day:02d}_' +
'v{version:02d}r{revision:04d}.NC'

Note

The returned Series should not have any duplicate datetimes. If there are multiple versions of a file the most recent version should be kept and the rest discarded. This routine uses the pysat.Files.from_os constructor, thus the returned files are up to pysat specifications.

Multiple data levels may be supported via the ‘tag’ input string. Multiple instruments via the inst_id string.

pysat.instruments.templates.template_instrument.list_remote_files(tag, inst_id, user=None, password=None)

Return a Pandas Series of every file for chosen remote data.

This routine is intended to be used by pysat instrument modules supporting a particular NASA CDAWeb dataset.

Parameters:
  • tag (str) – Denotes type of file to load. Accepted types are <tag strings>.

  • inst_id (str) – Specifies the satellite or instrument ID. Accepted types are <inst_id strings>.

  • user (str or NoneType) – Username to be passed along to resource with relevant data. (default=None)

  • password (str or NoneType) – User password to be passed along to resource with relevant data. (default=None)

Note

If defined, the expected return variable is a pandas.Series formatted for the Files class (pysat._files.Files) containing filenames and indexed by date and time

pysat.instruments.templates.template_instrument.load(fnames, tag='', inst_id='', custom_keyword=None)

Load platform_name data and meta data.

This routine is called as needed by pysat. It is not intended for direct user interaction.

Parameters:
  • fnames (array-like) – iterable of filename strings, full path, to data files to be loaded. This input is nominally provided by pysat itself.

  • tag (str) – tag name used to identify particular data set to be loaded. This input is nominally provided by pysat itself. While tag defaults to None here, pysat provides ‘’ as the default tag unless specified by user at Instrument instantiation. (default=’’)

  • inst_id (str) – Satellite ID used to identify particular data set to be loaded. This input is nominally provided by pysat itself. (default=’’)

  • custom_keyword (type to be set) – Developers may include any custom keywords, with default values defined in the method signature. This is included here as a place holder and should be removed.

Returns:

  • data (pds.DataFrame or xr.Dataset) – Data to be assigned to the pysat.Instrument.data object.

  • mdata (pysat.Meta) – Pysat Meta data for each data variable.

Note

  • Any additional keyword arguments passed to pysat.Instrument upon instantiation or via load that are defined above will be passed along to this routine.

  • When using pysat.utils.load_netcdf4 for xarray data, pysat will use decode_timedelta=False to prevent automated conversion of data to np.timedelta64 objects if the units attribute is time-like (‘hours’, ‘minutes’, etc). This can be added as a custom keyword if timedelta conversion is desired.

Examples

inst = pysat.Instrument('ucar', 'tiegcm')
inst.load(2019, 1)
pysat.instruments.templates.template_instrument.preprocess(self)

Perform standard preprocessing.

This routine is automatically applied to the Instrument object on every load by the pysat nanokernel (first in queue). Object modified in place.

General Instruments

The following Instrument modules support I/O and analysis in pysat.

pysat_ndtesting

Produces fake instrument data for testing.

pysat.instruments.pysat_ndtesting.load(fnames, tag='', inst_id='', sim_multi_file_right=False, sim_multi_file_left=False, root_date=None, non_monotonic_index=False, non_unique_index=False, start_time=None, num_samples=864, sample_rate='100S', test_load_kwarg=None, max_latitude=90.0, num_extra_time_coords=0)

Load the test files.

Parameters:
  • fnames (list) – List of filenames.

  • tag (str) – Tag name used to identify particular data set to be loaded. This input is nominally provided by pysat itself. (default=’’)

  • inst_id (str) – Instrument ID used to identify particular data set to be loaded. This input is nominally provided by pysat itself. (default=’’)

  • sim_multi_file_right (bool) – Adjusts date range to be 12 hours in the future or twelve hours beyond root_date. (default=False)

  • sim_multi_file_left (bool) – Adjusts date range to be 12 hours in the past or twelve hours before root_date. (default=False)

  • root_date (NoneType) – Optional central date, uses _test_dates if not specified. (default=None)

  • non_monotonic_index (bool) – If True, time index will be non-monotonic (default=False)

  • non_unique_index (bool) – If True, time index will be non-unique (default=False)

  • start_time (dt.timedelta or NoneType) – Offset time of start time since midnight UT. If None, instrument data will begin at midnight. (default=None)

  • num_samples (int) – Maximum number of times to generate. Data points will not go beyond the current day. (default=864)

  • sample_rate (str) – Frequency of data points, using pandas conventions. (default=’100s’)

  • test_load_kwarg (any) – Keyword used for pysat unit testing to ensure that functionality for custom keywords defined in instrument support functions is working correctly. (default=None)

  • max_latitude (float) – Latitude simulated as max_latitude * cos(theta(t))`, where theta is a linear periodic signal bounded by [0, 2 * pi) (default=90.0)

  • num_extra_time_coords (int) – Number of extra time coordinates to include. (default=0)

Returns:

  • data (xr.Dataset) – Testing data

  • meta (pysat.Meta) – Testing metadata

pysat_netcdf

General Instrument for loading pysat-written netCDF files.

Properties

platform

‘pysat’, will be updated if file contains a platform attribute

name

‘netcdf’, will be updated if file contains a name attribute

tag

‘’, will be updated if file contains a tag attribute

inst_id

‘’, will be updated if file contains an inst_id attribute

Note

Only tested against pysat created netCDF files

Examples

import pysat

# Load a test Instrument
inst = pysat.Instrument("pysat", "testing")
inst.load(date=inst.inst_module._test_dates[''][''])

# Create a NetCDF file
fname = "test_pysat_file_%Y%j.nc"
inst.to_netcdf4(fname=inst.date.strftime(fname))

# Load the NetCDF file
file_inst = pysat.Instrument(
    "pysat", "netcdf", temporary_file_list=True, directory_format="./",
    file_format="test_pysat_file_{year:04}{day:03}.nc")
file_inst.load(date=inst.inst_module._test_dates[''][''])
pysat.instruments.pysat_netcdf.clean(self)

Clean the file data.

pysat.instruments.pysat_netcdf.download(date_array, tag, inst_id, data_path=None)

Download data from the remote repository; not supported.

Parameters:
  • date_array (array-like) – list of datetimes to download data for. The sequence of dates need not be contiguous.

  • tag (str) – Tag identifier used for particular dataset. This input is provided by pysat. (default=’’)

  • inst_id (str) – Satellite ID string identifier used for particular dataset. This input is provided by pysat. (default=’’)

  • data_path (str or NoneType) – Path to directory to download data to. (default=None)

pysat.instruments.pysat_netcdf.init(self, pandas_format=True)

Initialize the Instrument object with instrument specific values.

pysat.instruments.pysat_netcdf.load(fnames, tag='', inst_id='', strict_meta=False, file_format='NETCDF4', epoch_name=None, epoch_unit='ms', epoch_origin='unix', pandas_format=True, decode_timedelta=False, meta_kwargs=None, meta_processor=None, meta_translation=None, drop_meta_labels=None, decode_times=None)

Load pysat-created NetCDF data and meta data.

Parameters:
  • fnames (array-like) – iterable of filename strings, full path, to data files to be loaded. This input is nominally provided by pysat itself.

  • tag (str) – Tag name used to identify particular data set to be loaded. This input is nominally provided by pysat itself. (default=’’)

  • inst_id (str) – Instrument ID used to identify particular data set to be loaded. This input is nominally provided by pysat itself. (default=’’)

  • strict_meta (bool) – Flag that checks if metadata across fnames is the same if True (default=False)

  • file_format (str) – file_format keyword passed to netCDF4 routine. Expects one of ‘NETCDF3_CLASSIC’, ‘NETCDF3_64BIT’, ‘NETCDF4_CLASSIC’, or ‘NETCDF4’. (default=’NETCDF4’)

  • epoch_name (str or NoneType) – Data key for epoch variable. The epoch variable is expected to be an array of integer or float values denoting time elapsed from an origin specified by epoch_origin with units specified by epoch_unit. This epoch variable will be converted to a DatetimeIndex for consistency across pysat instruments. (default=None)

  • epoch_unit (str) – The pandas-defined unit of the epoch variable (‘D’, ‘s’, ‘ms’, ‘us’, ‘ns’). (default=’ms’)

  • epoch_origin (str or timestamp-convertable) – Origin of epoch calculation, following convention for pandas.to_datetime. Accepts timestamp-convertable objects, as well as two specific strings for commonly used calendars. These conversions are handled by pandas.to_datetime. If ‘unix’ (or POSIX) time; origin is set to 1970-01-01. If ‘julian’, epoch_unit must be ‘D’, and origin is set to beginning of Julian Calendar. Julian day number 0 is assigned to the day starting at noon on January 1, 4713 BC. (default=’unix’)

  • pandas_format (bool) – Flag specifying if data is stored in a pandas DataFrame (True) or xarray Dataset (False). (default=False)

  • decode_timedelta (bool) – Used for xarray data (pandas_format is False). If True, variables with unit attributes that are ‘timelike’ (‘hours’, ‘minutes’, etc) are converted to np.timedelta64. (default=False)

  • meta_kwargs (dict or NoneType) – Dict to specify custom Meta initialization or None to use Meta defaults (default=None)

  • meta_processor (function or NoneType) – If not None, a dict containing all of the loaded metadata will be passed to meta_processor which should return a filtered version of the input dict. The returned dict is loaded into a pysat.Meta instance and returned as meta. (default=None)

  • meta_translation (dict or NoneType) – Translation table used to map metadata labels in the file to those used by the returned meta. Keys are labels from file and values are labels in meta. Redundant file labels may be mapped to a single pysat label. If None, will use default_from_netcdf_translation_table. This feature is maintained for file compatibility. To disable all translation, input an empty dict. (default=None)

  • drop_meta_labels (list or NoneType) – List of variable metadata labels that should be dropped. Applied to metadata as loaded from the file. (default=None)

  • decode_times (bool or NoneType) – If True, variables with unit attributes that are ‘timelike’ (‘hours’, ‘minutes’, etc) are converted to np.timedelta64 by xarray. If False, then epoch_name will be converted to datetime using epoch_unit and epoch_origin. If None, will be set to False for backwards compatibility. For xarray only. (default=None)

Returns:

  • data (pds.DataFrame or xr.Dataset) – Data to be assigned to the pysat.Instrument.data object.

  • mdata (pysat.Meta) – Pysat Meta data for each data variable.

pysat.instruments.pysat_netcdf.preprocess(self)

Extract Instrument attrs from file attrs loaded to Meta.header.

Test Instruments

The following Instrument modules support unit and integration testing for packages that depend on pysat.

pysat_testing

Produces fake instrument data for testing.

pysat.instruments.pysat_testing.load(fnames, tag='', inst_id='', sim_multi_file_right=False, sim_multi_file_left=False, root_date=None, non_monotonic_index=False, non_unique_index=False, start_time=None, num_samples=86400, test_load_kwarg=None, max_latitude=90.0)

Load the test files.

Parameters:
  • fnames (list) – List of filenames.

  • tag (str) – Tag name used to identify particular data set to be loaded. This input is nominally provided by pysat itself. (default=’’)

  • inst_id (str) – Instrument ID used to identify particular data set to be loaded. This input is nominally provided by pysat itself. (default=’’)

  • sim_multi_file_right (bool) – Adjusts date range to be 12 hours in the future or twelve hours beyond root_date. (default=False)

  • sim_multi_file_left (bool) – Adjusts date range to be 12 hours in the past or twelve hours before root_date. (default=False)

  • root_date (NoneType) – Optional central date, uses _test_dates if not specified. (default=None)

  • non_monotonic_index (bool) – If True, time index will be non-monotonic (default=False)

  • non_unique_index (bool) – If True, time index will be non-unique (default=False)

  • start_time (dt.timedelta or NoneType) – Offset time of start time since midnight UT. If None, instrument data will begin at midnight. (default=None)

  • num_samples (int) – Maximum number of times to generate. Data points will not go beyond the current day. (default=86400)

  • test_load_kwarg (any) – Keyword used for pysat unit testing to ensure that functionality for custom keywords defined in instrument support functions is working correctly. (default=None)

  • max_latitude (float) – Latitude simulated as max_latitude * cos(theta(t))`, where theta is a linear periodic signal bounded by [0, 2 * pi) (default=90.).

Returns:

  • data (pds.DataFrame) – Testing data

  • meta (pysat.Meta) – Metadata

pysat_testmodel

Produces fake instrument data for testing.

pysat.instruments.pysat_testmodel.load(fnames, tag='', inst_id='', start_time=None, num_samples=96, test_load_kwarg=None)

Load the test files.

Parameters:
  • fnames (list) – List of filenames.

  • tag (str) – Tag name used to identify particular data set to be loaded. This input is nominally provided by pysat itself. (default=’’)

  • inst_id (str) – Instrument ID used to identify particular data set to be loaded. This input is nominally provided by pysat itself. (default=’’)

  • start_time (dt.timedelta or NoneType) – Offset time of start time since midnight UT. If None, instrument data will begin at midnight. (default=None)

  • num_samples (int) – Maximum number of times to generate. Data points will not go beyond the current day. (default=96)

  • test_load_kwarg (any) – Keyword used for pysat unit testing to ensure that functionality for custom keywords defined in instrument support functions is working correctly. (default=None)

Returns:

  • data (xr.Dataset) – Testing data

  • meta (pysat.Meta) – Metadata

Test Constellations

The following Constellation modules support unit and integration testing for packages that depend on pysat.

Testing

Create a constellation with 5 testing instruments.

pysat.constellations.testing.instruments

List of pysat.Instrument objects

Type:

list

Note

Each instrument has a different sample size to test the common_index

Single Test

Create a constellation with one testing instrument.

pysat.constellations.single_test.instruments

List of pysat.Instrument objects

Type:

list

Testing Empty

Create an empty constellation for testing.

pysat.constellations.testing_empty.instruments

List of pysat.Instrument objects

Type:

list