API
Instrument
- class pysat.Instrument(platform=None, name=None, tag='', inst_id='', clean_level=None, update_files=None, pad=None, orbit_info=None, inst_module=None, data_dir='', directory_format=None, file_format=None, temporary_file_list=False, strict_time_flag=True, ignore_empty_files=False, meta_kwargs=None, custom=None, **kwargs)
Download, load, manage, modify and analyze science data.
- Parameters:
platform (str or NoneType) – Name of instrument platform. If None and name is also None, creates an Instrument with empty platform and name attributes. (default=None)
name (str or NoneType) – Name of instrument. If None and platform is also None, creates an Instrument with empty platform and name attributes. (default=None)
tag (str) – Identifies particular subset of instrument data (default=’’)
inst_id (str) – Secondary level of identification, such as spacecraft within a constellation platform (default=’’)
clean_level (str or NoneType) – Level of data quality. If not provided, will default to the setting in pysat.params[‘clean_level’]. (default=None)
update_files (bool or NoneType) – If True, immediately query filesystem for instrument files and store. If False, the local files are presumed to be the same. By default, this setting will be obtained from pysat.params (default=None)
pad (pandas.DateOffset, dict, or NoneType) – Length of time to pad the begining and end of loaded data for time-series processing. Extra data is removed after applying all custom functions. Dictionary, if supplied, is simply passed to pandas DateOffset. (default=None)
orbit_info (dict or NoneType) – Orbit information, {‘index’: index, ‘kind’: kind, ‘period’: period}. See pysat.Orbits for more information. (default=None)
inst_module (module or NoneType) – Provide instrument module directly, takes precedence over platform/name. (default=None)
data_dir (str) – Directory without sub-directory variables that allows one to bypass the directories provided by pysat.params[‘data_dirs’]. Only applied if the directory exists. (default=’’)
directory_format (str, function, or NoneType) – Sub-directory naming structure, which is expected to exist or be created within one of the python.params[‘data_dirs’] directories. Variables such as platform, name, tag, and inst_id will be filled in as needed using python string formatting, if a string is supplied. The default directory structure, which is used if None is specified, is provided by pysat.params[‘directory_format’] and is typically ‘{platform}/{name}/{tag}/{inst_id}’. If a function is provided, it must take tag and inst_id as arguments and return an appropriate string. (default=None)
file_format (str or NoneType) – File naming structure in string format. Variables such as year, month, day, etc. will be filled in as needed using python string formatting. The default file format structure is supplied in the instrument list_files routine. See pysat.utils.files.parse_delimited_filenames and pysat.utils.files.parse_fixed_width_filenames for more information. The value will be None if not specified by the user at instantiation. (default=None)
temporary_file_list (bool) – If true, the list of Instrument files will not be written to disk (default=False)
strict_time_flag (bool) – If true, pysat will check data to ensure times are unique and monotonically increasing (default=True)
ignore_empty_files (bool) – Flag controling behavior for listing available files. If True, the list of files found will be checked to ensure the filesizes are greater than zero. Empty files are removed from the stored list of files. (default=False)
meta_kwargs (dict or NoneType) – Dict to specify custom Meta initialization (default=None)
custom (list or NoneType) – Input list containing dicts of inputs for custom_attach method inputs that may be applied or None (default=None)
- platform
- name
- tag
- inst_id
- clean_level
- pad
- orbit_info
- inst_module
- data_dir
- directory_format
- file_format
- temporary_file_list
- strict_time_flag
- bounds
Tuple of datetime objects or filenames indicating bounds for loading data, or a tuple of NoneType objects. Users may provide as a tuple or tuple of lists (useful for bounds with gaps). The attribute is always stored as a tuple of lists for consistency.
- Type:
tuple
- custom_functions
List of functions to be applied by instrument nano-kernel
- Type:
list
- custom_args
List of lists containing arguments to be passed to particular custom function
- Type:
list
- custom_kwargs
List of dictionaries with keywords and values to be passed to a custom function
- Type:
list
- data
Class object holding the loaded science data
- Type:
pandas.DataFrame or xarray.Dataset
- date
Date and time for loaded data, None if no data is loaded
- Type:
dt.datetime or NoneType
- doy
Day of year for loaded data, None if no data is loaded
- Type:
int or NoneType
- files
Class to hold and interact with the available instrument files
- Type:
- kwargs
Keyword arguments passed to the standard Instrument routines
- Type:
dict
- kwargs_supported
Stores all supported keywords for user edification
- Type:
dict
- kwargs_reserved
Keyword arguments for reserved method arguments
- Type:
dict
- load_step
The temporal increment for loading data, defaults to a timestep of one day
- Type:
dt.timedelta
- meta
Class holding the instrument metadata
- Type:
pysat.Meta
- meta_kwargs
Dict containing defaults for Meta data
- Type:
dict
- orbits
Interface to extracting data orbit-by-orbit
- Type:
- pandas_format
Flag indicating whether data is stored as a pandas.DataFrame (True) or an xarray.Dataset (False)
- Type:
bool
- today
Date and time for the current day in UT
- Type:
dt.datetime
- tomorrow
Date and time for tomorrow in UT
- Type:
dt.datetime
- variables
List of loaded data variables
- Type:
list
- yesterday
Date and time for yesterday in UT
- Type:
dt.datetime
- yr
Year for loaded data, None if no data is loaded
- Type:
int or NoneType
- Raises:
ValueError – If platform and name are mixture of None and str, an unknown or reserved keyword is used, or if file_format, custom, or pad are improperly formatted
Note
pysat attempts to load the module platform_name.py located in the pysat/instruments directory. This module provides the underlying functionality to download, load, and clean instrument data. Alternatively, the module may be supplied directly using keyword inst_module.
Examples
# 1-second mag field data vefi = pysat.Instrument(platform='cnofs', name='vefi', tag='dc_b') start = dt.datetime(2009, 1, 1) stop = dt.datetime(2009, 1, 2) vefi.download(start, stop) vefi.load(date=start) print(vefi['dB_mer']) print(vefi.meta['db_mer']) # 1-second thermal plasma parameters ivm = pysat.Instrument(platform='cnofs', name='ivm') ivm.download(start, stop) ivm.load(2009, 1) print(ivm['ionVelmeridional']) # Ionosphere profiles from GPS occultation. Enable binning profile # data using a constant step-size. Feature provided by the underlying # COSMIC support code. cosmic = pysat.Instrument('cosmic', 'gps', 'ionprf', altitude_bin=3) cosmic.download(start, stop, user=user, password=password) cosmic.load(date=start) # Nano-kernel functionality enables instrument objects that are # 'set and forget'. The functions are always run whenever # the instrument load routine is called so instrument objects may # be passed safely to other routines and the data will always # be processed appropriately. # Define custom function to modify Instrument in place. def custom_func(inst, opt_param1=False, opt_param2=False): # perform calculations and store in new_data inst['new_data'] = new_data return inst = pysat.Instrument('pysat', 'testing') inst.custom_attach(custom_func, kwargs={'opt_param1': True}) # Custom methods are applied to data when loaded. inst.load(date=date) print(inst['new_data2']) # Custom methods may also be attached at instantiation. # Create a dictionary for each custom method and associated inputs custom_func_1 = {'function': custom_func, 'kwargs': {'opt_param1': True}} custom_func_2 = {'function': custom_func, 'args'=[True, False]} custom_func_3 = {'function': custom_func, 'at_pos'=0, 'kwargs': {'opt_param2': True}} # Combine all dicts into a list in order of application and execution, # although this can be modified by specifying 'at_pos'. The actual # order these functions will run is: 3, 1, 2. custom = [custom_func_1, custom_func_2, custom_func_3] # Instantiate `pysat.Instrument` inst = pysat.Instrument(platform, name, inst_id=inst_id, tag=tag, custom=custom)
Initialize pysat.Instrument object.
- property bounds
Boundaries for iterating over instrument object by date or file.
- Parameters:
start (dt.datetime, str, or NoneType) – Start of iteration, disregarding any time of day information. If None uses first data date. List-like collection also accepted, allowing mutliple bound ranges. (default=None)
stop (dt.datetime, str, or None) – Stop of iteration, inclusive of the entire day regardless of time of day in the bounds. If None uses last data date. List-like collection also accepted, allowing mutliple bound ranges, though it must match start. (default=None)
step (str, int, or NoneType) – Step size used when iterating from start to stop. Use a Pandas frequency string (‘3D’, ‘1M’) or an integer (will assume a base frequency equal to the file frequency). If None, defaults to a single unit of file frequency (typically 1 day) (default=None).
width (pandas.DateOffset, int, or NoneType) – Data window used when loading data within iteration. If None, defaults to a single file frequency (typically 1 day) (default=None)
- Raises:
ValueError – If start and stop don’t have the same type, or if too many input argument supplied, or unequal number of elements in start/stop, or if bounds aren’t in increasing order, or if the input type for start or stop isn’t recognized
Note
Both start and stop must be the same type (date, or filename) or None. Only the year, month, and day are used for date inputs.
Examples
import datetime as dt import pandas as pds import pysat inst = pysat.Instrument(platform=platform, name=name, tag=tag) start = dt.datetime(2009, 1, 1) stop = dt.datetime(2009, 1, 31) # Defaults to stepping by a single day and a data loading window # of one day/file. inst.bounds = (start, stop) # Set bounds by file. Iterates a file at a time. inst.bounds = ('filename1', 'filename2') # Create a more complicated season, multiple start and stop dates. start2 = dt.datetetime(2010,1,1) stop2 = dt.datetime(2010,2,14) inst.bounds = ([start, start2], [stop, stop2]) # Iterate via a non-standard step size of two days inst.bounds = ([start, start2], [stop, stop2], '2D') # Load more than a single day/file at a time when iterating inst.bounds = ([start, start2], [stop, stop2], '2D', dt.timedelta(days=3))
- concat_data(new_data, prepend=False, include=None, **kwargs)
Concatonate data to self.data for xarray or pandas as needed.
- Parameters:
new_data (pandas.DataFrame, xarray.Dataset, or list of such objects) – New data objects to be concatonated
prepend (bool) – If True, assign new data before existing data; if False append new data (default=False)
include (int or NoneType) – Index at which self.data should be included in new_data or None to use prepend (default=None)
**kwargs (dict) – Optional keyword arguments passed to pds.concat or xr.concat
Note
For pandas, sort=False is passed along to the underlying pandas.concat method. If sort is supplied as a keyword, the user provided value is used instead. Recall that sort orders the data columns, not the data values or the index.
For xarray, dim=Instrument.index.name is passed along to xarray.concat except if the user includes a value for dim as a keyword argument.
Examples
# Concatonate data before and after the existing Instrument data inst.concat_data([prev_data, next_data], include=1)
- copy()
Create a deep copy of the entire Instrument object.
- Return type:
pysat.Instrument
- custom_apply_all()
Apply all of the custom functions to the satellite data object.
- Raises:
ValueError – Raised when function returns any value
Note
This method does not generally need to be invoked directly by users.
- custom_attach(function, at_pos='end', args=None, kwargs=None)
Attach a function to custom processing queue.
Custom functions are applied automatically whenever .load() command called.
- Parameters:
function (str or function object) – Name of function or function object to be added to queue
at_pos (str or int) – Accepts string ‘end’ or a number that will be used to determine the insertion order if multiple custom functions are attached to an Instrument object (default=’end’)
args (list, tuple, or NoneType) – Ordered arguments following the instrument object input that are required by the custom function (default=None)
kwargs (dict or NoneType) – Dictionary of keyword arguments required by the custom function (default=None)
Note
Functions applied using custom_attach may add, modify, or use the data within Instrument inside of the function, and so should not return anything.
- custom_clear()
Clear the custom function list.
- property date
Date for loaded data.
- download(start=None, stop=None, date_array=None, **kwargs)
Download data for given Instrument object from start to stop.
- Parameters:
start (pandas.datetime or NoneType) – Start date to download data, or yesterday if None is provided. (default=None)
stop (pandas.datetime or NoneType) – Stop date (inclusive) to download data, or tomorrow if None is provided (default=None)
date_array (list-like or NoneType) – Sequence of dates to download date for. Takes precedence over start and stop inputs (default=None)
**kwargs (dict) – Dictionary of keywords that may be options for specific instruments. The keyword arguments ‘user’ and ‘password’ are expected for remote databases requiring sign in or registration. ‘freq’ temporarily ingested through this input option.
- Raises:
ValueError – Raised if there is an issue creating self.files.data_path
Note
Data will be downloaded to self.files.data_path
If Instrument bounds are set to defaults they are updated after files are downloaded.
See also
pandas.DatetimeIndex
- download_updated_files(**kwargs)
Download new files after comparing available remote and local files.
- Parameters:
**kwargs (dict) – Dictionary of keywords that may be options for specific instruments
Note
Data will be downloaded to self.files.data_path
If Instrument bounds are set to defaults they are updated after files are downloaded.
If no remote file listing method is available, existing local files are assumed to be up-to-date and gaps are assumed to be missing files.
If start, stop, or date_array are provided, only files at/between these times are considered for updating. If no times are provided and a remote listing method is available, all new files will be downloaded. If no remote listing method is available, the current file limits are used as the starting and ending times.
- drop(names)
Drop variables from Instrument.
- Parameters:
names (str or list-like) – String or list of strings specifying the variables names to drop
- Raises:
KeyError – If all of the variable names provided in names are not found in the variable list. If a subset is missing, a logger warning is issued instead.
- property empty
Boolean flag reflecting lack of data, True if there is no data.
- property index
Time index of the loaded data.
- load(yr=None, doy=None, end_yr=None, end_doy=None, date=None, end_date=None, fname=None, stop_fname=None, verifyPad=False, **kwargs)
Load the instrument data and metadata.
- Parameters:
yr (int or NoneType) – Year for desired data. pysat will load all files with an associated date between yr, doy and yr, doy + 1. (default=None)
doy (int or NoneType) – Day of year for desired data. Must be present with yr input. (default=None)
end_yr (int or NoneType) – Used when loading a range of dates, from yr, doy to end_yr, end_doy based upon the dates associated with the Instrument’s files. Date range is inclusive for yr, doy but exclusive for end_yr, end_doy. (default=None)
end_doy (int or NoneType) – Used when loading a range of dates, from yr, doy to end_yr, end_doy based upon the dates associated with the Instrument’s files. Date range is inclusive for yr, doy but exclusive for end_yr, end_doy. (default=None)
date (dt.datetime or NoneType) – Date to load data. pysat will load all files with an associated date between date and date + 1 day. (default=None)
end_date (dt.datetime or NoneType) – Used when loading a range of data from date to end_date based upon the dates associated with the Instrument’s files. Date range is inclusive for date but exclusive for end_date. (default=None)
fname (str or NoneType) – Filename to be loaded (default=None)
stop_fname (str or NoneType) – Used when loading a range of filenames from fname to stop_fname, inclusive. (default=None)
verifyPad (bool) – If True, padding data not removed for debugging. Padding parameters are provided at Instrument instantiation. (default=False)
**kwargs (dict) – Dictionary of keywords that may be options for specific instruments.
- Raises:
TypeError – For incomplete or incorrect input
ValueError – For input incompatible with Instrument set-up
Note
Loads data for a chosen instrument into .data. Any functions chosen by the user and added to the custom processing queue (.custom.attach) are automatically applied to the data before it is available to user in .data.
A mixed combination of .load() keywords such as yr and date are not allowed.
end kwargs have exclusive ranges (stop before the condition is reached), while stop kwargs have inclusive ranges (stop once the condition is reached).
Examples
import datetime as dt import pysat inst = pysat.Instrument('pysat', 'testing') # Load a single day by year and day of year inst.load(2009, 1) # Load a single day by date date = dt.datetime(2009, 1, 1) inst.load(date=date) # Load a single file, first file in this example inst.load(fname=inst.files[0]) # Load a range of days, data between # Jan. 1st (inclusive) - Jan. 3rd (exclusive) inst.load(2009, 1, 2009, 3) # Load a range of days using datetimes date = dt.datetime(2009, 1, 1) end_date = dt.datetime(2009, 1, 3) inst.load(date=date, end_date=end_date) # Load several files by filename. Note the change in index due to # inclusive slicing on filenames! inst.load(fname=inst.files[0], stop_fname=inst.files[1])
- next(verifyPad=False)
Iterate forward through the data loaded in Instrument object.
Bounds of iteration and iteration type (day/file) are set by bounds attribute.
- Parameters:
verifyPad (bool) – Passed to self.load(). If True, then padded data within the load method will be retained. (default=False)
Note
If there were no previous calls to load then the first day(default)/file will be loaded.
- property pandas_format
Boolean flag for pandas data support.
- prev(verifyPad=False)
Iterate backwards through the data in Instrument object.
Bounds of iteration and iteration type (day/file) are set by bounds attribute.
- Parameters:
verifyPad (bool) – Passed to self.load(). If True, then padded data within the load method will be retained. (default=False)
Note
If there were no previous calls to load then the first day (default) or file will be loaded.
- remote_date_range(start=None, stop=None, **kwargs)
Determine first and last available dates for remote data.
- Parameters:
start (dt.datetime or NoneType) – Starting time for file list. A None value will start with the first file found. (default=None)
stop (dt.datetime or NoneType) – Ending time for the file list. A None value will stop with the last file found. (default=None)
**kwargs (dict) – Dictionary of keywords that may be options for specific instruments. The keyword arguments ‘user’ and ‘password’ are expected for remote databases requiring sign in or registration.
- Returns:
First and last datetimes obtained from remote_file_list
- Return type:
List
Note
Default behaviour is to search all files. User may additionally specify a given year, year/month, or year/month/day combination to return a subset of available files.
- remote_file_list(start=None, stop=None, **kwargs)
Retrieve a time-series of remote files for chosen instrument.
- Parameters:
start (dt.datetime or NoneType) – Starting time for file list. A None value will start with the first file found. (default=None)
stop (dt.datetime or NoneType) – Ending time for the file list. A None value will stop with the last file found. (default=None)
**kwargs (dict) – Dictionary of keywords that may be options for specific instruments. The keyword arguments ‘user’ and ‘password’ are expected for remote databases requiring sign in or registration.
- Returns:
pandas Series of filenames indexed by date and time
- Return type:
pds.Series
Note
Default behaviour is to return all files. User may additionally specify a given year, year/month, or year/month/day combination to return a subset of available files.
- rename(mapper, lowercase_data_labels=False)
Rename variables within both data and metadata.
- Parameters:
mapper (dict or func) – Dictionary with old names as keys and new names as variables or a function to apply to all names
lowercase_data_labels (bool) – If True, the labels applied to self.data are forced to lowercase. The case supplied in mapper is retained within inst.meta.
Examples
# Standard renaming using a dict new_mapper = {'old_name': 'new_name', 'old_name2':, 'new_name2'} inst.rename(new_mapper) # Standard renaming using a function inst.rename(str.upper)
pysat supports differing case for variable labels across the data and metadata objects attached to an Instrument. Since Meta is case-preserving (on assignment) but case-insensitive to access, the labels used for data are always valid for metadata. This feature may be used to provide friendlier variable names within pysat while also maintaining external format compatibility when writing files.
# Example with lowercase_data_labels inst = pysat.Instrument('pysat', 'testing') inst.load(2009, 1) mapper = {'uts': 'Pysat_UTS'} inst.rename(mapper, lowercase_data_labels=True) # Note that 'Pysat_UTS' was applied to data as 'pysat_uts' print(inst['pysat_uts']) # Case is retained within inst.meta, though data access to meta is # case insensitive print('True meta variable name is ', inst.meta['pysat_uts'].name) # Note that the labels in meta may be used when creating a file, # thus, 'Pysat_UTS' would be found in the resulting file inst.to_netcdf4('./test.nc', preserve_meta_case=True) # Load in file and check raw = netCDF4.Dataset('./test.nc') print(raw.variables['Pysat_UTS'])
- to_netcdf4(fname, base_instrument=None, epoch_name=None, zlib=False, complevel=4, shuffle=True, preserve_meta_case=False, export_nan=None, export_pysat_info=True, unlimited_time=True, modify=False)
Store loaded data into a netCDF4 file.
- Parameters:
fname (str) – Full path to save instrument object to netCDF
base_instrument (pysat.Instrument or NoneType) – Class used as a comparison, only attributes that are present with self and not on base_instrument are written to netCDF. Using None assigns an unmodified pysat.Instrument object. (default=None)
epoch_name (str or NoneType) – Label in file for datetime index of Instrument object (default=None)
zlib (bool) – Flag for engaging zlib compression (True - compression on) (default=False)
complevel (int) – An integer flag between 1 and 9 describing the level of compression desired. Ignored if zlib=False. (default=4)
shuffle (bool) – The HDF5 shuffle filter will be applied before compressing the data. This significantly improves compression. Ignored if zlib=False. (default=True)
preserve_meta_case (bool) – Flag specifying the case of the meta data variable strings. If True, then the variable strings within the MetaData object (which preserves case) are used to name variables in the written netCDF file. If False, then the variable strings used to access data from the pysat.Instrument object are used instead. (default=False)
export_nan (list or NoneType) – By default, the metadata variables where a value of NaN is allowed and written to the netCDF4 file is maintained by the Meta object attached to the pysat.Instrument object. A list supplied here will override the settings provided by Meta, and all parameters included will be written to the file. If not listed and a value is NaN then that attribute simply won’t be included in the netCDF4 file. (default=None)
export_pysat_info (bool) – If True, platform, name, tag, inst_id, acknowledgements, and references will be appended to the metadata. For some operational uses (e.g., conversion of Level 1 to Level 2 data), it may be desirable to set this to false to avoid conflicting versions of these parameters. (default=True)
unlimited_time (bool) – Flag specifying whether or not the epoch/time dimension should be unlimited; it is when the flag is True. (default=True)
modify (bool) – Flag specifying whether or not the changes made to the Instrument object needed to prepare it for writing should also be made to this object. If False, the current Instrument object will remain unchanged. (default=False)
- Raises:
ValueError – If required kwargs are not given values
See also
pysat.utils.io.to_netcdf
- today()
Get today’s date (UTC), with no hour, minute, second, etc.
- Returns:
today_utc – Today’s date in UTC
- Return type:
datetime
- tomorrow()
Get tomorrow’s date (UTC), with no hour, minute, second, etc.
- Returns:
Tomorrow’s date in UTC
- Return type:
datetime
- property variables
List of variables for the loaded data.
- property vars_no_time
List of variables for the loaded data, excluding time index.
- yesterday()
Get yesterday’s date (UTC), with no hour, minute, second, etc.
- Returns:
Yesterday’s date in UTC
- Return type:
datetime
Constellation
- class pysat.Constellation(platforms=None, names=None, tags=None, inst_ids=None, const_module=None, instruments=None, index_res=None, common_index=True, custom=None, **kwargs)
Manage and analyze data from multiple pysat Instruments.
- Parameters:
platforms (list or NoneType) – List of strings indicating the desired Instrument platforms. If None is specified on initiation, a list will be created to hold the platform attributes from each pysat.Instrument object in instruments. (default=None)
names (list or NoneType) – List of strings indicating the desired Instrument names. If None is specified on initiation, a list will be created to hold the name attributes from each pysat.Instrument object in instruments. (default=None)
tags (list or NoneType) – List of strings indicating the desired Instrument tags. If None is specified on initiation, a list will be created to hold the tag attributes from each pysat.Instrument object in instruments. (default=None)
inst_ids (list or NoneType) – List of strings indicating the desired Instrument inst_ids. If None is specified on initiation, a list will be created to hold the inst_id attributes from each pysat.Instrument object in instruments. (default=None)
const_module (module or NoneType) – Name of a pysat constellation module (default=None)
instruments (list-like or NoneType) – A list of pysat Instruments to include in the Constellation (default=None)
index_res (float or NoneType) – Output index resolution in seconds or None to determine from Constellation instruments (default=None)
common_index (bool) – True to include times where all instruments have data, False to use the maximum time range from the Constellation (default=True)
custom (list or NoneType) – Input dict containing dicts of inputs for custom_attach method inputs that may be applied to all instruments or at the Constellation-level or None (default=None)
**kwargs (dict) – Additional keyword arguments are passed to Instruments instantiated within the class through use of input arguments platforms, names, tags, and inst_ids. Additional keywords are not applied when using the const_module or instruments inputs.
- platforms
- names
- tags
- inst_ids
- instruments
- index_res
- common_index
- bounds
Tuple of two datetime objects or filenames indicating bounds for loading data, or a tuple of NoneType objects. Users may provide as a tuple or tuple of lists (useful for bounds with gaps). The attribute is always stored as a tuple of lists for consistency.
- Type:
tuple
- custom_functions
List of functions to be applied at the Constellation-level upon load
- Type:
list
- custom_args
List of lists containing arguments to be passed to particular Constellation-level custom function
- Type:
list
- custom_kwargs
List of dictionaries with keywords and values to be passed to a Constellation-level custom function
- Type:
list
- date
Date and time for loaded data, None if no data is loaded
- Type:
dt.datetime or NoneType
- yr
Year for loaded data, None if no data is loaded
- Type:
int or NoneType
- doy
Day of year for loaded data, None if no data is loaded
- Type:
int or NoneType
- yesterday
Date and time for yesterday in UT
- Type:
dt.datetime
- today
Date and time for the current day in UT
- Type:
dt.datetime
- tomorrow
Date and time for tomorrow in UT
- Type:
dt.datetime
- empty
Flag that indicates all Instruments do not contain data when True.
- Type:
bool
- empty_partial
Flag that indicates at least one Instrument in the Constellation does not have data when True.
- Type:
bool
- variables
List of loaded data variables for all instruments.
- Type:
list
- Raises:
ValueError – When instruments is not list-like, when all inputs to load through the registered Instrument list are unknown, or when one of the items assigned is not an Instrument.
AttributeError – When module provided through const_module is missing the required attribute instruments.
Note
Omit platforms, names, tags, inst_ids, instruments, and const_module to create an empty constellation.
Initialize the Constellation object.
- property bounds
Obtain boundaries for Instruments in Constellation.
When setting, sets for all instruments in Constellation.
- Parameters:
value (tuple or NoneType) – Tuple containing starting time and ending time for Instrument bounds attribute or None (default=None)
- custom_attach(function, apply_inst=True, at_pos='end', args=None, kwargs=None)
Register a function to modify data of member Instruments.
- Parameters:
function (str or function object) – Name of function or function object to be added to queue
apply_inst (bool) – Apply the custom function to all Instruments if True, or at the Constellation level if False. (default=True)
at_pos (str or int) – Accepts string ‘end’ or a number that will be used to determine the insertion order if multiple custom functions are attached to an Instrument object. (default=’end’).
args (list, tuple, or NoneType) – Ordered arguments following the instrument object input that are required by the custom function (default=None)
kwargs (dict or NoneType) – Dictionary of keyword arguments required by the custom function (default=None)
Note
Functions applied using custom_attach_cost may add, modify, or use the data within any Instrument inside of the function, and so should not return anything.
Constellation-level custom functions are applied after Instrument-level custom functions whenever the load method is called.
Unlike Instrument-level custom functions, Constellation-level custom functions should take a Constellation object as their first input argument.
See also
Instrument.custom_attach
base method for attaching custom functions
- custom_clear()
Clear the custom function list.
See also
Instrument.custom_clear
base method for clearing custom functions
- property date
Date for loaded data.
- download(*args, **kwargs)
Download instrument data into Instrument object.data.
- Parameters:
*args (list reference) – References a list of input arguments
**kwargs (dict reference) – References a dict of input keyword arguments
See also
Instrument.download
base method for loading Instrument data
Note
If individual instruments require specific kwargs that differ from other instruments, define that in the individual instrument rather than this method.
- drop(names)
Drop variables (names) from metadata.
- Parameters:
names (str or list-like) – String or list of strings specifying the variable names to drop
- Raises:
KeyError – If all of the keys provided in names is not found in the standard metadata, labels, or header metadata. If a subset is missing, a logger warning is issued instead.
- property empty
Boolean flag reflecting lack of data.
Note
True if there is no Instrument data in all Constellation Instrument.
- property empty_partial
Boolean flag reflecting lack of data.
Note
True if there is no Instrument data in any Constellation Instrument.
- property index
Obtain time index of loaded data.
- load(*args, **kwargs)
Load instrument data into Instrument object.data.
- Parameters:
*args (list reference) – References a list of input arguments
**kwargs (dict reference) – References a dict of input keyword arguments
See also
Instrument.load
base method for loading Instrument data
- to_inst(common_coord=True, fill_method=None)
Combine Constellation data into an Instrument.
- Parameters:
common_coord (bool) – For Constellations with any xarray.Dataset Instruments, True to include locations where all coordinate arrays cover, False to use the maximum location range from the list of coordinates (default=True)
fill_method (str or NoneType) – Fill method if common data coordinates do not match exactly. If one of ‘nearest’, ‘pad’/’ffill’, ‘backfill’/’bfill’, or None then no interpolation will occur. If ‘linear’, ‘zero’, ‘slinear’, ‘quadratic’, ‘cubic’, or ‘polynomial’ are used, then 1D or ND interpolation will be used. (default=None)
- Returns:
inst – A pysat Instrument containing all data from the constellation at a common time index
- Return type:
pysat.Instrument
Note
Uses the common index, self.index, that was defined using information from the Constellation Instruments in combination with a potential user-supplied resolution defined through self.index_res.
- today()
Obtain UTC date for today, see pysat.Instrument for details.
- tomorrow()
Obtain UTC date for tomorrow, see pysat.Instrument for details.
- property variables
Retrieve list of uniquely named variables from all loaded data.
- yesterday()
Obtain UTC date for yesterday, see pysat.Instrument for details.
Files
- class pysat.Files(inst, data_dir=None, directory_format=None, update_files=False, file_format=None, write_to_disk=True, ignore_empty_files=False)
Maintain collection of files and associated methods.
- Parameters:
inst (pysat.Instrument) – Instrument object
data_dir (str or NoneType) – Directory without sub-directory variables that allows one to bypass the directories provided by pysat.params[‘data_dirs’]. Only applied if the directory exists. (default=None)
directory_format (str or NoneType) – Sub-directory naming structure, which is expected to exist or be created within one of the pysat.params[‘data_dirs’] directories. Variables such as platform, name, tag, and inst_id will be filled in as needed using python string formatting, if a string is supplied. The default directory structure, which is used if None is specified, is provided by pysat.params[‘directory_format’] and is typically ‘{platform}/{name}/{tag}/{inst_id}’. (default=None)
update_files (bool) – If True, immediately query filesystem for instrument files and store (default=False)
file_format (str or NoneType) – File naming structure in string format. Variables such as year, month, day, etc. will be filled in as needed using python string formatting. The default file format structure is supplied in the instrument list_files routine. See pysat.utils.files.parse_delimited_filenames and pysat.utils.files.parse_fixed_width_filenames for more information. (default=None)
write_to_disk (bool) – If true, the list of Instrument files will be written to disk. (default=True)
ignore_empty_files (bool) – If True, the list of files found will be checked to ensure the filesizes are greater than zero. Empty files are removed from the stored list of files. (default=False)
- directory_format
- update_files
- file_format
- write_to_disk
- ignore_empty_files
- home_path
Path to the pysat information directory.
- Type:
str
- data_path
Path to the top-level directory containing instrument files, selected from data_paths.
- Type:
str
- data_paths
Available paths that pysat will use when looking for files. The class uses the first directory with relevant data, stored in data_path.
- Type:
list of str
- files
Series of data files, indexed by file start time.
- Type:
pds.Series
- inst_info
Contains pysat.Instrument parameters ‘platform’, ‘name’, ‘tag’, and ‘inst_id’, identifying the source of the files.
- Type:
dict
- list_files_creator
Experimental feature for Instruments that internally generate data and thus don’t have a defined supported date range.
- Type:
functools.partial or NoneType
- list_files_rtn
Method used to locate relevant files on the local system. Provided by associated pysat.Instrument object.
- Type:
method
- multi_file_day
Flag copied from associated pysat.Instrument object that indicates when data for day n may be found in files for days n-1, or n+1
- Type:
bool
- start_date
Date of first file, used as default start bound for instrument object, or None if no files are loaded.
- Type:
datetime or NoneType
- stop_date
Date of last file, used as default stop bound for instrument object, or None if no files are loaded.
- Type:
datetime or NoneType
- stored_file_name
Name of the hidden file containing the list of archived data files for this instrument.
- Type:
str
- sub_dir_path
directory_format string formatted for the local system.
- Type:
str
- Raises:
NameError – If pysat.params[‘data_dirs’] not assigned
Note
Interfaces with the list_files method for a given instrument support module to create an ordered collection of files in time, used primarily by the pysat.Instrument object to identify files to be loaded. The Files class mediates access to the files by datetime and contains helper methods for determining the presence of new files and filtering out empty files.
User should generally use the interface provided by a pysat.Instrument instance. Exceptions are the classmethod from_os, provided to assist in generating the appropriate output for an instrument routine.
Examples
# Instantiate instrument to generate file list inst = pysat.Instrument(platform=platform, name=name, tag=tag, inst_id=inst_id) # First file inst.files[0] # Files from start up to stop (exclusive on stop) start = dt.datetime(2009,1,1) stop = dt.datetime(2009,1,3) print(inst.files[start:stop]) # Files for date print(inst.files[start]) # Files by slicing print(inst.files[0:4]) # Get a list of new files. New files are those that weren't present # the last time a given instrument's file list was stored. new_files = inst.files.get_new() # Search pysat appropriate directory for instrument files and # update Files instance. inst.files.refresh()
Initialize pysat.Files object.
- copy()
Provide a deep copy of object.
- Returns:
Copy of self
- Return type:
Files class instance
- classmethod from_os(data_path=None, format_str=None, two_digit_year_break=None, delimiter=None)
Produce a list of files and format it for Files class.
- Parameters:
data_path (str or NoneType) – Top level directory to search files for. This directory is provided by pysat to the instrument_module.list_files functions as data_path. (default=None)
format_str (str or NoneType) – Provides the naming pattern of the instrument files and the locations of date information so an ordered list may be produced. Supports ‘year’, ‘month’, ‘day’, ‘hour’, ‘minute’, ‘second’, ‘version’, ‘revision’, and ‘cycle’ Ex: ‘cnofs_cindi_ivm_500ms_{year:4d}{month:02d}{day:02d}_v01.cdf’ (deafult=None)
two_digit_year_break (int or NoneType) – If filenames only store two digits for the year, then ‘1900’ will be added for years >= two_digit_year_break and ‘2000’ will be added for years < two_digit_year_break. If None, then four-digit years are assumed. (default=None)
delimiter (str or NoneType) – Delimiter string upon which files will be split (e.g., ‘.’). If None, filenames will be parsed presuming a fixed width format. (default=None)
- Returns:
A Series of filenames indexed by time. See pysat.utils.files.process_parsed_filenames for details.
- Return type:
pds.Series
- Raises:
ValueError – If data_path or format_str is None
Note
Requires fixed_width or delimited filename
Does not produce a Files instance, but the proper output from instrument_module.list_files method
The ‘?’ may be used to indicate a set number of spaces for a variable part of the name that need not be extracted. ‘cnofs_cindi_ivm_500ms_{year:4d}{month:02d}{day:02d}_v??.cdf’
When parsing using fixed width filenames (delimiter=None), leading ‘*’ wilcards are supported, ‘{year:4d}{month:02d}{day:02d}_v??.cdf’, though the ‘’ is not supported after the first template variable. The ‘?’ wildcard may be used anywhere in the template string.
When parsing using a delimiter, the ‘*’ wildcard is supported when leading, trailing, or wholly contained between delimiters, such as ‘data_name-{year:04d}--{day:02d}.txt’, or ‘-{year:04d}-{day:02d}*’, where ‘-’ is the delimiter. There can not be a mixture of a template variable and ‘*’ without a delimiter in between, unless the ‘*’ occurs after the variable. The ‘?’ wildcard may be used anywhere in the template string.
The ‘day’ format keyword may be used to specify either day of month (if month is included) or day of year.
- get_file_array(start, stop)
Return a list of filenames between and including start and stop.
- Parameters:
start (array-like or str) – Filenames for start of returned filelist
stop (array-like or str) – Filenames inclusive of the ending of list provided by the stop time
- Returns:
files – A list of filenames between and including start and stop times over all intervals.
- Return type:
list
Note
start and stop must be of the same type: both array-like or both strings
- get_index(fname)
Return index for a given filename.
- Parameters:
fname (str) – Filename for the desired time index
- Raises:
ValueError – Filename not in index
Note
If fname not found in the file information already attached to the instrument.files instance, then a files.refresh() call is made.
- get_new()
List new files since last recorded file state.
- Returns:
A datetime-index Series of all new fileanmes since the last known change to the files.
- Return type:
pandas.Series
Note
pysat stores filenames in the user_home/.pysat directory. Filenames are stored if there is a change and either update_files is True at instrument object level or files.refresh() is called.
- refresh()
Update list of files, if there are changes.
Note
Calls underlying list_files_rtn for the particular science instrument. Typically, these routines search in the pysat provided path, pysat_data_dir/platform/name/tag/inst_id, where pysat_data_dir is set by pysat.params[‘data_dirs’] = path.
- set_top_level_directory(path)
Set top-level data directory.
Sets a valid self.data_path using provided top-level directory path and the associated pysat subdirectories derived from the directory_format attribute as stored in self.sub_dir_path
- Parameters:
path (str) – Top-level path to use when looking for files. Must be in pysat.params[‘data_dirs’].
- Raises:
ValueError – If path not in pysat.params[‘data_dirs’]
Warning
If there are Instrument files on the system under a top-level directory other than path, then, under certain conditions, self.data_path may be later updated by the object to point back to the directory with files.
Meta
- class pysat.Meta(metadata=None, header_data=None, labels={'desc': ('desc', <class 'str'>), 'fill_val': ('fill', (<class 'float'>, <class 'int'>, <class 'str'>)), 'max_val': ('value_max', (<class 'float'>, <class 'int'>)), 'min_val': ('value_min', (<class 'float'>, <class 'int'>)), 'name': ('long_name', <class 'str'>), 'notes': ('notes', <class 'str'>), 'units': ('units', <class 'str'>)}, export_nan=None, data_types=None)
Store metadata for the Instrument and Constellation classes.
- Parameters:
metadata (pandas.DataFrame) – DataFrame should be indexed by variable name that contains at minimum the standard_name (name), units, and long_name for the data stored in the associated pysat Instrument object.
header_data (dict or NoneType) – Global meta data to be assigned to the header attribute. Keys denote the desired attribute names and values the metadata for that attribute. (default=None)
labels (dict) – Dict where keys are the label attribute names and the values are tuples that have the label values and value types in that order. (default={‘units’: (‘units’, str), ‘name’: (‘long_name’, str), ‘notes’: (‘notes’, str), ‘desc’: (‘desc’, str), ‘min_val’: (‘value_min’, (float, int)), ‘max_val’: (‘value_max’, (float, int)), ‘fill_val’: (‘fill’, (float, int, str))})
export_nan (list or NoneType) – List of labels that should be exported even if their value is NaN or None for an empty list. When used, metadata with a value of NaN will be excluded from export. Will always allow NaN export for labels of the float type. (default=None)
data_types (dict or NoneType) – Dict of data types for variables names or None to determine after loading the data. (default=None)
- data
Index is variable standard name, ‘units’, ‘long_name’, and other defaults are also stored along with additional user provided labels.
- Type:
pandas.DataFrame
- labels
Labels for MetaData attributes
- Type:
- mutable
If True, attributes directly attached to Meta are modifiable
- Type:
bool
- header
Class containing global metadata
- Type:
MetaHeader
Note
Meta object preserves the case of variables and attributes as it first receives the data. Subsequent calls to set new metadata with the same variable or attribute will use case of first call. Accessing or setting data thereafter is case insensitive. In practice, use is case insensitive but the original case is preserved. Case preseveration is built in to support writing files with a desired case to meet standards.
Supports any custom metadata values in addition to the expected metadata attributes (units, name, notes, desc, value_min, value_max, and fill). These base attributes may be used to programatically access and set types of metadata regardless of the string values used for the attribute. String values for attributes may need to be changed depending upon the standards of code or files interacting with pysat.
Meta objects returned as part of pysat loading routines are automatically updated to use the same values of units, etc. as found in the pysat.Instrument object.
Meta objects have a structure similar to the CF-1.6 netCDF data standard.
Examples
# Instantiate Meta object, default values for attribute labels are used meta = pysat.Meta() # Set several variable units. Note that other base parameters are not # set below, and so will be assigned a default value. meta['var_name'] = {meta.labels.name: 'Variable Name', meta.labels.units: 'MegaUnits'} # Update only 'units' to new value. You can use the value of # `meta.labels.units` instead of the class attribute, as was done in # the above example. meta['var_name'] = {'units': 'MU'} # Custom meta data variables may be assigned using the same method. # This example uses non-standard meta data variables 'scale', 'PI', # and 'axis_multiplier'. You can include or not include any of the # standard meta data information. meta['var_name'] = {'units': 'MU', 'long_name': 'Variable Name', 'scale': 'linear', 'axis_multiplier': 1e4} meta['var_name'] = {'PI': 'Dr. R. Song'} # Meta data may be assigned to multiple variables at once meta[['var_name1', 'var_name2']] = {'long_name': ['Name1', 'Name2'], 'units': ['Units1', 'Units2'], 'scale': ['linear', 'linear']} # Sometimes n-Dimensional (nD) variables require multi-dimensional # meta data structures. meta2 = pysat.Meta() meta2['var_name41'] = {'long_name': 'name1of4', 'units': 'Units1'} meta2['var_name42'] = {'long_name': 'name2of4', 'units': 'Units2'} meta['var_name4'] = {'meta': meta2} # Meta data may be assigned from another Meta object using dict-like # assignments key1 = 'var_name' key2 = 'var_name4' meta[key1] = meta2[key2] # When accessing one meta data value for any data variable, first use # the data variable and then the meta data label. meta['var_name', 'fill'] # A more robust method is to use the available Meta variable attributes # in the attached MetaLabels class object. meta[key1, meta.labels.fill_val] # You may change a label used by Meta object to have a different value meta.labels.fill_val = '_FillValue' # Note that the fill label is intended for use when interacting # with external files. Thus, any fill values (NaN) within the Meta # object are not updated when changing the metadata string label, # or when updating the value representing fill data. A future update # (Issue #707) will expand functionality to include these custom # fill values when producing files.
Initialize pysat.Meta object.
- accept_default_labels(other_meta)
Apply labels for default meta labels from other onto self.
- Parameters:
other_meta (Meta) – Meta object to take default labels from
- add_epoch_metadata(epoch_name)
Add epoch or time-index metadata if it is missing.
- Parameters:
epoch_name (str) – Data key for time-index or epoch data
- apply_meta_labels(other_meta)
Apply the existing meta labels from self onto different MetaData.
- Parameters:
other_meta (Meta) – Meta object to have default labels applied
- Returns:
other_updated – Meta object with the default labels applied
- Return type:
Meta
- attr_case_name(name)
Retrieve preserved case name for case insensitive value of name.
- Parameters:
name (str or list) – Single or multiple variable name(s) to get stored case form.
- Returns:
out_name – Maintains same type as input. Name(s) in proper case.
- Return type:
str or list
Note
Checks first within standard attributes. If not found, returns supplied name as it is available for use. Intended to be used to help ensure that the same case is applied to all repetitions of a given variable name.
- attrs()
Yield metadata products stored for each variable name.
- concat(other_meta, strict=False)
Concats two metadata objects together.
- Parameters:
other_meta (Meta) – Meta object to be concatenated
strict (bool) – If True, this flag ensures there are no duplicate variable names (default=False)
- Returns:
mdata – Concatenated object
- Return type:
Meta
- Raises:
KeyError – If there are duplicate keys and the strict flag is True.
Note
Uses units and name label of self if other_meta is different
- copy()
Deep copy of the meta object.
- property data
Retrieve data.
May be set using data.setter(new_frame), where new_frame is a pandas Dataframe containing the metadata with label names as columns.
- drop(names)
Drop variables (names) from metadata.
- Parameters:
names (str or list-like) – String or list of strings specifying the variable names to drop
- Raises:
KeyError – If all of the keys provided in names is not found in the standard metadata, labels, or header metadata. If a subset is missing, a logger warning is issued instead.
- property empty
Return boolean True if there is no metadata.
- Returns:
Returns True if there is no data, and False if there is data
- Return type:
bool
- classmethod from_csv(filename=None, col_names=None, sep=None, **kwargs)
Create instrument metadata object from csv.
- Parameters:
filename (string) – Absolute filename for csv file or name of file stored in pandas instruments location
col_names (list-like collection of strings) – Column names in csv and resultant meta object
sep (string) – Column seperator for supplied csv filename
**kwargs (dict) – Optional kwargs used by pds.read_csv
Note
Column names must include at least [‘name’, ‘long_name’, ‘units’], which are assumed if col_names is None
- hasattr_case_neutral(attr_name)
Case-insensitive check for attribute names in this class.
- Parameters:
attr_name (str) – Name of attribute to find
- Returns:
has_name – True if the case-insensitive check for attribute name is successful, False if no attribute name is present.
- Return type:
bool
- keep(keep_names)
Keep variables (keep_names) while dropping other parameters.
- Parameters:
keep_names (list-like) – Variables to keep
- keys()
Yield variable names stored for 1D variables.
- merge(other)
Add metadata variables to self that are in other but not in self.
- Parameters:
other (pysat.Meta) – Metadata to be merged into self
- pop(label_name)
Remove and return metadata about variable.
- Parameters:
label_name (str) – Meta key for a data variable
- Returns:
output – Series of metadata for variable
- Return type:
pds.Series
- rename(mapper)
Update the preserved case name for mapped value of name.
- Parameters:
mapper (dict or func) – Dictionary with old names as keys and new names as variables or a function to apply to all names
Note
Checks first within standard attributes. If not found, returns supplied name as it is available for use. Intended to be used to help ensure that the same case is applied to all repetitions of a given variable name.
- to_dict(preserve_case=False)
Convert self into a dictionary.
- Parameters:
preserve_case (bool) – If True, the case of variables within self are preserved. If False, all variables returned as lower case. (default=False)
- Returns:
export_dict – A dictionary of the metadata for each variable of an output file
- Return type:
dict
- transfer_attributes_to_header(strict_names=False)
Transfer non-standard attributes in Meta to the MetaHeader object.
- Parameters:
strict_names (bool) – If True, produces an error if the MetaHeader object already has an attribute with the same name to be copied (default=False)
- Raises:
AttributeError – If strict_names is True and a global attribute would be updated.
- transfer_attributes_to_instrument(inst, strict_names=False)
Transfer non-standard attributes in Meta to Instrument object.
- Parameters:
inst (pysat.Instrument) – Instrument object to transfer attributes to
strict_names (bool) – If True, produces an error if the Instrument object already has an attribute with the same name to be copied (default=False)
- Raises:
ValueError – If inst type is not pysat.Instrument.
Note
pysat.files.io.load_netCDF and similar routines are only able to attach netCDF4 attributes to a Meta object. This routine identifies these attributes and removes them from the Meta object. Intent is to support simple transfers to the pysat.Instrument object.
Will not transfer names that conflict with pysat default attributes.
- var_case_name(name)
Provide stored name (case preserved) for case insensitive input.
- Parameters:
name (str or list) – Single or multiple variable name(s) using any capitalization scheme.
- Returns:
case_names – Maintains the same type as input, returning the stored name(s) of the meta object.
- Return type:
str or list
Note
If name is not found (case-insensitive check) then name is returned, as input. This function is intended to be used to help ensure the case of a given variable name is the same across the Meta object.
MetaLabels
- class pysat.MetaLabels(metadata=None, units=('units', <class 'str'>), name=('long_name', <class 'str'>), notes=('notes', <class 'str'>), desc=('desc', <class 'str'>), min_val=('value_min', (<class 'float'>, <class 'int'>)), max_val=('value_max', (<class 'float'>, <class 'int'>)), fill_val=('fill', (<class 'float'>, <class 'int'>, <class 'str'>)), **kwargs)
Store metadata labels for Instrument instance.
- Parameters:
units (tuple) – Units label name and value type(s) (default=(‘units’, str))
name (tuple) – Name label name and value type(s) (default=(‘long_name’, str))
notes (tuple) – Notes label name and value type(s) (default=(‘notes’, str))
desc (tuple) – Description label name and value type(s) (default=(‘desc’, str))
min_val (tuple) – Minimum value label name and value type(s) (default=(‘value_min’, (float, int)))
max_val (tuple) – Maximum value label name and value type(s) (default=(‘value_max’, (float, int)))
fill_val (tuple) – Fill value label name and value type(s) (default=(‘fill’, (float, int, str)))
kwargs (dict) – Dictionary containing optional label attributes, where the keys are the attribute names and the values are tuples containing the label name and value type
- meta
Coupled MetaData data object or NoneType
- Type:
pandas.DataFrame or NoneType
- units
String used to label units in storage (default=’units’)
- Type:
str
- name
String used to label long_name in storage (default=’long_name’)
- Type:
str
- notes
String used to label notes in storage (default=’notes’)
- Type:
str
- desc
String used to label variable descriptions in storage (default=’desc’)
- Type:
str
- min_val
String used to label typical variable value min limit in storage (default=’value_min’)
- Type:
str
- max_val
String used to label typical variable value max limit in storage (default=’value_max’)
- Type:
str
- fill_val
String used to label fill value in storage. The default follows the netCDF4 standards. (default=’fill’)
- Type:
str
- label_type
Dict with attribute names as keys and expected data types as values
- Type:
dict
- label_attrs
Dict with attribute names as values and attributes values as keys
- Type:
dict
- Raises:
TypeError – If meta data type is invalid
Note
Meta object preserves the case of variables and attributes as it first receives the data. Subsequent calls to set new metadata with the same variable or attribute will use case of first call. Accessing or setting data thereafter is case insensitive. In practice, use is case insensitive but the original case is preserved. Case preservation is built in to support writing files with a desired case to meet standards.
Supports any custom metadata values in addition to the expected metadata attributes (units, name, notes, desc, value_min, value_max, and fill). These base attributes may be used to programatically access and set types of metadata regardless of the string values used for the attribute. String values for attributes may need to be changed depending upon the standards of code or files interacting with pysat.
Meta objects returned as part of pysat loading routines are automatically updated to use the same values of units, etc. as found in the pysat.Instrument object.
Initialize the MetaLabels class.
- default_values_from_attr(attr_name, data_type=None)
Retrieve the default values for each label based on their type.
- Parameters:
attr_name (str) – Label attribute name (e.g., max_val)
data_type (type or NoneType) – Type for the data values or None if not specified (default=None)
- Returns:
default_val – Sets NaN for all float values, -1 for all int values, and ‘’ for all str values except for ‘scale’, which defaults to ‘linear’, and None for any other data type
- Return type:
str, float, int, or NoneType
- Raises:
ValueError – For unknown attr_name
- default_values_from_type(val_type, data_type=None)
Retrieve the default values for each label based on their type.
- Parameters:
val_type (type) – Variable type for the value to be assigned to a MetaLabel
data_type (type or NoneType) – Type for the data values or None if not specified (default=None)
- Returns:
default_val – Sets NaN for all float values, -1 for all int values, and ‘’ for all str values, and None for any other data type
- Return type:
str, float, int, NoneType
- drop(names)
Remove data from MetaLabels.
- Parameters:
names (str or list-like) – Attribute or MetaData name(s)
- Raises:
AttributeError or KeyError – If any part of names is missing and cannot be dropped
- update(lattr, lname, ltype)
Update MetaLabels with a new label.
- Parameters:
lattr (str) – Attribute for this new label
lname (str) – MetaData name for this label
ltype (type) – Expected data type for this label
- Raises:
TypeError – If meta data type is invalid
MetaHeader
- class pysat.MetaHeader(header_data=None)
Stores global metadata.
- Parameters:
header_data (dict or NoneType) – Meta data to be assigned to the class. Keys denote the desired attribute names and values the metadata for that attribute. (default=None)
- global_attrs
List of global attribute names
- Type:
list
- <attrs>
Attributes with names corresponding to the values of global_attrs, may have any type
- to_dict()
Convert global attributes to a dictionary.
Initialize the MetaHeader class.
- drop(names)
Drop variables (names) from MetaHeader.
- Parameters:
names (list-like) – List of strings specifying the variable names to drop
- to_dict()
Convert the header data to a dictionary.
- Returns:
header_data – Global meta data where the keys are the attribute names and values the metadata for that attribute.
- Return type:
dict
Orbits
- class pysat.Orbits(inst, index=None, kind='local time', period=None)
Determine orbits on the fly and provide orbital data in .data.
- Parameters:
inst (pysat.Instrument) – Instrument object for which the orbits will be determined
index (str or NoneType) – Name of the data series to use for determining orbit breaks (default=None)
kind (str) – Kind of orbit, which specifies how orbital breaks are determined. Expects one of: ‘local time’, ‘longitude’, ‘polar’, or ‘orbit’ - local time: negative gradients in lt or breaks in inst.data.index - longitude: negative gradients or breaks in inst.data.index - polar: zero crossings in latitude or breaks in inst.data.index - orbit: uses unique values of orbit number (default=’local time’)
period (np.timedelta64 or NoneType) – length of time for orbital period, used to gauge when a break in the datetime index inst.index is large enough to consider it a new orbit (default=None)
- inst
- kind
- orbit_period
Pandas Timedelta that specifies the orbit period. Used instead of dt.timedelta to enable np.timedelta64 input. (default=97 min)
- Type:
pds.Timedelta
- num
Number of orbits in loaded data
- Type:
int
- orbit_index
Index of currently loaded orbit, zero indexed
- Type:
int
- Raises:
ValueError – If kind is unsupported
Note
Determines the locations of orbit breaks in the loaded data in inst.data and provides iteration tools and convenient orbit selection via inst.orbit[orbit num]
This class should not be called directly by the user, it uses the interface provided by inst.orbits where inst = pysat.Instrument()
Examples
# Use orbit_info Instrument keyword to pass all Orbit kwargs orbit_info = {'index': 'longitude', 'kind': 'longitude'} vefi = pysat.Instrument(platform='cnofs', name='vefi', tag='dc_b', clean_level=None, orbit_info=orbit_info) # Load data vefi.load(date=start) # Set the instrument bounds start = dt.datetime(2009, 1, 1) stop = dt.datetime(2009, 1, 10) vefi.bounds(start, stop) # Iterate over orbits for loop_vefi in vefi.orbits: print('Next available orbit ', loop_vefi['dB_mer']) # Load fifth orbit of first day vefi.load(date=start) vefi.orbits[5] # Equivalent but less convenient load vefi.orbits.load(5) # Manually iterate forwards to the orbit vefi.orbits.next() # Manually iterate backwards to the previous orbit vefi.orbits.prev()
Initialize pysat.Instrument.orbits object.
- copy()
Provide a deep copy of object.
- Returns:
Copy of self
- Return type:
Orbits class instance
- property current
Retrieve current orbit number.
- Returns:
None if no orbit data. Otherwise, returns orbit number, beginning with zero. The first and last orbit of a day is somewhat ambiguous. The first orbit for day n is generally also the last orbit on day n - 1. When iterating forward, the orbit will be labeled as first (0). When iterating backward, orbit labeled as the last.
- Return type:
int or NoneType
- load(orbit_num)
Load a particular orbit into .data for loaded day.
- Parameters:
orbit_num (int) – orbit number, 1 indexed (1-length or -1 to -length) with sign denoting forward or backward indexing
- Raises:
ValueError – If index requested lies beyond the number of orbits
Note
A day of data must be loaded before this routine functions properly. If the last orbit of the day is requested, it will automatically be padded with data from the next day. The orbit counter will be reset to 1.
- next()
Load the next orbit into associated Instrument.data object.
- Raises:
RuntimeError – Placed in code that a user should never be able to reach
Note
Forms complete orbits across day boundaries. If no data loaded then the first orbit from the first date of data is returned.
- prev()
Load the previous orbit into associated Instrument.data object.
- Raises:
RuntimeError – Placed in code that a user should never be able to reach
Note
Forms complete orbits across day boundaries. If no data loaded then the last orbit of data from the last day is loaded.
Parameters
- class pysat._params.Parameters(path=None, create_new=False)
Stores user parameters used by pysat.
Also stores custom user parameters provided the keys don’t conflict with default pysat parameters.
- Parameters:
path (str) – If provided, the directory path will be used to load/store a parameters file with name ‘pysat_settings.json’ (default=None)
create_new (bool) – If True, a new parameters file is created. Will be created at path if provided. If not, file will be created in .pysat directory stored under the user’s home directory.
- data
pysat user settings dictionary
- Type:
dict
- defaults
Default parameters (keys) and values used by pysat that include {‘clean_level’: ‘clean’, ‘directory_format’: os.path.join(‘{platform}’, ‘{name}’, ‘{tag}’, ‘{inst_id}’), ‘ignore_empty_files’: False, ‘update_files’: True, ‘file_timeout’: 10, ‘user_modules’ : {}, ‘warn_empty_file_list’: False}
- Type:
dict
- file_path
Location of file used to store settings
- Type:
str
- non_defaults
List of pysat parameters (strings) that don’t have a defined default and are unaffected by self.restore_defaults()
- Type:
list
- Raises:
ValueError – The ‘user_modules’ parameter may not be set directly by the user. Please use the pysat.utils.regsitry module to modify the packages stored in ‘user_modules’.
OSError – User provided path does not exist
Note
This method will look for ‘pysat_settings.json’ file first in the current working directory and then in the home ‘~/.pysat’ directory.
All pysat parameters are automatically stored whenever a parameter is assigned or modified. The default parameters and values tracked by this class are grouped by type below.
Values that map to the corresponding keywords on pysat.Instrument: clean_level, directory_format, ignore_empty_files, and update_files. See the Instrument docstring for more information on these keywords.
Values that map to internal pysat settings: file_timeout, user_modules, and warn_empty_file_list.
Stored pysat parameters without a working default value: data_dirs.
file_timeout - Time in seconds that pysat will wait to modify a busy file
user_modules - Stores information on modules registered by pysat
warn_empty_file_list - Raise a warning when no Instrument files are found
data_dirs - Directory(ies) where data are stored, in access order
Initialize Parameters object.
- clear_and_restart()
Clear all stored settings and sets pysat defaults.
Note
pysat parameters without a default value are set to []
- restore_defaults()
Restore default pysat parameters.
Note
Does not modify any stored custom user keys or pysat parameters without a default value.
- store()
Store parameters using the filename specified in self.file_path.
Instrument Methods
The following methods support a variety of actions commonly needed by pysat.Instrument modules regardless of the data source.
General
Provides generalized routines for integrating instruments into pysat.
- pysat.instruments.methods.general.filename_creator(value, format_str=None, start_date=None, stop_date=None)
Create filenames as needed to support use of generated pysat data sets.
- Parameters:
value (slice) – Datetime slice, see _instrument.py, fname = self.files[date:(date + inc)]
format_str (str or NoneType) – File format template string (default=None)
start_date (datetime.datetime or NoneType) – First date supported (default=None)
stop_date (datetime.datetime or NoneType) – Last date supported (default=None)
- Returns:
Created filenames from format_str indexed by datetime
- Return type:
pandas.Series
- Raises:
NotImplementedError – This method is a stub to support development of a potential feature slated for a future release.
- pysat.instruments.methods.general.is_daily_file_cadence(file_cadence)
Evaluate file cadence to see if it is daily or greater than daily.
- Parameters:
file_cadence (dt.timedelta or pds.DateOffset) – pysat assumes a daily file cadence, but some instrument data file contain longer periods of time. This parameter allows the specification of regular file cadences greater than or equal to a day (e.g., weekly, monthly, or yearly). (default=dt.timedelta(days=1))
- Returns:
is_daily – True if the cadence is daily or less, False if the cadence is greater than daily
- Return type:
bool
- pysat.instruments.methods.general.list_files(tag='', inst_id='', data_path='', format_str=None, supported_tags=None, file_cadence=datetime.timedelta(days=1), two_digit_year_break=None, delimiter=None)
Return a Pandas Series of every file for chosen Instrument data.
This routine provides a standard interface for pysat instrument modules.
- Parameters:
tag (str) – Tag name used to identify particular data set to be loaded. This input is nominally provided by pysat itself. (default=’’)
inst_id (str) – Instrument ID used to identify particular data set to be loaded. This input is nominally provided by pysat itself. (default=’’)
data_path (str) – Path to data directory. This input is nominally provided by pysat itself. (default=’’)
format_str (string or NoneType) – User specified file format. If None is specified, the default formats associated with the supplied tags are used. See Files.from_os format_str kwarg for more details. (default=None)
supported_tags (dict or NoneType) – Keys are inst_id, each containing a dict keyed by tag where the values are file format template strings. (default=None)
file_cadence (dt.timedelta or pds.DateOffset) – pysat assumes a daily file cadence, but some instrument data file contain longer periods of time. This parameter allows the specification of regular file cadences greater than or equal to a day (e.g., weekly, monthly, or yearly). (default=dt.timedelta(days=1))
two_digit_year_break (int or NoneType) – If filenames only store two digits for the year, then ‘1900’ will be added for years >= two_digit_year_break and ‘2000’ will be added for years < two_digit_year_break. If None, then four-digit years are assumed. (default=None)
delimiter (str or NoneType) – Delimiter string upon which files will be split (e.g., ‘.’). If None, filenames will be parsed presuming a fixed width format. (default=None)
- Returns:
out – A class containing the verified available files
- Return type:
pysat.Files.from_os : pysat._files.Files
See also
Note
This function is intended to be invoked by pysat and not the end user.
Examples
from pysat.instruments.methods import general as mm_gen fname = 'instrument_{year:04d}{month:02d}{day:02d}_v{version:02}.cdf' supported_tags = {'tag_name': fname} list_files = functools.partial(mm_gen.list_files, supported_tags=supported_tags)
- pysat.instruments.methods.general.load_csv_data(fnames, read_csv_kwargs=None)
Load CSV data from a list of files into a single DataFrame.
- Parameters:
fnames (array-like) – Series, list, or array of filenames
read_csv_kwargs (dict or NoneType) – Dict of kwargs to apply to pds.read_csv. (default=None)
- Returns:
data – Data frame with data from all files in the fnames list
- Return type:
pds.DataFrame
See also
pds.read_csv
- pysat.instruments.methods.general.remove_leading_text(inst, target=None)
Remove leading text on variable names.
- Parameters:
inst (pysat.Instrument) – associated pysat.Instrument object
target (str or list of strings) – Leading string to remove. If none supplied, returns unmodified
Testing
Standard functions for the test instruments.
- pysat.instruments.methods.testing.clean(self, test_clean_kwarg=None)
Pass through when asked to clean a test instrument.
- Parameters:
test_clean_kwarg (any) – Testing keyword. If these keywords contain ‘logger’, ‘warning’, or ‘error’, the message entered as the value to that key will be returned as a logging.WARNING, UserWarning, or ValueError, respectively. If the ‘change’ kwarg is set, the clean level will be changed to the specified value. (default=None)
- pysat.instruments.methods.testing.concat_data(self, new_data, **kwargs)
Concatonate data to self.data for extra time dimensions.
- Parameters:
new_data (xarray.Dataset or list of such objects) – New data objects to be concatonated
**kwargs (dict) – Optional keyword arguments passed to xr.concat
Note
Expects the extra time dimensions to have a variable name that starts with ‘time’, and no other dimensions to have a name that fits this format.
- pysat.instruments.methods.testing.create_files(inst, start, stop, freq='1D', use_doy=True, root_fname='pysat_testing_{year:04d}_{day:03d}.txt', version=False, content=None, timeout=None)
Create a file set using the year and day of year.
- Parameters:
inst (pysat.Instrument) – A test instrument, used to generate file path
start (dt.datetime) – The date for the first file to create
stop (dt.datetime) – The date for the last file to create
freq (str) – Frequency of file output. Codes correspond to pandas.date_range codes (default=’1D’)
use_doy (bool) – If True use Day of Year (doy), if False use day of month and month. (default=True)
root_fname (str) – The format of the file name to create. Supports standard pysat template variables ‘year’, ‘month’, ‘day’, ‘hour’, ‘minute’, ‘second’, ‘version’, ‘revision’, ‘cycle’. (default=’pysat_testing_{year:04d}_{day:03d}.txt’)
version (bool) – If True, iterate over version / revision / cycle. If False, ignore version / revision / cycle. (default=False)
content (str) – Custom text to write to temporary files (default=None)
timeout (float) – Time is seconds to lock the files being created. If None, no timeout is used. (default=None)
Examples
# Commands below create empty files located at `inst.files.data_path`, # one per day, spanning 2008, where `year`, `month`, and `day` # are filled in using the provided template string appropriately. # The produced files are named like: 'pysat_testing_2008_01_01.txt' import datetime as dt inst = pysat.Instrument('pysat', 'testing') root_fname='pysat_testing_{year:04d}_{month:02d}_{day:02d}.txt' create_files(inst, dt.datetime(2008, 1, 1), dt.datetime(2008, 12, 31), root_fname=root_fname, use_doy=False) # The command below uses the default values for `create_files`, which # produces a daily set of files, labeled by year and day of year. # The files are names like: 'pysat_testing_2008_001.txt' create_files(inst, dt.datetime(2008, 1, 1), dt.datetime(2008, 12, 31))
- pysat.instruments.methods.testing.define_period()
Define the default periods for the fake data functions.
- Returns:
def_period – Dictionary of periods to use in test instruments
- Return type:
dict
Note
Local time and longitude slightly out of sync to simulate motion of Earth
- pysat.instruments.methods.testing.define_range()
Define the default ranges for the fake data functions.
- Returns:
def_range – Dictionary of periods to use in test instruments
- Return type:
dict
- pysat.instruments.methods.testing.download(date_array, tag, inst_id, data_path='', user=None, password=None, test_download_kwarg=None)
Pass through when asked to download for a test instrument.
- Parameters:
date_array (array-like) – list of datetimes to download data for. The sequence of dates need not be contiguous.
tag (str) – Tag identifier used for particular dataset. This input is provided by pysat.
inst_id (str) – Instrument ID string identifier used for particular dataset. This input is provided by pysat.
data_path (str) – Path to directory to download data to. (default=’’)
user (string or NoneType) – User string input used for download. Provided by user and passed via pysat. If an account is required for downloads this routine here must error if user not supplied. (default=None)
password (string or NoneType) – Password for data download. (default=None)
test_download_kwarg (any) – Testing keyword (default=None)
- Raises:
ValueError – When user/password are required but not supplied
Warning
When no download support will be provided
Note
This routine is invoked by pysat and is not intended for direct use by the end user.
- pysat.instruments.methods.testing.generate_fake_data(t0, num_array, period=5820, data_range=[0.0, 24.0], cyclic=True)
Generate fake data over a given range.
- Parameters:
t0 (float) – Start time in seconds
num_array (array_like) – Array of time steps from t0. This is the index of the fake data
period (int) – The number of seconds per period. (default = 5820)
data_range (float) – For cyclic functions, the range of data values cycled over one period. Not used for non-cyclic functions. (default = 24.0)
cyclic (bool) – If True, assume that fake data is a cyclic function (ie, longitude, slt) that will reset to data_range[0] once it reaches data_range[1]. If False, continue to monotonically increase
- Returns:
data – Array with fake data
- Return type:
array-like
- pysat.instruments.methods.testing.generate_times(fnames, num, freq='1s', start_time=None)
Construct list of times for simulated instruments.
- Parameters:
fnames (list) – List of filenames.
num (int) – Maximum number of times to generate. Data points will not go beyond the current day.
freq (str) – Frequency of temporal output, compatible with pandas.date_range (default=’1s’)
start_time (dt.timedelta or NoneType) – Offset time of start time in fractional hours since midnight UT. If None, set to 0. (default=None)
- Returns:
uts (array) – Array of integers representing uts for a given day
index (pds.DatetimeIndex) – The DatetimeIndex to be used in the pysat test instrument objects
date (datetime) – The requested date reconstructed from the fake file name
- pysat.instruments.methods.testing.init(self, test_init_kwarg=None)
Initialize the Instrument object with instrument specific values.
Runs once upon instantiation.
Shifts time index of files by 5-minutes if mangle_file_dates set to True at pysat.Instrument instantiation.
Creates a file list for a given range if the file_date_range keyword is set at instantiation.
- Parameters:
test_init_kwarg (any) – Testing keyword (default=None)
- pysat.instruments.methods.testing.initialize_test_meta(epoch_name, data_keys)
Initialize meta data for test instruments.
This routine should be applied to test instruments at the end of the load routine.
- Parameters:
epoch_name (str) – The variable name of the instrument epoch.
data (pds.DataFrame or xr.Dataset) – The dataset keys from the instrument.
- pysat.instruments.methods.testing.list_files(tag='', inst_id='', data_path='', format_str=None, file_date_range=None, test_dates=None, mangle_file_dates=False, test_list_files_kwarg=None)
Produce a fake list of files spanning three years.
- Parameters:
tag (str) – Tag name used to identify particular data set to be loaded. This input is nominally provided by pysat itself. (default=’’)
inst_id (str) – Instrument ID used to identify particular data set to be loaded. This input is nominally provided by pysat itself. (default=’’)
data_path (str) – Path to data directory. This input is nominally provided by pysat itself. (default=’’)
format_str (str or NoneType) –
- File format string. This is passed from the user at pysat.Instrument
instantiation, if provided. (default=None)
file_date_range (pds.date_range) – File date range. The default mode generates a list of 3 years of daily files (1 year back, 2 years forward) based on the test_dates passed through below. Otherwise, accepts a range of files specified by the user. (default=None)
test_dates (dt.datetime or NoneType) – Pass the _test_date object through from the test instrument files
mangle_file_dates (bool) – If True, file dates are shifted by 5 minutes. (default=False)
test_list_files_kwarg (any) – Testing keyword (default=None)
- Return type:
Series of filenames indexed by file time
- pysat.instruments.methods.testing.list_remote_files(tag='', inst_id='', data_path='', format_str=None, start=None, stop=None, test_dates=None, user=None, password=None, mangle_file_dates=False, test_list_remote_kwarg=None)
Produce a fake list of files to simulate new files on a remote server.
Note
List spans three years and one month.
- Parameters:
tag (str) – Tag name used to identify particular data set. This input is nominally provided by pysat itself. (default=’’)
inst_id (str) – Instrument ID used to identify particular data. This input is nominally provided by pysat itself. (default=’’)
data_path (str) – Path to data directory. This input is nominally provided by pysat itself. (default=’’)
format_str (str or NoneType) – file format string (default=None)
start (dt.datetime or NoneType) – Starting time for file list. A None value will start 1 year before test_date (default=None)
stop (dt.datetime or NoneType) – Ending time for the file list. A None value will stop 2 years 1 month after test_date (default=None)
test_dates (dt.datetime or NoneType) – Pass the _test_date object through from the test instrument files
user (str or NoneType) – User string input used for download. Provided by user and passed via pysat. If an account is required for dowloads this routine here must error if user not supplied. (default=None)
password (str or NoneType) – Password for data download. (default=None)
mangle_file_dates (bool) – If True, file dates are shifted by 5 minutes. (default=False)
test_list_remote_kwarg (any) – Testing keyword (default=None)
- Returns:
Filenames indexed by file time, see list_files for more info
- Return type:
pds.Series
- pysat.instruments.methods.testing.non_monotonic_index(index)
Adjust the index to be non-monotonic.
- Parameters:
index (pds.DatetimeIndex) – The index generated in an instrument test file.
- Returns:
new_index – A non-montonic index
- Return type:
pds.DatetimeIndex
- pysat.instruments.methods.testing.non_unique_index(index)
Adjust the index to be non-unique.
- Parameters:
index (pds.DatetimeIndex) – The index generated in an instrument test file.
- Returns:
new_index – A non-unique index
- Return type:
pds.DatetimeIndex
- pysat.instruments.methods.testing.preprocess(self, test_preprocess_kwarg=None)
Perform standard preprocessing.
This routine is automatically applied to the Instrument object on every load by the pysat nanokernel (first in queue). Object modified in place.
- Parameters:
test_preprocess_kwarg (any) – Testing keyword (default=None)
Utilities
The utilites module contains functions used throughout the pysat package. This includes utilities for determining the available Instruments, loading files, et cetera.
Core Utilities
These utilities are available directly from the pysat.utils
module.
- class pysat.utils._core.NetworkLock(*args, **kwargs)
Unit tests for NetworkLock manager.
Initialize lock manager compatible with networked file systems.
- Parameters:
*args (list reference) – References a list of input arguments
**kwargs (dict reference) – References a dict of input keyword argument
Note
See portalocker.utils.Lock for more details (
portalocker.utils.Lock
)Examples
from pysat.utils import NetworkLock with NetworkLock(file_to_be_written, 'w') as locked_file: locked_file.write('content')
- release()
Release the Lock from the file system.
- From portalocker docs:
On some networked filesystems it might be needed to force a os.fsync() before closing the file so it’s actually written before another client reads the file.
- pysat.utils._core.available_instruments(inst_loc=None)
Obtain basic information about instruments in a given subpackage.
- Parameters:
inst_loc (python subpackage or NoneType) – The location of the instrument subpackage (e.g., pysat.instruments) or None to list all registered instruments (default=None)
- Returns:
inst_info – Nested dictionary with ‘platform’, ‘name’, ‘inst_module’, ‘inst_ids_tags’, ‘inst_id’, and ‘tag’ with the tag descriptions given as the value for each unique dictionary combination.
- Return type:
dict
- pysat.utils._core.display_available_instruments(inst_loc=None, show_inst_mod=None, show_platform_name=None)
Display basic information about instruments in a given subpackage.
- Parameters:
inst_loc (python subpackage or NoneType) – The location of the instrument subpackage (e.g., pysat.instruments) or None to list all registered instruments (default=None)
show_inst_mod (boolean or NoneType) – Displays the instrument module if True, does not include it if False, and reverts to standard display based on inst_loc type if None. (default=None)
show_platform_name (boolean or NoneType) – Displays the platform and name if True, does not include it if False, and reverts to standard display based on inst_loc type if None. (default=None)
Note
Prints to standard out, a user-friendly interface for availabe_instruments. Defaults to including the instrument module and not the platform/name values if inst_loc is an instrument module and to including the platform/name values and not the instrument module if inst_loc is None (listing the registered instruments).
- pysat.utils._core.display_instrument_stats(inst_locs=None)
Display supported instrument stats.
- Parameters:
inst_locs (list of packages) – List of instrument library modules to inspect for pysat support. If None, report on default pysat package. (default=None)
- pysat.utils._core.fmt_output_in_cols(out_strs, ncols=3, max_num=6, lpad=None)
Format a string with desired output values in columns.
- Parameters:
out_strs (array-like) – Array like object containing strings to print
ncols (int) – Number of columns to print (default=3)
max_num (int) – Maximum number of out_strs members to print. Best display achieved if this number is divisable by 2 and ncols (default=6)
lpad (int or NoneType) – Left padding or None to use length of longest string + 1 (default=None)
- Returns:
output – String with desired data formatted in columns
- Return type:
str
- pysat.utils._core.generate_instrument_list(inst_loc, user_info=None)
Iterate through and classify instruments in a given subpackage.
- Parameters:
inst_loc (python subpackage) – The location of the instrument subpackage to test, e.g., ‘pysat.instruments’
user_info (dict or NoneType) – Nested dictionary with user and password info for instrument module name. If None, no user or password is assumed. (default=None) EX: user_info = {‘jro_isr’: {‘user’: ‘myname’, ‘password’: ‘email’}}
- Returns:
output – Dictionary with keys ‘names’, ‘download’, ‘no_download’ that contain lists with different information for each key: ‘names’ - list of platform_name combinations ‘download’ - list of dicts containing ‘inst_module’, ‘tag’, and ‘inst_id’ for instruments with download routines ‘load_options’ - list of dicts containing load and download options ‘no_download’ - list of dicts containing ‘inst_module’, ‘tag’, and ‘inst_id’ for instruments without download routines
- Return type:
dict
Note
This routine currently supports classification of instruments for unit tests both in the core package and in seperate instrument packages that use pysat.
- pysat.utils._core.get_mapped_value(value, mapper)
Adjust value using mapping dict or function.
- Parameters:
value (str) – MetaData variable name to be adjusted
mapper (dict or function) – Dictionary with old names as keys and new names as variables or a function to apply to all names
- Returns:
mapped_val – Adjusted MetaData variable name or NoneType if input value should stay the same
- Return type:
str or NoneType
- pysat.utils._core.listify(iterable)
Produce a flattened list of items from input that may not be iterable.
- Parameters:
iterable (iter-like) – An iterable object that will be wrapped within a list
- Returns:
An enclosing 1-D list of iterable if not already a list
- Return type:
list
Note
Does not accept dict_keys or dict_values as input.
- pysat.utils._core.scale_units(out_unit, in_unit)
Determine the scaling factor between two units.
- Parameters:
out_unit (str) – Desired unit after scaling
in_unit (str) – Unit to be scaled
- Returns:
unit_scale – Scaling factor that will convert from in_units to out_units
- Return type:
float
Note
Accepted units include degrees (‘deg’, ‘degree’, ‘degrees’), radians (‘rad’, ‘radian’, ‘radians’), hours (‘h’, ‘hr’, ‘hrs’, ‘hour’, ‘hours’), lengths (‘m’, ‘km’, ‘cm’), volumes (‘m-3’, ‘cm-3’, ‘/cc’, ‘n/cc’, ‘km-3’, ‘m$^{-3}$’, ‘cm$^{-3}$’, ‘km$^{-3}$’), and speeds (‘m/s’, ‘cm/s’, ‘km/s’, ‘m s$^{-1}$’, ‘cm s$^{-1}$’, ‘km s$^{-1}$’, ‘m s-1’, ‘cm s-1’, ‘km s-1’). Can convert between degrees, radians, and hours or different lengths, volumes, or speeds.
Examples
import numpy as np two_pi = 2.0 * np.pi scale = scale_units("deg", "RAD") two_pi *= scale two_pi # will show 360.0
- pysat.utils._core.stringify(strlike)
Convert input into a str type.
- Parameters:
strlike (str or bytes) – Input values in str or byte form
- Returns:
strlike – If input is not string-like then the input type is retained.
- Return type:
str or input type
- pysat.utils._core.update_fill_values(inst, variables=None, new_fill_val=nan)
Update Instrument data so that the fill value is consistent with Meta.
- Parameters:
inst (pysat.Instrument) – Instrument object with data loaded
variables (str, list, or NoneType) – List of variables to update or None to update all (default=None)
new_fill_val (any) – New fill value to use (default=np.nan)
Note
On Windows OS, this function may not work for data variables that are also xarray coordinates.
Coordinates
Coordinate transformation functions for pysat.
- pysat.utils.coords.adjust_cyclic_data(samples, high=6.283185307179586, low=0.0)
Adjust cyclic values such as longitude to a different scale.
- Parameters:
samples (array_like) – Input array
high (float or int) – Upper boundary for circular standard deviation range (default=2 pi)
low (float or int) – Lower boundary for circular standard deviation range (default=0)
axis (int or NoneType) – Axis along which standard deviations are computed. The default is to compute the standard deviation of the flattened array
- Returns:
out_samples – Circular standard deviation
- Return type:
float
- pysat.utils.coords.calc_solar_local_time(inst, lon_name=None, slt_name='slt', apply_modulus=True, ref_date=None)
Append solar local time to an instrument object.
- Parameters:
inst (pysat.Instrument) – Instrument class object to be updated
lon_name (str) – Name of the longtiude data key (assumes data are in degrees)
slt_name (str) – Name of the output solar local time data key (default=’slt’)
apply_modulus (bool) – If True, SLT values are confined to [0, 24), if False they may be positive or negative based on the value of their universal time relative to that of the reference date ref_date. (default=True)
ref_date (dt.datetime or NoneType) – Reference initial date. If None, will use the date found at inst.date. Only valid if apply_modulus is True. (default=None)
Note
Updates Instrument data in column specified by slt_name, as well as Metadata
- pysat.utils.coords.establish_common_coord(coord_vals, common=True)
Create a coordinate array that is appropriate for multiple data sets.
- Parameters:
coord_vals (list-like) – A list of coordinate arrays of the same type: e.g., all geodetic latitude in degrees
common (bool) – True to include locations where all coordinate arrays cover, False to use the maximum location range from the list of coordinates (default=True)
- Returns:
out_coord – An array appropriate for the list of coordinate values
- Return type:
array-like
Note
Assumes that the supplied coordinates are distinct representations of the same value in the same units and range (e.g., longitude in degrees from 0-360).
- pysat.utils.coords.expand_xarray_dims(data_list, meta, dims_equal=False, exclude_dims=None)
Ensure that dimensions do not vary when concatenating data.
- Parameters:
data_list (list-like) – List of xr.Dataset objects with the same dimensions and variables
meta (pysat.Meta) – Metadata for the data in data_list
dims_equal (bool) – Assert that all xr.Dataset objects have the same dimensions if True, the Datasets in data_list may have differing dimensions if False. (default=False)
exclude_dims (list-like or NoneType) – Dimensions to exclude from evaluation or None (default=None)
- Returns:
out_list – List of xr.Dataset objects with the same dimensions and variables, and with dimensions that all have the same values and data padded when needed.
- Return type:
list-like
- pysat.utils.coords.update_longitude(inst, lon_name=None, high=180.0, low=-180.0)
Update longitude to the desired range.
- Parameters:
inst (pysat.Instrument) – Instrument class object to be updated
lon_name (str) – Name of the longtiude data in inst
high (float) – Highest allowed longitude value (default=180.0)
low (float) – Lowest allowed longitude value (default=-180.0)
Note
Updates instrument data in column provided by lon_name
I/O
Input/Output utilities for pysat data.
- pysat.utils.io.add_netcdf4_standards_to_metadict(inst, in_meta_dict, epoch_name, check_type=None, export_nan=None)
Add metadata variables needed to meet SPDF ISTP/IACG NetCDF standards.
- Parameters:
inst (pysat.Instrument) – Object containing data and meta data
in_meta_dict (dict) – Metadata dictionary, can be obtained from inst.meta.to_dict().
epoch_name (str) – Name for epoch or time-index variable.
check_type (NoneType or list) – List of keys associated with meta_dict that should have the same data type as coltype. Passed to pysat.utils.io.filter_netcdf4_metadata. (default=None)
export_nan (NoneType or list) – Metadata parameters allowed to be NaN. Passed along to pysat.utils.io.filter_netcdf4_metadata. (default=None)
- Returns:
in_meta_dict – Input dictionary with additional information for standards.
- Return type:
dict
See also
filter_netcdf4_metadata
Removes unsupported SPDF ISTP/IACG variable metadata.
Note
Removes unsupported SPDF ISTP/IACG variable metadata.
For xarray inputs, converts datetimes to integers representing milliseconds since 1970. This does not include the main index, ‘time’.
- pysat.utils.io.apply_table_translation_from_file(trans_table, meta_dict)
Modify meta_dict by applying trans_table to metadata keys.
- Parameters:
trans_table (dict) – Mapping of metadata label used in a file to new value.
meta_dict (dict) – Dictionary with metadata information from a loaded file.
- Returns:
filt_dict – meta_dict after the mapping in trans_table applied.
- Return type:
dict
Note
The purpose of this function is to maintain default compatibility with meta.labels and existing code that writes and reads netcdf files via pysat while also changing the labels for metadata within the file.
- pysat.utils.io.apply_table_translation_to_file(inst, meta_dict, trans_table=None)
Translate labels in meta_dict using trans_table.
- Parameters:
inst (pysat.Instrument) – Instrument object with data to be written to file.
meta_dict (dict) – Output starting from Instrument.meta.to_dict() supplying attribute data.
trans_table (dict or NoneType) – Keyed by current metalabels containing a list of metadata labels to use within the returned dict. If None, a default translation using self.labels will be used except self.labels.fill_val will be mapped to [‘_FillValue’, ‘FillVal’, ‘fill’].
- Returns:
export_dict – A dictionary of the metadata for each variable of an output file.
- Return type:
dict
- Raises:
ValueError – If there is a duplicated variable label in the translation table
- pysat.utils.io.default_from_netcdf_translation_table(meta)
Create metadata translation table with minimal netCDF requirements.
- Parameters:
meta (pysat.Meta) – Meta instance to get appropriate default values for.
- Returns:
trans_table – Keyed by self.labels with a list of strings to be used when writing netcdf files.
- Return type:
dict
Note
The purpose of this function is to maintain default compatibility with meta.labels and existing code that writes and reads netcdf files via pysat while also changing the labels for metadata within the file.
- pysat.utils.io.default_to_netcdf_translation_table(inst)
Create metadata translation table with minimal netCDF requirements.
- Parameters:
inst (pysat.Instrument) – Instrument object to be written to file.
- Returns:
trans_table – Keyed by self.labels with a list of strings to be used when writing netcdf files.
- Return type:
dict
- pysat.utils.io.filter_netcdf4_metadata(inst, mdata_dict, coltype, remove=False, check_type=None, export_nan=None, varname='')
Filter metadata properties to be consistent with netCDF4.
- Parameters:
inst (pysat.Instrument) – Object containing data and metadata
mdata_dict (dict) – Dictionary equivalent to Meta object info
coltype (type or dtype) – Data type provided by pysat.Instrument._get_data_info. If boolean, int will be used instead.
remove (bool) – Remove metadata that should be the same type as coltype, but isn’t if True. Recast data if False. (default=False)
check_type (list or NoneType) – List of keys associated with meta_dict that should have the same data type as coltype. These will be removed from the filtered output if they differ. If None, this check will not be performed. (default=None)
export_nan (list or NoneType) – Metadata parameters allowed to be NaN. If None, assumes no Metadata parameters are allowed to be Nan. (default=None)
varname (str) – Variable name to be processed. Used for error feedback. (default=’’)
- Returns:
filtered_dict – Modified as needed for netCDf4
- Return type:
dict
Warning
- UserWarning
When data are removed due to conflict between value and type, and removal was not explicitly requested (remove is False).
Note
Metadata values that are NaN and not listed in export_nan are removed.
- pysat.utils.io.inst_to_netcdf(inst, fname, base_instrument=None, epoch_name=None, mode='w', zlib=False, complevel=4, shuffle=True, preserve_meta_case=False, check_type=None, export_nan=None, export_pysat_info=True, unlimited_time=True, meta_translation=None, meta_processor=None)
Store pysat data in a netCDF4 file.
- Parameters:
inst (pysat.Instrument) – Instrument object with loaded data to save
fname (str) – Output filename with full path
base_instrument (pysat.Instrument or NoneType) – Class used as a comparison, only attributes that are present with inst and not on base_instrument are written to netCDF. Using None assigns an unmodified pysat.Instrument object. (default=None)
epoch_name (str or NoneType) – Label in file for datetime index of inst. If None, uses ‘Epoch’ for pandas data formats, and uses ‘time’ for xarray formats.
mode (str) – Write (‘w’) or append (‘a’) mode. If mode=’w’, any existing file at this location will be overwritten. If mode=’a’, existing variables will be overwritten. (default=’w’)
zlib (bool) – Flag for engaging zlib compression, if True compression is used (default=False)
complevel (int) – An integer flag between 1 and 9 describing the level of compression desired. Ignored if zlib=False. (default=4)
shuffle (bool) – The HDF5 shuffle filter will be applied before compressing the data. This significantly improves compression. Ignored if zlib=False. (default=True)
preserve_meta_case (bool) – Flag specifying the case of the meta data variable strings. If True, then the variable strings within the MetaData object (which preserves case) are used to name variables in the written netCDF file. If False, then the variable strings used to access data from the pysat.Instrument object are used instead. (default=False)
check_type (list or NoneType) – List of keys associated with meta_dict that should have the same data type as coltype. These will be removed from the filtered output if they differ. If None, this check will default to include fill, min, and max values. (default=None)
export_nan (list or NoneType) – By default, the metadata variables where a value of NaN is allowed and written to the netCDF4 file is maintained by the Meta object attached to the pysat.Instrument object. A list supplied here will override the settings provided by Meta, and all parameters included will be written to the file. If not listed and a value is NaN then that attribute simply won’t be included in the netCDF4 file. (default=None)
export_pysat_info (bool) – Appends the platform, name, tag, and inst_id to the metadata if True. Otherwise these attributes are lost. (default=True)
unlimited_time (bool) – Flag specifying whether or not the epoch/time dimension should be unlimited; it is when the flag is True. (default=True)
meta_translation (dict or NoneType) – The keys in the input dict are used to map metadata labels for inst to one or more values used when writing the file. E.g., {meta.labels.fill_val: [‘FillVal’, ‘_FillValue’]} would result in both ‘FillVal’ and ‘_FillValue’ being used to store variable fill values in the netCDF file. Overrides use of inst._meta_translation_table.
meta_processor (function or NoneType) – If not None, a dict containing all of the metadata will be passed to meta_processor which should return a processed version of the input dict. If None and inst has a valid inst._export_meta_post_processing function then that function is used for meta_processor. (default=None)
Note
Depending on which kwargs are specified, the input class, inst, will be modified.
Stores 1-D data along dimension ‘Epoch’ - the date time index.
The name of the main variable column is used to prepend subvariable names within netCDF, var_subvar_sub
A netCDF4 dimension is created for each main variable column with higher order data; first dimension Epoch
The index organizing the data stored as a dimension variable and long_name will be set to ‘Epoch’.
from_netcdf uses the variable dimensions to reconstruct data structure
All attributes attached to instrument meta are written to netCDF attrs with the exception of ‘Date_End’, ‘Date_Start’, ‘File’, ‘File_Date’, ‘Generation_Date’, and ‘Logical_File_ID’. These are defined within to_netCDF at the time the file is written, as per the adopted standard, SPDF ISTP/IACG Modified for NetCDF. Atrributes ‘Conventions’ and ‘Text_Supplement’ are given default values if not present.
- pysat.utils.io.load_netcdf(fnames, strict_meta=False, file_format='NETCDF4', epoch_name=None, epoch_unit='ms', epoch_origin='unix', pandas_format=True, decode_timedelta=False, combine_by_coords=True, meta_kwargs=None, meta_processor=None, meta_translation=None, drop_meta_labels=None, decode_times=None, strict_dim_check=True)
Load netCDF-3/4 file produced by pysat.
- Parameters:
fnames (str or array_like) – Filename(s) to load, will fail if None. (default=None)
strict_meta (bool) – Flag that checks if metadata across fnames is the same if True (default=False)
file_format (str) – file_format keyword passed to netCDF4 routine. Expects one of ‘NETCDF3_CLASSIC’, ‘NETCDF3_64BIT’, ‘NETCDF4_CLASSIC’, or ‘NETCDF4’. (default=’NETCDF4’)
epoch_name (str or NoneType) – Data key for epoch variable. The epoch variable is expected to be an array of integer or float values denoting time elapsed from an origin specified by epoch_origin with units specified by epoch_unit. This epoch variable will be converted to a DatetimeIndex for consistency across pysat instruments. If None, then epoch_name set by the load_netcdf_pandas or load_netcdf_xarray as appropriate. (default=None)
epoch_unit (str) – The pandas-defined unit of the epoch variable (‘D’, ‘s’, ‘ms’, ‘us’, ‘ns’). (default=’ms’)
epoch_origin (str or timestamp-convertable) – Origin of epoch calculation, following convention for pandas.to_datetime. Accepts timestamp-convertable objects, as well as two specific strings for commonly used calendars. These conversions are handled by pandas.to_datetime. If ‘unix’ (or POSIX) time; origin is set to 1970-01-01. If ‘julian’, epoch_unit must be ‘D’, and origin is set to beginning of Julian Calendar. Julian day number 0 is assigned to the day starting at noon on January 1, 4713 BC. (default=’unix’)
pandas_format (bool) – Flag specifying if data is stored in a pandas DataFrame (True) or xarray Dataset (False). (default=False)
decode_timedelta (bool) – Used for xarray data (pandas_format is False). If True, variables with unit attributes that are ‘timelike’ (‘hours’, ‘minutes’, etc) are converted to np.timedelta64. (default=False)
combine_by_coords (bool) – Used for xarray data (pandas_format is False) when loading a multi-file dataset. If True, uses xarray.combine_by_coords. If False, uses xarray.combine_nested. (default=True)
meta_kwargs (dict or NoneType) – Dict to specify custom Meta initialization or None to use Meta defaults (default=None)
meta_processor (function or NoneType) – If not None, a dict containing all of the loaded metadata will be passed to meta_processor which should return a filtered version of the input dict. The returned dict is loaded into a pysat.Meta instance and returned as meta. (default=None)
meta_translation (dict or NoneType) – Translation table used to map metadata labels in the file to those used by the returned meta. Keys are labels from file and values are labels in meta. Redundant file labels may be mapped to a single pysat label. If None, will use default_from_netcdf_translation_table. This feature is maintained for file compatibility. To disable all translation, input an empty dict. (default=None)
drop_meta_labels (list or NoneType) – List of variable metadata labels that should be dropped. Applied to metadata as loaded from the file. (default=None)
decode_times (bool or NoneType) – If True, variables with unit attributes that are ‘timelike’ (‘hours’, ‘minutes’, etc) are converted to np.timedelta64 by xarray. If False, then epoch_name will be converted to datetime using epoch_unit and epoch_origin. If None, will be set to False for backwards compatibility. For xarray only. (default=None)
strict_dim_check (bool) – Used for xarray data (pandas_format is False). If True, warn the user that the desired epoch is not present in xarray.dims. If False, no warning is raised. (default=True)
- Returns:
data (pandas.DataFrame or xarray.Dataset) – Class holding file data
meta (pysat.Meta) – Class holding file meta data
- Raises:
KeyError – If epoch/time dimension could not be identified.
ValueError – When attempting to load data with more than 2 dimensions or if strict_meta is True and meta data changes across files.
See also
load_netcdf_pandas
,load_netcdf_xarray
,pandas.to_datetime
- pysat.utils.io.load_netcdf_pandas(fnames, strict_meta=False, file_format='NETCDF4', epoch_name='Epoch', epoch_unit='ms', epoch_origin='unix', meta_kwargs=None, meta_processor=None, meta_translation=None, drop_meta_labels=None)
Load netCDF-3/4 file produced by pysat in a pandas format.
- Parameters:
fnames (str or array_like) – Filename(s) to load
strict_meta (bool) – Flag that checks if metadata across fnames is the same if True (default=False)
file_format (str) – file_format keyword passed to netCDF4 routine. Expects one of ‘NETCDF3_CLASSIC’, ‘NETCDF3_64BIT’, ‘NETCDF4_CLASSIC’, or ‘NETCDF4’. (default=’NETCDF4’)
epoch_name (str or NoneType) – Data key for epoch variable. The epoch variable is expected to be an array of integer or float values denoting time elapsed from an origin specified by epoch_origin with units specified by epoch_unit. This epoch variable will be converted to a DatetimeIndex for consistency across pysat instruments. (default=’Epoch’)
epoch_unit (str) – The pandas-defined unit of the epoch variable (‘D’, ‘s’, ‘ms’, ‘us’, ‘ns’). (default=’ms’)
epoch_origin (str or timestamp-convertable) – Origin of epoch calculation, following convention for pandas.to_datetime. Accepts timestamp-convertable objects, as well as two specific strings for commonly used calendars. These conversions are handled by pandas.to_datetime. If ‘unix’ (or POSIX) time; origin is set to 1970-01-01. If ‘julian’, epoch_unit must be ‘D’, and origin is set to beginning of Julian Calendar. Julian day number 0 is assigned to the day starting at noon on January 1, 4713 BC. (default=’unix’)
meta_kwargs (dict or NoneType) – Dict to specify custom Meta initialization or None to use Meta defaults (default=None)
meta_processor (function or NoneType) – If not None, a dict containing all of the loaded metadata will be passed to meta_processor which should return a filtered version of the input dict. The returned dict is loaded into a pysat.Meta instance and returned as meta. (default=None)
meta_translation (dict or NoneType) – Translation table used to map metadata labels in the file to those used by the returned meta. Keys are labels from file and values are labels in meta. Redundant file labels may be mapped to a single pysat label. If None, will use default_from_netcdf_translation_table. This feature is maintained for file compatibility. To disable all translation, input an empty dict. (default=None)
drop_meta_labels (list or NoneType) – List of variable metadata labels that should be dropped. Applied to metadata as loaded from the file. (default=None)
- Returns:
data (pandas.DataFrame) – Class holding file data
meta (pysat.Meta) – Class holding file meta data
- Raises:
KeyError – If epoch/time dimension could not be identified.
ValueError – When attempting to load data with more than 2 dimensions, or if strict_meta is True and meta data changes across files, or if epoch/time dimension could not be identified.
See also
- pysat.utils.io.load_netcdf_xarray(fnames, strict_meta=False, file_format='NETCDF4', epoch_name='time', epoch_unit='ms', epoch_origin='unix', decode_timedelta=False, combine_by_coords=True, meta_kwargs=None, meta_processor=None, meta_translation=None, drop_meta_labels=None, decode_times=False, strict_dim_check=True)
Load netCDF-3/4 file produced by pysat into an xarray Dataset.
- Parameters:
fnames (str or array_like) – Filename(s) to load.
strict_meta (bool) – Flag that checks if metadata across fnames is the same if True. (default=False)
file_format (str or NoneType) – file_format keyword passed to netCDF4 routine. Expects one of ‘NETCDF3_CLASSIC’, ‘NETCDF3_64BIT’, ‘NETCDF4_CLASSIC’, or ‘NETCDF4’. (default=’NETCDF4’)
epoch_name (str or NoneType) – Data key for epoch variable. The epoch variable is expected to be an array of integer or float values denoting time elapsed from an origin specified by epoch_origin with units specified by epoch_unit. This epoch variable will be converted to a DatetimeIndex for consistency across pysat instruments. (default=’time’)
epoch_unit (str) – The pandas-defined unit of the epoch variable (‘D’, ‘s’, ‘ms’, ‘us’, ‘ns’). (default=’ms’)
epoch_origin (str or timestamp-convertable) – Origin of epoch calculation, following convention for pandas.to_datetime. Accepts timestamp-convertable objects, as well as two specific strings for commonly used calendars. These conversions are handled by pandas.to_datetime. If ‘unix’ (or POSIX) time; origin is set to 1970-01-01. If ‘julian’, epoch_unit must be ‘D’, and origin is set to beginning of Julian Calendar. Julian day number 0 is assigned to the day starting at noon on January 1, 4713 BC. (default=’unix’)
decode_timedelta (bool) – If True, variables with unit attributes that are ‘timelike’ (‘hours’, ‘minutes’, etc) are converted to np.timedelta64. (default=False)
combine_by_coords (bool) – Used for xarray data (pandas_format is False) when loading a multi-file dataset. If True, uses xarray.combine_by_coords. If False, uses xarray.combine_nested. (default=True)
meta_kwargs (dict or NoneType) – Dict to specify custom Meta initialization or None to use Meta defaults (default=None)
meta_processor (function or NoneType) – If not None, a dict containing all of the loaded metadata will be passed to meta_processor which should return a filtered version of the input dict. The returned dict is loaded into a pysat.Meta instance and returned as meta. (default=None)
meta_translation (dict or NoneType) – Translation table used to map metadata labels in the file to those used by the returned meta. Keys are labels from file and values are labels in meta. Redundant file labels may be mapped to a single pysat label. If None, will use default_from_netcdf_translation_table. This feature is maintained for compatibility. To disable all translation, input an empty dict. (default=None)
drop_meta_labels (list or NoneType) – List of variable metadata labels that should be dropped. Applied to metadata as loaded from the file. (default=None)
decode_times (bool or NoneType) – If True, variables with unit attributes that are ‘timelike’ (‘hours’, ‘minutes’, etc) are converted to np.timedelta64 by xarray. If False, then epoch_name will be converted to datetime using epoch_unit and epoch_origin. If None, will be set to False for backwards compatibility. (default=None)
strict_dim_check (bool) – Used for xarray data (pandas_format is False). If True, warn the user that the desired epoch is not present in xarray.dims. If False, no warning is raised. (default=True)
- Returns:
data (xarray.Dataset) – Class holding file data
meta (pysat.Meta) – Class holding file meta data
See also
- pysat.utils.io.meta_array_expander(meta_dict)
Expand meta arrays by storing each element with new incremented label.
If meta_dict[variable][‘label’] = [ item1, item2, …, itemn] then the returned dict will contain: meta_dict[variable][‘label0’] = item1, meta_dict[variable][‘label1’] = item2, and so on up to meta_dict[variable][‘labeln-1’] = itemn.
- Parameters:
meta_dict (dict) – Keyed by variable name with a dict as a value. Each variable dict is keyed by metadata name and the value is the metadata.
- Returns:
meta_dict – Input dict with expanded array elements.
- Return type:
dict
Note
pysat.Meta can not take array-like or list-like data.
- pysat.utils.io.pysat_meta_to_xarray_attr(xr_data, pysat_meta, epoch_name)
Attach pysat metadata to xarray Dataset as attributes.
- Parameters:
xr_data (xarray.Dataset) – Xarray Dataset whose attributes will be updated.
pysat_meta (dict) – Output starting from Instrument.meta.to_dict() supplying attribute data.
epoch_name (str) – Label for datetime index information.
- pysat.utils.io.remove_netcdf4_standards_from_meta(mdict, epoch_name, labels)
Remove metadata from loaded file using SPDF ISTP/IACG NetCDF standards.
- Parameters:
mdict (dict) – Contains all of the loaded file’s metadata.
epoch_name (str) – Name for epoch or time-index variable. Use ‘’ if no epoch variable.
labels (Meta.labels) – Meta.labels instance.
- Returns:
mdict – File metadata with unnecessary netCDF4 SPDF information removed.
- Return type:
dict
See also
add_netcdf4_standards_to_metadict
Adds SPDF ISTP/IACG netCDF4 metadata.
Note
Removes metadata for epoch_name. Also removes metadata such as ‘Depend_*’, ‘Display_Type’, ‘Var_Type’, ‘Format’, ‘Time_Scale’, ‘MonoTon’, ‘calendar’, and ‘Time_Base’.
- pysat.utils.io.return_epoch_metadata(inst, epoch_name)
Create epoch or time-index metadata.
- Parameters:
inst (pysat.Instrument) – Instrument object with data and metadata.
epoch_name (str) – Data key for time-index or epoch data.
- Returns:
meta_dict – Dictionary with epoch metadata, keyed by metadata label.
- Return type:
dict
- pysat.utils.io.xarray_all_vars(data)
Extract all variable names, including dimensions and coordinates.
- Parameters:
data (xarray.Dataset) – Dataset to get all variables from.
- Returns:
all_vars – List of all data.data_vars, data.dims, and data.coords.
- Return type:
list
- pysat.utils.io.xarray_vars_no_time(data, time_label='time')
Extract all DataSet variables except time_label dimension.
- Parameters:
data (xarray.Dataset) – Dataset to get variables from.
time_label (str) – Label used within data for time information.
- Returns:
vars – All variables, dimensions, and coordinates, except for time_label.
- Return type:
list
- Raises:
ValueError – If time_label not present in data.
Files
Utilities for file management and parsing file names.
- pysat.utils.files.check_and_make_path(path, expand_path=False)
Check if path exists and create it if needed.
- Parameters:
path (str) – String specifying a directory path without any file names. All directories needed to create the full path will be created.
expand_path (bool) – If True, input path will be processed through os.path.expanduser (accounting for ~ and ~user constructs, if $HOME and user are known) and os.path.expandvars (accounting for environment variables)
- Returns:
made_dir – True, if new directory made. False, if path already existed.
- Return type:
bool
- Raises:
ValueError – If an invalid path is supplied.
RuntimeError – If the input path and internally constructed paths differ.
See also
os.path.expanduser
,os.path.expandvars
- pysat.utils.files.construct_searchstring_from_format(format_str, wildcard=False)
Parse format file string and returns string formatted for searching.
Each variable in the string template is replaced with an appropriate number of ‘?’ based upon the provided length of the data.
- Parameters:
format_str (str) – Provides the naming pattern of the instrument files and the locations of date information so an ordered list may be produced. For example, instrument_{year:04d}{month:02d}{day:02d}_v{version:02d}.cdf
wildcard (bool) – If True, replaces each ‘?’ sequence that would normally be returned with a single ‘*’. (default=False)
- Returns:
out_dict – An output dict with the following keys: - ‘search_string’ (format_str with data to be parsed replaced with ?) - ‘keys’ (keys for data to be parsed) - ‘type’ (type of data expected for each key to be parsed) - ‘lengths’ (string length for data to be parsed) - ‘string_blocks’ (the filenames are broken into fixed width segments).
- Return type:
dict
- Raises:
ValueError – If a filename template isn’t provided in format_str
Note
The ‘?’ may be used to indicate a set number of spaces for a variable part of the name that need not be extracted. cnofs_cindi_ivm_500ms_{year:4d}{month:02d}{day:02d}_v??.cdf
A standards compliant filename can be constructed by adding the first element from string_blocks, then the first item in keys, and iterating that alternating pattern until all items are used.
This is the first function employed by pysat.Files.from_os.
If no type is supplied for datetime parameters, int will be used.
- pysat.utils.files.get_file_information(paths, root_dir='')
Retrieve system statistics for the input path(s).
- Parameters:
paths (str or list) – Full pathnames of files to get attribute information.
root_dir (str) – Common root path shared by all paths, if any. (default=’’)
- Returns:
file_info – Keyed by file attribute, which uses names that mirror or are expanded upon those used by os.stat. Each attribute maps to a list of values for each file in paths.
- Return type:
dict
See also
os.stat
- pysat.utils.files.parse_delimited_filenames(files, format_str, delimiter)
Extract specified info from a list of files using a delimiter.
Will parse file using delimiter though the function does not require every parsed item to be a variable, and more than one variable may be within a parsed section. Thus, the main practical difference with parse_fixed_width_filenames is more support for the use of the wildcard ‘*’ within format_str. Overuse of the ‘*’ wildcard increases the probability of false positive matches if there are multiple instrument files in the directory.
- Parameters:
files (list) – List of files, typically provided by pysat.utils.files.search_local_system_formatted_filename.
format_str (str) – Provides the naming pattern of the instrument files and the locations of date information so an ordered list may be produced. Supports all provided string formatting codes though only ‘year’, ‘month’, ‘day’, ‘hour’, ‘minute’, ‘second’, ‘version’, ‘revision’, and ‘cycle’ will be used for time and sorting information. For example, *_{year:4d}_{month:02d}_{day:02d}_*_v{version:02d}_*.cdf
delimiter (str) – Delimiter string upon which files will be split (e.g., ‘.’)
- Returns:
stored – Information parsed from filenames that includes: ‘year’, ‘month’, ‘day’, ‘hour’, ‘minute’, ‘second’, ‘version’, ‘revision’, and ‘cycle’, as well as any other user provided template variables. Also includes files, an input list of files, and format_str.
- Return type:
collections.OrderedDict
Note
The ‘*’ wildcard is supported when leading, trailing, or wholly contained between delimiters, such as ‘data_name-{year:04d}--{day:02d}.txt’, or ‘-{year:04d}*--{day:02d}’, where ‘-’ is the delimiter. There can not be a mixture of a template variable and ‘*’ without a delimiter in between, unless the ‘*’ occurs after the variables. The ‘*’ should not be used to replace the delimited character in the filename.
- pysat.utils.files.parse_fixed_width_filenames(files, format_str)
Extract specified info from a list of files with a fixed name width.
- Parameters:
files (list) – List of files, typically provided by pysat.utils.files.search_local_system_formatted_filename.
format_str (str) – Provides the naming pattern of the instrument files and the locations of date information so an ordered list may be produced. Supports all provided string formatting codes though only ‘year’, ‘month’, ‘day’, ‘hour’, ‘minute’, ‘second’, ‘version’, ‘revision’, and ‘cycle’ will be used for time and sorting information. For example, instrument-{year:4d}_{month:02d}-{day:02d}_v{version:02d}.cdf, or *-{year:4d}_{month:02d}hithere{day:02d}_v{version:02d}.cdf
- Returns:
stored – Information parsed from filenames that includes: ‘year’, ‘month’, ‘day’, ‘hour’, ‘minute’, ‘second’, ‘version’, ‘revision’, and ‘cycle’, as well as any other user provided template variables. Also includes files, an input list of files, and format_str.
- Return type:
collections.OrderedDict
Note
The function uses the lengths of the fixed characters within format_str, as well as the supplied lengths for template variables, to determine where to parse out information. Thus, support for the wildcard ‘*’ is limited to locations before the first template variable.
- pysat.utils.files.process_parsed_filenames(stored, two_digit_year_break=None)
Create a Files pandas Series of filenames from a formatted dict.
- Parameters:
stored (collections.orderedDict) – Ordered dictionary produced by parse_fixed_width_filenames or parse_delimited_filenames, containing date, time, version, and other information extracted from the filenames.
two_digit_year_break (int or NoneType) – If filenames only store two digits for the year, then ‘1900’ will be added for years >= two_digit_year_break and ‘2000’ will be added for years < two_digit_year_break. If None, then four-digit years are assumed. (default=None)
- Returns:
Series, indexed by datetime, with file strings
- Return type:
pds.Series
Note
If two files have the same date and time information in the filename then the file with the higher version/revision/cycle is used. Series returned only has one file per datetime. Version is required for this filtering, revision and cycle are optional.
- pysat.utils.files.search_local_system_formatted_filename(data_path, search_str)
Parse format file string and returns string formatted for searching.
- Parameters:
data_path (str) – Top level directory to search files for. This directory is provided by pysat to the instrument_module.list_files functions as data_path.
search_str (str) – String used to search for local files. For example, cnofs_cindi_ivm_500ms_????????_v??.cdf or inst-name-*-v??.cdf Typically this input is provided by files.construct_searchstring_from_format.
- Returns:
files – list of files matching the specified file format
- Return type:
list
Note
The use of ?s (1 ? per character) rather than the full wildcard * provides a more specific filename search string that limits the false positive rate.
- pysat.utils.files.update_data_directory_structure(new_template, test_run=True, full_breakdown=False, remove_empty_dirs=False)
Update pysat data directory structure to match supplied template.
Translates all of pysat’s managed science files to a new directory structure. By default, pysat uses the template string stored in pysat.params[‘directory_format’] to organize files. This method makes it possible to transition an existing pysat installation so it works with the supplied new template.
- Parameters:
new_template (str) –
- New directory template string. The default value for pysat is
os.path.join((‘{platform}’, ‘{name}’, ‘{tag}’, ‘{inst_id}’))
test_run (bool) – If True, a printout of all proposed changes will be made, but the directory changes will not be enacted. (default=True)
full_breakdown (bool) – If True, a full path for every file is printed to terminal. (default=False)
remove_empty_dirs (bool) – If True, all directories that had pysat.Instrument data moved to another location and are now empty are deleted. Traverses the directory chain up to the top-level directories in pysat.params[‘data_dirs’]. (default=False)
Note
After updating the data directory structures users should nominally assign new_template as the directory format via
pysat.params['directory_format'] = new_template
Registry
pysat user module registry utilities.
This module allows pysat to provide direct access to external or custom instrument modules by maintaining information about these instrument modules.
Examples
Instrument support modules must be registered before use. This may be done individually or for a collection of Instruments at once. For example, assume there is an implementation for myInstrument in the module my.package.myInstrument with platform and name attributes ‘myplatform’ and ‘myname’. Such an instrument may be registered with
registry.register(['my.package.myInstrument'])
The full module name “my.package.myInstrument” will be registered in pysat.params[‘user_modules’] and stored as a dict of dicts keyed by platform and name.
Once registered, subsequent calls to Instrument may use the platform and name string identifiers.
Instrument('myplatform', 'myname')
A full suite of instrument support modules may be registered at once using
# General form where my.package contains a collection of
# submodules to support Instrument data sets.
registry.register_by_module(my.package)
# Register published packages from pysat team
import pysatSpaceWeather
registry.register_by_module(pysatSpaceWeather.instruments)
import pysatNASA
registry.register_by_module(pysatNASA.instruments)
import pysatModels
registry.register_by_module(pysatModels.models)
- pysat.utils.registry.load_saved_modules()
Load registered pysat.Instrument modules.
- Returns:
instrument module strings are keyed by platform then name
- Return type:
dict of dicts
- pysat.utils.registry.register(module_names, overwrite=False)
Register a user pysat.Instrument module by name.
Enables instantiation of a third-party Instrument module using
inst = pysat.Instrument(platform, name, tag=tag, inst_id=inst_id)
- Parameters:
module_names (list-like of str) – specify package name and instrument modules
overwrite (bool) – If True, an existing registration will be updated with the new module information. (default=False)
- Raises:
ValueError – If a new module is input with a platform and name that is already associated with a registered module and the overwrite flag is set to False.
Warning
Registering a module that contains code other than pysat instrument files could result in unexpected consequences.
Note
Modules should be importable using
from my.package.name import my_instrument
Module names do not have to follow the pysat platform_name naming convection.
Current registered modules bay be found at
pysat.params['user_modules']
which is stored as a dict of dicts keyed by platform and name.
Examples
from pysat import Instrument from pysat.utils import registry registry.register(['my.package.name.myInstrument']) testInst = Instrument(platform, name)
- pysat.utils.registry.register_by_module(module, overwrite=False)
Register all sub-modules attached to input module.
Enables instantiation of a third-party Instrument module using
inst = pysat.Instrument(platform, name)
- Parameters:
module (Python module) – Module with one or more pysat.Instrument support modules attached as sub-modules to the input module
overwrite (bool) – If True, an existing registration will be updated with the new module information. (default=False)
- Raises:
ValueError – If platform and name associated with a module are already registered
Note
Gets a list of sub-modules by using the __all__ attribute, defined in the module’s __init__.py
Examples
import pysat import pysatModels pysat.utils.registry.register_by_module(pysatModels.models)
- pysat.utils.registry.remove(platforms, names)
Remove module from registered user modules.
- Parameters:
platforms (list-like of str) – Platform identifiers to remove
names (list-like of str) – Name identifiers, paired with platforms, to remove. If the names element paired with the platform element is None, then all instruments under the specified platform will be removed. Should be the same type as platforms.
- Raises:
ValueError – If platform and/or name are not currently registered
Note
Current registered user modules available at pysat.params[‘user_modules’]
Examples
platforms = ['platform1', 'platform2'] names = ['name1', 'name2'] # remove all instruments with platform=='platform1' registry.remove(['platform1'], [None]) # remove all instruments with platform 'platform1' or 'platform2' registry.remove(platforms, [None, None]) # remove all instruments with 'platform1', and individual instrument # for 'platform2', 'name2' registry.remove(platforms, [None, 'name2'] # remove 'platform1', 'name1', as well as 'platform2', 'name2' registry.remove(platforms, names)
- pysat.utils.registry.store()
Store current registry onto disk.
Time
Date and time handling utilities.
- pysat.utils.time.calc_freq(index)
Determine the frequency for a time index.
- Parameters:
index (array-like) – Datetime list, array, or Index
- Returns:
freq – Frequency string as described in Pandas Offset Aliases
- Return type:
str
Note
Calculates the minimum time difference and sets that as the frequency.
To reduce the amount of calculations done, the returned frequency is either in seconds (if no sub-second resolution is found) or nanoseconds.
See also
pds.offsets.DateOffset
- pysat.utils.time.calc_res(index, use_mean=False)
Determine the resolution for a time index.
- Parameters:
index (array-like) – Datetime list, array, or Index
use_mean (bool) – Use the minimum time difference if False, use the mean time difference if True (default=False)
- Returns:
res_sec – Resolution value in seconds
- Return type:
float
- Raises:
ValueError – If index is too short to calculate a time resolution
- pysat.utils.time.create_date_range(start, stop, freq='D')
Create array of datetime objects using input freq from start to stop.
- Parameters:
start (dt.datetime or list-like of dt.datetime) – The beginning of the date range. Supports list, tuple, or ndarray of start dates.
stop (dt.datetime or list-like of dt.datetime) – The end of the date range. Supports list, tuple, or ndarray of stop dates.
freq (str) – The frequency of the desired output. Codes correspond to pandas date_range codes: ‘D’ daily, ‘M’ monthly, ‘s’ secondly
- Returns:
season – Range of dates over desired time with desired frequency.
- Return type:
pds.date_range
- pysat.utils.time.create_datetime_index(year=None, month=None, day=None, uts=None)
Create a timeseries index using supplied date and time.
- Parameters:
year (array_like or NoneType) – Array of year values as np.int (default=None)
month (array_like or NoneType) – Array of month values as np.int. Leave None if using day for day of year. (default=None)
day (array_like or NoneType) – Array of number of days as np.int. If month=None then value interpreted as day of year, otherwise, day of month. (default=None)
uts (array-like or NoneType) – Array of UT seconds as np.float64 values (default=None)
- Return type:
Pandas timeseries index.
Note
Leap seconds have no meaning here.
- pysat.utils.time.datetime_to_dec_year(dtime)
Convert datetime timestamp to a decimal year.
- Parameters:
dtime (dt.datetime) – Datetime timestamp
- Returns:
year – Year with decimal containing time increments of less than a year
- Return type:
float
- pysat.utils.time.filter_datetime_input(date)
Create a datetime object that only includes year, month, and day.
- Parameters:
date (NoneType, array-like, or datetime) – Single or sequence of datetime inputs
- Returns:
out_date – NoneType input yeilds NoneType output, array-like yeilds list of datetimes, datetime object yeilds like. All datetime output excludes the sub-daily temporal increments (keeps only date information).
- Return type:
NoneType, datetime, or array-like
Note
Checks for timezone information not in UTC
- pysat.utils.time.freq_to_res(freq)
Convert a frequency string to a resolution value in seconds.
- Parameters:
freq (str) – Frequency string as described in Pandas Offset Aliases
- Returns:
res_sec – Resolution value in seconds
- Return type:
np.float64
See also
pds.offsets.DateOffset
- pysat.utils.time.getyrdoy(date)
Return a tuple of year, day of year for a supplied datetime object.
- Parameters:
date (datetime.datetime) – Datetime object
- Returns:
year (int) – Integer year
doy (int) – Integer day of year
- Raises:
AttributeError – If input date does not have toordinal method
- pysat.utils.time.parse_date(str_yr, str_mo, str_day, str_hr='0', str_min='0', str_sec='0', century=2000)
Convert string dates to dt.datetime.
- Parameters:
str_yr (str) – String containing the year (2 or 4 digits)
str_mo (str) – String containing month digits
str_day (str) – String containing day of month digits
str_hr (str) – String containing the hour of day (default=’0’)
str_min (str) – String containing the minutes of hour (default=’0’)
str_sec (str) – String containing the seconds of minute (default=’0’)
century (int) – Century, only used if str_yr is a 2-digit year (default=2000)
- Returns:
out_date – datetime object
- Return type:
dt.datetime
- Raises:
ValueError – If any input results in an unrealistic datetime object value
- pysat.utils.time.today()
Obtain today’s date (UTC), with no hour, minute, second, etc.
- Returns:
today_utc – Today’s date in UTC
- Return type:
datetime
Testing
Utilities to perform common evaluations.
- pysat.utils.testing.assert_hasattr(obj, attr_name)
Provide useful info if object is missing a required attribute.
- Parameters:
obj (object) – Name of object to check
attr_name (str) – Name of required attribute that must be present in obj
- Raises:
AssertionError – If obj does not have attribute attr_name
- pysat.utils.testing.assert_isinstance(obj, obj_type)
Provide useful info if object is the wrong type.
- Parameters:
obj (object) – Name of object to check
obj_type (str) – Required type of object
- Raises:
AssertionError – If obj is not type obj_type
- pysat.utils.testing.assert_list_contains(small_list, big_list, test_nan=False, test_case=True)
Assert all elements of one list exist within the other list.
- Parameters:
small_list (list) – List whose values must all be present within big_list
big_list (list) – List that must contain all the values in small_list
test_nan (bool) – Test the lists for the presence of NaN values
test_case (bool) – Requires strings to be the same case when testing
- Raises:
AssertionError – If a small_list value is missing from big_list
- pysat.utils.testing.assert_lists_equal(list1, list2, test_nan=False, test_case=True)
Assert that the lists contain the same elements.
- Parameters:
list1 (list) – Input list one
list2 (list) – Input list two
test_nan (bool) – Test the lists for the presence of NaN values
test_case (bool) – Requires strings to be the same case when testing
- Raises:
AssertionError – If a list1 value is missing from list2 or list lengths are unequal
Note
This test does not require that the lists have the same elements in the same order, and so is also a good test for keys.
- pysat.utils.testing.eval_bad_input(func, error, err_msg, input_args=None, input_kwargs=None)
Evaluate bad function or method input.
- Parameters:
func (function, method, or class) – Function, class, or method to be evaluated
error (class) – Expected error or exception
err_msg (str) – Expected error message
input_args (list or NoneType) – Input arguments or None for no input arguments (default=None)
input_kwargs (dict or NoneType) – Input keyword arguments or None for no input kwargs (default=None)
- Raises:
AssertionError – If unexpected error message is returned
Exception – If error or exception of unexpected type is returned, it is raised
- pysat.utils.testing.eval_warnings(warns, check_msgs, warn_type=<class 'DeprecationWarning'>)
Evaluate warnings by category and message.
- Parameters:
warns (list) – List of warnings.WarningMessage objects
check_msgs (list) – List of strings containing the expected warning messages
warn_type (type or list-like) – Type or list-like for the warning messages (default=DeprecationWarning)
- Raises:
AssertionError – If warning category doesn’t match type or an expected message is missing
- pysat.utils.testing.nan_equal(value1, value2)
Determine if values are equal or are both NaN.
- Parameters:
value1 (scalar-like) – Value of any type that can be compared without iterating
value2 (scalar-like) – Another value of any type that can be compared without iterating
- Returns:
is_equal – True if both values are equal or NaN, False if they are not
- Return type:
bool
Instrument Template
Template for a pysat.Instrument support file.
Modify this file as needed when adding a new Instrument to pysat.
This is a good area to introduce the instrument, provide background on the mission, operations, instrumentation, and measurements.
Also a good place to provide contact information. This text will be included in the pysat API documentation.
Properties
- platform
List platform string here
- name
List name string here
- tag
List supported tag strings here
- inst_id
List supported inst_id strings here
Note
Optional section, remove if no notes
Warning
Optional section, remove if no warnings
Two blank lines needed afterward for proper formatting
Examples
Example code can go here
- pysat.instruments.templates.template_instrument.clean(self)
Return platform_name data cleaned to the specified level.
Cleaning level is specified in inst.clean_level and pysat will accept user input for several strings. The clean_level is specified at instantiation of the Instrument object, though it may be updated to a more stringent level and re-applied after instantiation. The clean method is applied after default every time data is loaded.
Note
‘clean’ All parameters are good, suitable for scientific studies
‘dusty’ Most parameters are good, requires instrument familiarity
‘dirty’ There are data areas that have issues, use with caution
‘none’ No cleaning applied, routine not called in this case.
- pysat.instruments.templates.template_instrument.download(date_array, tag, inst_id, data_path=None, user=None, password=None, **kwargs)
Download platform_name data from the remote repository.
This routine is called as needed by pysat. It is not intended for direct user interaction.
- Parameters:
date_array (array-like) – list of datetimes to download data for. The sequence of dates need not be contiguous.
tag (str) – Tag identifier used for particular dataset. This input is provided by pysat. (default=’’)
inst_id (str) – Satellite ID string identifier used for particular dataset. This input is provided by pysat. (default=’’)
data_path (str or NoneType) – Path to directory to download data to. (default=None)
user (str or NoneType (OPTIONAL)) – User string input used for download. Provided by user and passed via pysat. If an account is required for dowloads this routine here must error if user not supplied. (default=None)
password (str or NoneType (OPTIONAL)) – Password for data download. (default=None)
custom_keywords (placeholder (OPTIONAL)) – Additional keywords supplied by user when invoking the download routine attached to a pysat.Instrument object are passed to this routine. Use of custom keywords here is discouraged.
- pysat.instruments.templates.template_instrument.init(self)
Initialize the Instrument object with instrument specific values.
Runs once upon instantiation. Object modified in place. Use this to set the acknowledgements and references.
- pysat.instruments.templates.template_instrument.list_files(tag='', inst_id='', data_path='', format_str=None)
Produce a list of files corresponding to PLATFORM/NAME.
This routine is invoked by pysat and is not intended for direct use by the end user. Arguments are provided by pysat.
- Parameters:
tag (str) – Tag name used to identify particular data set to be loaded. This input is nominally provided by pysat itself. (default=’’)
inst_id (str) – Instrument ID used to identify particular data set to be loaded. This input is nominally provided by pysat itself. (default=’’)
data_path (str) – Full path to directory containing files to be loaded. This is provided by pysat. The user may specify their own data path at Instrument instantiation and it will appear here. (default=’’)
format_str (str) – String template used to parse the datasets filenames. If a user supplies a template string at Instrument instantiation then it will appear here, otherwise defaults to None. (default=None)
- Returns:
Series of filename strings, including the path, indexed by datetime.
- Return type:
pandas.Series
Examples
If a filename is SPORT_L2_IVM_2019-01-01_v01r0000.NC then the template is 'SPORT_L2_IVM_{year:04d}-{month:02d}-{day:02d}_' + 'v{version:02d}r{revision:04d}.NC'
Note
The returned Series should not have any duplicate datetimes. If there are multiple versions of a file the most recent version should be kept and the rest discarded. This routine uses the pysat.Files.from_os constructor, thus the returned files are up to pysat specifications.
Multiple data levels may be supported via the ‘tag’ input string. Multiple instruments via the inst_id string.
- pysat.instruments.templates.template_instrument.list_remote_files(tag, inst_id, user=None, password=None)
Return a Pandas Series of every file for chosen remote data.
This routine is intended to be used by pysat instrument modules supporting a particular NASA CDAWeb dataset.
- Parameters:
tag (str) – Denotes type of file to load. Accepted types are <tag strings>.
inst_id (str) – Specifies the satellite or instrument ID. Accepted types are <inst_id strings>.
user (str or NoneType) – Username to be passed along to resource with relevant data. (default=None)
password (str or NoneType) – User password to be passed along to resource with relevant data. (default=None)
Note
If defined, the expected return variable is a pandas.Series formatted for the Files class (pysat._files.Files) containing filenames and indexed by date and time
- pysat.instruments.templates.template_instrument.load(fnames, tag='', inst_id='', custom_keyword=None)
Load platform_name data and meta data.
This routine is called as needed by pysat. It is not intended for direct user interaction.
- Parameters:
fnames (array-like) – iterable of filename strings, full path, to data files to be loaded. This input is nominally provided by pysat itself.
tag (str) – tag name used to identify particular data set to be loaded. This input is nominally provided by pysat itself. While tag defaults to None here, pysat provides ‘’ as the default tag unless specified by user at Instrument instantiation. (default=’’)
inst_id (str) – Satellite ID used to identify particular data set to be loaded. This input is nominally provided by pysat itself. (default=’’)
custom_keyword (type to be set) – Developers may include any custom keywords, with default values defined in the method signature. This is included here as a place holder and should be removed.
- Returns:
data (pds.DataFrame or xr.Dataset) – Data to be assigned to the pysat.Instrument.data object.
mdata (pysat.Meta) – Pysat Meta data for each data variable.
Note
Any additional keyword arguments passed to pysat.Instrument upon instantiation or via load that are defined above will be passed along to this routine.
When using pysat.utils.load_netcdf4 for xarray data, pysat will use decode_timedelta=False to prevent automated conversion of data to np.timedelta64 objects if the units attribute is time-like (‘hours’, ‘minutes’, etc). This can be added as a custom keyword if timedelta conversion is desired.
Examples
inst = pysat.Instrument('ucar', 'tiegcm') inst.load(2019, 1)
- pysat.instruments.templates.template_instrument.preprocess(self)
Perform standard preprocessing.
This routine is automatically applied to the Instrument object on every load by the pysat nanokernel (first in queue). Object modified in place.
General Instruments
The following Instrument modules support I/O and analysis in pysat.
pysat_ndtesting
Produces fake instrument data for testing.
- pysat.instruments.pysat_ndtesting.load(fnames, tag='', inst_id='', sim_multi_file_right=False, sim_multi_file_left=False, root_date=None, non_monotonic_index=False, non_unique_index=False, start_time=None, num_samples=864, sample_rate='100s', test_load_kwarg=None, max_latitude=90.0, num_extra_time_coords=0)
Load the test files.
- Parameters:
fnames (list) – List of filenames.
tag (str) – Tag name used to identify particular data set to be loaded. This input is nominally provided by pysat itself. (default=’’)
inst_id (str) – Instrument ID used to identify particular data set to be loaded. This input is nominally provided by pysat itself. (default=’’)
sim_multi_file_right (bool) – Adjusts date range to be 12 hours in the future or twelve hours beyond root_date. (default=False)
sim_multi_file_left (bool) – Adjusts date range to be 12 hours in the past or twelve hours before root_date. (default=False)
root_date (NoneType) – Optional central date, uses _test_dates if not specified. (default=None)
non_monotonic_index (bool) – If True, time index will be non-monotonic (default=False)
non_unique_index (bool) – If True, time index will be non-unique (default=False)
start_time (dt.timedelta or NoneType) – Offset time of start time since midnight UT. If None, instrument data will begin at midnight. (default=None)
num_samples (int) – Maximum number of times to generate. Data points will not go beyond the current day. (default=864)
sample_rate (str) – Frequency of data points, using pandas conventions. (default=’100s’)
test_load_kwarg (any) – Keyword used for pysat unit testing to ensure that functionality for custom keywords defined in instrument support functions is working correctly. (default=None)
max_latitude (float) – Latitude simulated as max_latitude * cos(theta(t))`, where theta is a linear periodic signal bounded by [0, 2 * pi) (default=90.0)
num_extra_time_coords (int) – Number of extra time coordinates to include. (default=0)
- Returns:
data (xr.Dataset) – Testing data
meta (pysat.Meta) – Testing metadata
pysat_netcdf
General Instrument for loading pysat-written netCDF files.
Properties
- platform
‘pysat’, will be updated if file contains a platform attribute
- name
‘netcdf’, will be updated if file contains a name attribute
- tag
‘’, will be updated if file contains a tag attribute
- inst_id
‘’, will be updated if file contains an inst_id attribute
Note
Only tested against pysat created netCDF files
Examples
import pysat
# Load a test Instrument
inst = pysat.Instrument("pysat", "testing")
inst.load(date=inst.inst_module._test_dates[''][''])
# Create a NetCDF file
fname = "test_pysat_file_%Y%j.nc"
inst.to_netcdf4(fname=inst.date.strftime(fname))
# Load the NetCDF file
file_inst = pysat.Instrument(
"pysat", "netcdf", temporary_file_list=True, directory_format="./",
file_format="test_pysat_file_{year:04}{day:03}.nc")
file_inst.load(date=inst.inst_module._test_dates[''][''])
- pysat.instruments.pysat_netcdf.clean(self)
Clean the file data.
- pysat.instruments.pysat_netcdf.download(date_array, tag, inst_id, data_path=None)
Download data from the remote repository; not supported.
- Parameters:
date_array (array-like) – list of datetimes to download data for. The sequence of dates need not be contiguous.
tag (str) – Tag identifier used for particular dataset. This input is provided by pysat. (default=’’)
inst_id (str) – Satellite ID string identifier used for particular dataset. This input is provided by pysat. (default=’’)
data_path (str or NoneType) – Path to directory to download data to. (default=None)
- pysat.instruments.pysat_netcdf.init(self, pandas_format=True)
Initialize the Instrument object with instrument specific values.
- pysat.instruments.pysat_netcdf.load(fnames, tag='', inst_id='', strict_meta=False, file_format='NETCDF4', epoch_name=None, epoch_unit='ms', epoch_origin='unix', pandas_format=True, decode_timedelta=False, meta_kwargs=None, meta_processor=None, meta_translation=None, drop_meta_labels=None, decode_times=None)
Load pysat-created NetCDF data and meta data.
- Parameters:
fnames (array-like) – iterable of filename strings, full path, to data files to be loaded. This input is nominally provided by pysat itself.
tag (str) – Tag name used to identify particular data set to be loaded. This input is nominally provided by pysat itself. (default=’’)
inst_id (str) – Instrument ID used to identify particular data set to be loaded. This input is nominally provided by pysat itself. (default=’’)
strict_meta (bool) – Flag that checks if metadata across fnames is the same if True (default=False)
file_format (str) – file_format keyword passed to netCDF4 routine. Expects one of ‘NETCDF3_CLASSIC’, ‘NETCDF3_64BIT’, ‘NETCDF4_CLASSIC’, or ‘NETCDF4’. (default=’NETCDF4’)
epoch_name (str or NoneType) – Data key for epoch variable. The epoch variable is expected to be an array of integer or float values denoting time elapsed from an origin specified by epoch_origin with units specified by epoch_unit. This epoch variable will be converted to a DatetimeIndex for consistency across pysat instruments. (default=None)
epoch_unit (str) – The pandas-defined unit of the epoch variable (‘D’, ‘s’, ‘ms’, ‘us’, ‘ns’). (default=’ms’)
epoch_origin (str or timestamp-convertable) – Origin of epoch calculation, following convention for pandas.to_datetime. Accepts timestamp-convertable objects, as well as two specific strings for commonly used calendars. These conversions are handled by pandas.to_datetime. If ‘unix’ (or POSIX) time; origin is set to 1970-01-01. If ‘julian’, epoch_unit must be ‘D’, and origin is set to beginning of Julian Calendar. Julian day number 0 is assigned to the day starting at noon on January 1, 4713 BC. (default=’unix’)
pandas_format (bool) – Flag specifying if data is stored in a pandas DataFrame (True) or xarray Dataset (False). (default=False)
decode_timedelta (bool) – Used for xarray data (pandas_format is False). If True, variables with unit attributes that are ‘timelike’ (‘hours’, ‘minutes’, etc) are converted to np.timedelta64. (default=False)
meta_kwargs (dict or NoneType) – Dict to specify custom Meta initialization or None to use Meta defaults (default=None)
meta_processor (function or NoneType) – If not None, a dict containing all of the loaded metadata will be passed to meta_processor which should return a filtered version of the input dict. The returned dict is loaded into a pysat.Meta instance and returned as meta. (default=None)
meta_translation (dict or NoneType) – Translation table used to map metadata labels in the file to those used by the returned meta. Keys are labels from file and values are labels in meta. Redundant file labels may be mapped to a single pysat label. If None, will use default_from_netcdf_translation_table. This feature is maintained for file compatibility. To disable all translation, input an empty dict. (default=None)
drop_meta_labels (list or NoneType) – List of variable metadata labels that should be dropped. Applied to metadata as loaded from the file. (default=None)
decode_times (bool or NoneType) – If True, variables with unit attributes that are ‘timelike’ (‘hours’, ‘minutes’, etc) are converted to np.timedelta64 by xarray. If False, then epoch_name will be converted to datetime using epoch_unit and epoch_origin. If None, will be set to False for backwards compatibility. For xarray only. (default=None)
- Returns:
data (pds.DataFrame or xr.Dataset) – Data to be assigned to the pysat.Instrument.data object.
mdata (pysat.Meta) – Pysat Meta data for each data variable.
- pysat.instruments.pysat_netcdf.preprocess(self)
Extract Instrument attrs from file attrs loaded to Meta.header.
Test Instruments
The following Instrument modules support unit and integration testing for packages that depend on pysat.
pysat_testing
Produces fake instrument data for testing.
- pysat.instruments.pysat_testing.load(fnames, tag='', inst_id='', sim_multi_file_right=False, sim_multi_file_left=False, root_date=None, non_monotonic_index=False, non_unique_index=False, start_time=None, num_samples=86400, test_load_kwarg=None, max_latitude=90.0)
Load the test files.
- Parameters:
fnames (list) – List of filenames.
tag (str) – Tag name used to identify particular data set to be loaded. This input is nominally provided by pysat itself. (default=’’)
inst_id (str) – Instrument ID used to identify particular data set to be loaded. This input is nominally provided by pysat itself. (default=’’)
sim_multi_file_right (bool) – Adjusts date range to be 12 hours in the future or twelve hours beyond root_date. (default=False)
sim_multi_file_left (bool) – Adjusts date range to be 12 hours in the past or twelve hours before root_date. (default=False)
root_date (NoneType) – Optional central date, uses _test_dates if not specified. (default=None)
non_monotonic_index (bool) – If True, time index will be non-monotonic (default=False)
non_unique_index (bool) – If True, time index will be non-unique (default=False)
start_time (dt.timedelta or NoneType) – Offset time of start time since midnight UT. If None, instrument data will begin at midnight. (default=None)
num_samples (int) – Maximum number of times to generate. Data points will not go beyond the current day. (default=86400)
test_load_kwarg (any) – Keyword used for pysat unit testing to ensure that functionality for custom keywords defined in instrument support functions is working correctly. (default=None)
max_latitude (float) – Latitude simulated as max_latitude * cos(theta(t))`, where theta is a linear periodic signal bounded by [0, 2 * pi) (default=90.).
- Returns:
data (pds.DataFrame) – Testing data
meta (pysat.Meta) – Metadata
pysat_testmodel
Produces fake instrument data for testing.
- pysat.instruments.pysat_testmodel.load(fnames, tag='', inst_id='', start_time=None, num_samples=96, test_load_kwarg=None)
Load the test files.
- Parameters:
fnames (list) – List of filenames.
tag (str) – Tag name used to identify particular data set to be loaded. This input is nominally provided by pysat itself. (default=’’)
inst_id (str) – Instrument ID used to identify particular data set to be loaded. This input is nominally provided by pysat itself. (default=’’)
start_time (dt.timedelta or NoneType) – Offset time of start time since midnight UT. If None, instrument data will begin at midnight. (default=None)
num_samples (int) – Maximum number of times to generate. Data points will not go beyond the current day. (default=96)
test_load_kwarg (any) – Keyword used for pysat unit testing to ensure that functionality for custom keywords defined in instrument support functions is working correctly. (default=None)
- Returns:
data (xr.Dataset) – Testing data
meta (pysat.Meta) – Metadata
Test Constellations
The following Constellation modules support unit and integration testing for packages that depend on pysat.
Testing
Create a constellation with 5 testing instruments.
- pysat.constellations.testing.instruments
List of pysat.Instrument objects
- Type:
list
Note
Each instrument has a different sample size to test the common_index
Single Test
Create a constellation with one testing instrument.
- pysat.constellations.single_test.instruments
List of pysat.Instrument objects
- Type:
list
Testing Empty
Create an empty constellation for testing.
- pysat.constellations.testing_empty.instruments
List of pysat.Instrument objects
- Type:
list