Basics¶
The core functionality of pysat is exposed through the pysat.Instrument object. The intent of the Instrument object is to offer a single interface for interacting with science data that is independent of measurement platform. The layer of abstraction presented by the Instrument object allows for things to occur in the background that can make science data analysis simpler and more rigorous.
To begin,
import pysat
The data directory pysat looks in for data (pysat_data_dir) needs to be set upon the first import,
pysat.utils.set_data_dir(path=path_to_existing_directory)
Note
A data directory must be set before any pysat.Instruments may be used or an error will be raised.
Basic Instrument Discovery
Support for each instrument in pysat is enabled by a suite of methods that interact with the particular files for that dataset and supply the data within in a pysat compatible format. A particular data set is identified using up to four parameters
Identifier | Description |
platform | General platform instrument is on |
name | Name of the instrument |
tag | Label for a subset of total data |
sat_id | Label for instrument sub-group |
All supported pysat Instruments for v2.x are stored in the pysat.instruments submodule. A listing of all currently supported instruments is available via help,
help(pysat.instruments)
Each instrument listed will support one or more data sets for analysis. The submodules are named with the convention platform_name. To get a description of an instrument, along with the supported datasets, use help again,
help(pysat.instruments.dmsp_ivm)
Further, the dictionary:
pysat.instruments.dmsp_ivm.tags
is keyed by tag
with a description of each type of data
the tag
parameter selects. The dictionary:
pysat.instruments.dmsp_ivm.sat_ids
indicates which instrument or satellite ids (sat_id
) support which tag.
The combination of tag
and sat_id
select the particular dataset
a pysat.Instrument object will provide and interact with.
Instantiation
To create a pysat.Instrument object, select a platform
, instrument name
,
and potentially a tag
and sat_id
, consistent with
the desired data to be analyzed, from one the supported instruments.
To work with plasma data from the Ion Velocity Meter (IVM) onboard the Defense Meteorological Satellite Program (DMSP) constellation, use:
dmsp = pysat.Instrument(platform='dmsp', name='ivm', tag='utd', sat_id='f12')
Behind the scenes pysat uses a python module named dmsp_ivm that understands how to interact with ‘utd’ data for ‘f12’.
Download
Let’s download some data. DMSP data is hosted by the Madrigal database, a community resource for
geospace data. The proper process for downloading DMSP and other Madrigal data
is built into the open source
tool madrigalWeb, which
is invoked appropriately by pysat within the dmsp_ivm module. To get DMSP
data specifically all we have to do is invoke the .download()
method
attached to the DMSP object. Madrigal requires that users provide their name
and email address as their username and password.
# set user and password for Madrigal
user = 'Firstname+Lastname'
password = 'email@address.com'
# define date range to download data
start = pysat.datetime(2001, 1, 1)
stop = pysat.datetime(2001, 1, 2)
# download data to local system
dmsp.download(start, stop, user=user, password=password)
The data is downloaded to pysat_data_dir/platform/name/tag/, in this case pysat_data_dir/dmsp/ivm/utd/. At the end of the download, pysat will update the list of files associated with DMSP.
Some instruments support an improved download experience that ensures the local system is fully up to date compared to the data source. The command,
dmsp.download_updated_files()
will obtain the full set of files present on the server and compare the version and revision numbers for the server files with those on the local system. Any files missing or out of date on the local system are downloaded from the server. This command downloads, as needed, the entire dataset.
Note
Science data servers may not have the same reliability and bandwidth as commercial providers
Load Data
Data is loaded into a pysat.Instrument object, in this case dmsp, using the
.load
method using year, day of year; date; or filename.
# load by year, day of year
dmsp.load(2001, 1)
# load by datetime
dmsp.load(date=datetime.datetime(2001, 1, 1))
# load by filename
dmsp.load(fname='dms_ut_20010101_12.002.hdf5')
# load by filename
dmsp.load(fname=dmsp.files[0])
# load by filename
dmsp.load(fname=dmsp.files[datetime.datetime(2001, 1, 1)])
When the pysat load routine runs it stores the instrument data into dmsp.data. pysat supports the use of two different data structures, either a pandas DataFrame, a highly capable structure with labeled rows and columns, or an xarray DataSet for data sets with more dimensions. Either way, the full data structure is available at:
# all data
dmsp.data
providing full access to the underlying data library functionality. The
type of data structure is flagged at the instrument level with the attribute
inst.pandas_format
, True if a DataFrame is returned by the corresponding
instrument module load method.
In addition, convenience access to the data is also available at the instrument level.
# Convenience access
dmsp['ti']
# slicing
dmsp[0:10, 'ti']
# slicing by date time
dmsp[start:stop, 'ti']
# Convenience assignment
dmsp['ti'] = new_array
# exploit broadcasting, single value assigned to all times
dmsp['ti'] = single_value
# slicing
dmsp[0:10, 'ti'] = sub_array
# slicing by date time
dmsp[start:stop, 'ti'] = sub_array
See Instrument
for more.
To load data over a season, pysat provides a convenience function that returns an array of dates over a season. The season need not be continuous.
import matplotlib.pyplot as plt
import numpy as np
import pandas
# create empty series to hold result
mean_ti = pandas.Series()
# get list of dates between start and stop
start = dt.datetime(2001, 1, 1)
stop = dt.datetime(2001, 1, 10)
date_array = pysat.utils.time.create_date_range(start, stop)
# iterate over season, calculate the mean Ion Temperature
for date in date_array:
# load data into dmsp.data
dmsp.load(date=date)
# check if data present
if not dmsp.empty:
# isolate data to locations near geomagnetic equator
idx, = np.where((dmsp['mlat'] < 5) & (dmsp['mlat'] > -5))
# downselect data
dmsp.data = dmsp[idx]
# compute mean ion temperature using pandas functions and store
mean_ti[dmsp.date] = dmsp['ti'].abs().mean(skipna=True)
# plot the result using pandas functionality
mean_ti.plot(title='Mean Ion Temperature near Magnetic Equator')
plt.ylabel(dmsp.meta['ti', dmsp.desc_label] + ' (' +
dmsp.meta['ti', dmsp.units_label] + ')')
Note, the numpy.where may be removed using the convenience access to the attached pandas data object.
idx, = np.where((dmsp['mlat'] < 5) & (dmsp['mlat'] > -5))
dmsp.data = dmsp[idx] = dmsp.data.iloc[idx
is equivalent to
dmsp.data = vefi[(dmsp['mlat'] < 5) & (dmsp['mlat'] > -5)]
Clean Data
Before data is available in .data it passes through an instrument specific cleaning routine. The amount of cleaning is set by the clean_level keyword, provided at instantiation. The level defaults to ‘clean’.
dmsp = pysat.Instrument(platform='dmsp', name='ivm', tag='utd', sat_id='f12',
clean_level=None)
dmsp = pysat.Instrument(platform='dmsp', name='ivm', tag='utd', sat_id='f12',
clean_level='clean')
Four levels of cleaning may be specified,
clean_level | Result |
clean | Generally good data |
dusty | Light cleaning, use with care |
dirty | Minimal cleaning, use with caution |
none | No cleaning, use at your own risk |
The user provided cleaning level is stored on the Instrument object at
dmsp.clean_level
. The details of the cleaning will generally vary greatly
between instruments.
Metadata
Metadata is also stored along with the main science data. pysat presumes a minimum default set of metadata that may be arbitrarily expanded. The default parameters are driven by the attributes required by public science data files, like those produced by the Ionospheric Connections Explorer (ICON).
Metadata | Description |
axis | Label for plot axes |
desc | Description of variable |
fill | Fill value for bad data points |
label | Label used for plots |
name | Name of variable, or long_name |
notes | Notes about variable |
min | Maximum valid value |
max | Minimum valid value |
scale | Axis scale, linear or log |
units | Variable units |
# all metadata
dmsp.meta.data
# variable metadata
dmsp.meta['ti']
# units using standard labels
dmsp.meta['ti'].units
# units using general labels
dmsp.meta['ti', dmsp.units_label]
# update units for ti
dmsp.meta['ti'] = {'units':'new_units'}
# update display name, long_name
dmsp.meta['ti'] = {'long_name':'Fancy Name'}
# add new meta data
dmsp.meta['new'] = {dmsp.units_label:'fake',
dmsp.name_label:'Display'}
The string values used within metadata to identify the parameters above
are all attached to the instrument object as dmsp.*_label, or
dmsp.units_label
, dmsp.min_label
, and dmsp.notes_label
, etc.
All variables must have the same metadata parameters. If a new parameter is added for only one data variable, then the remaining data variables will get a null value for that metadata parameter.
Data may be assigned to the instrument, with or without metadata.
# assign data alone
dmsp['new_data'] = new_data
# assign data with metadata
# the data must be keyed under 'data'
# all other dictionary inputs are presumed to be metadata
dmsp['new_data'] = {'data': new_data,
dmsp.units_label: new_unit,
'new_meta_data': new_value}
# alter assigned metadata
dmsp.meta['new_data', 'new_meta_data'] = even_newer_value
The labels used for identifying metadata may be provided by the user at Instrument instantiation and do not need to conform with what is in the file:
dmsp = pysat.Instrument(platform='dmsp', name='ivm', tag='utd', sat_id='f12',
clean_level='dirty', units_label='new_units')
dmsp.load(2001, 1)
dmsp.meta['ti', 'new_units']
dmsp.meta['ti', dmsp.units_label]
While this feature doesn’t require explicit support on the part of an instrument module developer, code that does not use the metadata labels may not always work when a user invokes this functionality.
pysat’s metadata object is case insensitive but case preserving. Thus, if a particular Instrument uses ‘units’ for units metadata, but a separate package that operates via pysat but uses ‘Units’ or even ‘UNITS’, the code will still function:
# the following are all equivalent
dmsp.meta['TI', 'Long_Name']
dmsp.meta['Ti', 'long_Name']
dmsp.meta['ti', 'Long_NAME']
Note
While metadata access is case-insensitive, data access is case-sensitive.