Data Loading

The pysat.Instrument.load() method takes care of a lot of the processing details needed to produce a scientifically useful data set. The image below provides an overview of this process.

../_images/pysat_load_flow_chart.png

Load Data

A single day (or file) may be loaded by the user into a pysat.Instrument object using the Instrument.load() method by specifying a year and day of year, date, or filename.

import pysat
import datetime as dt

# Set user and password for Madrigal
username = 'Firstname+Lastname'
password = 'email@address.com'

# Initialize the instrument, passing the username and password to the
# standard routines that need it
import pysatMadrigal as pysatMad
dmsp = pysat.Instrument(inst_module=pysatMad.instruments.dmsp_ivm, tag='utd',
                        inst_id='f12', user=username, password=password)

# Define date range to download data
start = dt.datetime(2001, 1, 1)
stop = dt.datetime(2001, 1, 2)

# Download data
dmsp.download(start, stop)

# Load by year, day of year
dmsp.load(2001, 1)

# Load by date
dmsp.load(date=start)

# Load by filename from string
dmsp.load(fname='dms_ut_20010101_12.002.hdf5')

# Load by filename in tag
dmsp.load(fname=dmsp.files[0])

# Load by filename in tag and specify date
dmsp.load(fname=dmsp.files[start])

When the pysat.Instrument.load() method runs, it stores the intrument data in the pysat.Instrument.data attribute,

# Display all data
dmsp.data

which maintains full access to the underlying data library functionality.

pysat supports the use of two different data structures. You can either use a pandas DataFrame, a highly capable class with labeled rows and columns, or an xarray DataSet for data sets with more dimensions. The type of data class is flagged using the attribute pysat.Instrument.pandas_format. This is set to True if a pandas.DataFrame is returned by the corresponding Instrument.load() method and False if a xarray.Dataset is returned.

Load Data Range

pysat also supports loading data from a range of files or file dates. Given the potential change in user expectation when supplying a list of filenames to load instead of loading using a range of dates, pysat has adopted a nomenclature to consistently distinguish between inclusive and exclusive bounds. Keywords in pysat with end_* are an exclusive bound, similar to slicing numpy.ndarray objects, while those with stop_* are an inclusive bound. The starting index is always inclusive.

Note

Keywords for date or filename ranges that begin with end are used as an exclusive terminating bound, while keywords that begin with stop are used as an inclusive bound.

Loading a range of data by year and day of year. Termination bounds are exclusive.

# Load by year, day of year from 2001, 1 up to but not including 2001, 3
dmsp.load(2001, 1, end_yr=2001, end_doy=3)

# The following two load commands are equivalent
dmsp.load(2001, 1, end_yr=2001, end_doy2=2)
dmsp.load(2001, 1)

Loading a range of data using datetime.datetime limits. Termination bounds are exclusive.

# Load by datetimes
dmsp.load(date=dt.datetime(2001, 1, 1),
          end_date=dt.datetime(2001, 1, 3))

# The following two load commands are equivalent
dmsp.load(date=dt.datetime(2001, 1, 1),
          end_date=dt.datetime(2001, 1, 2))
dmsp.load(date=dt.datetime(2001, 1, 1))

Loading a range of data using filenames. Termination bounds are inclusive.

# Load a single file
dmsp.load(fname='dms_ut_20010101_12.002.hdf5')

# Load by filename, from fname up to and including stop_fname
dmsp.load(fname='dms_ut_20010101_12.002.hdf5',
          stop_fname='dms_ut_20010102_12.002.hdf5')

# Load by filenames using the DMSP object to get valid filenames
dmsp.load(fname=dmsp.files[0], stop_fname=dmsp.files[1])

# Load by filenames. Includes data from 2001, 1 up to but not
# including 2001, 3
dmsp.load(fname=dmsp.files[dt.datetime(2001, 1, 1)],
          stop_fname=dmsp.files[dt.datetime(2001, 1, 2)])

For small size data sets, such as space weather indices, pysat also supports loading all data at once.

# F10.7 data
import pysatSpaceWeather
f107 = pysat.Instrument(inst_module=pysatSpaceWeather.instruments.sw_f107)

# Load all F10.7 solar flux data, from beginning to end.
f107.load()

Clean Data

Before data is available in pysat.Instrument.data it passes through an instrument specific cleaning routine. The amount of cleaning is set by the clean_level attribute, which may be specified at instantiation. The level defaults to 'clean'.

dmsp = pysat.Instrument(platform='dmsp', name='ivm', tag='utd',
                        inst_id='f12', clean_level=None)
dmsp = pysat.Instrument(platform='dmsp', name='ivm', tag='utd',
                        inst_id='f12', clean_level='clean')

Four levels of cleaning may be specified,

clean_level

Result

clean

Generally good data

dusty

Light cleaning, use with care

dirty

Minimal cleaning, use with caution

none

No cleaning, use at your own risk

The user provided cleaning level is can be retrieved or reset from the attribute Instrument.clean_level. The details of the cleaning will generally vary greatly between instruments. Many instruments provide only two levels of data: clean or none.

By default, pysat is configured to use 'clean' as the default value for clean_level. This setting may be updated using Parameters.