load method takes care of a lot of the processing details needed
to produce a scientifically useful data set. The image below provides an
overview of this process.
A single day (or file) may be loaded by the user into a pysat.Instrument object
.load method by specifying a year and day of year; date; or
import pysat import datetime as dt # Set user and password for Madrigal username = 'Firstname+Lastname' password = 'firstname.lastname@example.org' # Initialize the instrument, passing the username and password to the # standard routines that need it import pysatMadrigal as pysatMad dmsp = pysat.Instrument(inst_module=pysatMad.instruments.dmsp_ivm, tag='utd', inst_id='f12', user=username, password=password) # Define date range to download data start = dt.datetime(2001, 1, 1) stop = dt.datetime(2001, 1, 2) # Download data dmsp.download(start, stop) # Load by year, day of year dmsp.load(2001, 1) # Load by date dmsp.load(date=start) # Load by filename from string dmsp.load(fname='dms_ut_20010101_12.002.hdf5') # Load by filename in tag dmsp.load(fname=dmsp.files) # Load by filename in tag and specify date dmsp.load(fname=dmsp.files[start])
When the pysat load routine runs it stores the instrument data at:
# Display all data dmsp.data
which maintains full access to the underlying data library functionality.
pysat supports the use of two different data structures. You can either use a
a highly capable structure with labeled rows and columns, or
for data sets with more dimensions. The type of data structure is flagged at
the instrument level with the attribute
inst.pandas_format. This is set to
True if a DataFrame is returned by the corresponding instrument module load
Load Data Range¶
pysat also supports loading data from a range of files/file dates. Given the potential change in user expectation when supplying a list of filenames to load vs loading a range of dates, pysat has adopted a nomenclature to consistently distinguish between inclusive and exclusive bounds. Keywords in pysat with end_* are an exclusive bound, similar to slicing numpy arrays, while those with stop_* are an inclusive bound. The starting index is always inclusive.
Keywords for date or filename ranges that begin with end are used as an exclusive terminating bound, while keywords that begin with stop are used as an inclusive bound.
Loading a range of data by year and day of year. Termination bounds are exclusive.
# Load by year, day of year from 2001, 1 up to but not including 2001, 3 dmsp.load(2001, 1, end_yr=2001, end_doy=3) # The following two load commands are equivalent dmsp.load(2001, 1, end_yr=2001, end_doy2=2) dmsp.load(2001, 1)
Loading a range of data using datetimes. Termination bounds are exclusive.
# Load by datetimes dmsp.load(date=dt.datetime(2001, 1, 1), end_date=dt.datetime(2001, 1, 3)) # The following two load commands are equivalent dmsp.load(date=dt.datetime(2001, 1, 1), end_date=dt.datetime(2001, 1, 2)) dmsp.load(date=dt.datetime(2001, 1, 1))
Loading a range of data using filenames. Termination bounds are inclusive.
# Load a single file dmsp.load(fname='dms_ut_20010101_12.002.hdf5') # Load by filename, from fname up to and including stop_fname dmsp.load(fname='dms_ut_20010101_12.002.hdf5', stop_fname='dms_ut_20010102_12.002.hdf5') # Load by filenames using the DMSP object to get valid filenames dmsp.load(fname=dmsp.files, stop_fname=dmsp.files) # Load by filenames. Includes data from 2001, 1 up to but not # including 2001, 3 dmsp.load(fname=dmsp.files[dt.datetime(2001, 1, 1)], stop_fname=dmsp.files[dt.datetime(2001, 1, 2)])
For small size data sets, such as space weather indices, pysat also supports loading all data at once.
# F10.7 data import pysatSpaceWeather f107 = pysat.Instrument(inst_module=pysatSpaceWeather.instruments.sw_f107) # Load all F10.7 solar flux data, from beginning to end. f107.load()
Before data is available in
.data it passes through an instrument specific
cleaning routine. The amount of cleaning is set by the clean_level keyword,
provided at instantiation. The level defaults to ‘clean’.
dmsp = pysat.Instrument(platform='dmsp', name='ivm', tag='utd', inst_id='f12', clean_level=None) dmsp = pysat.Instrument(platform='dmsp', name='ivm', tag='utd', inst_id='f12', clean_level='clean')
Four levels of cleaning may be specified,
|clean||Generally good data|
|dusty||Light cleaning, use with care|
|dirty||Minimal cleaning, use with caution|
|none||No cleaning, use at your own risk|
The user provided cleaning level is can be retrieved or reset from the Instrument object attribute clean_level. The details of the cleaning will generally vary greatly between instruments. Many instruments provide only two levels of data: clean or none.
By default, pysat is configured to use
'clean' as the default value
for clean_level. This setting may be updated using Parameters.