Roadmap
The long-term vision for pysat
is to make the package suitable for
working with any combination of data. As pysat
is intended to support
the development of highly robust and verifiable scientific analysis and
processing packages, pysat
must produce each of its features with high
quality unit tests and documentation.
This document provides a broad and long term vision of pysat
. Specific
tasks associated with this roadmap may be found within the posted
Issues and
Projects.
An item being on the roadmap does not necessarily mean that it will happen. During the implementation or testing periods we may discover issues that limit the feature.
Generality
Data support with pysat
is currently focused on space-science data
sets. However, the features within the module also work well on other types of
data. Where appropriate, space-science specific features will be generalized for
a wider audience.
Data Support
The Instrument
class currently supports both
pandas.DataFrame
and xarray.Dataset
formats, covering
1D time-series and multi-dimensional data that can be loaded into memory. Even
larger data sets would require that pysat
integrate a data format such
as Dask. To cover the needs of any potential user, an ideal solution would be
for pysat
to implement a clear public mechanism for users to add their
own data formats. Commonalities observed after integrating dask
,
pandas
, and xarray
should provide a viable path forward for
this generalization.
Multiple Data Sources
The Instrument class is designed to work on a single data source at a time. For
multiple data sources pysat
is developing a Constellation class that
operates on multiple Instrument objects and will include methods designed to
assist in merging multiple data sets together. The Constellation class will
feature compatibility with the simpler Instrument object when possible. However,
given the additional complexity when working with multiple sources this may not
always be possible. Long term, we intend on providing functionality that can
merge a Constellation into a ‘live’ Instrument object for greatest compatibility.
Metapackage
The minimal barriers to entry in open source software allows for a large
variety of packages, each with its own approach to a problem. A disadvantage
of this setup is that many of these packages have been developed without
interoperability in mind, presenting challenges when attempting to combine
these disparate packages towards a common goal. pysat
provides a
versatility when coupling to data sources, which may be used to connect these
isolated packages together. Once a package is connected to pysat
then that functionality becomes available to all packages that incorporate
pysat
as a source. The value and functionality of this large scale
pysat
metapackage increases exponentially with every new connection.
File Support
pysat
currently supports tracking both data and metadata, as well as
the ability to create netCDF4 files, and is capable of maintaining compliance
with NASA’s
Space Physics Data Facility
(SPDF) formatting requirements for NASA satellite missions. Support for creating
different types of files, as well as a variety of file standards, needs to be
enhanced to support a broader array of research areas.
Data Iteration
pysat
currently features orbit iteration, a feature that transparently
provides complete orbits (across day/file breaks) calculated in real time. A
variety of orbit types are supported, each of which maps to a method looking for
a particular signal in the data to trigger upon. However, the current variety of
orbit types is insufficient to address community needs. The underlying class is
capable of iterating over a wider variety of conditions though this type of
functionality is not currently available to users. Improving access to this
area enables generalized real-time data pagination based upon custom user
supplied conditions. Ensuring good performance under a variety of conditions
requires upgrading and generalizing the data cacheing in pysat
as well
as the orbit iteration interface.
Performance
While it is critical for scientific outputs to be correct, results that are equally correct but calculated quicker make it easier for scientists to fully explore a data set. A benchmarking solution will be implemented and used to identify areas with slow performance that could potentially be improved upon.
Testing
Unit tests confirming pysat
behaves as expected is fundamental to the
scientific goals of the project. While unit test coverage is high, a general
review of all the unit tests needs to be performed. In particular, unit tests
written early in the project need to be brought up to project standards. The
test suite needs additional organization as many files are too long. Further,
tests need to be expanded to ensure that more combinations of features are
engaged at once to ensure interoperability.
User Experience
Providing a consistent, versatile, and easy to use interface is a core feature
for pysat
.
Documentation
Robust, accurate, consistent, comprehensive, and easy to understand
documentation is essential for any project presented to the community to build
upon. While great strides were made with the release of pysat
v3.0,
additional review and expansion of examples and discussion would be helpful to
users.
pysatPenumbra Modules
The development of analysis packages built on pysat
has historically
revealed areas for improvement. Active engagement with these publicly developed
packages helps ensure that solutions are practical and responsive to community
requirements.