Skip to content

Commit

Permalink
Merge branch 'refactor-plot-utils' into yohai-ds_scatter
Browse files Browse the repository at this point in the history
* refactor-plot-utils: (22 commits)
  review comment.
  small rename
  stale requires a label (pydata#2701)
  Update indexing.rst (pydata#2700)
  add line break to message posted (pydata#2698)
  Config for closing stale issues (pydata#2684)
  to_dict without data (pydata#2659)
  Update asv.conf.json (pydata#2693)
  try no rasterio in py36 env (pydata#2691)
  Detailed report for testing.assert_equal and testing.assert_identical (pydata#1507)
  Hotfix for pydata#2662 (pydata#2678)
  Update README.rst (pydata#2682)
  Fix test failures with numpy=1.16 (pydata#2675)
  lint
  Back to map_dataarray_line
  Refactor out cmap_params, cbar_kwargs processing
  Refactor out colorbar making to plot.utils._add_colorbar
  flake8
  facetgrid refactor
  Refactor out utility functions.
  ...
  • Loading branch information
dcherian committed Jan 24, 2019
2 parents 1d939af + 351a466 commit 57a6c64
Show file tree
Hide file tree
Showing 28 changed files with 734 additions and 376 deletions.
58 changes: 58 additions & 0 deletions .github/stale.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,58 @@
# Configuration for probot-stale - https://github.com/probot/stale

# Number of days of inactivity before an Issue or Pull Request becomes stale
daysUntilStale: 700 # start with a large number and reduce shortly

# Number of days of inactivity before an Issue or Pull Request with the stale label is closed.
# Set to false to disable. If disabled, issues still need to be closed manually, but will remain marked as stale.
daysUntilClose: 30

# Issues or Pull Requests with these labels will never be considered stale. Set to `[]` to disable
exemptLabels:
- pinned
- security
- "[Status] Maybe Later"

# Set to true to ignore issues in a project (defaults to false)
exemptProjects: false

# Set to true to ignore issues in a milestone (defaults to false)
exemptMilestones: false

# Set to true to ignore issues with an assignee (defaults to false)
exemptAssignees: true

# Label to use when marking as stale
staleLabel: stale

# Comment to post when marking as stale. Set to `false` to disable
markComment: |
In order to maintain a list of currently relevant issues, we mark issues as stale after a period of inactivity
If this issue remains relevant, please comment here; otherwise it will be marked as closed automatically
# Comment to post when removing the stale label.
# unmarkComment: >
# Your comment here.

# Comment to post when closing a stale Issue or Pull Request.
# closeComment: >
# Your comment here.

# Limit the number of actions per hour, from 1-30. Default is 30
limitPerRun: 1 # start with a small number


# Limit to only `issues` or `pulls`
# only: issues

# Optionally, specify configuration settings that are specific to just 'issues' or 'pulls':
# pulls:
# daysUntilStale: 30
# markComment: >
# This pull request has been automatically marked as stale because it has not had
# recent activity. It will be closed if no further activity occurs. Thank you
# for your contributions.

# issues:
# exemptLabels:
# - confirmed
2 changes: 1 addition & 1 deletion .travis.yml
Original file line number Diff line number Diff line change
Expand Up @@ -60,7 +60,7 @@ script:
- python --version
- python -OO -c "import xarray"
- if [[ "$CONDA_ENV" == "docs" ]]; then
conda install -c conda-forge sphinx sphinx_rtd_theme sphinx-gallery numpydoc;
conda install -c conda-forge --override-channels sphinx sphinx_rtd_theme sphinx-gallery numpydoc "gdal>2.2.4";
sphinx-build -n -j auto -b html -d _build/doctrees doc _build/html;
elif [[ "$CONDA_ENV" == "lint" ]]; then
pycodestyle xarray ;
Expand Down
86 changes: 26 additions & 60 deletions README.rst
Original file line number Diff line number Diff line change
Expand Up @@ -9,49 +9,47 @@ xarray: N-D labeled arrays and datasets
:target: https://coveralls.io/r/pydata/xarray
.. image:: https://readthedocs.org/projects/xray/badge/?version=latest
:target: http://xarray.pydata.org/
.. image:: https://img.shields.io/pypi/v/xarray.svg
:target: https://pypi.python.org/pypi/xarray/
.. image:: https://zenodo.org/badge/13221727.svg
:target: https://zenodo.org/badge/latestdoi/13221727
.. image:: http://img.shields.io/badge/benchmarked%20by-asv-green.svg?style=flat
:target: http://pandas.pydata.org/speed/xarray/
.. image:: https://img.shields.io/badge/powered%20by-NumFOCUS-orange.svg?style=flat&colorA=E1523D&colorB=007D8A
:target: http://numfocus.org
.. image:: https://img.shields.io/pypi/v/xarray.svg
:target: https://pypi.python.org/pypi/xarray/

**xarray** (formerly **xray**) is an open source project and Python package
that makes working with labelled multi-dimensional arrays simple,
efficient, and fun!

Multi-dimensional (a.k.a. N-dimensional, ND) arrays (sometimes called
"tensors") are an essential part of computational science.
They are encountered in a wide range of fields, including physics, astronomy,
geoscience, bioinformatics, engineering, finance, and deep learning.
In Python, NumPy_ provides the fundamental data structure and API for
working with raw ND arrays.
However, real-world datasets are usually more than just raw numbers;
they have labels which encode information about how the array values map
to locations in space, time, etc.
Xarray introduces labels in the form of dimensions, coordinates and
attributes on top of raw NumPy_-like arrays, which allows for a more
intuitive, more concise, and less error-prone developer experience.
The package includes a large and growing library of domain-agnostic functions
for advanced analytics and visualization with these data structures.

By introducing *dimensions*, *coordinates*, and *attributes* on top of raw
NumPy-like arrays, xarray is able to understand these labels and use them to
provide a more intuitive, more concise, and less error-prone experience.
Xarray also provides a large and growing library of functions for advanced
analytics and visualization with these data structures.
Xarray was inspired by and borrows heavily from pandas_, the popular data
analysis package focused on labelled tabular data.
Xarray can read and write data from most common labeled ND-array storage
formats and is particularly tailored to working with netCDF_ files, which were
the source of xarray's data model.
It is particularly tailored to working with netCDF_ files, which were the
source of xarray's data model, and integrates tightly with dask_ for parallel
computing.

.. _NumPy: http://www.numpy.org/
.. _NumPy: http://www.numpy.org
.. _pandas: http://pandas.pydata.org
.. _dask: http://dask.org
.. _netCDF: http://www.unidata.ucar.edu/software/netcdf

Why xarray?
-----------

Adding dimensions names and coordinate indexes to numpy's ndarray_ makes many
powerful array operations possible:
Multi-dimensional (a.k.a. N-dimensional, ND) arrays (sometimes called
"tensors") are an essential part of computational science.
They are encountered in a wide range of fields, including physics, astronomy,
geoscience, bioinformatics, engineering, finance, and deep learning.
In Python, NumPy_ provides the fundamental data structure and API for
working with raw ND arrays.
However, real-world datasets are usually more than just raw numbers;
they have labels which encode information about how the array values map
to locations in space, time, etc.

Xarray doesn't just keep track of labels on arrays -- it uses them to provide a
powerful and concise interface. For example:

- Apply operations over dimensions by name: ``x.sum('time')``.
- Select values by label instead of integer location:
Expand All @@ -65,42 +63,10 @@ powerful array operations possible:
- Keep track of arbitrary metadata in the form of a Python dictionary:
``x.attrs``.

pandas_ provides many of these features, but it does not make use of dimension
names, and its core data structures are fixed dimensional arrays.

Why isn't pandas enough?
------------------------

pandas_ excels at working with tabular data. That suffices for many statistical
analyses, but physical scientists rely on N-dimensional arrays -- which is
where xarray comes in.

xarray aims to provide a data analysis toolkit as powerful as pandas_ but
designed for working with homogeneous N-dimensional arrays
instead of tabular data. When possible, we copy the pandas API and rely on
pandas's highly optimized internals (in particular, for fast indexing).

Why netCDF?
-----------

Because xarray implements the same data model as the netCDF_ file format,
xarray datasets have a natural and portable serialization format. But it is also
easy to robustly convert an xarray ``DataArray`` to and from a numpy ``ndarray``
or a pandas ``DataFrame`` or ``Series``, providing compatibility with the full
`PyData ecosystem <http://pydata.org/>`__.

Our target audience is anyone who needs N-dimensional labeled arrays, but we
are particularly focused on the data analysis needs of physical scientists --
especially geoscientists who already know and love netCDF_.

.. _ndarray: http://docs.scipy.org/doc/numpy/reference/arrays.ndarray.html
.. _pandas: http://pandas.pydata.org
.. _netCDF: http://www.unidata.ucar.edu/software/netcdf

Documentation
-------------

The official documentation is hosted on ReadTheDocs at http://xarray.pydata.org/
Learn more about xarray in its official documentation at http://xarray.pydata.org/

Contributing
------------
Expand Down Expand Up @@ -148,7 +114,7 @@ __ http://climate.com/
License
-------

Copyright 2014-2018, xarray Developers
Copyright 2014-2019, xarray Developers

Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
Expand Down
2 changes: 1 addition & 1 deletion asv_bench/asv.conf.json
Original file line number Diff line number Diff line change
Expand Up @@ -40,7 +40,7 @@

// The Pythons you'd like to test against. If not provided, defaults
// to the current version of Python used to run `asv`.
"pythons": ["2.7", "3.6"],
"pythons": ["3.6"],

// The matrix of dependencies to test. Each key is the name of a
// package (in PyPI) and the values are version numbers. An empty
Expand Down
6 changes: 3 additions & 3 deletions ci/requirements-py36.yml
Original file line number Diff line number Diff line change
Expand Up @@ -20,14 +20,14 @@ dependencies:
- scipy
- seaborn
- toolz
- rasterio
# - rasterio # xref #2683
- bottleneck
- zarr
- pseudonetcdf>=3.0.1
- eccodes
- cdms2
- pynio
- iris>=1.10
# - pynio # xref #2683
# - iris>=1.10 # xref #2683
- pydap
- lxml
- pip:
Expand Down
14 changes: 8 additions & 6 deletions doc/faq.rst
Original file line number Diff line number Diff line change
Expand Up @@ -18,8 +18,9 @@ pandas is a fantastic library for analysis of low-dimensional labelled data -
if it can be sensibly described as "rows and columns", pandas is probably the
right choice. However, sometimes we want to use higher dimensional arrays
(`ndim > 2`), or arrays for which the order of dimensions (e.g., columns vs
rows) shouldn't really matter. For example, climate and weather data is often
natively expressed in 4 or more dimensions: time, x, y and z.
rows) shouldn't really matter. For example, the images of a movie can be
natively represented as an array with four dimensions: time, row, column and
color.

Pandas has historically supported N-dimensional panels, but deprecated them in
version 0.20 in favor of Xarray data structures. There are now built-in methods
Expand All @@ -39,9 +40,8 @@ if you were using Panels:
xarray ``Dataset``.

You can :ref:`read about switching from Panels to Xarray here <panel transition>`.
Pandas gets a lot of things right, but scientific users need fully multi-
dimensional data structures.

Pandas gets a lot of things right, but many science, engineering and complex
analytics use cases need fully multi-dimensional data structures.

How do xarray data structures differ from those found in pandas?
----------------------------------------------------------------
Expand All @@ -65,7 +65,9 @@ multi-dimensional data-structures.

That said, you should only bother with xarray if some aspect of data is
fundamentally multi-dimensional. If your data is unstructured or
one-dimensional, stick with pandas.
one-dimensional, pandas is usually the right choice: it has better performance
for common operations such as ``groupby`` and you'll find far more usage
examples online.


Why don't aggregations return Python scalars?
Expand Down
30 changes: 11 additions & 19 deletions doc/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -5,29 +5,21 @@ xarray: N-D labeled arrays and datasets in Python
that makes working with labelled multi-dimensional arrays simple,
efficient, and fun!

Multi-dimensional (a.k.a. N-dimensional, ND) arrays (sometimes called
"tensors") are an essential part of computational science.
They are encountered in a wide range of fields, including physics, astronomy,
geoscience, bioinformatics, engineering, finance, and deep learning.
In Python, NumPy_ provides the fundamental data structure and API for
working with raw ND arrays.
However, real-world datasets are usually more than just raw numbers;
they have labels which encode information about how the array values map
to locations in space, time, etc.

By introducing *dimensions*, *coordinates*, and *attributes* on top of raw
NumPy-like arrays, xarray is able to understand these labels and use them to
provide a more intuitive, more concise, and less error-prone experience.
Xarray also provides a large and growing library of functions for advanced
analytics and visualization with these data structures.
Xarray introduces labels in the form of dimensions, coordinates and
attributes on top of raw NumPy_-like arrays, which allows for a more
intuitive, more concise, and less error-prone developer experience.
The package includes a large and growing library of domain-agnostic functions
for advanced analytics and visualization with these data structures.

Xarray was inspired by and borrows heavily from pandas_, the popular data
analysis package focused on labelled tabular data.
Xarray can read and write data from most common labeled ND-array storage
formats and is particularly tailored to working with netCDF_ files, which were
the source of xarray's data model.
It is particularly tailored to working with netCDF_ files, which were the
source of xarray's data model, and integrates tightly with dask_ for parallel
computing.

.. _NumPy: http://www.numpy.org/
.. _NumPy: http://www.numpy.org
.. _pandas: http://pandas.pydata.org
.. _dask: http://dask.org
.. _netCDF: http://www.unidata.ucar.edu/software/netcdf

Documentation
Expand Down
2 changes: 1 addition & 1 deletion doc/indexing.rst
Original file line number Diff line number Diff line change
Expand Up @@ -371,7 +371,7 @@ Vectorized indexing also works with ``isel``, ``loc``, and ``sel``:
ind = xr.DataArray([['a', 'b'], ['b', 'a']], dims=['a', 'b'])
da.loc[:, ind] # same as da.sel(y=ind)
These methods may and also be applied to ``Dataset`` objects
These methods may also be applied to ``Dataset`` objects

.. ipython:: python
Expand Down
12 changes: 11 additions & 1 deletion doc/io.rst
Original file line number Diff line number Diff line change
Expand Up @@ -81,6 +81,16 @@ require external libraries and dicts can easily be pickled, or converted to
json, or geojson. All the values are converted to lists, so dicts might
be quite large.

To export just the dataset schema, without the data itself, use the
``data=False`` option:

.. ipython:: python
ds.to_dict(data=False)
This can be useful for generating indices of dataset contents to expose to
search indices or other automated data discovery tools.

.. _io.netcdf:

netCDF
Expand Down Expand Up @@ -665,7 +675,7 @@ To read a consolidated store, pass the ``consolidated=True`` option to
:py:func:`~xarray.open_zarr`::

ds = xr.open_zarr('foo.zarr', consolidated=True)

Xarray can't perform consolidation on pre-existing zarr datasets. This should
be done directly from zarr, as described in the
`zarr docs <https://zarr.readthedocs.io/en/latest/tutorial.html#consolidating-metadata>`_.
Expand Down
18 changes: 12 additions & 6 deletions doc/related-projects.rst
Original file line number Diff line number Diff line change
Expand Up @@ -3,7 +3,7 @@
Xarray related projects
-----------------------

Here below is a list of several existing libraries that build
Here below is a list of existing open source projects that build
functionality upon xarray. See also section :ref:`internals` for more
details on how to build xarray extensions.

Expand Down Expand Up @@ -39,11 +39,16 @@ Geosciences

Machine Learning
~~~~~~~~~~~~~~~~
- `cesium <http://cesium-ml.org/>`_: machine learning for time series analysis
- `ArviZ <https://arviz-devs.github.io/arviz/>`_: Exploratory analysis of Bayesian models, built on top of xarray.
- `Elm <https://ensemble-learning-models.readthedocs.io>`_: Parallel machine learning on xarray data structures
- `sklearn-xarray (1) <https://phausamann.github.io/sklearn-xarray>`_: Combines scikit-learn and xarray (1).
- `sklearn-xarray (2) <https://sklearn-xarray.readthedocs.io/en/latest/>`_: Combines scikit-learn and xarray (2).

Other domains
~~~~~~~~~~~~~
- `ptsa <https://pennmem.github.io/ptsa_new/html/index.html>`_: EEG Time Series Analysis
- `pycalphad <https://pycalphad.org/docs/latest/>`_: Computational Thermodynamics in Python

Extend xarray capabilities
~~~~~~~~~~~~~~~~~~~~~~~~~~
- `Collocate <https://github.com/cistools/collocate>`_: Collocate xarray trajectories in arbitrary physical dimensions
Expand All @@ -61,9 +66,10 @@ Visualization
- `hvplot <https://hvplot.pyviz.org/>`_ : A high-level plotting API for the PyData ecosystem built on HoloViews.
- `psyplot <https://psyplot.readthedocs.io>`_: Interactive data visualization with python.

Other
~~~~~
- `ptsa <https://pennmem.github.io/ptsa_new/html/index.html>`_: EEG Time Series Analysis
- `pycalphad <https://pycalphad.org/docs/latest/>`_: Computational Thermodynamics in Python
Non-Python projects
~~~~~~~~~~~~~~~~~~~
- `xframe <https://github.com/QuantStack/xframe>`_: C++ data structures inspired by xarray.
- `AxisArrays <https://github.com/JuliaArrays/AxisArrays.jl>`_ and
`NamedArrays <https://github.com/davidavdav/NamedArrays.jl>`_: similar data structures for Julia.

More projects can be found at the `"xarray" Github topic <https://github.com/topics/xarray>`_.
7 changes: 7 additions & 0 deletions doc/whats-new.rst
Original file line number Diff line number Diff line change
Expand Up @@ -28,6 +28,8 @@ Breaking changes
Enhancements
~~~~~~~~~~~~

- Add ``data=False`` option to ``to_dict()`` methods. (:issue:`2656`)
By `Ryan Abernathey <https://github.com/rabernat>`_
- :py:meth:`~xarray.DataArray.coarsen` and
:py:meth:`~xarray.Dataset.coarsen` are newly added.
See :ref:`comput.coarsen` for details.
Expand All @@ -36,6 +38,11 @@ Enhancements
- Upsampling an array via interpolation with resample is now dask-compatible,
as long as the array is not chunked along the resampling dimension.
By `Spencer Clark <https://github.com/spencerkclark>`_.
- :py:func:`xarray.testing.assert_equal` and
:py:func:`xarray.testing.assert_identical` now provide a more detailed
report showing what exactly differs between the two objects (dimensions /
coordinates / variables / attributes) (:issue:`1507`).
By `Benoit Bovy <https://github.com/benbovy>`_.

Bug fixes
~~~~~~~~~
Expand Down
Loading

0 comments on commit 57a6c64

Please sign in to comment.