Merge branch 'refactor-plot-utils' into yohai-ds_scatter

* refactor-plot-utils: (22 commits) review comment. small rename stale requires a label (pydata#2701) Update indexing.rst (pydata#2700) add line break to message posted (pydata#2698) Config for closing stale issues (pydata#2684) to_dict without data (pydata#2659) Update asv.conf.json (pydata#2693) try no rasterio in py36 env (pydata#2691) Detailed report for testing.assert_equal and testing.assert_identical (pydata#1507) Hotfix for pydata#2662 (pydata#2678) Update README.rst (pydata#2682) Fix test failures with numpy=1.16 (pydata#2675) lint Back to map_dataarray_line Refactor out cmap_params, cbar_kwargs processing Refactor out colorbar making to plot.utils._add_colorbar flake8 facetgrid refactor Refactor out utility functions. ...
yohai · Jan 24, 2019 · 57a6c64 · 57a6c64
2 parents 1d939af + 351a466
commit 57a6c64
Show file tree

Hide file tree

Showing 28 changed files with 734 additions and 376 deletions.
diff --git a/.github/stale.yml b/.github/stale.yml
@@ -0,0 +1,58 @@
+# Configuration for probot-stale - https://github.com/probot/stale
+
+# Number of days of inactivity before an Issue or Pull Request becomes stale
+daysUntilStale: 700  # start with a large number and reduce shortly
+
+# Number of days of inactivity before an Issue or Pull Request with the stale label is closed.
+# Set to false to disable. If disabled, issues still need to be closed manually, but will remain marked as stale.
+daysUntilClose: 30
+
+# Issues or Pull Requests with these labels will never be considered stale. Set to `[]` to disable
+exemptLabels:
+  - pinned
+  - security
+  - "[Status] Maybe Later"
+
+# Set to true to ignore issues in a project (defaults to false)
+exemptProjects: false
+
+# Set to true to ignore issues in a milestone (defaults to false)
+exemptMilestones: false
+
+# Set to true to ignore issues with an assignee (defaults to false)
+exemptAssignees: true
+
+# Label to use when marking as stale
+staleLabel: stale
+
+# Comment to post when marking as stale. Set to `false` to disable
+markComment: |
+  In order to maintain a list of currently relevant issues, we mark issues as stale after a period of inactivity
+  If this issue remains relevant, please comment here; otherwise it will be marked as closed automatically
+
+# Comment to post when removing the stale label.
+# unmarkComment: >
+#   Your comment here.
+
+# Comment to post when closing a stale Issue or Pull Request.
+# closeComment: >
+#   Your comment here.
+
+# Limit the number of actions per hour, from 1-30. Default is 30
+limitPerRun: 1  # start with a small number
+
+
+# Limit to only `issues` or `pulls`
+# only: issues
+
+# Optionally, specify configuration settings that are specific to just 'issues' or 'pulls':
+# pulls:
+#   daysUntilStale: 30
+#   markComment: >
+#     This pull request has been automatically marked as stale because it has not had
+#     recent activity. It will be closed if no further activity occurs. Thank you
+#     for your contributions.
+
+# issues:
+#   exemptLabels:
+#     - confirmed
diff --git a/.travis.yml b/.travis.yml
@@ -60,7 +60,7 @@ script:
   - python --version
   - python -OO -c "import xarray"
   - if [[ "$CONDA_ENV" == "docs" ]]; then
-      conda install -c conda-forge sphinx sphinx_rtd_theme sphinx-gallery numpydoc;
+      conda install -c conda-forge --override-channels sphinx sphinx_rtd_theme sphinx-gallery numpydoc "gdal>2.2.4";
       sphinx-build -n -j auto -b html -d _build/doctrees doc _build/html;
     elif [[ "$CONDA_ENV" == "lint" ]]; then
       pycodestyle xarray ;

diff --git a/README.rst b/README.rst
@@ -9,49 +9,47 @@ xarray: N-D labeled arrays and datasets
    :target: https://coveralls.io/r/pydata/xarray
 .. image:: https://readthedocs.org/projects/xray/badge/?version=latest
    :target: http://xarray.pydata.org/
-.. image:: https://img.shields.io/pypi/v/xarray.svg
-   :target: https://pypi.python.org/pypi/xarray/
-.. image:: https://zenodo.org/badge/13221727.svg
-  :target: https://zenodo.org/badge/latestdoi/13221727
 .. image:: http://img.shields.io/badge/benchmarked%20by-asv-green.svg?style=flat
   :target: http://pandas.pydata.org/speed/xarray/
-.. image:: https://img.shields.io/badge/powered%20by-NumFOCUS-orange.svg?style=flat&colorA=E1523D&colorB=007D8A
-  :target: http://numfocus.org
+.. image:: https://img.shields.io/pypi/v/xarray.svg
+   :target: https://pypi.python.org/pypi/xarray/
 
 **xarray** (formerly **xray**) is an open source project and Python package
 that makes working with labelled multi-dimensional arrays simple,
 efficient, and fun!
 
-Multi-dimensional (a.k.a. N-dimensional, ND) arrays (sometimes called
-"tensors") are an essential part of computational science.
-They are encountered in a wide range of fields, including physics, astronomy,
-geoscience, bioinformatics, engineering, finance, and deep learning.
-In Python, NumPy_ provides the fundamental data structure and API for
-working with raw ND arrays.
-However, real-world datasets are usually more than just raw numbers;
-they have labels which encode information about how the array values map
-to locations in space, time, etc.
+Xarray introduces labels in the form of dimensions, coordinates and
+attributes on top of raw NumPy_-like arrays, which allows for a more
+intuitive, more concise, and less error-prone developer experience.
+The package includes a large and growing library of domain-agnostic functions
+for advanced analytics and visualization with these data structures.
 
-By introducing *dimensions*, *coordinates*, and *attributes* on top of raw
-NumPy-like arrays, xarray is able to understand these labels and use them to
-provide a more intuitive, more concise, and less error-prone experience.
-Xarray also provides a large and growing library of functions for advanced
-analytics and visualization with these data structures.
 Xarray was inspired by and borrows heavily from pandas_, the popular data
 analysis package focused on labelled tabular data.
-Xarray can read and write data from most common labeled ND-array storage
-formats and is particularly tailored to working with netCDF_ files, which were
-the source of xarray's data model.
+It is particularly tailored to working with netCDF_ files, which were the
+source of xarray's data model, and integrates tightly with dask_ for parallel
+computing.
 
-.. _NumPy: http://www.numpy.org/
+.. _NumPy: http://www.numpy.org
 .. _pandas: http://pandas.pydata.org
+.. _dask: http://dask.org
 .. _netCDF: http://www.unidata.ucar.edu/software/netcdf
 
 Why xarray?
 -----------
 
-Adding dimensions names and coordinate indexes to numpy's ndarray_ makes many
-powerful array operations possible:
+Multi-dimensional (a.k.a. N-dimensional, ND) arrays (sometimes called
+"tensors") are an essential part of computational science.
+They are encountered in a wide range of fields, including physics, astronomy,
+geoscience, bioinformatics, engineering, finance, and deep learning.
+In Python, NumPy_ provides the fundamental data structure and API for
+working with raw ND arrays.
+However, real-world datasets are usually more than just raw numbers;
+they have labels which encode information about how the array values map
+to locations in space, time, etc.
+
+Xarray doesn't just keep track of labels on arrays -- it uses them to provide a
+powerful and concise interface. For example:
 
 -  Apply operations over dimensions by name: ``x.sum('time')``.
 -  Select values by label instead of integer location:
@@ -65,42 +63,10 @@ powerful array operations possible:
 -  Keep track of arbitrary metadata in the form of a Python dictionary:
    ``x.attrs``.
 
-pandas_ provides many of these features, but it does not make use of dimension
-names, and its core data structures are fixed dimensional arrays.
-
-Why isn't pandas enough?
-------------------------
-
-pandas_ excels at working with tabular data. That suffices for many statistical
-analyses, but physical scientists rely on N-dimensional arrays -- which is
-where xarray comes in.
-
-xarray aims to provide a data analysis toolkit as powerful as pandas_ but
-designed for working with homogeneous N-dimensional arrays
-instead of tabular data. When possible, we copy the pandas API and rely on
-pandas's highly optimized internals (in particular, for fast indexing).
-
-Why netCDF?
------------
-
-Because xarray implements the same data model as the netCDF_ file format,
-xarray datasets have a natural and portable serialization format. But it is also
-easy to robustly convert an xarray ``DataArray`` to and from a numpy ``ndarray``
-or a pandas ``DataFrame`` or ``Series``, providing compatibility with the full
-`PyData ecosystem <http://pydata.org/>`__.
-
-Our target audience is anyone who needs N-dimensional labeled arrays, but we
-are particularly focused on the data analysis needs of physical scientists --
-especially geoscientists who already know and love netCDF_.
-
-.. _ndarray: http://docs.scipy.org/doc/numpy/reference/arrays.ndarray.html
-.. _pandas: http://pandas.pydata.org
-.. _netCDF: http://www.unidata.ucar.edu/software/netcdf
-
 Documentation
 -------------
 
-The official documentation is hosted on ReadTheDocs at http://xarray.pydata.org/
+Learn more about xarray in its official documentation at http://xarray.pydata.org/
 
 Contributing
 ------------
@@ -148,7 +114,7 @@ __ http://climate.com/
 License
 -------
 
-Copyright 2014-2018, xarray Developers
+Copyright 2014-2019, xarray Developers
 
 Licensed under the Apache License, Version 2.0 (the "License");
 you may not use this file except in compliance with the License.

diff --git a/asv_bench/asv.conf.json b/asv_bench/asv.conf.json
@@ -40,7 +40,7 @@
 
     // The Pythons you'd like to test against.  If not provided, defaults
     // to the current version of Python used to run `asv`.
-    "pythons": ["2.7", "3.6"],
+    "pythons": ["3.6"],
 
     // The matrix of dependencies to test.  Each key is the name of a
     // package (in PyPI) and the values are version numbers.  An empty

diff --git a/ci/requirements-py36.yml b/ci/requirements-py36.yml
@@ -20,14 +20,14 @@ dependencies:
   - scipy
   - seaborn
   - toolz
-  - rasterio
+  # - rasterio  # xref #2683
   - bottleneck
   - zarr
   - pseudonetcdf>=3.0.1
   - eccodes
   - cdms2
-  - pynio
-  - iris>=1.10
+  # - pynio  # xref #2683
+  # - iris>=1.10    # xref #2683
   - pydap
   - lxml
   - pip:

diff --git a/doc/faq.rst b/doc/faq.rst
@@ -18,8 +18,9 @@ pandas is a fantastic library for analysis of low-dimensional labelled data -
 if it can be sensibly described as "rows and columns", pandas is probably the
 right choice.  However, sometimes we want to use higher dimensional arrays
 (`ndim > 2`), or arrays for which the order of dimensions (e.g., columns vs
-rows) shouldn't really matter. For example, climate and weather data is often
-natively expressed in 4 or more dimensions: time, x, y and z.
+rows) shouldn't really matter. For example, the images of a movie can be
+natively represented as an array with four dimensions: time, row, column and
+color.
 
 Pandas has historically supported N-dimensional panels, but deprecated them in
 version 0.20 in favor of Xarray data structures.  There are now built-in methods
@@ -39,9 +40,8 @@ if you were using Panels:
   xarray ``Dataset``.
 
 You can :ref:`read about switching from Panels to Xarray here <panel transition>`.
-Pandas gets a lot of things right, but scientific users need fully multi-
-dimensional data structures.
-
+Pandas gets a lot of things right, but many science, engineering and complex
+analytics use cases need fully multi-dimensional data structures.
 
 How do xarray data structures differ from those found in pandas?
 ----------------------------------------------------------------
@@ -65,7 +65,9 @@ multi-dimensional data-structures.
 
 That said, you should only bother with xarray if some aspect of data is
 fundamentally multi-dimensional. If your data is unstructured or
-one-dimensional, stick with pandas.
+one-dimensional, pandas is usually the right choice: it has better performance
+for common operations such as ``groupby`` and you'll find far more usage
+examples online.
 
 
 Why don't aggregations return Python scalars?

diff --git a/doc/index.rst b/doc/index.rst
@@ -5,29 +5,21 @@ xarray: N-D labeled arrays and datasets in Python
 that makes working with labelled multi-dimensional arrays simple,
 efficient, and fun!
 
-Multi-dimensional (a.k.a. N-dimensional, ND) arrays (sometimes called
-"tensors") are an essential part of computational science.
-They are encountered in a wide range of fields, including physics, astronomy,
-geoscience, bioinformatics, engineering, finance, and deep learning.
-In Python, NumPy_ provides the fundamental data structure and API for
-working with raw ND arrays.
-However, real-world datasets are usually more than just raw numbers;
-they have labels which encode information about how the array values map
-to locations in space, time, etc.
-
-By introducing *dimensions*, *coordinates*, and *attributes* on top of raw
-NumPy-like arrays, xarray is able to understand these labels and use them to
-provide a more intuitive, more concise, and less error-prone experience.
-Xarray also provides a large and growing library of functions for advanced
-analytics and visualization with these data structures.
+Xarray introduces labels in the form of dimensions, coordinates and
+attributes on top of raw NumPy_-like arrays, which allows for a more
+intuitive, more concise, and less error-prone developer experience.
+The package includes a large and growing library of domain-agnostic functions
+for advanced analytics and visualization with these data structures.
+
 Xarray was inspired by and borrows heavily from pandas_, the popular data
 analysis package focused on labelled tabular data.
-Xarray can read and write data from most common labeled ND-array storage
-formats and is particularly tailored to working with netCDF_ files, which were
-the source of xarray's data model.
+It is particularly tailored to working with netCDF_ files, which were the
+source of xarray's data model, and integrates tightly with dask_ for parallel
+computing.
 
-.. _NumPy: http://www.numpy.org/
+.. _NumPy: http://www.numpy.org
 .. _pandas: http://pandas.pydata.org
+.. _dask: http://dask.org
 .. _netCDF: http://www.unidata.ucar.edu/software/netcdf
 
 Documentation

diff --git a/doc/indexing.rst b/doc/indexing.rst
@@ -371,7 +371,7 @@ Vectorized indexing also works with ``isel``, ``loc``, and ``sel``:
     ind = xr.DataArray([['a', 'b'], ['b', 'a']], dims=['a', 'b'])
     da.loc[:, ind]  # same as da.sel(y=ind)
 
-These methods may and also be applied to ``Dataset`` objects
+These methods may also be applied to ``Dataset`` objects
 
 .. ipython:: python
 

diff --git a/doc/io.rst b/doc/io.rst
@@ -81,6 +81,16 @@ require external libraries and dicts can easily be pickled, or converted to
 json, or geojson. All the values are converted to lists, so dicts might
 be quite large.
 
+To export just the dataset schema, without the data itself, use the
+``data=False`` option:
+
+.. ipython:: python
+
+    ds.to_dict(data=False)
+
+This can be useful for generating indices of dataset contents to expose to
+search indices or other automated data discovery tools.
+
 .. _io.netcdf:
 
 netCDF
@@ -665,7 +675,7 @@ To read a consolidated store, pass the ``consolidated=True`` option to
 :py:func:`~xarray.open_zarr`::
 
     ds = xr.open_zarr('foo.zarr', consolidated=True)
-    
+
 Xarray can't perform consolidation on pre-existing zarr datasets. This should
 be done directly from zarr, as described in the
 `zarr docs <https://zarr.readthedocs.io/en/latest/tutorial.html#consolidating-metadata>`_.

diff --git a/doc/related-projects.rst b/doc/related-projects.rst
@@ -3,7 +3,7 @@
 Xarray related projects
 -----------------------
 
-Here below is a list of several existing libraries that build
+Here below is a list of existing open source projects that build
 functionality upon xarray. See also section :ref:`internals` for more
 details on how to build xarray extensions.
 
@@ -39,11 +39,16 @@ Geosciences
 
 Machine Learning
 ~~~~~~~~~~~~~~~~
-- `cesium <http://cesium-ml.org/>`_: machine learning for time series analysis
+- `ArviZ <https://arviz-devs.github.io/arviz/>`_: Exploratory analysis of Bayesian models, built on top of xarray.
 - `Elm <https://ensemble-learning-models.readthedocs.io>`_: Parallel machine learning on xarray data structures
 - `sklearn-xarray (1) <https://phausamann.github.io/sklearn-xarray>`_: Combines scikit-learn and xarray (1).
 - `sklearn-xarray (2) <https://sklearn-xarray.readthedocs.io/en/latest/>`_: Combines scikit-learn and xarray (2).
 
+Other domains
+~~~~~~~~~~~~~
+- `ptsa <https://pennmem.github.io/ptsa_new/html/index.html>`_: EEG Time Series Analysis
+- `pycalphad <https://pycalphad.org/docs/latest/>`_: Computational Thermodynamics in Python
+
 Extend xarray capabilities
 ~~~~~~~~~~~~~~~~~~~~~~~~~~
 - `Collocate <https://github.com/cistools/collocate>`_: Collocate xarray trajectories in arbitrary physical dimensions
@@ -61,9 +66,10 @@ Visualization
 - `hvplot <https://hvplot.pyviz.org/>`_ : A high-level plotting API for the PyData ecosystem built on HoloViews.
 - `psyplot <https://psyplot.readthedocs.io>`_: Interactive data visualization with python.
 
-Other
-~~~~~
-- `ptsa <https://pennmem.github.io/ptsa_new/html/index.html>`_: EEG Time Series Analysis
-- `pycalphad <https://pycalphad.org/docs/latest/>`_: Computational Thermodynamics in Python
+Non-Python projects
+~~~~~~~~~~~~~~~~~~~
+- `xframe <https://github.com/QuantStack/xframe>`_: C++ data structures inspired by xarray.
+- `AxisArrays <https://github.com/JuliaArrays/AxisArrays.jl>`_ and
+  `NamedArrays <https://github.com/davidavdav/NamedArrays.jl>`_: similar data structures for Julia.
 
 More projects can be found at the `"xarray" Github topic <https://github.com/topics/xarray>`_.
diff --git a/doc/whats-new.rst b/doc/whats-new.rst
@@ -28,6 +28,8 @@ Breaking changes
 Enhancements
 ~~~~~~~~~~~~
 
+- Add ``data=False`` option to ``to_dict()`` methods. (:issue:`2656`)
+  By `Ryan Abernathey <https://github.com/rabernat>`_
 - :py:meth:`~xarray.DataArray.coarsen` and
   :py:meth:`~xarray.Dataset.coarsen` are newly added.
   See :ref:`comput.coarsen` for details.
@@ -36,6 +38,11 @@ Enhancements
 - Upsampling an array via interpolation with resample is now dask-compatible,
   as long as the array is not chunked along the resampling dimension.
   By `Spencer Clark <https://github.com/spencerkclark>`_.
+- :py:func:`xarray.testing.assert_equal` and
+  :py:func:`xarray.testing.assert_identical` now provide a more detailed
+  report showing what exactly differs between the two objects (dimensions /
+  coordinates / variables / attributes)  (:issue:`1507`).
+  By `Benoit Bovy <https://github.com/benbovy>`_.
 
 Bug fixes
 ~~~~~~~~~