Skip to content

Why does groupby('time').median() rechunk a dataArray when groupby(time').mean() or max() does not? #8689

Answered by dcherian
ZZMitch asked this question in General
Discussion options

You must be logged in to vote

Medians, and quantiles in general, can only be exactly calculated if you sort or partition the array in memory. (Not totally true, you can trade off amount of data in memory vs number of passes over data; but I'm having trouble finding the reference now. In any case, no one's implemented it for xarray/dask).

That means you have to rechunk. Dask does this for you and ends up rearranging the other axes.

In this case, since you know these are duplicates you could just do groupby("time").first()?

Replies: 1 comment 2 replies

Comment options

You must be logged in to vote
2 replies
@ZZMitch
Comment options

@dcherian
Comment options

Answer selected by ZZMitch
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
2 participants