Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ENH: len for groupby #533

Open
wants to merge 34 commits into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from 6 commits
Commits
Show all changes
34 commits
Select commit Hold shift + click to select a range
35a6a8a
fix some minor issues on formatting
RayJi01 Jun 9, 2023
d6b836a
Merge branch 'xprobe-inc:main' into main
RayJi01 Jun 15, 2023
5376759
Merge branch 'xprobe-inc:main' into main
RayJi01 Jun 16, 2023
e7c6224
first try on implement Len Operands
RayJi01 Jun 16, 2023
a7819c5
first try on implement Len Operands2
RayJi01 Jun 16, 2023
9d647f1
Merge branch 'main' into feature/len_for_groupby
mergify[bot] Jun 16, 2023
98f0a08
issues of reducer_index
RayJi01 Jun 21, 2023
f21a683
issues of reducer_index
RayJi01 Jun 21, 2023
6235355
Merge branch 'xprobe-inc:main' into main
RayJi01 Jun 21, 2023
ba93ce7
Fix tile
UranusSeven Jun 26, 2023
5dbbbb0
len method implemented with random test passed
RayJi01 Jun 28, 2023
4f9106f
try to solve chunk_size issues
RayJi01 Jun 29, 2023
59fc701
multiple chunks(chunk_size) implemented with UT and IT passed
RayJi01 Jun 29, 2023
6be18c2
Merge branch 'xprobe-inc:main' into main
RayJi01 Jun 29, 2023
c66a5b9
Merge branch 'main' into feature/len_for_groupby
mergify[bot] Jun 29, 2023
75edbdf
Merge branch 'main' into feature/len_for_groupby
mergify[bot] Jun 30, 2023
5be25cf
Merge branch 'xprobe-inc:main' into main
RayJi01 Jun 30, 2023
a73eb21
Merge branch 'main' into feature/len_for_groupby
mergify[bot] Jun 30, 2023
150b30c
Merge branch 'main' into feature/len_for_groupby
mergify[bot] Jul 1, 2023
2afbd92
Merge branch 'xprobe-inc:main' into main
RayJi01 Jul 3, 2023
d1b77e3
Merge branch 'xprobe-inc:main' into main
RayJi01 Jul 3, 2023
b899fe0
first try on implement Len Operands
RayJi01 Jun 16, 2023
a39c029
first try on implement Len Operands2
RayJi01 Jun 16, 2023
9710c4c
issues of reducer_index
RayJi01 Jun 21, 2023
472d03c
issues of reducer_index
RayJi01 Jun 21, 2023
b0c3f94
Fix tile
UranusSeven Jun 26, 2023
c356e0f
len method implemented with random test passed
RayJi01 Jun 28, 2023
3285a68
try to solve chunk_size issues
RayJi01 Jun 29, 2023
b2a647c
multiple chunks(chunk_size) implemented with UT and IT passed
RayJi01 Jun 29, 2023
04939bf
Merge branch 'main' into feature/len_for_groupby
mergify[bot] Jul 3, 2023
7178676
Merge remote-tracking branch 'origin/feature/len_for_groupby' into fe…
RayJi01 Jul 3, 2023
33da36e
Merge branch 'main' into feature/len_for_groupby
mergify[bot] Jul 4, 2023
1fd2dfd
Merge branch 'main' into feature/len_for_groupby
mergify[bot] Jul 4, 2023
0cf151b
Merge branch 'main' into feature/len_for_groupby
mergify[bot] Jul 5, 2023
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
3 changes: 2 additions & 1 deletion python/xorbits/_mars/dataframe/groupby/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -12,9 +12,9 @@
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.

# noinspection PyUnresolvedReferences
from ..core import DataFrameGroupBy, GroupBy, SeriesGroupBy
from .len import groupby_len


def _install():
Expand Down Expand Up @@ -63,6 +63,7 @@ def _install():
setattr(cls, "sem", lambda groupby, **kw: agg(groupby, "sem", **kw))
setattr(cls, "nunique", lambda groupby, **kw: agg(groupby, "nunique", **kw))

setattr(cls, "__len__", groupby_len)
setattr(cls, "apply", groupby_apply)
setattr(cls, "transform", groupby_transform)

Expand Down
84 changes: 84 additions & 0 deletions python/xorbits/_mars/dataframe/groupby/len.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,84 @@
import numpy as np
import pandas as pd

from ... import opcodes
from ...core import OutputType
from ...core.operand import Operand, OperandStage
from ..operands import DataFrameOperandMixin


class GroupByLen(DataFrameOperandMixin, Operand):
_op_type_ = opcodes.GROUPBY_LEN

def __call__(self, groupby):
return self.new_scalar([groupby])

Check warning on line 14 in python/xorbits/_mars/dataframe/groupby/len.py

View check run for this annotation

Codecov / codecov/patch

python/xorbits/_mars/dataframe/groupby/len.py#L14

Added line #L14 was not covered by tests

@classmethod
def tile(cls, op: "GroupByLen"):
in_groupby = op.inputs[0]

Check warning on line 18 in python/xorbits/_mars/dataframe/groupby/len.py

View check run for this annotation

Codecov / codecov/patch

python/xorbits/_mars/dataframe/groupby/len.py#L18

Added line #L18 was not covered by tests

# generate map chunks
map_chunks = []

Check warning on line 21 in python/xorbits/_mars/dataframe/groupby/len.py

View check run for this annotation

Codecov / codecov/patch

python/xorbits/_mars/dataframe/groupby/len.py#L21

Added line #L21 was not covered by tests
for chunk in in_groupby.chunks:
map_op = op.copy().reset_key()
map_op.stage = OperandStage.map
map_op.output_types = [OutputType.series]
chunk_inputs = [chunk]

Check warning on line 26 in python/xorbits/_mars/dataframe/groupby/len.py

View check run for this annotation

Codecov / codecov/patch

python/xorbits/_mars/dataframe/groupby/len.py#L23-L26

Added lines #L23 - L26 were not covered by tests

map_chunks.append(map_op.new_chunk(chunk_inputs))

Check warning on line 28 in python/xorbits/_mars/dataframe/groupby/len.py

View check run for this annotation

Codecov / codecov/patch

python/xorbits/_mars/dataframe/groupby/len.py#L28

Added line #L28 was not covered by tests

# generate reduce chunks, we only need one reducer here.
out_chunks = []
reduce_op = op.copy().reset_key()
reduce_op.output_types = [OutputType.scalar]
reduce_op.stage = OperandStage.reduce
out_chunks.append(reduce_op.new_chunk(map_chunks))

Check warning on line 35 in python/xorbits/_mars/dataframe/groupby/len.py

View check run for this annotation

Codecov / codecov/patch

python/xorbits/_mars/dataframe/groupby/len.py#L31-L35

Added lines #L31 - L35 were not covered by tests
RayJi01 marked this conversation as resolved.
Show resolved Hide resolved

# final wrap up:
new_op = op.copy()
params = op.outputs[0].params.copy()
params["nsplits"] = ((np.nan,) * len(out_chunks),)
params["chunks"] = out_chunks
return new_op.new_scalar(op.inputs, **params)

Check warning on line 42 in python/xorbits/_mars/dataframe/groupby/len.py

View check run for this annotation

Codecov / codecov/patch

python/xorbits/_mars/dataframe/groupby/len.py#L38-L42

Added lines #L38 - L42 were not covered by tests

@classmethod
def execute_map(cls, ctx, op):
chunk = op.outputs[0]
in_df_grouped = ctx[op.inputs[0].key]

Check warning on line 47 in python/xorbits/_mars/dataframe/groupby/len.py

View check run for this annotation

Codecov / codecov/patch

python/xorbits/_mars/dataframe/groupby/len.py#L46-L47

Added lines #L46 - L47 were not covered by tests

# grouped object .size() method ensure every unique keys
summary = in_df_grouped.size()
sum_indexes = summary.index

Check warning on line 51 in python/xorbits/_mars/dataframe/groupby/len.py

View check run for this annotation

Codecov / codecov/patch

python/xorbits/_mars/dataframe/groupby/len.py#L50-L51

Added lines #L50 - L51 were not covered by tests

res = []

Check warning on line 53 in python/xorbits/_mars/dataframe/groupby/len.py

View check run for this annotation

Codecov / codecov/patch

python/xorbits/_mars/dataframe/groupby/len.py#L53

Added line #L53 was not covered by tests
for index in sum_indexes:
res.append(index)

Check warning on line 55 in python/xorbits/_mars/dataframe/groupby/len.py

View check run for this annotation

Codecov / codecov/patch

python/xorbits/_mars/dataframe/groupby/len.py#L55

Added line #L55 was not covered by tests

# use series to convey every index store in this level
ctx[chunk.key, 1] = pd.Series(res)

Check warning on line 58 in python/xorbits/_mars/dataframe/groupby/len.py

View check run for this annotation

Codecov / codecov/patch

python/xorbits/_mars/dataframe/groupby/len.py#L58

Added line #L58 was not covered by tests

@classmethod
def execute_reduce(cls, ctx, op: "GroupByLen"):
chunk = op.outputs[0]
input_idx_to_series = dict(op.iter_mapper_data(ctx))
row_idxes = sorted(input_idx_to_series.keys())

Check warning on line 64 in python/xorbits/_mars/dataframe/groupby/len.py

View check run for this annotation

Codecov / codecov/patch

python/xorbits/_mars/dataframe/groupby/len.py#L62-L64

Added lines #L62 - L64 were not covered by tests

res = set()

Check warning on line 66 in python/xorbits/_mars/dataframe/groupby/len.py

View check run for this annotation

Codecov / codecov/patch

python/xorbits/_mars/dataframe/groupby/len.py#L66

Added line #L66 was not covered by tests
for row_index in row_idxes:
row_series = input_idx_to_series.get(row_index, None)
res.update(row_series)

Check warning on line 69 in python/xorbits/_mars/dataframe/groupby/len.py

View check run for this annotation

Codecov / codecov/patch

python/xorbits/_mars/dataframe/groupby/len.py#L68-L69

Added lines #L68 - L69 were not covered by tests

res_len = len(res)
ctx[chunk.key] = res_len

Check warning on line 72 in python/xorbits/_mars/dataframe/groupby/len.py

View check run for this annotation

Codecov / codecov/patch

python/xorbits/_mars/dataframe/groupby/len.py#L71-L72

Added lines #L71 - L72 were not covered by tests

@classmethod
def execute(cls, ctx, op: "GroupByLen"):
if op.stage == OperandStage.map:
cls.execute_map(ctx, op)

Check warning on line 77 in python/xorbits/_mars/dataframe/groupby/len.py

View check run for this annotation

Codecov / codecov/patch

python/xorbits/_mars/dataframe/groupby/len.py#L77

Added line #L77 was not covered by tests
elif op.stage == OperandStage.reduce:
cls.execute_reduce(ctx, op)

Check warning on line 79 in python/xorbits/_mars/dataframe/groupby/len.py

View check run for this annotation

Codecov / codecov/patch

python/xorbits/_mars/dataframe/groupby/len.py#L79

Added line #L79 was not covered by tests


def groupby_len(groupby):
op = GroupByLen()
return op(groupby).execute().fetch()

Check warning on line 84 in python/xorbits/_mars/dataframe/groupby/len.py

View check run for this annotation

Codecov / codecov/patch

python/xorbits/_mars/dataframe/groupby/len.py#L83-L84

Added lines #L83 - L84 were not covered by tests
15 changes: 15 additions & 0 deletions python/xorbits/_mars/dataframe/groupby/tests/test_groupby.py
Original file line number Diff line number Diff line change
Expand Up @@ -525,3 +525,18 @@ def test_groupby_fill():
assert len(r.chunks) == 4
assert r.shape == (len(s1),)
assert r.chunks[0].shape == (np.nan,)


def test_groupby_len(setup):
df = md.DataFrame(
{
"a": ["a", "b", "a", "c"],
"b": [0.1, 0.2, 0.3, 0.4],
"c": ["aa", "bb", "cc", "aa"],
}
)

grouped = df.groupby("b")

num_groups = len(grouped)
print(num_groups)
Original file line number Diff line number Diff line change
Expand Up @@ -1886,3 +1886,16 @@ def test_series_groupby_rolling_agg(setup, window, min_periods, center, closed,
mresult = mresult.execute().fetch()

pd.testing.assert_series_equal(presult, mresult.sort_index())


def test_grouby_len(setup):
df = md.DataFrame(
RayJi01 marked this conversation as resolved.
Show resolved Hide resolved
{
"a": ["a", "b", "a", "c"],
"b": [0.1, 0.2, 0.3, 0.4],
"c": ["aa", "bb", "cc", "aa"],
}
)
grouped = df.groupby("b")

print(len(grouped))
2 changes: 2 additions & 0 deletions python/xorbits/_mars/opcodes.py
Original file line number Diff line number Diff line change
Expand Up @@ -390,6 +390,7 @@
APPLYMAP = 742
PIVOT = 743
PIVOT_TABLE = 744
LEN = 745

FUSE = 801

Expand Down Expand Up @@ -434,6 +435,7 @@
GROUPBY_SORT_REGULAR_SAMPLE = 2037
GROUPBY_SORT_PIVOT = 2038
GROUPBY_SORT_SHUFFLE = 2039
GROUPBY_LEN = 2064

# parallel sorting by regular sampling
PSRS_SORT_REGULAR_SMAPLE = 2040
Expand Down
Loading