Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Migrate quantile.pxd to pylibcudf #15874

Merged
merged 13 commits into from
Jun 6, 2024

Conversation

lithomas1
Copy link
Contributor

@lithomas1 lithomas1 commented May 28, 2024

Description

xref #15162

Migrate quantile.pxd to use pylibcudf APIs.

Checklist

  • I am familiar with the Contributing Guidelines.
  • New or existing tests cover these changes.
  • The documentation is up to date with these changes.

@lithomas1 lithomas1 added feature request New feature or request non-breaking Non-breaking change labels May 28, 2024
@github-actions github-actions bot added Python Affects Python cuDF API. CMake CMake build issue labels May 28, 2024
@vyasr vyasr added the pylibcudf Issues specific to the pylibcudf package label May 28, 2024
@lithomas1 lithomas1 marked this pull request as ready for review May 29, 2024 13:37
@lithomas1 lithomas1 requested a review from a team as a code owner May 29, 2024 13:37
@lithomas1 lithomas1 requested review from bdice and isVoid May 29, 2024 13:37
python/cudf/cudf/_lib/pylibcudf/quantiles.pyx Show resolved Hide resolved
python/cudf/cudf/_lib/pylibcudf/quantiles.pyx Outdated Show resolved Hide resolved
python/cudf/cudf/_lib/pylibcudf/quantiles.pyx Show resolved Hide resolved
python/cudf/cudf/_lib/pylibcudf/quantiles.pyx Outdated Show resolved Hide resolved
python/cudf/cudf/_lib/pylibcudf/quantiles.pyx Show resolved Hide resolved
python/cudf/cudf/_lib/pylibcudf/quantiles.pyx Show resolved Hide resolved
python/cudf/cudf/_lib/pylibcudf/types.pxd Outdated Show resolved Hide resolved
python/cudf/cudf/pylibcudf_tests/conftest.py Outdated Show resolved Hide resolved
python/cudf/cudf/pylibcudf_tests/conftest.py Outdated Show resolved Hide resolved
python/cudf/cudf/pylibcudf_tests/test_quantiles.py Outdated Show resolved Hide resolved
python/cudf/cudf/_lib/pylibcudf/quantiles.pyx Outdated Show resolved Hide resolved
python/cudf/cudf/_lib/quantiles.pyx Outdated Show resolved Hide resolved
@lithomas1 lithomas1 requested a review from a team as a code owner June 3, 2024 23:57
@lithomas1 lithomas1 requested a review from nvdbaranec June 3, 2024 23:57
@github-actions github-actions bot added the libcudf Affects libcudf (C++/CUDA) code. label Jun 3, 2024
@lithomas1
Copy link
Contributor Author

I dropped the memoryviews back in favor of directly typing as vector.

I originally wanted to do a memoryview since I thought it'd be better performance wise, but now I realize that Cython can probably figure that out itself when doing the assignment (hopefully it's not going through the Python sequence protocol for that).

@lithomas1 lithomas1 requested review from wence- and vyasr June 4, 2024 23:12
Copy link
Contributor

@vyasr vyasr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Regarding the quantiles argument and how we should type it, here's what I expect Cython will do:

  • Typed memoryview: If the object in question exposes the buffer protocol, the function will receive a zero-copy view. If the object does not, it will error. This will work correctly for e.g. numpy arrays, but fail for lists/tuples/etc.
  • vector[double]: This will work for any iterable that contains doubles. Narrower types should be transparently upcasted, but that's worth testing (e.g. if you pass a tuple of ints, make sure that works).

So I think vector[double] is a good choice here. It should work fine from pure Python as well.

We seem to have lost the changes to the legacy _lib/quantiles.pyx though.

Copy link
Contributor

@wence- wence- left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor (non-blocking suggestion), but we should plumb backwards in the cudf-classic cython (as @vyasr noted)


Ignored if `is_input_sorted` is `Sorted.YES`
null_precedence: list, default None
A list of :py:class:`~cudf._lib.pylibcudf.types.NullOrder` enums,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps:

Suggested change
A list of :py:class:`~cudf._lib.pylibcudf.types.NullOrder` enums,
A list of :py:class:`~.types.NullOrder` enums,

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks, updated.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nevermind, I don't think sphinx is able to find the enum as a class generally - I'm getting errors where sphinx isn't able to find the reference.
(maybe since the enum doesn't have a docstring).

I changed it back to regular backtics.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We will definitely have to do some work on improving documentation of pylibcudf before the end. I'm hoping to find time between now and then to improve enum support in Cython in some of the edge cases that tend to bite us.

Comment on lines +21 to +28
@pytest.fixture(scope="module", params=[[1, 2, 3, 4, 5], [5, 4, 3, 2, 1]])
def pa_col_data(request, numeric_pa_type):
return pa.array(request.param, type=numeric_pa_type)


@pytest.fixture(scope="module")
def plc_col_data(pa_col_data):
return plc.interop.from_arrow(pa_col_data)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe combine these as we discussed in the other PR? Or you can wait for a follow-up, whatever you prefer.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the review, I'm planning on addressing in followup.

@lithomas1
Copy link
Contributor Author

/merge

@rapids-bot rapids-bot bot merged commit 5f45803 into rapidsai:branch-24.08 Jun 6, 2024
70 checks passed
@lithomas1 lithomas1 deleted the pylibcudf-quantile branch June 6, 2024 20:00
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CMake CMake build issue feature request New feature or request libcudf Affects libcudf (C++/CUDA) code. non-breaking Non-breaking change pylibcudf Issues specific to the pylibcudf package Python Affects Python cuDF API.
Projects
Status: Done
Development

Successfully merging this pull request may close these issues.

3 participants