-
Notifications
You must be signed in to change notification settings - Fork 884
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
JSON tree algorithm code reorg #16836
base: branch-24.10
Are you sure you want to change the base?
JSON tree algorithm code reorg #16836
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Approved trivial CMake changes
bitmask_type* validity; | ||
}; | ||
|
||
std::pair<cudf::detail::host_vector<uint8_t>, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we change ignore_vals
returned here to a bool vector? We can also move this change to #16545
EDIT: Similar suggestion for is_pruned
and is_str_column_all_nulls
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IIRC, I've seen concerns raised in the past regarding the use of bool
vectors, and recommendations to use int8_t
/uint8_t
instead. I didn't think host_vector<bool>
would be plagued with the same problems as std::vector<bool>
, but I could be wrong.
Watching this conversation for @karthikeyann's opinion.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@mythrocks That's right. 💯
std::vector<bool>
stores each element as a bit (to save memory). so, iteration and writing to data pointer causes issues.
cudf::detail::host_vector
is based on thrust::host_vector
which are simple vectors. (without any specialization for bool). So, that should be suitable.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, thrust::host_vector<bool>
will use byte-packing (not bits).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The re-org looks good to me, FWIW. But I'm curious about @shrshi's suggestion regarding bool
vectors.
Description
This PR moves JSON host tree algorithms to separate file.
This code movement will help #16545 review easier.
The code is moved to new file and reorganized for code reuse.
Very long function
make_device_json_column
is split intoreduce_to_column_tree
callbuild_tree
scatter_offsets
No new functionality is added in this PR.
Checklist