Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Optimize OrderBy operator by new layout #10949

Open
jinchengchenghh opened this issue Sep 9, 2024 · 3 comments
Open

Optimize OrderBy operator by new layout #10949

jinchengchenghh opened this issue Sep 9, 2024 · 3 comments
Labels
enhancement New feature or request

Comments

@jinchengchenghh
Copy link
Contributor

jinchengchenghh commented Sep 9, 2024

Description

With this benchmark #10041, I find addInput and addOutput expire most of the time.

addInput: addInputTiming: storeRows to RowContainer
noMoreInput: finishtiming: sort the data
getOutput: getOutputTiming: extract RowVector from RowContainer

The benchmark result shows:

OrderBy_no-payload_2_bigint_0.01k                            7.12s   140.50m
Total 7.25s Input 209.43ms Output 242.97ms Finish 75.55ms
OrderBy_no-payload_4_bigint_10k                            10.54ms     94.85
Total 40.09ms Input 10.61ms Output 381.28us Finish 1.00us
OrderBy_no-payloads_1_varchar_100k                         79.13ms     12.64
Total 106.43ms Input 24.12ms Output 4.50ms Finish 10.57us

So we need to optimize the addInput and getOutput.
Further more, the prefix sort has several limitations,

  1. Iterate all supported types to encode one data, performance will deteriorate as long as we support more types;
  2. Access all the data again (first access is in storeRows to RowContainer), https://github.com/facebookincubator/velox/blob/main/velox/exec/PrefixSort.cpp#L53

I have a proposal to optimize OrderBy operator when the key types are fully supported by prefix sort.

For addInput, save the RowVecotr to UnsafeRow like layout but with prefix.
UnsafeRow layout:

* Each tuple has three parts: [null-tracking bit set] [values] [variable length portion]
 *
 * The null-tracking bit set is aligned to 8-byte word boundaries. It stores one bit per field.
 *
 * In the `values` region, we store one 8-byte word per field. For fields that hold fixed-length
 * primitive types, such as long, double, or int, we store the value directly in the word. For
 * fields with non-primitive or variable-length values, we store a relative offset (w.r.t. the
 * base address of the row) that points to the beginning of the variable-length field, and length
 * (they are combined into a long).

Allocate the buffer by UnsafeRowFast::rowSize, (if this estimate expense is too heavy, we could use RowVector::estimateFlatSize) serialize the RowVector to a continuous buffer. Save all the rows to std::vector<BufferPtr> buffers and std::vector<size_t> offset for each BufferPtr.
We could add new part prefix.

Each tuple has six parts: [prefix][buffer-idx][row-number] [null-tracking bit set] [values] [variable length portion]
prefix will be all the key types. It is fixed width which length depending on the key encodedSize, now it is `numKeys * sizeOf<T> + 1`. For the variable length type such as String, if the prefix is equal, we will compare the actual record.

buffer-idx is the std::vector<BufferPtr> buffers index.

row-number is the row number of serialized rows.

For noMoreInput, sort the prefix.
For getOutput, get the rows from sorted prefix, we can get any data with buffer-idx(get startMemoryAddress) and row-number.

In this PR, we get the start memory address of serialized rows, and compute the start address of each row, and compute the fieldOffset to get the data address. const uint8_t* srcPtr = (memoryAddress + offsets[row] + fieldOffset);

#10797

This is a first proposal, need more detail, and I will help to implement it.

Spill supports will be next stage.
This optimization will be in SortBuffer, so ParquetWrite will also benefit from it.

@jinchengchenghh jinchengchenghh added the enhancement New feature or request label Sep 9, 2024
@jinchengchenghh
Copy link
Contributor Author

The layout will be

Each tuple has six parts: [prefix][vector-idx][row-number]
prefix, same as before
vector-idx stores the input to std::vector<VectorPtr> inputs_
row-number, same as before

store the VectorPtr and reconstruct the RowVector by row-number after sorting.
I may also try to support the first proposal and compare the performance by benchmark when I'm free.

@xiaoxmeng
Copy link
Contributor

@jinchengchenghh do you know how Spark java sort by operator works? Does it store a prefix to accelerate sort? Thanks!

@jinchengchenghh
Copy link
Contributor Author

In this comments #10385 (comment), I comment Spark sort.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants