Optimize OrderBy operator by new layout #10949

jinchengchenghh · 2024-09-09T03:45:49Z

Description

With this benchmark #10041, I find addInput and addOutput expire most of the time.

addInput: addInputTiming: storeRows to RowContainer
noMoreInput: finishtiming: sort the data
getOutput: getOutputTiming: extract RowVector from RowContainer

The benchmark result shows:

OrderBy_no-payload_2_bigint_0.01k                            7.12s   140.50m
Total 7.25s Input 209.43ms Output 242.97ms Finish 75.55ms
OrderBy_no-payload_4_bigint_10k                            10.54ms     94.85
Total 40.09ms Input 10.61ms Output 381.28us Finish 1.00us
OrderBy_no-payloads_1_varchar_100k                         79.13ms     12.64
Total 106.43ms Input 24.12ms Output 4.50ms Finish 10.57us

So we need to optimize the addInput and getOutput.
Further more, the prefix sort has several limitations,

Iterate all supported types to encode one data, performance will deteriorate as long as we support more types;
Access all the data again (first access is in storeRows to RowContainer), https://github.com/facebookincubator/velox/blob/main/velox/exec/PrefixSort.cpp#L53

I have a proposal to optimize OrderBy operator when the key types are fully supported by prefix sort.

For addInput, save the RowVecotr to UnsafeRow like layout but with prefix.
UnsafeRow layout:

* Each tuple has three parts: [null-tracking bit set] [values] [variable length portion]
 *
 * The null-tracking bit set is aligned to 8-byte word boundaries. It stores one bit per field.
 *
 * In the `values` region, we store one 8-byte word per field. For fields that hold fixed-length
 * primitive types, such as long, double, or int, we store the value directly in the word. For
 * fields with non-primitive or variable-length values, we store a relative offset (w.r.t. the
 * base address of the row) that points to the beginning of the variable-length field, and length
 * (they are combined into a long).

Allocate the buffer by UnsafeRowFast::rowSize, (if this estimate expense is too heavy, we could use RowVector::estimateFlatSize) serialize the RowVector to a continuous buffer. Save all the rows to std::vector<BufferPtr> buffers and std::vector<size_t> offset for each BufferPtr.
We could add new part prefix.

Each tuple has six parts: [prefix][buffer-idx][row-number] [null-tracking bit set] [values] [variable length portion]
prefix will be all the key types. It is fixed width which length depending on the key encodedSize, now it is `numKeys * sizeOf<T> + 1`. For the variable length type such as String, if the prefix is equal, we will compare the actual record.

buffer-idx is the std::vector<BufferPtr> buffers index.

row-number is the row number of serialized rows.

For noMoreInput, sort the prefix.
For getOutput, get the rows from sorted prefix, we can get any data with buffer-idx(get startMemoryAddress) and row-number.

In this PR, we get the start memory address of serialized rows, and compute the start address of each row, and compute the fieldOffset to get the data address. const uint8_t* srcPtr = (memoryAddress + offsets[row] + fieldOffset);

#10797

This is a first proposal, need more detail, and I will help to implement it.

Spill supports will be next stage.
This optimization will be in SortBuffer, so ParquetWrite will also benefit from it.

The text was updated successfully, but these errors were encountered:

jinchengchenghh · 2024-09-11T01:06:28Z

The layout will be

Each tuple has six parts: [prefix][vector-idx][row-number]
prefix, same as before
vector-idx stores the input to std::vector<VectorPtr> inputs_
row-number, same as before

store the VectorPtr and reconstruct the RowVector by row-number after sorting.
I may also try to support the first proposal and compare the performance by benchmark when I'm free.

xiaoxmeng · 2024-09-12T06:36:13Z

@jinchengchenghh do you know how Spark java sort by operator works? Does it store a prefix to accelerate sort? Thanks!

jinchengchenghh · 2024-09-12T06:38:03Z

In this comments #10385 (comment), I comment Spark sort.

jinchengchenghh added the enhancement New feature or request label Sep 9, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Optimize OrderBy operator by new layout #10949

Optimize OrderBy operator by new layout #10949

jinchengchenghh commented Sep 9, 2024 •

edited

Loading

jinchengchenghh commented Sep 11, 2024

xiaoxmeng commented Sep 12, 2024

jinchengchenghh commented Sep 12, 2024

Optimize OrderBy operator by new layout #10949

Optimize OrderBy operator by new layout #10949

Comments

jinchengchenghh commented Sep 9, 2024 • edited Loading

Description

jinchengchenghh commented Sep 11, 2024

xiaoxmeng commented Sep 12, 2024

jinchengchenghh commented Sep 12, 2024

jinchengchenghh commented Sep 9, 2024 •

edited

Loading