Skip to content
Aleksandar Vitorovic edited this page Jun 3, 2016 · 12 revisions

For the SQL interface, we provide a cost-based query optimizer. The optimizer parses SQL, and decides on the best join order and parallelism for each component. It aims to minimize latency and the number of machines used, and to maximize throughput. The optimizer automatically assigns operators to components, specifies the connection between components, and assigns the parallelism for each component. The key ideas are:

  • Universal producer-consumer balance (the speed balance holds among any two communicating machines) and
  • Application-level batching (we programmatically set the batch sizes and we find a sweet spot between latency and throughput).

The optimizer applies various optimizations on the query plan, which includes pushing down selections and projections, and common subexpression elimination. For convenience, we also provide versions of the optimizer in which a user can manually specify parallelism and/or join ordering.

The pre-bundled sql queries are available here. They use TPC-H scheme which is specified here. Of course, users can write their own queries (by modifying DIP_QUERY_NAME and DIP_SQL_ROOT in the configuration file) over their own schema (by modifying DIP_SCHEMA_PATH in the configuration file).

Implementation

We have implemented several query optimizers. Two major types are: INDEX and NAME optimizers. The INDEX ones are quite simple, and they add projection operators after query plan is fully created. The NAME ones are much more sophisticated. They support pushing up selections and projections. They also perform common subexpression elimination. That is, if only expressions are used downstream a component in the query plan, the component sends only expressions (rather than the all the corresponding fields). To do so, each component decides on its output scheme based on the fields/expressions that are needed downstream in the query plan. The NAME optimizers also use cardinality estimation to assign parallelism to components (CostParallelismAssigner class is responsible for this functionality).

There are two INDEX optimizers:

  • INDEX_SIMPLE: It generates lefty plans in the exactly the same order as specified in SQL query JOIN ON syntax. No projections except on the very last component.
  • INDEX_RULE_BUSH: It generates bushy plans using rules such as ensuring that a smaller relation comes earlier in the join order.

In both cases, parallelism is generated using the position of a component in the hierarchy, and not by using cardinality information.

There are several NAME optimizers:

  • NAME_MANUAL_PAR_LEFTY: Both query plan and parallelism is manually specified.
  • NAME_MANUAL_COST_LEFTY: Query plan is explicitly specified, as well as total parallelism for spouts.
  • NAME_RULE_LEFTY: Query plan is built such that the smallest component which can be joined is joined first. Total parallelism of sources also has to be specified.
  • NAME_COST_LEFTY: The optimizer starts from the data sources and adds the operators one after another, pushing selections and projections as close as possible to the data sources, and performing common subexpression elimination. Where possible, the optimizer collocates operators to components to minimize network transfers. Furthermore, it assigns the right parallelism to each component, such that a component is neither overloaded nor mostly idle. To find an optimal join order and component parallelism, the optimizer uses a dynamic programming (Selinger-style) algorithm, which requires cardinality estimation. The dynamic programming algorithm works as follows. For the subplans producing the same intermediate result, we prune all the subplans except the one requiring the smallest number of nodes. A user specifies the total parallelism of data sources.
  • NAME_MANUAL_BATCHING: Query plan is optimized for latency. Batch size for each component is set by the optimizer.

CostParallelismAssigner.PARALLELISM_PARENTS is a constant that defines how to set up parallelism of a component based on the parallelisms of the parent (upstream) components. Depending on your software/hardware architecture (and relative CPU and network costs), you might need to change this constant.

Supported features

Squall is mainly concerned with aggregation queries. It supports a wide range of SQL queries, and it also supports some features which are outside SQL Standard:

  • Specifying GROUP BY (for example in an aggregation) is possible not only on a column, but also on a Projection over a tuple
  • DISTINCT can operate on multiple fields

Currently, the SQL interface recognizes ANSI SQL syntax, and it instantiates hash joins with full-history semantics.