Skip to content
ElSeidy edited this page Dec 15, 2014 · 3 revisions

The following is a list of potential projects to extend Squall. If you're interested or have additional proposals, please communicate with us your suggestions and get your name to the Contributors list.

Front-End: The goal of this project is to make the Squall interface more interactive and to improve connectivity to other important systems. This project includes

  • Adding more data sources, e.g., Kafka and Kestrel
  • adding support for reading from the HDFS
  • An interesting feature is to allow stopping and resuming online processing using the signals library.
  • Adding support for Cassandra as the datastore
  • Allowing a user to specify annotations in the SQL generator. One example is the join order annotations.
  • The ability to manually change the code of the generated query plan in an easy way.

Result Representation: This project (part of the front-end) aims to visually represent the continuously changing results. In addition to building a nice GUI interface, this project includes the following technical challenges:

  • A communication protocol between the last-level Squall operator and the visualization system.
  • The possibility of defining the rate of updating the results in the interface, i.e., doing that too frequently implies high throughput, possibly reducing the network speed between the operators.
  • The visualization needs to be flexible enough to represent different data layouts and to support very large results.
  • It should be seamlessly integrated with the front-end. This includes a possibility to show a progress bar, if the input data is static. Ideally, we are interested in using the D3.js library for the visualizations.

Approximate Query Processing: This project exploits approximate query processing techniques. The main idea is to reduce the number of processing units used, while providing guaranteeing the result quality, i.e., the running results are within certain error bounds from the exact answer. To achieve this, it requires implementing state-of-the-art distributed sketching techniques, as described in "Sketch-based Geometric Monitoring of Distributed Stream Queries, VLDB 2013" and "Sketch-­based Querying of Distributed Sliding­ Window Data Streams, VLDB 2012".