Skip to content

Setting up a virtual cluster using Wirbelsturm

Daniel Espino Timón edited this page Aug 2, 2015 · 1 revision

There is some situations in which it is interesting to use Squall in distributed mode without requiring a real cluster. Wirbelsturm makes it very easy to set up a virtual cluster in your machine.

After following the quickstart instructions for Wirbelsturm, it is straightforward to use Squall with Wirbelsturm. However, there are some things that can be done to make it easier to use.

Setting up port forwarding

By default, if you want to submit a storm topology to the Wirbelsturm cluster you need to ssh into one of the machines, but it is much more convenient to forward port 6627 from the storm master. For this, open wirbelsturm.yaml, find the storm_master section and edit it so that it looks like this:

  # Deploys Storm master machines running Storm's Nimbus and UI daemons.
  storm_master:
    # Unless you know what you are doing there should be only one Storm master (Nimbus).
    count: 1
    # Do not change the hostname prefix because it is used in the Hiera YAML configuration data.
    hostname_prefix: nimbus
    ip_range_start: 10.0.0.250
    node_role: storm_master
    providers:
      virtualbox:
        memory: 768
        forwarded_ports:
          - guest: 8080 # Storm UI
            host: 28080
          - guest: 6627
            host: 6627
      aws:
        instance_type: m1.xlarge
        ami: ami-abc12345
        security_groups:
          - wirbelsturm

(The important part is the addition of port 6627 to the list of forwarded ports).

After redeploying the virtual cluster, you can edit your storm.yaml file and set up localhost as your nimbus.host, thus making it possible to submit storm topologies using the storm client.

Setting up data directory

For testing the default Squall configurations, it is important that the workers have access to the data directory. It is possible to just copy this to all the worker VMs, but it is much more convenient to edit Vagrantfile and add the data folder to the list of synced folders.

For example, if the Squall repository is in /home/bob/squall/, you can search for the lines:

      c.vm.synced_folder "./", "/vagrant", disabled: true
      c.vm.synced_folder "shared/", "/shared"

and edit the file to add this line just after:

      c.vm.synced_folder "./", "/vagrant", disabled: true
      c.vm.synced_folder "shared/", "/shared"
      c.vm.synced_folder "/home/bob/squall/test/data", "/data"