Skip to content

Latest commit

 

History

History
140 lines (110 loc) · 6.34 KB

ARCHITECTURE.md

File metadata and controls

140 lines (110 loc) · 6.34 KB

Deimos Architecture

Deimos is the third of three libraries that add functionality on top of each other:

  • RubyKafka is the low-level Kafka client, providing API's for producers, consumers and the client as a whole.
  • Phobos is a lightweight wrapper on top of RubyKafka that provides threaded consumers, a simpler way to write producers, and lifecycle management.
  • Deimos is a full-featured framework using Phobos as its base which provides schema integration (e.g. Avro), database integration, metrics, tracing, test helpers and other utilities.

Folder structure

As of May 12, 2020, the following are the important files to understand in how Deimos fits together:

  • lib/generators: Generators to generate database migrations, e.g. for the DB Poller and DB Producer features.
  • lib/tasks: Rake tasks for starting consumers, DB Pollers, etc.
  • lib/deimos: Main Deimos code.
  • lib/deimos/deimos.rb: The bootstrap / startup code for Deimos. Also provides some global convenience methods and (for legacy purposes) the way to start the DB Producer.
  • lib/deimos/backends: The different plug-in producer backends - e.g. produce directly to Kafka, use the DB backend, etc.
  • lib/deimos/schema_backends: The different plug-in schema handlers, such as the various flavors of Avro (with/without schema registry etc.)
  • lib/deimos/metrics: The different plug-in metrics providers, e.g. Datadog.
  • lib/deimos/tracing: The different plug-in tracing providers, e.g. Datadog.
  • lib/deimos/utils: Utility classes for things not directly related to producing and consuming, such as the DB Poller, DB Producer, lag reporter, etc.
  • lib/deimos/config: Classes related to configuring Deimos.
  • lib/deimos/monkey_patches: Monkey patches to existing libraries. These should be removed in a future update.

Features

Producers and Consumers

Both producers and consumers include the SharedConfig module, which standardizes configuration like schema settings, topic, keys, etc.

Consumers come in two flavors: Consumer and BatchConsumer. Both include BaseConsumer for shared functionality.

While producing messages go to Kafka by default, literally anything else can happen when your producer calls produce, by swapping out the producer backend. This is just a file that needs to inherit from Deimos::Backends::Base and must implement a single method, execute.

Producers have a complex workflow while processing the payload to publish. This is aided by the Deimos::Message class (not to be confused with the KafkaMessage class, which is an ActiveRecord used by the DB Producer feature, below).

Schemas

Schema backends are used to encode and decode payloads into different formats such as Avro. These are integrated with producers and consumers, as well as test helpers. These are a bit more involved than producer backends, and must define methods such as:

  • encode a payload or key (when encoding a key, for Avro a key schema may be auto-generated)
  • decode a payload or key
  • validate that a payload is correct for encoding
  • coerce a payload into the given schema (e.g. turn ints into strings)
  • Get a list of schema_fields in the configured schema, used when interacting with ActiveRecord
  • Define a mock backend when the given backend is used. This is used during testing. Typically mock backends will validate values but not actually encode/decode them.

Configuration

Deimos uses the https://www.github.com/flipp_oss/fig_tree gem for configuration.

The configuration definition for Deimos is in config/configuration.rb. In addition, there are methods in config/phobos_config.rb which translate to/from the Phobos configuration format and support the old phobos.yml method of configuration.

Metrics and Tracing

These are simpler than other plugins and must implement the expected methods (increment, gauge, histogram and time for metrics, and start, finish and set_error for tracing). These are used primarily in producers and consumers.

ActiveRecord Integration

Deimos provides an ActiveRecordConsumer and ActiveRecordProducer. These are relatively lightweight ways to save data into a database or read it off the database as part of app logic. It uses things like the coerce method of the schema backends to manage the differences between the given payload and the configured schema for the topic.

Database Backend / Database Producer

This feature (which provides better performance and transaction guarantees) is powered by two components:

  • The db publish backend, which saves messages to the database rather than to Kafka;
  • The DbProducer utility, which runs as a separate process, pulls data from the database and sends it to Kafka.

There are a set of utility classes that power the producer, which are largely copied from Phobos:

  • Executor takes a set of "runnable" things (which implement a start and stop method) puts them in a thread pool and runs them all concurrently. It manages starting and stopping all threads when necessary.
  • SignalHandler wraps the Executor and handles SIGINT and SIGTERM signals to stop the executor gracefully.

In the case of this feature, the DbProducer is the runnable object - it can run several threads at once.

On the database side, the ActiveRecord models that power this feature are:

  • KafkaMessage: The actual message, saved to the database. This message is already encoded by the producer, so only has to be sent.
  • KafkaTopicInfo: Used for locking topics so only one producer can work on it at once.

A Rake task (defined in deimos.rake) can be used to start the producer.

Database Poller

This feature (which periodically polls the database to send Kafka messages) primarily uses other aspects of Deimos and hence is relatively small in size. The DbPoller class acts as a "runnable" and is used by an Executor (above). The PollInfo class is saved to the database to keep track of where each poller is up to.

A Rake task (defined in deimos.rake) can be used to start the pollers.

Other Utilities

The utils folder also contains the LagReporter (which sends metrics on lag) and the InlineConsumer, which can read data from a topic and directly pass it into a handler or save it to memory.