Skip to content

Commit

Permalink
Update the documentation
Browse files Browse the repository at this point in the history
  • Loading branch information
WardLT committed Jan 5, 2024
1 parent a757fbf commit ad38514
Show file tree
Hide file tree
Showing 7 changed files with 61 additions and 54 deletions.
2 changes: 1 addition & 1 deletion colmena/task_server/globus.py
Original file line number Diff line number Diff line change
Expand Up @@ -29,7 +29,7 @@ class GlobusComputeTaskServer(FutureBasedTaskServer):
`registers <https://funcx.readthedocs.io/en/latest/sdk.html#registering-functions>`_
the wrapped function with Globus Compute.
You must also provide a Globus Compute :class:`~globus_compute_sdk.client.Client`
that the task server will use to authenticate with the web service.
that the task server will use to authenticate with the web service.
The task server works using Globus Compute's :class:`~globus_compute_sdk.executor.Executor`
to communicate to the web service over a web socket.
Expand Down
93 changes: 47 additions & 46 deletions docs/design.rst
Original file line number Diff line number Diff line change
@@ -1,9 +1,8 @@
Design
======

Colmena is a library for applications that steer
Colmena is a library for building applications that steer
ensembles of simulations running on distributed computing resources.
We describe the concepts behind Colmena here.

Key Concepts
------------
Expand All @@ -19,20 +18,21 @@ delegates them to the Doer.
"Thinker": Planning Agent
+++++++++++++++++++++++++

The "Thinker" is defines the strategy for a computational campaign.
The "Thinker" defines the strategy for a computational campaign.
The strategy is expressed by a series of "agents" that identify
which computations to run and adapt to their results.
As `demonstrated in our optimization examples <how-to#creating-a-thinker-application>`_,
complex strategies are simple if broken into many agents.


"Doer": task server
"Doer": Task Server
+++++++++++++++++++

The "Doer" server accepts tasks specification, deploys tasks on remote services
and sends results back to the Thinker agent(s).
Doers are interfaces to workflow engies, such as `Parsl <https://parsl-project.org>`_
or `FuncX <https://funcx.org/>`_.
The "Doer" server accepts tasks specification from the Thinker,
deploys tasks on remote services
and sends results back to the Thinker.
Doers are interfaces to workflow engines, such as `Parsl <https://parsl-project.org>`_
or `Globus Compute <https://funcx.org/>`_.

Implementation
--------------
Expand All @@ -47,7 +47,7 @@ Client

The "Thinker" process is a Python program that runs a separate thread for each agent.

Agents are functions that define which computations to run by sending *task request*
Agents are functions that define which computations to run by sending *task requests*
to a task server or reading *results* from a queue.
Results are returned in the order they are completed.

Expand Down Expand Up @@ -109,8 +109,8 @@ Communication
+++++++++++++

Task requests and results are communicated between Thinker and Doer via queues.
Thinkers submit a task request to one queue and receive results in a second as soon as possible it completes.
Users can also denote tasks with a "topics" if there are tasks used by different agents.
Thinkers submit a task request to one queue and receive results in a second as soon it completes.
Users can also denote tasks with a "topic" to separate tasks used by different agents.

The easiest-to-configure queue, :class:`~colmena.queue.python.PipeQueues`, is based on Python's multiprocessing Pipes.
Creating it requires no other services or configuration beyond the topics:
Expand Down Expand Up @@ -145,51 +145,52 @@ by illustrating a typical :class:`~colmena.models.Result` object.
"method": "reduce",
"value": 2,
"success": true,
"time_created": 1593498015.132477,
"time_input_received": 1593498015.13357,
"time_compute_started": 1593498018.856764,
"time_result_sent": 1593498018.858268,
"time_result_received": 1593498018.860002,
"time_running": 1.8e-05,
"time_serialize_inputs": 4.07e-05,
"time_deserialize_inputs": 4.28-05,
"time_serialize_results": 3.32e-05,
"time_deserialize_results": 3.30e-05,
"timestamps": {
"created": 1593498015.132477,
"input_received": 1593498015.13357,
"compute_started": 1593498018.856764,
"result_sent": 1593498018.858268,
"result_received": 1593498018.860002
},
"time": {
"running": 1.8e-05,
"serialize_inputs": 4.07e-05,
"deserialize_inputs": 4.28-05,
"serialize_results": 3.32e-05,
"deserialize_results": 3.30e-05
}
}
**Launching Tasks**: A client creates a task request at ``time_created`` and adds the the input
**Launching Tasks**: A client creates a task request at ``timestamp.created`` and adds the the input
specification (``method`` and ``inputs``) to an "outbound" Redis queue. The task request is formatted
in the JSON format defined above with only the ``method``, ``inputs`` and ``time_created`` fields
populated. The task inputs are then serialized (``time_serialize_inputs``) and send using
the Redis Queue to the task server.
The serialization method is communicated along with the inputs.
in the JSON format defined above with only the ``method``, ``inputs`` and ``timestamp.created`` fields
populated. The task inputs are then serialized (``time.serialize_inputs`` records the execution time)
and passed via the queue to the Task Server.

**Task Routing**: The task server reads the task request from the outbound queue at ``time_input_received``
**Task Routing**: The task server reads the task request from the outbound queue at ``timestamp.input_received``
and submits the task to the distributed workflow engine.
The method definitions in the task server denote on which resources they can run,
and Parsl chooses when and to which resource to submit tasks.

**Computation**: A Parsl worker starts a task at ``time_compute_started``.
The task inputs are deserialized (``time_deserialize_inputs``),
the requested work is executed (``time_running``),
and the results serialized (``time_serialize_results``).
**Computation**: A Parsl worker starts a task at ``timestamp.compute_started``.
The task inputs are deserialized (``time.deserialize_inputs``),
the requested work is executed (``time.running``),
and the results serialized (``time.serialize_results``).

**Result Communication**: The task server adds the result to the task specification (``value``) and
sends it back to the client in an "inbound" queue at (``time_result_sent``).
sends it back to the client in an "inbound" queue at (``timestamp.result_sent``).

**Result Retrieval**: The client retrieves the message from the inbound queue.
The result is deserialized (``time_deserialize_result``) and returned
back to the client at ``time_result_received``.

The overall efficiency of the task system can be approximated by comparing the ``time_running``, which
denotes the actual time spent executing the task on the workers, to the difference between the ``time_created``
and ``time_returned`` (i.e., the round-trip time).
Comparing round-trip time and ``time_running`` captures both the overhead of the system and any time
waiting in a queue for other tasks to complete and must be viewed carefully.

The overhead specific to Colmena (i.e., and not Parsl) can be measured by assessing the communication time
for the Redis queues.
For example, the inbound queue can be assessed by comparing the ``time_created`` and ``time_input_received``.
The communication times for Parsl can be measured only when the queue length is negligible
through the differences between ``time_inputs_received`` and ``time_compute_started``.
The communication times related to serialization are also stored (e.g., ``time_serialize_result``).
back to the client at ``timestamp.result_received``.

The overall efficiency of the task system can be approximated by comparing the ``time.running``, which
denotes the actual time spent executing the task on the workers, to the difference between the ``timestamp.created``
and ``timestamp.result_returned`` (i.e., the round-trip time).

The overhead specific to Colmena (i.e., and not Parsl) can be measured by assessing the communication time for each step.
For example, the inbound queue can be assessed by comparing the ``timestamp.created`` and ``timestamp.input_received``.
The communication times for Parsl can be measured through the differences between
``timestamp.inputs_received`` and ``timestamp.compute_started``,
provided the task does not wait for a worker to become available.
The communication times related to serialization are also stored (e.g., ``time.serialize_result``).
2 changes: 2 additions & 0 deletions docs/examples.rst
Original file line number Diff line number Diff line change
Expand Up @@ -41,6 +41,8 @@ Papers where Colmena was used
[`ChemRxiv <https://doi.org/10.26434/chemrxiv-2022-8w9ft>`_]
- Zvyagin et al. "GenSLMs: Genome-scale language models reveal SARS-CoV-2 evolutionary dynamics."
[`bioRxiv <https://doi.org/10.1101/2022.10.10.511571>`_]
- Dharuman et al. "Protein Generation via Genome-scale Language Models with Bio-physical Scoring"
[`SC-W '23 <https://dl.acm.org/doi/abs/10.1145/3624062.3626087>`_]

Open-Source Software
--------------------
Expand Down
4 changes: 2 additions & 2 deletions docs/how-to.rst
Original file line number Diff line number Diff line change
Expand Up @@ -40,7 +40,7 @@ Methods that used Compiled Applications
Many Colmena applications launch tasks that use software written in languages besides Python.
Colmena provides the :class:`~colmena.models.ExecutableTask` class to help integrate these tasks into a Colmena application.

The definition of an `ExectuableTask` is split into three parts:
The definition of an ``ExectuableTask`` is split into three parts:

1. ``__init__``: create the shell command needed to launch your code and pass it to the initializer of the base class.
2. ``preprocess``: use method arguments to create the input files, command line arguments, or stdin needed to execute
Expand Down Expand Up @@ -71,7 +71,7 @@ then stores the result in stdout.
Some Task Server implements execute the pre- and post-processing step on separate resources
from the executable task to make more efficient use of the compute resources.

See the `MPI with RADICAL Cybertools (RCT) <#>`_ example for a demonstration.
See the `MPI example <https://github.com/exalearn/colmena/tree/master/demo_apps/mpi-with-rct>`_.

MPI Applications
................
Expand Down
2 changes: 1 addition & 1 deletion docs/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -26,7 +26,7 @@ Colmena provides a few main components to enable building thinking applications:

#. An extensible base class for building thinking applications with a dataflow-like programming model
#. A "Task Server" that provides a simplified interface to HPC-ready workflow systems
#. A high-performance queuing system communicating between tasks server(s) from thinking applications
#. A high-performance queuing system communicating to tasks servers from thinking applications

The `demo applications <https://github.com/exalearn/colmena/tree/master/demo_apps/optimizer-examples>`_
illustrate how to implement different thinking applications that solve optimization problems.
Expand Down
4 changes: 4 additions & 0 deletions docs/installation.rst
Original file line number Diff line number Diff line change
Expand Up @@ -15,3 +15,7 @@ Installation on Windows

We recommend installing the Windows Subsystem for Linux in order to use Colmena from a Windows system.

Development Environment
-----------------------

The Anaconda environment in the ``dev`` folder provides an environment capable of running the examples.
8 changes: 4 additions & 4 deletions docs/quickstart.rst
Original file line number Diff line number Diff line change
Expand Up @@ -26,7 +26,7 @@ Translating our target function, *f(x)*, and search algorithm into Python yields
return best_to_date + random() - 0.5
1. Define Communication
------------------------------
-----------------------

.. image:: _static/overview.svg
:height: 100
Expand Down Expand Up @@ -85,7 +85,7 @@ it to only use up to 4 processes on a single machine:
config = Config(executors=[HighThroughputExecutor(max_workers=4)])
The list of methods and resources are used to define the "task server":
Build a task server by providing a list of methods and resources:

.. code-block:: python
Expand All @@ -98,7 +98,7 @@ Colmena provides a "BaseThinker" class to create steering applications.
These applications run multiple operations (called agents) that send tasks and receive results
from the task server.

Our thinker has two agents that each are class methods marked with the ``@agent`` decorator:
Our example thinker has two agents that each are class methods marked with the ``@agent`` decorator:

.. code-block:: python
Expand Down Expand Up @@ -171,7 +171,7 @@ Accordingly, we call their ``.start()`` methods to launch them.

Launch the Colmena application by running it with Python: ``python multi-agent-thinker.py``

The application will produce a prolific about of log messages, including:
The application will produce log messages from many components, including:

1. Log items from the thinker that mark the agent which wrote them:

Expand Down

0 comments on commit ad38514

Please sign in to comment.