EASI Cookbook

A collection of notebooks and recipes that demonstrate larger data processing workflows in EASI.

Use the code in this repository as examples and guides to be adapted to your workflows.

Overview

We regularly see that users can struggle to efficiently go from a development notebook (that works on a small area) to scaling up their workflow to work on a larger set of data or operationally. The main challenges we see are in:

Efficient use of dask parameters tuned to the workflow
Resilient and cost-effective workflows

Common patterns for your package and code

Common patterns that occur for each data processing workflow:

Get work
- Space, time, product and processing parameters
- Select batching and tiling options
- Output is a list of work to do (number of batches)
Do work
- Launch processes with each to do a batch
- Select optional dask configuration

Common patterns for large workflows

There are three main patterns that can be explored. The best solution will likely depend on your workflow and requirements.

Jupyter Lab

Launch one dask cluster per process.

Grid workflow

Job tiling with ODC code

Argo

Group work into batches each of which is run by a single Argo worker.

Can control the number of simultaneous Argo workers
If an Argo worker dies then the batch will be restarted. In this case ensure your code can skip work that was previously done.
Each Argo worker can itself launch a dask cluster and a grid workflow, or any complex processing task.

Contributing

Contributions are welcome.

A pre-commit hook is provided in /bin. For each notebook committed this will:

Attempy to strip any AWS secrets.
Render an HTML copy of the notebook (with outputs) into html/.
Strip outputs from the notebook to reduce the size of the repository.

The apply_hooks.sh script creates a symlink to bin/pre-commit.

# Run this in your local repository
sh bin/apply_hooks.sh

For contributors:

Apply the pre-commit hook.
Run each notebook (that has been updated) to populate the figures, tables and other outputs as you want them.
Add a link into html/readme.md for each new notebook.
Add an item to whats_new.md.
git add and git commit.
If everything looks ok, git push to your fork of this repository and create a pull request.

Name		Name	Last commit message	Last commit date
Latest commit History 22 Commits
.devcontainer		.devcontainer
bin		bin
common		common
docker		docker
docs/gridded		docs/gridded
html		html
resources		resources
s3-aio		s3-aio
tasks		tasks
workflows/gridded		workflows/gridded
.gitignore		.gitignore
README.md		README.md
docker-compose.cleandb.yaml		docker-compose.cleandb.yaml
docker-compose.db.yaml		docker-compose.db.yaml
docker-compose.yaml		docker-compose.yaml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

EASI Cookbook

Overview

Common patterns for your package and code

Common patterns for large workflows

Jupyter Lab

Grid workflow

Argo

Contributing

About

Releases

Packages

Contributors 3

Languages

csiro-easi/easi-cookbook

Folders and files

Latest commit

History

Repository files navigation

EASI Cookbook

Overview

Common patterns for your package and code

Common patterns for large workflows

Jupyter Lab

Grid workflow

Argo

Contributing

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 3

Languages

Packages