Skip to content

ohsu-comp-bio/awesome-tes

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

25 Commits
 
 
 
 

Repository files navigation

Awesome GA4GH TES


Project license Pull Requests welcome Coded with love by ga4gh


Overview

The Global Alliance for Genomics and Health (GA4GH) is an international coalition formed to enable the responsible, voluntary, and secure sharing of genomic and health-related data. This awesome list collects resources, projects, tools, and standards from the GA4GH ecosystem that support the mission of enabling responsible data sharing.

Paper

The following resources are adapted from The GA4GH Task Execution API: Enabling Easy Multi Cloud Task Execution

@article{kanitz2024ga4gh,
  title={The GA4GH Task Execution API: Enabling Easy Multi Cloud Task Execution},
  author={Kanitz, Alexander and McLoughlin, Matthew H and Beckman, Liam and Malladi, Venkat S and Ellrott, Kyle},
  journal={Computing in Science \& Engineering},
  year={2024},
  publisher={IEEE}
}

TES Ecosystem

A listing of available servers, proxy and client implementations that utilize the TES API.

Type Project Description Source
API TES OpenAPI definition of the specification GitHub
API TES Conformance test runner GitHub
Server Funnel TES server implementation for HPC/HTS systems including AWS Batch, Google Cloud, Kubernetes, Slurm, GridEngine, and HTCondor GitHub
Server Pulsar TES server implementation for the Galaxy/Pulsar federated distributed network Docs
Server TES-Azure TES server implementation for Microsoft Azure GitHub
Server TESK TES server implementation for Kubernetes/Native Cloud systems GitHub
Proxy proTES Proxy service for injecting middleware into GA4GH TES requests GitHub
Client Cromwell Workflow management system for executing composed workflows in the Workflow Definition Language (WDL) DSL Docs
Client cwl-tes Workflow management system for executing workflows in the Common Workflow Language (CWL) DSL GitHub
Client ELIXIR Cloud Components Web Component library for interacting with TES services (and other GA4GH APIs) Site
Client Nextflow Workflow management system for executing workflows composed in the Nextflow DSL Site
Client py-tes Python client library for interacting with TES services GitHub
Client Snakemake Workflow management system for executing workflows composed in the Snakemake DSL Site
Client Toil Workflow management system for executing workflows composed in the Toil and CWL DSLs Docs

Design

Use Cases

Common TES use cases. The TES API wraps around compute environments providing a standard way of executing tasks. Researchers can write and package their tasks and data in a domain-specific language (DSL) workflow language. They then hand the orchestration of the tasks over to the respective workflow management systems. The workflow management systems can then make use of TES clients to distribute tasks across different environments.

Alternatively, users can submit individual tasks to TES servers directly via command-line (CLI) or graphical user (GUI) interfaces. Thus, TES makes it easier for researchers to make use of a variety of compute environments seamlessly. Applications can support new compute environments by integrating with TES API, rather than develop unique connections for each environment.

Architecture

TES Execution Architecture. An outline of the separate layers found in current TES service implementations. The client talks to a server, which is responsible for allocating a worker node on an HPC/HTC or cloud infrastructure. The TES worker is responsible for transferring inputs, running user code, capturing logging and storing outputs.

Example TES Packet

An example TES task demonstrating the use of inputs, outputs, and logging.

Input/Output Defintions

  "inputs": [
    {
      "name": "input-genome-data",
      "url": "gs://genomics-bucket/input-data/genome-data.bam",
      "path": "/data/genome-data.bam",
      "type": "FILE"
    },
    {
      "name": "reference-genome",
      "url": "gs://genomics-bucket/reference/human-reference.fa",
      "path": "/data/human-reference.fa",
      "type": "FILE"
    }
  ],
  "outputs": [
    {
      "name": "output-processed-data",
      "url": "gs://genomics-bucket/processed-data/processed-output.bam",
      "path": "/output/processed-output.bam",
      "type": "FILE"
    },
    {
      "name": "log-file",
      "url": "gs://genomics-bucket/logs/task-log.txt",
      "path": "/output/task-log.txt",
      "type": "FILE"
    }
  ],

Resource Defintions

  "resources": {
    "cpu_cores": 8,
    "ram_gb": 32,
    "disk_gb": 100,
    "preemptible": true,
    "zones": ["us-west1-a", "us-west1-b"],
    "backend_parameters": {
      "VmSize": "Standard_D64_v3"
    }
  },
  "volumes": ["/mnt/workdir"],

Executor Command Lines

  "executors": [
    {
      "image": "bioinformatics/pipeline",
      "command": [
        "bash",
        "-c",
        "/tools/process-genome.sh",
        "/data/genome-data.bam",
        "/data/human-reference.fa",
        "/output/processed-output.bam"
      ],
      "stdout": "/output/task-log.txt",
      "stderr": "/output/task-error-log.txt",
      "workdir": "/mnt/workdir",
      "env": {
        "GENOME_ENV": "production",
        "MAX_THREADS": "8"
      }
    }
  ],

User Provided Tags

  "tags": {
    "department": "bioinformatics",
    "project": "genome-analysis"
  },

Logs Added by Server

"logs": [
    {
      "start_time": "2023-12-25T00:00:00+00:00",
      "end_time": "2023-12-25T12:12:12+00:00",
      "logs": [
        {
          "start_time": "2023-12-25T00:00:01+00:00",
          "end_time": "2023-12-25T12:12:12+00:00",
          "exit_code": 0
        }
      ]
    }
  ]
Full Packet Example
{
  "inputs": [
    {
      "name": "input-genome-data",
      "url": "gs://genomics-bucket/input-data/genome-data.bam",
      "path": "/data/genome-data.bam",
      "type": "FILE"
    },
    {
      "name": "reference-genome",
      "url": "gs://genomics-bucket/reference/human-reference.fa",
      "path": "/data/human-reference.fa",
      "type": "FILE"
    }
  ],
  "outputs": [
    {
      "name": "output-processed-data",
      "url": "gs://genomics-bucket/processed-data/processed-output.bam",
      "path": "/output/processed-output.bam",
      "type": "FILE"
    },
    {
      "name": "log-file",
      "url": "gs://genomics-bucket/logs/task-log.txt",
      "path": "/output/task-log.txt",
      "type": "FILE"
    }
  ],
  "resources": {
    "cpu_cores": 8,
    "ram_gb": 32,
    "disk_gb": 100,
    "preemptible": true,
    "zones": ["us-west1-a", "us-west1-b"],
    "backend_parameters": {
      "VmSize": "Standard_D64_v3"
    }
  },
  "volumes": ["/mnt/workdir"],
  "executors": [
    {
      "image": "bioinformatics/pipeline",
      "command": [
        "bash",
        "-c",
        "/tools/process-genome.sh",
        "/data/genome-data.bam",
        "/data/human-reference.fa",
        "/output/processed-output.bam"
      ],
      "stdout": "/output/task-log.txt",
      "stderr": "/output/task-error-log.txt",
      "workdir": "/mnt/workdir",
      "env": {
        "GENOME_ENV": "production",
        "MAX_THREADS": "8"
      }
    }
  ],
  "tags": {
    "department": "bioinformatics",
    "project": "genome-analysis"
  },
  "logs": [
    {
      "start_time": "2023-12-25T00:00:00+00:00",
      "end_time": "2023-12-25T12:12:12+00:00",
      "logs": [
        {
          "start_time": "2023-12-25T00:00:01+00:00",
          "end_time": "2023-12-25T12:12:12+00:00",
          "exit_code": 0
        }
      ]
    }
  ]
}

Additional Resources

  • GA4GH: Global Alliance for Genomics and Health
  • DRS: Data Repository Service
  • TRS: Tool Registry Service
  • TES: Task Execution Service
  • WES: Workflow Execution Service
  • Implementations: Implementations of GA4GH Producs

Contributing

If you're working with TES or would like to add any additional programs here, please reach out or create a new PR or issue. We'd love to hear about it!


Thanks to @dec0dOS for the amazing-github-template

https://github.com/dec0dOS/amazing-github-template