Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

186 ecr cwl dag #192

Merged
merged 12 commits into from
Sep 10, 2024
Merged

186 ecr cwl dag #192

merged 12 commits into from
Sep 10, 2024

Conversation

nikki-t
Copy link
Collaborator

@nikki-t nikki-t commented Aug 27, 2024

Purpose

  • Demonstrate the use of ECR within an Airflow DAG so that the end user can use their own private ECR repository in the execution of a CWL DAG.

Proposed Changes

  • [ADD] Container image and Python script that facilitates the use of AWS ECR in CWL DAGs.

Issues

Testing

  • Deployed container image to GHCR and copied DAG definition to development Airflow installation.
  • Ran DAG: cwl_dag_ecr to confirm the DAG will pull the AWS ECR URI and complete successfully.

Logs (truncated to only include success details)

[2024-08-27, 13:38:48 UTC] {pod_manager.py:468} INFO - [base] + aws ecr get-login-password --region xxx
[2024-08-27, 13:38:49 UTC] {pod_manager.py:468} INFO - [base] + docker login --username AWS --password-stdin xxx.dkr.ecr.xxx.amazonaws.com
[2024-08-27, 13:38:49 UTC] {pod_manager.py:468} INFO - [base] WARNING! Your password will be stored unencrypted in /root/.docker/config.json.
[2024-08-27, 13:38:49 UTC] {pod_manager.py:468} INFO - [base] Configure a credential helper to remove this warning. See
[2024-08-27, 13:38:49 UTC] {pod_manager.py:468} INFO - [base] https://docs.docker.com/engine/reference/commandline/login/#credentials-store
[2024-08-27, 13:38:49 UTC] {pod_manager.py:468} INFO - [base] Login Succeeded
[2024-08-27, 13:38:49 UTC] {pod_manager.py:468} INFO - [base] Logged into: xxx.dkr.ecr.xxx.amazonaws.com
...
[2024-08-27, 13:38:51 UTC] {pod_manager.py:468} INFO - [base] INFO /usr/share/cwl/venv/bin/cwl-runner 3.1.20240708091337
[2024-08-27, 13:38:51 UTC] {pod_manager.py:468} INFO - [base] Error: No such object: xxx.dkr.ecr.xxx.amazonaws.com/unity-nikki-1-dev-sps-busybox
[2024-08-27, 13:38:51 UTC] {pod_manager.py:468} INFO - [base] INFO ['docker', 'pull', 'xxx.dkr.ecr.xxx.amazonaws.com/unity-nikki-1-dev-sps-busybox']
[2024-08-27, 13:38:51 UTC] {pod_manager.py:468} INFO - [base] Using default tag: latest
[2024-08-27, 13:38:51 UTC] {pod_manager.py:468} INFO - [base] latest: Pulling from unity-nikki-1-dev-sps-busybox
[2024-08-27, 13:38:51 UTC] {pod_manager.py:468} INFO - [base] xxx: Pulling fs layer
[2024-08-27, 13:38:51 UTC] {pod_manager.py:468} INFO - [base] xxx: Verifying Checksum
[2024-08-27, 13:38:51 UTC] {pod_manager.py:468} INFO - [base] xxx: Download complete
[2024-08-27, 13:38:51 UTC] {pod_manager.py:468} INFO - [base] xxx: Pull complete
[2024-08-27, 13:38:51 UTC] {pod_manager.py:468} INFO - [base] Digest: sha256:xxx
[2024-08-27, 13:38:51 UTC] {pod_manager.py:468} INFO - [base] Status: Downloaded newer image for xxx.dkr.ecr.xxx.amazonaws.com/unity-nikki-1-dev-sps-busybox:latest
[2024-08-27, 13:38:51 UTC] {pod_manager.py:468} INFO - [base] xxx.dkr.ecr.xxx.amazonaws.com/unity-nikki-1-dev-sps-busybox:latest
[2024-08-27, 13:38:51 UTC] {pod_manager.py:468} INFO - [base] INFO [job echo_message_ecr.cwl] /scratch/z6o_f23f$ docker \
[2024-08-27, 13:38:51 UTC] {pod_manager.py:468} INFO - [base]     run \
[2024-08-27, 13:38:51 UTC] {pod_manager.py:468} INFO - [base]     -i \
[2024-08-27, 13:38:51 UTC] {pod_manager.py:468} INFO - [base]     --mount=type=bind,source=/scratch/z6o_f23f,target=/wyvDSa \
[2024-08-27, 13:38:51 UTC] {pod_manager.py:468} INFO - [base]     --mount=type=bind,source=/tmp/dpd5rnoj,target=/tmp \
[2024-08-27, 13:38:51 UTC] {pod_manager.py:468} INFO - [base]     --workdir=/wyvDSa \
[2024-08-27, 13:38:51 UTC] {pod_manager.py:468} INFO - [base]     --net=none \
[2024-08-27, 13:38:51 UTC] {pod_manager.py:468} INFO - [base]     --log-driver=none \
[2024-08-27, 13:38:51 UTC] {pod_manager.py:468} INFO - [base]     --user=0:0 \
[2024-08-27, 13:38:51 UTC] {pod_manager.py:468} INFO - [base]     --rm \
[2024-08-27, 13:38:51 UTC] {pod_manager.py:468} INFO - [base]     --cidfile=/tmp/aulc0ggb/20240827133851-877513.cid \
[2024-08-27, 13:38:51 UTC] {pod_manager.py:468} INFO - [base]     --env=TMPDIR=/tmp \
[2024-08-27, 13:38:51 UTC] {pod_manager.py:468} INFO - [base]     --env=HOME=/wyvDSa \
[2024-08-27, 13:38:51 UTC] {pod_manager.py:468} INFO - [base]     xxx.dkr.ecr.xxx.amazonaws.com/unity-nikki-1-dev-sps-busybox \
[2024-08-27, 13:38:51 UTC] {pod_manager.py:468} INFO - [base]     echo \
[2024-08-27, 13:38:52 UTC] {pod_manager.py:468} INFO - [base]     https://github.com/unity-sds/unity-sps-workflows/blob/186-ecr-cwl-dag/demos/echo_message_ecr.yaml > /scratch/z6o_f23f/echo_message.txt
[2024-08-27, 13:38:52 UTC] {pod_manager.py:468} INFO - [base] INFO [job echo_message_ecr.cwl] completed success
[2024-08-27, 13:38:52 UTC] {pod_manager.py:468} INFO - [base] {
[2024-08-27, 13:38:52 UTC] {pod_manager.py:468} INFO - [base]     "the_output": {
[2024-08-27, 13:38:52 UTC] {pod_manager.py:468} INFO - [base]         "location": "file:///scratch/echo_message.txt",
[2024-08-27, 13:38:52 UTC] {pod_manager.py:468} INFO - [base]         "basename": "echo_message.txt",
[2024-08-27, 13:38:52 UTC] {pod_manager.py:468} INFO - [base]         "class": "File",
[2024-08-27, 13:38:52 UTC] {pod_manager.py:468} INFO - [base]         "checksum": "sha1$78daed7b0a376de5fe9f87ef75249dc53b13f9fa",
[2024-08-27, 13:38:52 UTC] {pod_manager.py:468} INFO - [base]         "size": 98,
[2024-08-27, 13:38:52 UTC] {pod_manager.py:468} INFO - [base]         "path": "/scratch/echo_message.txt"
[2024-08-27, 13:38:52 UTC] {pod_manager.py:468} INFO - [base]     }
[2024-08-27, 13:38:53 UTC] {pod_manager.py:468} INFO - [base] INFO Final process status is success

@nikki-t
Copy link
Collaborator Author

nikki-t commented Sep 5, 2024

Updated to centralize CWL DAG operations to a single container (ghcr.io/unity-sds/unity-sps/sps-docker-cwl) and Python script (cwl_dag.pg).

Confirmed usage with ECR. DAG logs:

[2024-09-05, 20:31:44 UTC] {pod_manager.py:468} INFO - [base] Login Succeeded
...
[2024-09-05, 20:31:44 UTC] {pod_manager.py:468} INFO - [base] Logged into: xxx.dkr.ecr.us-west-2.amazonaws.com
...
[2024-09-05, 20:31:44 UTC] {pod_manager.py:468} INFO - [base] INFO ['docker', 'pull', 'xxx.dkr.ecr.us-west-2.amazonaws.com/unity-nikki-1-dev-sps-busybox']
[2024-09-05, 20:31:44 UTC] {pod_manager.py:468} INFO - [base] Using default tag: latest
[2024-09-05, 20:31:44 UTC] {pod_manager.py:468} INFO - [base] latest: Pulling from unity-nikki-1-dev-sps-busybox
...
[2024-09-05, 20:31:44 UTC] {pod_manager.py:468} INFO - [base] Status: Downloaded newer image for xxx.dkr.ecr.us-west-2.amazonaws.com/unity-nikki-1-dev-sps-busybox:latest
...
[2024-09-05, 20:31:44 UTC] {pod_manager.py:468} INFO - [base]     xxx.dkr.ecr.us-west-2.amazonaws.com/unity-nikki-1-dev-sps-busybox \
[2024-09-05, 20:31:44 UTC] {pod_manager.py:468} INFO - [base]     echo \
[2024-09-05, 20:31:45 UTC] {pod_manager.py:468} INFO - [base]     'Hello Unity' > /scratch/cl_41qdk/echo_message.txt
[2024-09-05, 20:31:45 UTC] {pod_manager.py:468} INFO - [base] INFO [job echo_message.cwl] completed success
...
[2024-09-05, 20:31:45 UTC] {pod_manager.py:468} INFO - [base] INFO Final process status is success

Confirmed usage without ECR. DAG logs:

[2024-09-05, 20:30:54 UTC] {pod_manager.py:468} INFO - [base] INFO ['docker', 'pull', 'busybox']
[2024-09-05, 20:30:55 UTC] {pod_manager.py:468} INFO - [base] Using default tag: latest
...
[2024-09-05, 20:30:55 UTC] {pod_manager.py:468} INFO - [base] Status: Downloaded newer image for busybox:latest
[2024-09-05, 20:30:55 UTC] {pod_manager.py:468} INFO - [base] docker.io/library/busybox:latest
...
[2024-09-05, 20:30:55 UTC] {pod_manager.py:468} INFO - [base]     busybox \
[2024-09-05, 20:30:55 UTC] {pod_manager.py:468} INFO - [base]     echo \
[2024-09-05, 20:30:56 UTC] {pod_manager.py:468} INFO - [base]     'Hello Unity' > /scratch/1llk9m62/echo_message.txt
[2024-09-05, 20:30:56 UTC] {pod_manager.py:468} INFO - [base] INFO [job echo_message.cwl] completed success
...
[2024-09-05, 20:30:56 UTC] {pod_manager.py:468} INFO - [base] INFO Final process status is success

Modifications included:

  • docker_cwl_entrypoint.sh to use command line argument flags and only log into ECR if -e contains an ECR login URI.
  • cwl_dag.py to include a use_ecr Boolean Param and if set to True the Docker image will be overridden in the CWL.

@nikki-t
Copy link
Collaborator Author

nikki-t commented Sep 5, 2024

Pending questions include:

  1. Is it okay to use command line flags for arguments? Does this break any existing architecture?
  2. The reference to the Docker image needs to be updated with a new release. I am not sure how we want to handle this?

@nikki-t nikki-t marked this pull request as ready for review September 5, 2024 20:50
@LucaCinquini
Copy link
Collaborator

LucaCinquini commented Sep 9, 2024

Hi @nikki-t : I think this PR is almost ready. To answer your questions:

  1. Yes
  2. We will update the basic Docker image

But can we make this further change: can we assume that (at least for now) the URL of the Docker image to run is embedded in the CWL (yes, exposing the AWS account number, but that seems to be ok). And let's assume we only store the ECR base URI as a Docker variable. Consequently can you please:

o Comment out the part in the cwl_dag.py where we override the dockerPull section (let's keep it commented out for future reference)
o Change the Terraform deployment to store the basic ECR URI. I think this involves modules/terraform-unity-sps/airflow/main.tf and values.tmp.yaml

Thanks!

@nikki-t
Copy link
Collaborator Author

nikki-t commented Sep 10, 2024

Updated code to use an Airflow variable to set ECR login URL. If use_ecr parameter is set to True the cwl_dag.py script will pass the ECR login URL to the docker_cwl_entrypoint.sh script to log into ECR.

Deployed and tested,

A) Setting use_ecr to True:

[2024-09-10, 13:56:57 UTC] {cwl_dag.py:151} INFO - Use ECR: True
[2024-09-10, 13:56:57 UTC] {cwl_dag.py:160} INFO - ECR login: xxx.dkr.ecr.us-west-2.amazonaws.com
[2024-09-10, 13:56:57 UTC] {cwl_dag.py:162} INFO - CWL DAG arguments: {'message': 'Hello Unity'}
...
[2024-09-10, 13:59:54 UTC] {pod_manager.py:468} INFO - [base] + docker login --username AWS --password-stdin xxx.dkr.ecr.us-west-2.amazonaws.com
[2024-09-10, 13:59:54 UTC] {pod_manager.py:468} INFO - [base] + aws ecr get-login-password --region us-west-2
...
[2024-09-10, 13:59:54 UTC] {pod_manager.py:468} INFO - [base] Login Succeeded
[2024-09-10, 13:59:54 UTC] {pod_manager.py:468} INFO - [base] Logged into: xxx.dkr.ecr.us-west-2.amazonaws.com
[2024-09-10, 13:59:54 UTC] {pod_manager.py:468} INFO - [base] + echo 'Logged into: xxx.dkr.ecr.us-west-2.amazonaws.com'
...
[2024-09-10, 13:59:55 UTC] {pod_manager.py:468} INFO - [base] Error: No such object: busybox
[2024-09-10, 13:59:55 UTC] {pod_manager.py:468} INFO - [base] INFO ['docker', 'pull', 'busybox']
[2024-09-10, 13:59:56 UTC] {pod_manager.py:468} INFO - [base] Using default tag: latest
[2024-09-10, 13:59:56 UTC] {pod_manager.py:468} INFO - [base] latest: Pulling from library/busybox
[2024-09-10, 13:59:56 UTC] {pod_manager.py:468} INFO - [base] 3d1a87f2317d: Pulling fs layer
[2024-09-10, 13:59:56 UTC] {pod_manager.py:468} INFO - [base] 3d1a87f2317d: Verifying Checksum
[2024-09-10, 13:59:57 UTC] {pod_manager.py:468} INFO - [base] 3d1a87f2317d: Download complete
[2024-09-10, 13:59:57 UTC] {pod_manager.py:468} INFO - [base] 3d1a87f2317d: Pull complete
[2024-09-10, 13:59:57 UTC] {pod_manager.py:468} INFO - [base] Digest: sha256:34b191d63fbc93e25e275bfccf1b5365664e5ac28f06d974e8d50090fbb49f41
[2024-09-10, 13:59:57 UTC] {pod_manager.py:468} INFO - [base] Status: Downloaded newer image for busybox:latest
[2024-09-10, 13:59:57 UTC] {pod_manager.py:468} INFO - [base] docker.io/library/busybox:latest
...
[2024-09-10, 13:59:57 UTC] {pod_manager.py:468} INFO - [base]     busybox \
[2024-09-10, 13:59:57 UTC] {pod_manager.py:468} INFO - [base]     echo \
[2024-09-10, 13:59:58 UTC] {pod_manager.py:468} INFO - [base]     'Hello Unity' > /scratch/ji_rekxi/echo_message.txt
[2024-09-10, 13:59:58 UTC] {pod_manager.py:468} INFO - [base] INFO [job echo_message.cwl] completed success
...
[2024-09-10, 13:59:58 UTC] {pod_manager.py:468} INFO - [base] INFO Final process status is success

B) Setting use_ecr to False (default behavior):

[2024-09-10, 13:56:52 UTC] {cwl_dag.py:151} INFO - Use ECR: False
[2024-09-10, 13:56:52 UTC] {cwl_dag.py:162} INFO - CWL DAG arguments: {'message': 'Hello Unity'}
...
[2024-09-10, 13:59:45 UTC] {pod_manager.py:468} INFO - [base] INFO /usr/share/cwl/venv/bin/cwl-runner 3.1.20240708091337
[2024-09-10, 13:59:45 UTC] {pod_manager.py:468} INFO - [base] Error: No such object: busybox
[2024-09-10, 13:59:45 UTC] {pod_manager.py:468} INFO - [base] INFO ['docker', 'pull', 'busybox']
[2024-09-10, 13:59:46 UTC] {pod_manager.py:468} INFO - [base] Using default tag: latest
[2024-09-10, 13:59:46 UTC] {pod_manager.py:468} INFO - [base] latest: Pulling from library/busybox
[2024-09-10, 13:59:47 UTC] {pod_manager.py:468} INFO - [base] 3d1a87f2317d: Pulling fs layer
[2024-09-10, 13:59:47 UTC] {pod_manager.py:468} INFO - [base] 3d1a87f2317d: Download complete
[2024-09-10, 13:59:47 UTC] {pod_manager.py:468} INFO - [base] 3d1a87f2317d: Pull complete
[2024-09-10, 13:59:47 UTC] {pod_manager.py:468} INFO - [base] Digest: sha256:34b191d63fbc93e25e275bfccf1b5365664e5ac28f06d974e8d50090fbb49f41
[2024-09-10, 13:59:47 UTC] {pod_manager.py:468} INFO - [base] Status: Downloaded newer image for busybox:latest
[2024-09-10, 13:59:47 UTC] {pod_manager.py:468} INFO - [base] docker.io/library/busybox:latest
...
[2024-09-10, 13:59:47 UTC] {pod_manager.py:468} INFO - [base]     busybox \
[2024-09-10, 13:59:47 UTC] {pod_manager.py:468} INFO - [base]     echo \
[2024-09-10, 13:59:48 UTC] {pod_manager.py:468} INFO - [base]     'Hello Unity' > /scratch/ak8glaqi/echo_message.txt
[2024-09-10, 13:59:48 UTC] {pod_manager.py:468} INFO - [base] INFO [job echo_message.cwl] completed success
...
[2024-09-10, 13:59:48 UTC] {pod_manager.py:468} INFO - [base] INFO Final process status is success

Copy link
Collaborator

@LucaCinquini LucaCinquini left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tested the change on a new deployment and logging into ECR worked.

@LucaCinquini LucaCinquini merged commit 17cee6b into develop Sep 10, 2024
2 checks passed
@LucaCinquini LucaCinquini deleted the 186-ecr-cwl-dag branch September 10, 2024 16:44
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants