Skip to content

Commit

Permalink
Redeploy ClimSim with only 50 workers, due to caching errors (#38)
Browse files Browse the repository at this point in the history
* note climsim rate limit issue

* rewrite label fetching to support push events

* use yaml pipe for python script
  • Loading branch information
cisaacstern committed Aug 2, 2023
1 parent 6b98f26 commit 25ff6e5
Show file tree
Hide file tree
Showing 2 changed files with 29 additions and 9 deletions.
35 changes: 26 additions & 9 deletions .github/workflows/deploy.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -29,15 +29,32 @@ jobs:
echo prune=false >> $GITHUB_ENV
fi
- name: "Set max_num_workers based on PR label if present"
run: >
echo max_num_workers=`
echo '${{ toJSON(github.event.pull_request.labels.*.name) }}' |
python -c "import json, sys;
labels = json.loads(sys.stdin.read());
max_num_workers = [l.split(':')[-1] for l in labels if l.startswith('max_num_workers:')];
print((int(max_num_workers[0]) if max_num_workers else 1000));
"
` >> $GITHUB_ENV
# This is a little complicated, but the only way I know to retrieve labels on both
# `pull_request` *and* `push` events (and we want the ability to do so in both cases).
# Adapted from the following (note question in comment there re: external prs):
# https://github.com/pangeo-forge/deploy-recipe-action/blob/256da2916b5f17f358c5e5b0442458645cadb9f0/action/deploy_recipe.py#L34-L68
shell: python3 {0}
run: |
import json
import os
import urllib.request
repository = os.environ["GITHUB_REPOSITORY"]
api_url = os.environ["GITHUB_API_URL"]
head_ref = os.environ["GITHUB_HEAD_REF"]
sha = os.environ["GITHUB_SHA"]
commit_sha = head_ref if head_ref else sha
pulls_url = "/".join([api_url, "repos", repository, "commits", commit_sha, "pulls"])
pulls_txt = urllib.request.urlopen(pulls_url).read()
pulls_json = json.loads(pulls_txt)
labels = [label["name"] for label in pulls_json[0]["labels"]]
max_num_workers = [l.split(":")[-1] for l in labels if l.startswith("max_num_workers:")]
max_num_workers = (max_num_workers[0] if max_num_workers else "1000")
with open(os.environ["GITHUB_ENV"], mode="a") as f:
f.write(f"max_num_workers={max_num_workers}")
- name: "Deploy recipes"
uses: "pangeo-forge/deploy-recipe-action@v0.1"
with:
Expand Down
3 changes: 3 additions & 0 deletions feedstock/climsim.py
Original file line number Diff line number Diff line change
Expand Up @@ -146,6 +146,9 @@ class OpenAndPreprocess(beam.PTransform):
def expand(self, pcoll: beam.PCollection) -> beam.PCollection:
return (
pcoll
# FIXME: rate limiting on caching step is probably required to get this to run
# end-to-end, without globally capping workers at a low value for all stages,
# see discussion in: https://github.com/leap-stc/data-management/issues/36.
| OpenURLWithFSSpec()
| OpenWithXarray(
# FIXME: Get files to open without `copy_to_local=True`
Expand Down

0 comments on commit 25ff6e5

Please sign in to comment.