Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[New hubs] Dandi #3879

Merged
merged 10 commits into from
Apr 2, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
18 changes: 17 additions & 1 deletion config/clusters/dandi/cluster.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -11,4 +11,20 @@ support:
helm_chart_values_files:
- support.values.yaml
- enc-support.secret.values.yaml
hubs: []
hubs:
- name: staging
display_name: "MIT DANDI (staging)"
domain: staging.dandi.2i2c.cloud
helm_chart: basehub
helm_chart_values_files:
- common.values.yaml
- staging.values.yaml
- enc-staging.secret.values.yaml
- name: prod
display_name: "MIT DANDI"
domain: dandi.2i2c.cloud
helm_chart: basehub
helm_chart_values_files:
- common.values.yaml
- prod.values.yaml
- enc-prod.secret.values.yaml
181 changes: 181 additions & 0 deletions config/clusters/dandi/common.values.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,181 @@
nfs:
enabled: true
pv:
enabled: true
# from https://docs.aws.amazon.com/efs/latest/ug/mounting-fs-nfs-mount-settings.html
mountOptions:
- rsize=1048576
- wsize=1048576
- timeo=600
- soft # We pick soft over hard, so NFS lockups don't lead to hung processes
- retrans=2
- noresvport
serverIP: fs-0bf8c8fce5ca8695f.efs.us-east-2.amazonaws.com
baseShareName: /
jupyterhub:
hub:
config:
JupyterHub:
authenticator_class: github
Authenticator:
enable_auth_state: true
admin_users:
- kabilar
- aaronkanzer
- asmacdo
- satra
- yarikoptic
- waxlamp
custom:
2i2c:
add_staff_user_ids_to_admin_users: true
add_staff_user_ids_of_type: "github"
jupyterhubConfigurator:
enabled: false
homepage:
templateVars:
org:
logo_url: "https://raw.githubusercontent.com/dandi/artwork/3f287d3ae53154a66f8b50711549740719a23fdb/pics/dandi-logo.svg"
url: "https://dandiarchive.org/"
designed_by:
name: 2i2c
url: https://2i2c.org
operated_by:
name: 2i2c
url: https://2i2c.org
funded_by:
name: "DANDI (MIT Brain)"
url: "https://dandiarchive.org/"
scheduling:
userScheduler:
enabled: true
singleuser:
profileList:
- display_name: "DANDI (CPU)"
description: "Default DANDI image with JupyterLab"
default: true
kubespawner_override:
image: dandiarchive/dandihub:latest
image_pull_policy: Always
default_url: /lab
profile_options: &profile_options_cpu
requests:
display_name: Resource Allocation
choices:
mem_3_7:
display_name: 3.7 GB RAM, upto 3.7 CPUs
kubespawner_override:
mem_guarantee: 3982682624
mem_limit: 3982682624
cpu_guarantee: 0.46875
cpu_limit: 3.75
node_selector:
node.kubernetes.io/instance-type: r5.xlarge
default: true
mem_7_4:
display_name: 7.4 GB RAM, upto 3.7 CPUs
kubespawner_override:
mem_guarantee: 7965365248
mem_limit: 7965365248
cpu_guarantee: 0.9375
cpu_limit: 3.75
node_selector:
node.kubernetes.io/instance-type: r5.xlarge
mem_14_8:
display_name: 14.8 GB RAM, upto 3.7 CPUs
kubespawner_override:
mem_guarantee: 15930730496
mem_limit: 15930730496
cpu_guarantee: 1.875
cpu_limit: 3.75
node_selector:
node.kubernetes.io/instance-type: r5.xlarge
mem_29_7:
display_name: 29.7 GB RAM, upto 3.7 CPUs
kubespawner_override:
mem_guarantee: 31861460992
mem_limit: 31861460992
cpu_guarantee: 3.75
cpu_limit: 3.75
node_selector:
node.kubernetes.io/instance-type: r5.xlarge
GeorgianaElena marked this conversation as resolved.
Show resolved Hide resolved
mem_60_6:
display_name: 60.6 GB RAM, upto 15.7 CPUs
kubespawner_override:
mem_guarantee: 65094813696
mem_limit: 65094813696
cpu_guarantee: 7.86
cpu_limit: 15.72
node_selector:
node.kubernetes.io/instance-type: r5.4xlarge
mem_121_2:
display_name: 121.2 GB RAM, upto 15.7 CPUs
kubespawner_override:
mem_guarantee: 130189627392
mem_limit: 130189627392
cpu_guarantee: 15.72
cpu_limit: 15.72
node_selector:
node.kubernetes.io/instance-type: r5.4xlarge
mem_244_9:
display_name: 244.9 GB RAM, upto 63.6 CPUs
kubespawner_override:
mem_guarantee: 263005526016
mem_limit: 263005526016
cpu_guarantee: 31.8
cpu_limit: 63.6
node_selector:
node.kubernetes.io/instance-type: r5.16xlarge
mem_489_9:
display_name: 489.9 GB RAM, upto 63.6 CPUs
kubespawner_override:
mem_guarantee: 526011052032
mem_limit: 526011052032
cpu_guarantee: 63.6
cpu_limit: 63.6
node_selector:
node.kubernetes.io/instance-type: r5.16xlarge
- display_name: "DANDI Matlab (CPU)"
description: "DANDI image with MATLAB. Requires you to bring your own license"
kubespawner_override:
image: dandiarchive/dandihub:latest-matlab
image_pull_policy: Always
profile_options: *profile_options_cpu
- display_name: "DANDI (GPU)"
description: "DANDI image with JupyterLab and GPU support"
kubespawner_override:
image: dandiarchive/dandihub:latest-gpu
image_pull_policy: Always
extra_resource_limits:
nvidia.com/gpu: 1
profile_options: &profile_options_gpu
requests:
display_name: Resource Allocation
choices:
gpu_1:
display_name: 1 T4 GPU, ~4 CPUs, ~16GB of RAM
kubespawner_override:
mem_guarantee: 14G
mem_limit: 16G
cpu_guarantee: 3
cpu_limit: 4
node_selector:
node.kubernetes.io/instance-type: g4dn.xlarge
default: true
gpu_2:
display_name: 1 T4 GPU, ~8 CPUs, ~32GB of RAM
kubespawner_override:
mem_guarantee: 29G
mem_limit: 32G
cpu_guarantee: 6
cpu_limit: 8
node_selector:
node.kubernetes.io/instance-type: g4dn.2xlarge
- display_name: "DANDI Matlab (GPU)"
description: "DANDI Matlab image with GPU support. Requires you to bring your own license."
kubespawner_override:
image: dandiarchive/dandihub:latest-gpu-matlab
image_pull_policy: Always
extra_resource_limits:
nvidia.com/gpu: 1
profile_options: *profile_options_gpu
20 changes: 20 additions & 0 deletions config/clusters/dandi/enc-prod.secret.values.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,20 @@
jupyterhub:
hub:
config:
GitHubOAuthenticator:
client_id: ENC[AES256_GCM,data:GHXwlK8xb1mqg8JyKV9MCveoBBY=,iv:dh4byjuzDeHxf9RRH2D/UVQc6O6eu1oq3s6mluA9aCI=,tag:HaqPA1l1fYy0OjowJNBnIg==,type:str]
client_secret: ENC[AES256_GCM,data:H5y0hUlRVn+zreyLsi20qSFVdLx7HrQSyfPM9CIzqMiG3wTrvMV1lA==,iv:6yfWAMH01UP5DVW4DuM0MvDV4HkHRoH89Ib+41c8uYg=,tag:PZl90fWWxJ+hs1baYPag4w==,type:str]
sops:
kms: []
gcp_kms:
- resource_id: projects/two-eye-two-see/locations/global/keyRings/sops-keys/cryptoKeys/similar-hubs
created_at: "2024-03-28T20:59:26Z"
enc: CiUA4OM7eEw+PG5huNxT8nSngA2L2N3GbBS5PMEaJXNQRFXgf2MCEkkAXoW3JkxVvTaBas/24VBQ+fyYZcfqhjDm5u4EP/mzsrVvXwC/xjMY/TrDyJWgBUMH17jCVmdqpRWEPrcPU3SFO+wiinn9xyjh
azure_kv: []
hc_vault: []
age: []
lastmodified: "2024-03-28T21:01:36Z"
mac: ENC[AES256_GCM,data:0CPxn8WAaQNJ4HALOEKJgHFv9eZ3q9bVg8sd3e8rCnsG0FZdbt/qibGlCtF0ow4a5NUcoygNhVIndLVufKFNNVNvDl00qyg5ke78oZFWHg+3CzVjs4PQIvG4K1M0xwlVC1S8MZtcUBCCvyND5l/ahLVLxYJjDvXSg3R66NwECx0=,iv:l8T/ufDQeeDfAPMds5uJoC5KC4I5ePhuEs5Bd1gC4rw=,tag:lT5Q0bLzZJ+ZMQ8+xErZRA==,type:str]
pgp: []
unencrypted_suffix: _unencrypted
version: 3.7.3
20 changes: 20 additions & 0 deletions config/clusters/dandi/enc-staging.secret.values.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,20 @@
jupyterhub:
hub:
config:
GitHubOAuthenticator:
client_id: ENC[AES256_GCM,data:36XvfIO3qOI9RdfNAwZA9WdG+AE=,iv:OhawuU/IKP4n8VYWub3o60ljLCpAnUCR8U2rxS4R0CA=,tag:eQoJ1pRHMJd6QWyOE9yoLA==,type:str]
client_secret: ENC[AES256_GCM,data:uIpmcAJQ9WsLYwe+/QRSyGCasK5bVTPfxn89umhmVvwR1qXDsmLXEQ==,iv:4DAtEjj+bHSU7T+sG7AZa9jlwdhsl9bg/OhNk7Dj8EE=,tag:28R2UJwU1+H6a7IjWi+UTA==,type:str]
sops:
kms: []
gcp_kms:
- resource_id: projects/two-eye-two-see/locations/global/keyRings/sops-keys/cryptoKeys/similar-hubs
created_at: "2024-03-27T16:04:45Z"
enc: CiUA4OM7ePqxiqs9++GjbB2FDQnNdmoVHS7O7Zcl2aMFfp1a11/vEkkAXoW3Jve/+QrP/HLbBVHIvuGz722VBqrZXIRQLHzfh1UR91ke/JBGHYp8CNQoji4wDYGNiuT6QwtEB5QXJxY7SMlCkR5qmSFu
azure_kv: []
hc_vault: []
age: []
lastmodified: "2024-03-28T20:40:59Z"
mac: ENC[AES256_GCM,data:50U5gW7d3eMUDKojfZECf2xpyrQsqjGlqz/03GRuaObsmx25rU+6UWV68p22orcjC5kT13DqXjWqvzClVOntP9dV8jIpCjSSc4wrKw9Yo0BoSUMZ59k0uLVfGzV/umiIcaDaum/JDkwn2Dl+dx19HBWYxhbECsPQ2ZyJZJso/ok=,iv:TOrtexsqE7kRQ8Q9g6HKbRIXz2m745JBCs0n+QwOdko=,tag:WXqmp3sf46fmypLDQzRC7Q==,type:str]
pgp: []
unencrypted_suffix: _unencrypted
version: 3.7.3
22 changes: 22 additions & 0 deletions config/clusters/dandi/prod.values.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,22 @@
userServiceAccount:
annotations:
eks.amazonaws.com/role-arn: arn:aws:iam::975050327108:role/dandi-prod

jupyterhub:
ingress:
hosts: [dandi.2i2c.cloud]
tls:
- hosts: [dandi.2i2c.cloud]
secretName: https-auto-tls
hub:
config:
GitHubOAuthenticator:
oauth_callback_url: https://dandi.2i2c.cloud/hub/oauth_callback
custom:
homepage:
templateVars:
org:
name: MIT DANDI
singleuser:
extraEnv:
SCRATCH_BUCKET: s3://dandi-scratch/$(JUPYTERHUB_USER)
22 changes: 22 additions & 0 deletions config/clusters/dandi/staging.values.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,22 @@
userServiceAccount:
annotations:
eks.amazonaws.com/role-arn: arn:aws:iam::975050327108:role/dandi-staging

jupyterhub:
ingress:
hosts: [staging.dandi.2i2c.cloud]
tls:
- hosts: [staging.dandi.2i2c.cloud]
secretName: https-auto-tls
hub:
config:
GitHubOAuthenticator:
oauth_callback_url: https://staging.dandi.2i2c.cloud/hub/oauth_callback
custom:
homepage:
templateVars:
org:
name: MIT DANDI (staging)
singleuser:
extraEnv:
SCRATCH_BUCKET: s3://dandi-scratch-staging/$(JUPYTERHUB_USER)
6 changes: 6 additions & 0 deletions config/clusters/dandi/support.values.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -30,3 +30,9 @@ grafana:
- secretName: grafana-tls
hosts:
- grafana.dandi.2i2c.cloud

cluster-autoscaler:
enabled: true
autoDiscovery:
clusterName: dandi
awsRegion: us-east-2
14 changes: 1 addition & 13 deletions eksctl/dandi.jsonnet
Original file line number Diff line number Diff line change
Expand Up @@ -47,19 +47,7 @@ local notebookNodes = [
availabilityZones: masterAzs,
}
];
local daskNodes = [
// Node definitions for dask worker nodes. Config here is merged
// with our dask worker node definition, which uses spot instances.
// A `node.kubernetes.io/instance-type label is set to the name of the
// *first* item in instanceDistribution.instanceTypes, to match
// what we do with notebook nodes. Pods can request a particular
// kind of node with a nodeSelector
//
// A not yet fully established policy is being developed about using a single
// node pool, see https://github.com/2i2c-org/infrastructure/issues/2687.
//
{ instancesDistribution+: { instanceTypes: ["r5.4xlarge"] }},
];
local daskNodes = [];


{
Expand Down
1 change: 0 additions & 1 deletion terraform/aws/projects/dandi.tfvars
Original file line number Diff line number Diff line change
Expand Up @@ -13,7 +13,6 @@ user_buckets = {
},
}


hub_cloud_permissions = {
"staging" : {
bucket_admin_access : ["scratch-staging"],
Expand Down