Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

WIP: 1710: Add SELinuxChangePolicy to PodSpec #4843

Open
wants to merge 8 commits into
base: master
Choose a base branch
from

Conversation

jsafrane
Copy link
Member

@jsafrane jsafrane commented Sep 11, 2024

We found out that we need an explicit opt-out for mounting all volumes with -o context - all pods sharing the same volume must have the same label + must be all unprivileged (or all privileged).

There are many changes in the KEP because of that. It feels like a new KEP.

  • New API field: PodSecurityContext.SELinuxChangePolicy with values Recursive and MountOption. MountOption is the default one!
  • Since the KEP changes the default behavior, there are more details how to upgrade a cluster safely. Short summary
    1. A new feature gate SELinuxChangePolicy gets enabled in Kubernetes 1.N.
      • This enables the new field SELinuxChangePolicy.
      • SELinuxMount feature gate is still disabled.
      • Cluster admin can (easily?) list what Pods would break when SELinuxMount gets enabled in the next Kubernetes release and fix them (either don't mix privileged and unprivileged pods or set their SELinuxChangePolicy: Recursive).
      • A kubernetes distro may choose to alert / block upgrades to a version that has SELinuxMount enabled until the cluster admin fixes all problematic Pods.
    2. In the next Kubernetes upgrade, SELinuxMount gets enabled by default, but it does not break anything, because all problematic Pods were fixed before the upgrade.

WIP:

  • check PRR questionnaire + see what needs to be changed there
  • kube-state-metrics may need more thoughts, especially with all weird rules and feature gates around SELinux mount.

@k8s-ci-robot k8s-ci-robot added do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. labels Sep 11, 2024
@k8s-ci-robot k8s-ci-robot added the kind/kep Categorizes KEP tracking issues and PRs modifying the KEP directory label Sep 11, 2024
@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: jsafrane

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added approved Indicates a PR has been approved by an approver from all required OWNERS files. sig/storage Categorizes an issue or PR as relevant to SIG Storage. size/L Denotes a PR that changes 100-499 lines, ignoring generated files. labels Sep 11, 2024
| AWS EBS CSI | true | `-o context=XYZ` | | 2) |
| AWS EBS CSI | unset or false | | `:Z` | 3) |
| NFS1 CSI | true | `-o context=XYZ,noshareacache` | | 4) |
| NFS2 CSI | unset or false | | | 5) |
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IMO, this table can be removed and merged with "Volume Mounting" section.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I;m not sure, "Volume Mounting" describes general mount capabilities. This example shows how we're going to use these capabilities in kubelet.

// MountReadWriteOncePod mounts PersistentVolumes with access mode ReadWriteOncePod with `-o context` mount option.
// All other volumes are relabeled recursively by the container runtime.
// Mounting with `-o context` is instant and does not require the container runtime to recursively walk through the volume.
SELinuxChangePolicyMountReadWriteOncePod PodSELinuxChangePolicy = "MountReadWriteOncePod"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also - for this option to work, Pod must have an explicit SELinux context specified, otherwise it will again default to recursive.

IMO - this default though makes sense, there isn't one single word that is going to capture all the necessary information.


I was thinking - how about we keep existing default and introduce a boolean field in securityContext. So something like:

securityContext.useSELInuxMountOption: true|false

So we keep existing behaviour for ReadWriteOncePod and other situations (if field is nil) but use this field if explicitly set to true. If this is explicitly set to false, then I guess we will have to use recursive chcon even for RWOP volumes. But that doesn't seem so bad.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should have all allowed options as explicit values in the API. nil should mean either true or false, not something in between.

I'll add the wording about SELinux label in Pod.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added a very short sentence about SELinuxLabel in PodSecurityContext / SecurityContext.

@k8s-ci-robot k8s-ci-robot added size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. and removed size/L Denotes a PR that changes 100-499 lines, ignoring generated files. labels Sep 13, 2024
We found out that we *need* an explicit opt-in for mounting all volumes
with `-o context` - all pods sharing the same volume must have the same
label + must be all unprivileged (or all privileged).

There are many changes in the KEP because of that. It feels like a new KEP.
@jsafrane
Copy link
Member Author

I split it into some small commits, but there is still a huge one with the API change. And I tried to capture what was already done and how we want to move forward.

Files on the volume already have the right SELinux context, no recursive relabeling happens there.
* When a `PodSecurityContext` specifies incomplete SELinux label (i.e. omits `SELinuxOptions.User`, `.Role` or `.Type`), kubelet fills the blanks from the system defaults provided by [ContainerLabels() from go-selinux bindings](https://github.com/opencontainers/selinux/blob/621ca21a5218df44259bf5d7a6ee70d720a661d5/go-selinux/selinux_linux.go#L770).
Introduce a new PodSpec field `Spec.SecurityContext.SELinuxChangePolicy` with the following values:
* `UseMountOptionForReadWriteOncePod`: mount RWOP volumes with `-o context` (see other conditions below), all other volumes are relabeled recursively by the container runtime.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Another option I was thinking is - UseMountOptionsSafely, it is perhaps as much big as current choice, but again this name is going to be just picking "okay" option among worst. :(

Make `MountOption` the default value. This way, we don't need
`UseMountOptionForReadWriteOncePod` value.
We need SELinuxChangePolicy field to be available befor `SELinuxMount`
feature gate is enabled.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. kind/kep Categorizes KEP tracking issues and PRs modifying the KEP directory sig/storage Categorizes an issue or PR as relevant to SIG Storage. size/XL Denotes a PR that changes 500-999 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants