Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix(foreach): Remove shell invocation from foreach #136

Merged
merged 9 commits into from
Jun 17, 2024

Conversation

annettejanewilson
Copy link
Contributor

As discussed in issue 135, turbolift applies shell escaping inconsistently on the arguments in foreach. Upon consideration, I think there is no good reason to apply shell escaping at all. It makes turbolift more fragile, more confusing and ultimately less flexible.

Instead, do not use a shell at all and all these problems disappear. None of the examples in the README use this functionality. If users do need to invoke a shell, for example to do redirection, it is much more explicit and understandable for them if they do it themselves.

This PR:

  • Removes the use of a shell, instead directly executing a subprocess based on the arguments provided.
  • Removes custom handling of arguments. Users are directed to the standard GNU-style -- to indicate that everything else on the command-line is an argument, not an option. This makes the code substantially simpler.
  • Uses shellescape to format the printed command. (In retrospect I could separate this into another PR. Let me know if you prefer that.)

@rnorth
Copy link
Collaborator

rnorth commented Jun 3, 2024

Let's think about this, as it does look nicer - FYI @sledigabel

I need to backtrack and check why we used a shell in the first place. From rough memory I vaguely recall frustration with there being enough real-world situations where a shell was needed, but that could be mistaken. I'll check slack history, particularly back on the internal pre-go (bash!) prototype of turbolift.

@rnorth
Copy link
Collaborator

rnorth commented Jun 4, 2024

@sledigabel, @Dan7-7-7 and I had our regular scheduled catchup yesterday; we're kinda on the fence about this, partly because it would be a breaking change and requires ordinary users (who are not necessarily POSIX power users) to learn the -- option. We're not so fussed about the volume of code we have to maintain.

We also discussed, and looked up in Slack, the scenarios where shell has been desirable in the past; these are roughly in two categories:

  • compatibility with SDK version/environment managers. For me this is the number one priority, since tools like ASDF can drastically simplify working across many repos that each has its own version/environment specifics
  • intentional use of chained commands - e.g. patterns like turbolift foreach 'gh pr view --comments | grep "Closing temporarily" && gh pr reopen --comment "Reopening"'. I have a bad feeling we've already broken this in a recent release

Running some quick local tests with this branch I think we're safe-ish on the first but it would be a breaking change for the second:

> turbolift foreach -- which java
  OK   Reading campaign data (repos.txt)
  OK   Executing which java in work/someorg/somerepo

     Executing: which [java] in work/someorg/somerepo
     /Users/richardnorth/.asdf/shims/java
  OK   turbolift foreach completed (1 OK, 0 skipped)

^ Evidently the ASDF shims are still used

> cd work/someorg/somerepo
someorg/somerepo> asdf local java temurin-21.0.0+35.0.LTS
someorg/somerepo> cd -
> turbolift foreach -- java --version
  OK   Reading campaign data (repos.txt)
  OK   Executing java --version in work/someorg/somerepo

     Executing: java [--version] in work/someorg/somerepo
     openjdk 21 2023-09-19 LTS
     OpenJDK Runtime Environment Temurin-21+35 (build 21+35-LTS)
     OpenJDK 64-Bit Server VM Temurin-21+35 (build 21+35-LTS, mixed mode)
  OK   turbolift foreach completed (1 OK, 0 skipped)

^ ... and actually apply

> turbolift foreach -- which java && echo hello
  OK   Reading campaign data (repos.txt)
  OK   Executing which java in work/someorg/somerepo

     Executing: which [java] in work/someorg/somerepo
     /Users/richardnorth/.asdf/shims/java
  OK   turbolift foreach completed (1 OK, 0 skipped)

hello

^ using shell syntax behaves as might be expected. Not shown: wrapping a scriptlet in single quotes used to work, but now doesn't work in our release version.

> turbolift foreach -- sh -c "which java && echo hello"
  OK   Reading campaign data (repos.txt)
  OK   Executing sh -c 'which java && echo hello' in work/someorg/somerepo

     Executing: sh [-c which java && echo hello] in work/someorg/somerepo
     /Users/richardnorth/.asdf/shims/java
     hello
  OK   turbolift foreach completed (1 OK, 0 skipped)

^ workaround necessary to execute multiple commands, if one does not want to put them in an actual script file.

@rnorth
Copy link
Collaborator

rnorth commented Jun 4, 2024

One other thing we wondered: given now mandatory use of --, is it at least possible to make the command fail with a sensible and helpful error message if the user does not supply it?

@annettejanewilson
Copy link
Contributor Author

FYI I fully implemented this with a -c/--command option that would take a shell command as an option, then realised that it had very little value. It was just sugar that turned turbolift foreach sh -c 'command | with | pipes' into turbolift foreach -c 'command | with | pipes' but left you wondering about the details.

My feeling is that it's best to provide a simple and predictable building block that handles 90% of cases as-is and which doesn't get in the way when you want to do the complex ones: you can easily invoke a shell yourself because you can be confident that your arguments get through to it untouched.

This will need release notes, but I also know people are likely to be using this through brew and not seeing release notes. Do you think there's anything needed for the change in behaviour?

I see 2.4.1 was released yesterday. I was hoping we could resolve this before releasing that.

Here's a handy summary:

What works?2.4.02.4.1This branch
Can pass any valid shell script as a single argumentYesNoNo
Can pass simple command directly as args (no empty args, no spaces, no shell chars)YesYesYes
Can pass simple commands with spaces directly as args (no empty args, no shell chars)NoYesYes
Can pass any command directly as args (including empty args, arbitrary chars in args)NoNoYes
There is some way to pass an arbitrary shell scriptYes (single arg)NoYes (sh -c ...)

@annettejanewilson
Copy link
Contributor Author

annettejanewilson commented Jun 4, 2024

One other thing we wondered: given now mandatory use of --, is it at least possible to make the command fail with a sensible and helpful error message if the user does not supply it?

We could check that c.ArgsLenAtDash() is 0. It's -1 if -- doesn't appear at all, and positive if some number of non-option arguments appear before the --.

Note that it's not mandatory, just strongly recommended to avoid confusion.

@sledigabel
Copy link
Contributor

Note that it's not mandatory, just strongly recommended to avoid confusion.

I think in the current state of the PR it is mandatory in order to avoid the command options from being swallowed by Cobra since we're not disabling the parsing any longer. So it's either using -- or quoting the command.

Cobra is too clever for this so checking the arg length sounds like a good strategy.

@rnorth
Copy link
Collaborator

rnorth commented Jun 4, 2024

Yes, it can be avoided but quickly becomes needed if any flags are used for the command.

I was going to say we could emit a prominent warning, but most people when faced with a warning and an error only see the error.

Perhaps we should just make it mandatory, and bomb out with our own very clear message if it's not been used?

@annettejanewilson
Copy link
Contributor Author

annettejanewilson commented Jun 4, 2024

I've pushed a change to make -- mandatory. However, there are some caveats:

  • When you omit -- and include no options, you'll see "Error: Use -- to separate command" and brief usage instructions.
  • When you omit -- and do include options in your command, you'll see something like "Error: unknown shorthand flag: 'x' in -x" and brief usage instructions.
  • When you specify --help, you will see detailed usage instructions. These are very clear about the need for --.

The problem here is that the brief usage instructions shown when the arguments are incorrect do not include either the "Long" or the "Short" description of the command. See cobra #1084.

In all cases – brief or full – the usage instructions show:

Usage:
  turbolift foreach [--repos REPOFILE] -- COMMAND [ARGUMENT...] [flags]

It's just that only in the full usage instructions is there text to reinforce the need for --.

@annettejanewilson
Copy link
Contributor Author

Oh, lol, now I type that out I see that [flags] floating at the end. Let me fix that.

@annettejanewilson
Copy link
Contributor Author

Oh, I can't! It's put there by Cobra. Huh. Why does it put it at the end? Who puts flags at the end?

As discussed in [issue 135], turbolift applies shell escaping
inconsistently on the arguments in foreach. Upon consideration, I think
there is no good reason to apply shell escaping at all. It makes
turbolift more fragile, more confusing and ultimately less flexible.

Instead, do not use a shell at all and all these problems disappear.
None of the examples in the README use this functionality. If users do
need to invoke a shell, for example to do redirection, it is much more
explicit and understandable for them if they do it themselves.

This PR:

* Removes the use of a shell, instead directly executing a subprocess
  based on the arguments provided.
* Removes custom handling of arguments. Users are directed to the
  standard GNU-style `--` to indicate that everything else on the
  command-line is an argument, not an option. This makes the code
  substantially simpler.

[issue 135]: Skyscanner#135
Shell escape the arguments so you can tell when arguments have spaces in
them.
Since we are deliberately removing custom handling of arguments, users
will find that if they don't include a double hypen (`--`) separator
before their foreach command, any options they pass to their command
will be interpreted by turbolift and either result in errors, or worse,
being quietly omitted from their command (e.g. `-v`), which could be
surprising and frustrating!

To head this off, refuse immediately if `--` either doesn't appear on
the command-line, or appears after one or more arguments.
Make it clear that the double-hyphen separator `--` is mandatory and not
just a good idea.
Turns out Cobra doesn't wrap usage messages, so they look pretty ugly if
we don't pick a pretty conservative width and wrap by hand.
Turns out Cobra searches your usage line for the specific text "[flags]"
and if it doesn't find it, shoves it on the end. Gross. For that reason,
let's remove explicit callout of any flags and move it back to *before*
the double-hyphen.
@annettejanewilson
Copy link
Contributor Author

Okay, I'm happy with this again now. I've rebased again main (cleanly) but otherwise I haven't collapsed or rewritten any of my commits.

@annettejanewilson annettejanewilson changed the title Remove shell invocation from foreach fix(foreach): Remove shell invocation from foreach Jun 4, 2024
@annettejanewilson
Copy link
Contributor Author

Please let me know if you would prefer the invocation formatting in a separate PR. I think maybe that would make sense as it's not a necessary part of this fix, more a feature in itself, and it should probably be applied more broadly than just to foreach. I note that right now it seems like all the invocations get reported twice by different parts of the code - once in foreach and again in executor. Again, that might be better dealt with in its own PR. I already did the work to separate it into its own commit, so I think it could easily be dropped from this PR.

@sledigabel
Copy link
Contributor

I think it looks good @annettejanewilson. The formatting can stay in the same PR as far as I'm concerned, it's a nice addition to this refactor.

We can prob live with the error message triggered if the user omits the -- and adds an option. I understand it's because Cobra parses it before you can handle this kind of error? I think it's OK because despite the error handling saying the flag doesn't exist, the usage makes it very clear what the user needs to do.

Copy link
Contributor

@sledigabel sledigabel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM overall, is there a way you could add a test for the shellescape as part of foreach since it's introduced in this PR?

Wrap braces around the logged command to delimit it clearly. Add unit
tests to cover argument formatting.
@annettejanewilson
Copy link
Contributor Author

LGTM overall, is there a way you could add a test for the shellescape as part of foreach since it's introduced in this PR?

Done.

sledigabel
sledigabel previously approved these changes Jun 4, 2024
Copy link
Contributor

@sledigabel sledigabel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, great work @annettejanewilson thanks

@annettejanewilson
Copy link
Contributor Author

annettejanewilson commented Jun 4, 2024

we're kinda on the fence about this

Ah, sorry, I missed this longer reply earlier as we had multiple messages cross over - I only saw the smaller one following it.

@rnorth Did I convince you with the table above? I would still argue this is a more coherent approach, but if we don't do this I think the status quo in 2.4.0 was more useful than where 2.4.1 ends up.

README.md Show resolved Hide resolved
@annettejanewilson
Copy link
Contributor Author

Suggested text for release notes:

Turbolift's foreach command no longer invokes a shell to interpret your command. This fixes some issues around passing an empty string as an argument and difficulty escaping shell characters. However, it also means that you can no longer directly pass turbolift a quoted script containing redirects or pipes. Instead, you will need to explicitly invoke a shell yourself. For example, if you previously used turbolift foreach 'script | with > redirects' you would now use turbolift foreach -- bash -c 'script | with > redirects'. We hope that the improved consistency and predictability makes up for needing to be a bit more explicit in this advanced usage.

Copy link
Collaborator

@Dan7-7-7 Dan7-7-7 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, the change might cause minor disruption for users but I think ultimately it's more consistent / less confusing this way.
Re-approving after my comment but will let @rnorth and @sledigabel have the final say.

@rnorth
Copy link
Collaborator

rnorth commented Jun 17, 2024

Let's merge - I agree with the approach. Thanks for this @annettejanewilson !

@rnorth rnorth enabled auto-merge (squash) June 17, 2024 14:05
@rnorth rnorth merged commit 6997ad4 into Skyscanner:main Jun 17, 2024
4 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants