Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Package registry #24

Open
hdgarrood opened this issue May 14, 2017 · 16 comments
Open

Package registry #24

hdgarrood opened this issue May 14, 2017 · 16 comments

Comments

@hdgarrood
Copy link
Contributor

Following on from purescript/purescript#2526. I am thinking about the architecture of psc-package, and in particular, thinking about how it differs from Stackage in that Stackage is an extra layer in front of Hackage, and Hackage is a centralized package registry which provides:

  • uniqueness of names, i.e. it makes it easy for people to know which package you mean when you just say "profunctors" or "st" or something.
  • a canonical location for information such as dependencies and version bounds, which means that the work of package authors in tracking down version bounds can be reused by everyone, regardless of which package manager they are using
  • protection against packages going missing as a result of mutation of git tags or deletion of repositories.

I also think that having a centralised registry which is separate from curated package sets provides an important option for publishing packages for authors who might struggle to find time to keep their packages up to date; if the only option is submitting to a package set, I think we risk discouraging people from publishing their packages at all.

Another related issue that has just occurred to me: I think it's quite far from ideal that if someone were publishing their packages only through psc-package and also uploading them to Pursuit, the information about dependencies and bounds which would be passed to purs publish via --manifest psc-package.json on Pursuit would essentially be meaningless. Since the package author would not actually be using it in the course of developing their package, I expect in most cases it would quickly go out of date.

It is probably obvious by now that I would quite like to have a centralised package registry of some kind. However, I appreciate that this would amount to quite a lot of work. So I'm really opening this issue to ask: do you agree that it is worth addressing these issues by creating a centralised registry and modifying psc-package to use it, and if not, is that because of how much work it would be or because of something else?

@paf31
Copy link
Contributor

paf31 commented May 14, 2017

I think of package-sets as sort of like a registry, albeit quite difficult to use. So let me try to summarize what I think of as the differences, and we can see if we can agree:

  • It's too difficult to publish to the main package set right now. I agree with this, and it would be nice to offer some sort of leaner workflow, but I would very much prefer that psc-package only use Git for pulling dependency info. So this would mean some sort of API which would validate a package and push the data into the Git repo, I'm guessing?
  • Packages can go missing. I think we can fix this without moving away from Git as the base, by simply forking repos as necessary, or even creating copies on some server somewhere.
  • We don't track version bound information. Since psc-package doesn't use version bounds, and I don't plan to ever use them, I don't think this matters for psc-package, although I agree it would be helpful to track it for the purposes of other package managers.

So I'm in favor of tracking this data. Some of it seems to belong in the package set, and some outside.

Unfortunately, I don't have time to work on it right now.

Let me ask though: given we already decided that psc-package was not going to be the blessed package management solution, are you trying to solve a particular problem with psc-package here, or just a general problem of tracking data for general purpose use?

@hdgarrood
Copy link
Contributor Author

I guess I'm mainly trying to understand the ideas behind psc-package better, so that I can help work out how Pulp should use it, and I'm also trying to imagine what the purescript ecosystem would look like without Bower and trying to work out what I might be able to do to help there.

In addition to that, though, I do consider the things I've written to be problems for psc-package which I would maybe like to fix or investigate through separate projects.

Re git: if we can find some space on some server, wouldn't it be simpler to just have a tarball per package per version? Mirroring git repos won't be fun if the history has been rewritten since the last time you pulled from the upstream repo and a fast-forward merge is impossible. We also can't just delete and reclone the whole repo, as tags could have been mutated or removed.

To clarify my position re ease of use; I agree that it would be nice to make it easier to publish packages but that's not really what I mean here. My worry is that by only having package sets there will be a social effect that leads people to avoid creating and publishing packages at all. (Yes, of course there is bower, but I am starting to think that bower's lack of a proper solver means that ideally it wants replacing.)

@Pauan
Copy link

Pauan commented May 15, 2017

@hdgarrood I get what you're saying, but as the user of libraries, I really like that package sets give a kind of guarantee of maintenance.

There's been plenty of packages on Pursuit which I wanted to use, but I couldn't because they hadn't been updated in months, because the author didn't maintain them. But they were still listed in Pursuit, leading me to believe that they would work (even though they don't).

I would rather have a small set of high-quality maintained libraries, rather than a large set of low-quality unmaintained libraries (which is what you get with other package managers like npm).

This does run the risk that a library which I am using will be dropped in a future package set, but if anything I consider that a good thing, because it means that people will bug the author to update the library (or fork the library).

Providing workarounds for unmaintained libraries just encourages more unmaintained libraries. Having a strict "must maintain" policy encourages maintenance. Yes it puts more pressure on library authors, but since there are far more library users than library authors, I think that's okay.

In my opinion, the library user experience is more important than the library author experience, because libraries are useless by themself (they are only useful when used by another library or application). So there is a natural asymmetry which is biased toward library users.

So I think the only thing that is necessary is to have the ability for somebody to "take over" an unmaintained package. So that way if the author doesn't maintain the package, somebody else can.

@hdgarrood
Copy link
Contributor Author

hdgarrood commented May 15, 2017

As someone who is both a library user and a library author, I very much disagree. If there was an expectation of maintenance, as there is with package sets, most of my libraries would never have been published. Even though they do often lag behind, I know that people find them useful.

I'm very strongly against any policy that puts unnecessary pressure on library authors or encourages library authors to bug library users. That's not at all the tone I want the PureScript library ecosystem to have.

We know from experience that the Stackage model - i.e. a package registry with no maintenance expectations, plus package sets with maintenance expectations for authors who want to commit to it, works well and scales. I also don't buy the argument that it will be hard to find packages that work well. It's easy - just don't search the full registry, instead restrict your search to a package set. I know a tool that can search within a package set doesn't exist now but it easily could do.

@Pauan
Copy link

Pauan commented May 15, 2017

@hdgarrood Just to be clear, when I say "bug the library author" I mean "file a bug report about updating the library", or "make a pull request updating the library", or "send a polite e-mail explaining the situation", that sort of thing. I don't mean waking them up at 2 AM to pester them.

@hdgarrood
Copy link
Contributor Author

Ok good, thanks for clarifying. My position is unchanged, though.

@Pauan
Copy link

Pauan commented May 15, 2017

What about using the npm registry for packages which don't want to commit to the package set? With peerDependencies you can ensure that there is only a single version of each package.

And we will need to use npm anyways, because there are some PureScript packages which use JavaScript packages. So that avoids needing to create a separate registry.

@hdgarrood
Copy link
Contributor Author

That does seem like a good option, but I would rather not get into this discussion right now - for now, I am really just hoping to ascertain how psc-package should interact with a centralised registry, if at all, rather than details like which registry we use, whether we make our own, etc.

@paf31
Copy link
Contributor

paf31 commented May 15, 2017

I don't have time for a full reply now, but I wanted to just clear up a couple of things:

  • There's no expectation of package maintenance with package sets (other than that you don't delete your tags).
  • Publishing is possible right now, it's just cumbersome. I get a steady stream of PRs on package-sets, but it would be nice if it were simpler to publish.
  • It's not meant to replace Bower, and I don't think it can. I've come to think of it as an extra layer on top of whatever mechanism you're already using for pulling in library dependencies, which makes it easier to freeze those for app developers.

I think we can implement a centralized package repository, but it fundamentally would be different from psc-package.

@hdgarrood
Copy link
Contributor Author

Ok great, thanks. I can't remember if I've said this elsewhere but I expect having version bounds would help with curation once package sets start to become a bit larger, incidentally, having seen how Stackage operates.

When/if you have time I'd be interested to hear about your views on Git as the base for psc-package too, in particular with respect to the issue of package availability, as I'm not sure I fully understand where you're coming from there.

@mostalive
Copy link

(possibly dumb question) @paf31 imagine all the purescript package you need to do something are in package-sets, what would you use bower for? (since js dependencies probably come from npm).

@hdgarrood I think Git can be great for availability, because it is inherently distributed. I can imagine e.g. specifying multiple locations for a package in package.json. It also looks like it is very easy to host a 'private repository' - clone the packages repository, add your private packages where merge conflicts are unlikely (e.g. at the bottom). I think in the long run it could grow the ecosystem - if I can easily split my application into a bunch of libraries in git repositories, and ensure they all build together, making the more generic ones available to everyone else is a matter of moving the git repository to github/bitbucket/gitlab and sending a pull request on 'package-sets' once it is far enough along to be interesting to someone else.

Maintaining community infrastructure is an expensive and not always thankful experience, from what I see in various communities. Less infrastructure = more time available to develop libraries, the compiler and other more interesting parts of the ecosystem.

@mostalive
Copy link

mostalive commented May 24, 2017

FYI (sorry if this is the wrong place, could not find a more appropriate one at the moment) to make it a bit easier to add existing packages to the package set, I wrote a small shell script

https://gist.github.com/mostalive/54dbbf388f6ca58795d6ae37fef22890

that generates most of a packages.json snippet for you. (it currently produces one comma to much and needs to be formatted manually on adding the snippet to packages.json).

I'd be happy to translate this to haskell as a subcommand of psc-package, but not sure it belongs there, and if so, under what name.

The other thing I found useful is a oneliner to extract dependencies from 'bower.json' for use in 'psc-package.json':

$ jq '.dependencies | keys' bower.json | sed s/purescript-//g
[
  "lists",
  "mmorph",
  "monoid",
  "prelude",
  "tailrec",
  "transformers",
  "tuples"
]

@hdgarrood
Copy link
Contributor Author

@hdgarrood I think Git can be great for availability, because it is inherently distributed.

I know that Git is inherently distributed, but that doesn't actually address the availability issues I have described earlier in this thread and in purescript/purescript#2526. I am still not aware of any good way of handling a case where a package author rewrites the Git history between releases if we continue to use Git as the base.

@mostalive
Copy link

mostalive commented May 24, 2017

I see (I did read that thread, there's a lot to take in). Oversimplifying things, I was thinking of a trade off. which has a greater chance:

  1. some one rewrites their git history, on purpose or by accident.
  2. a centralized piece of community infrastructure is down (this I've seen happen in more than just the Haskell community )

The chance of 1. increases with the size of the package set. At the same time so does 2. Why would anyone do 1?
I can think of left-pad (https://www.theregister.co.uk/2016/03/23/npm_left_pad_chaos/ ) as an example, but that is independent of the technical solution - if someone wants to remove their contribution for whatever reason, they should be able to (someone else can take it over / fork it / ... ) I think.

What I then understand from your question at 2526 is: how can we limit the amount of work that has to be done by others when a contributed package gets removed or is broken? Or how can we prevent it?

In the case of the broken tag:
How about storing the tag and the commit hash for that tag and seeing if they match? It is probably possible to change a commit hash, but you'd have to make an effort to do it... (temporary resolution would be to fork the repository, set the tag as it was and send an (automated) message to the author to fix their package).

@hdgarrood
Copy link
Contributor Author

This is not a trade off we are necessarily forced to make; if packages are distributed as .tar.gz files via a system like IPFS or BitTorrent, it's conceivable that everything apart from perhaps publishing new versions of packages would still work.

if someone wants to remove their contribution for whatever reason, they should be able to (someone else can take it over / fork it / ... ) I think.

Certainly people should be able to say "I no longer have the inclination or time to maintain this" and we should have a process for allowing someone else to take over. But once version x.y.z of package A is published on a package registry it should be available indefinitely (except for in very, very unusual cases e.g. the contents of the package are likely to cause legal issues). If, as a package author, you're not comfortable with that, then don't publish your package.

What I then understand from your question at 2526 is: how can we limit the amount of work that has to be done by others when a contributed package gets removed or is broken?

No, my question is how can we prevent this from happening in the first place, because it is entirely avoidable.

How about storing the tag and the commit hash for that tag and seeing if they match?

This has already been suggested and people have already described why it won't work. In summary: we need to be able to reliably obtain package A at version x.y.z. If a package manager just fails with a checksum mismatch error during installation (and all a package manager could reasonably do in that scenario is to just fail with a checksum mismatch error), that's essentially useless from the point of view of the developer trying to install their project's dependencies.

@mostalive
Copy link

Thank you for the detailed reply @hdgarrood . I'm mulling it over.

Rembane pushed a commit to Rembane/psc-package that referenced this issue May 26, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants