Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix concurrent R/W access to link properties #87

Merged
merged 7 commits into from
May 18, 2024

Conversation

chocolatkey
Copy link
Member

@chocolatkey chocolatkey commented Mar 21, 2024

This addresses crashes in the go-toolkit when used in highly concurrent access to publications, due to simultaneous updates to a link's properties (which is a map[string]interface{}) in other parts of the go-toolkit such as the archive fetcher.

fatal error: concurrent map writes
{}
goroutine 1427362 [running]:
github.com/readium/go-toolkit/pkg/manifest.(*Properties).Add(...)
/go/pkg/mod/github.com/readium/go-toolkit@v0.0.0-20231006014237-221ff3b66b2e/pkg/manifest/properties.go:16
github.com/readium/go-toolkit/pkg/fetcher.(*entryResource).Link(0xc005e5a410)
/go/pkg/mod/github.com/readium/go-toolkit@v0.0.0-20231006014237-221ff3b66b2e/pkg/fetcher/fetcher_archive.go:112 +0x305

Unfortunately this ends up making the code more ugly due to the addition of a mutex (and thus conversion of the map to a struct with a map and mutex) as well a custom JSON marshalling needed to omit empty properties due to current limitations of go's stdlib json marshaller

@chocolatkey chocolatkey changed the base branch from main to content-iterator March 21, 2024 00:16
@mickael-menu
Copy link
Member

To understand the issue better, could you explain when and where link properties can be changed concurrently?

Ideally, the Link struct should be immutable, and its properties should not require changes after construction. Would computing the link properties in the ArchiveFetcher and entryResource initializers help?

@chocolatkey
Copy link
Member Author

chocolatkey commented Mar 22, 2024

@mickael-menu The example in the stack trace with the archive fetcher is technically solvable, you're correct, however, programs using the go-toolkit may want to modify the properties of the link after receiving them from the toolkit, for example, to add custom metadata to the link that was computed by their code. In this case, if the properties were immutable, they would have to create a new instance of the struct/property.
If we leave the map[string]interface{} type of the properties like it is, there's also a chance that anyone who uses the package will make the same mistake we made internally and concurrently read/write the map from their software. That's why I turned it into a thread-safe version, so neither we or external implementers can make that mistake. I think we should consider doing the same for the other map properties of the structures in the toolkit.
Making the entire Link struct (or others as well) truly immutable in the code would require significant refactoring of the code, and also make usage of the entire package significantly more difficult, as you could no longer e.g. access the struct properties like link.Href directly, there would have to be a function for each property. This would also increase the complexity of json marshalling/unmarshalling for the structs that do not yet have custom marshallers.

@mickael-menu
Copy link
Member

I think this discussion goes further than this particular PR, so I opened a new discussion. This should already answer some of your concerns regarding JSON marshalling and struct property accessors.

If it makes sense and you agree with my point, then for this data race issue we "just" need to refactor to not mutate the structs in the archive fetcher. In future PRs we could migrate to immutable list/map types, or not.

If you disagree with my take, let's talk about it on the discussion thread and during meetings.

@chocolatkey
Copy link
Member Author

@mickael-menu I'm working on getting the go-toolkit in a state for building and testing, and I'm coming back to getting this PR merged . While I think going with the immutability strategy is a good idea in the long-term, I actually realized it's not super easy to make the https://readium.org/webpub-manifest/properties#archive property not be modified in the archive fetcher because of how the resource link is implemented in this "old" copy of the kotlin toolkit it was original copied from.
In the current version of the kotlin-toolkit, you have the archive entry's properties getter which is from this new version of the resource interface, so you never have to actually modify the link, because those properties are isolated from it.
In the implementation of the go-toolkit, we're using the link, which is supplied from the webpub readingOrder, which in the case of a reflowable publication where this matters is from an EPUB, where the links' properties come from here. We're using that link to Get from the fetcher, and then accessing the link's properties here.

My overall question is, given the goals of not spending significant on a refactor of this (so the branch can be merged), and not incurring additional memory usage by e.g. making a complete copy of the Link to prevent mutations, what do you think is the best way to move forwards with this? Keeping this as it is for the moment and then going back to it later is also an option of course, while it's not ideal, the current implementation "works".

@mickael-menu
Copy link
Member

I'm concerned that if we don't get this right in this PR, the technical debt will continue to accumulate and we may never take the time to address this issue. Implementers could also rely on the mutability of the link properties.

You made a good point about the Kotlin changes and this could actually serve as inspiration for the solution. How about adding a Properties map in Resource to hold the archive info? We initially added it to Link.properties as a workaround to provide the compressed length to the positions service, but it was never really meant to be in the JSON manifest anyways.

@chocolatkey
Copy link
Member Author

chocolatkey commented May 17, 2024

@mickael-menu OK, that works for me. I've done so in the latest commit, let me know if this is what you were thinking

Copy link
Member

@mickael-menu mickael-menu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perfect thank you. If you could just remove the Properties.Add mutable APIs and that's ready to merge for me. 🙏

@chocolatkey
Copy link
Member Author

@mickael-menu I've removed all the mutex code from the properties, and reverted it to simply being a map[string]interface{} type. I've also removed the Add and Delete functions and made adjustments where necessary.

@chocolatkey chocolatkey merged commit f3c807d into content-iterator May 18, 2024
1 check passed
@chocolatkey chocolatkey deleted the fix-properties-map branch May 18, 2024 04:36
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants