Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for hashing from a reader #159

Open
wants to merge 8 commits into
base: feat/no-digest
Choose a base branch
from

Conversation

Stebalien
Copy link
Member

fixes #141

The io import stuff is a bit annoying, but I can't find a better way to do this.

Remove the per-hasher digest type. Instead, store hash digests inside
the hashers and "borrow" it. In all cases, we're going to copy it into a
`Multihash<S>` anyways.

This:

1. Removes bunch of code.
2. Means that hashers don't need to be generic over the size (unless
they actually support multiple sizes). This fixes the UX issue
introduced in the const generics PR.
3. Avoids some copying.

BREAKING CHANGE

1. `Hasher.digest` no longer exists. Users should use
`Code::SomeCode.digest` where possible.
2. The hasher digests no longer exist.
quote!(Self::#ident => {
let mut hasher = #hasher::default();
io::copy(reader, &mut hasher)?;
Ok(Multihash::wrap(#code, hasher.finalize()).unwrap())
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you please add a comment why it is OK to use unwrap() here (or using an expect() which makes it clear?

fn digest(&self, input: &[u8]) -> Multihash<S>;
fn digest(&self, input: &[u8]) -> Multihash<S> {
let mut input = input;
self.digest_reader(&mut input).unwrap()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same here, please add a comment or expect() to make clear why it is OK to use unwrap() here.

src/multihash.rs Outdated
@@ -49,7 +69,9 @@ pub trait MultihashDigest<const S: usize>:
/// let hash = Code::Sha3_256.wrap(&hasher.finalize());
/// println!("{:02x?}", hash);
/// ```
fn wrap(&self, digest: &[u8]) -> Multihash<S>;
fn wrap(&self, digest: &[u8]) -> Multihash<S> {
Multihash::wrap((*self).into(), digest).unwrap()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same here, please add a comment or expect() to make clear why it is OK to use unwrap() here.

@vmx
Copy link
Member

vmx commented Nov 9, 2021

In regards to the io imports. This seems to work:

+++ b/derive/src/multihash.rs
@@ -238,7 +238,8 @@ pub fn multihash(s: Structure) -> TokenStream {
     let code_into_u64 = hashes.iter().map(|h| h.code_into_u64(&params));
     let code_from_u64 = hashes.iter().map(|h| h.code_from_u64());
     let code_digest = hashes.iter().map(|h| h.code_digest());
-    let code_reader = hashes.iter().map(|h| h.code_reader());
+    let code_reader_core = hashes.iter().map(|h| h.code_reader());
+    let code_reader_std = hashes.iter().map(|h| h.code_reader());
 
     quote! {
         /// A Multihash with the same allocated size as the Multihashes produces by this derive.
@@ -253,11 +254,22 @@ pub fn multihash(s: Structure) -> TokenStream {
                 }
             }
 
-            fn digest_reader<R: #io_path::Read>(&self, reader: &mut R) -> #io_path::Result<Multihash> {
-                use #io_path;
+            #[cfg(feature = "std")]
+            fn digest_reader<R: std::io::Read>(&self, reader: &mut R) -> std::io::Result<Multihash> {
+                use std::io;
                 use #mh_crate::Hasher;
                 match self {
-                    #(#code_reader,)*
+                    #(#code_reader_core,)*
+                    _ => unreachable!(),
+                }
+            }
+
+            #[cfg(not(feature = "std"))]
+            fn digest_reader<R: core2::io::Read>(&self, reader: &mut R) -> core2::io::Result<Multihash> {
+                use core2::io;
+                use #mh_crate::Hasher;
+                match self {
+                    #(#code_reader_std,)*
                     _ => unreachable!(),
                 }
             }

@Stebalien
Copy link
Member Author

@vmx so, that brings up an interesting question.

The first two are "safe" (except for the identity hash) because we know the digests will fit. But the last one isn't as a digest could have an arbitrary size. This isn't a new issue, but it would be nice to fix it while we're changing these APIs.

Options:

  • Return a Result.
  • Remove the method. The user can always call Multihash::wrap(code.into(), digest).

Thoughts?

@Stebalien
Copy link
Member Author

For now I'm returning a result.

@vmx
Copy link
Member

vmx commented Nov 9, 2021

The first two are "safe" (except for the identity hash) because we know the digests will fit.

I'm not sure about this. In the default table we would know. But if someone defines their own table (and people really should), then they could define an allocation size, which is smaller than what the hasher returns. Multihash::wrap() would the return an error. Then the unwrap will lead to a panic.

This behaviour might be fine, but we need documentation for the panic cases.

@Stebalien
Copy link
Member Author

You know what, I think I have a good solution. I'm going to codegen tests.

@Stebalien
Copy link
Member Author

So, actually, what I can do is:

  1. Have hashers specify a maximum digest size.
  2. Statically assert that the maximum digest size of all hashers is less than the size of the Multihash type. That's basically what we used to do anyways.

@vmx
Copy link
Member

vmx commented Jan 26, 2022

2. Statically assert that the maximum digest size of all hashers is less than the size of the Multihash type. That's basically what we used to do anyways.

@Stebalien what exactly do you mean with "statically assert"? At compile time or at run-time via assert!()? If it's at compile time, could you give me a hint on how to do that? Then I'll try to implement that.

@Stebalien
Copy link
Member Author

I mean at compile time. I can't remember exactly what I was planning, but I think it was something like: when deriving MultihashDigest, emit some static assertions that Digest::SIZE <= target_multihash_size.

But I can't remember exactly how I intended to do it. Something like:

trait Digest {
    const SIZE: usize;
}

struct MyDigest;

impl Digest for MyDigest {
    const SIZE: usize = 20;
}

trait Assert<const T: bool> {
    fn assert() {}
}

impl Assert<true> for () {}

fn main() {
    <() as Assert<{ MyDigest::SIZE < 64 }>>::assert();
    println!("Results:")
}

But I'm not sure if there's a way to get a better error.

@vmx
Copy link
Member

vmx commented Jan 26, 2022

@mriise On one hand I don't want to drag you into yet another const generics related discussion, on the other hand I'm pretty sure you would have a good idea on how to check if a certain size is smaller then other nicely at compile time :)

@mriise
Copy link
Contributor

mriise commented Jan 27, 2022

here is a playground link to the best solution right now IMO: https://play.rust-lang.org/?version=stable&mode=debug&edition=2018&gist=de22846e5857a2b607e884e0589e9aae

The error it puts out is a bit confusing at first glance but its not too terrible, might be even better with different ident names.

as for shrinking arrays, since it can fail at runtime assuming we are shrinking it to the mh.size() it can be covered by a resize function that does checks at runtime for both growing and shrinking the backing array.

Use config flags instead of introducing the new attribute `io_path`.
With some compile-time asserts we can make sure that certain `unwrap()`
calls won't panic.
@vmx
Copy link
Member

vmx commented Jan 27, 2022

I've pushed two new commits (I intentionally didn't rebase this PR for easier review. Once reviewed, I'll rebase and squash it into a single commit).

The first one is about the io_path, I hope my solution is a working replacement.

The second one adds those compile-time asserts. Thanks @mriise your playground like was really useful.

@vmx vmx requested a review from mriise January 27, 2022 17:16
@vmx
Copy link
Member

vmx commented Jan 27, 2022

@mriise I've added you a few times as a reviewer, when I'd appreciate your input. Though please don't feel obliged to do a review.

@@ -253,11 +245,22 @@ pub fn multihash(s: Structure) -> TokenStream {
}
}

fn digest_reader<R: #io_path::Read>(&self, reader: &mut R) -> #io_path::Result<Multihash> {
use #io_path;
#[cfg(feature = "std")]
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This gets expanded in the importing crate, not in the multihash-derive crate. That means if the importing crate doesn't have an std feature, we'll use core2.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I changed it so that it is expanded at the multihash-derive level. Would that work?

@vmx
Copy link
Member

vmx commented Jan 27, 2022

While testing the std/core2 expansion, I found out that core2 does support io::copy only on nightly Rust. @Stebalien is that a problem?

Perhaps we should also check examples being compiled without the std feature. I discovered the issue by running:

cargo build --example custom_table --no-default-features --features derive,secure-hashes,alloc

@Stebalien
Copy link
Member Author

While testing the std/core2 expansion, I found out that core2 does support io::copy only on nightly Rust. @Stebalien is that a problem?

Hm. Yes, it is. Especially because the unstable API is being replaced. Maybe upstream would accept a stable version? There's no good reason it can't be supported on stable.

@vmx
Copy link
Member

vmx commented Jan 28, 2022

Maybe upstream would accept a stable version?

I've opened technocreatives/core2#17, let's see what happens.

@vmx
Copy link
Member

vmx commented Jan 28, 2022

Solving this properly probably will take a while. What do you folks think about releasing current master and leaving this one for the release after (especially since this change is just and addition, without breaking APIs)?

@Stebalien
Copy link
Member Author

Yeah, that makes sense.

Copy link

@Leena8686 Leena8686 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ass support for hashing from a reader please. Thanks

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants