Add support for hashing from a reader #159

Stebalien · 2021-11-08T16:40:37Z

fixes #141

The io import stuff is a bit annoying, but I can't find a better way to do this.

Remove the per-hasher digest type. Instead, store hash digests inside the hashers and "borrow" it. In all cases, we're going to copy it into a `Multihash<S>` anyways. This: 1. Removes bunch of code. 2. Means that hashers don't need to be generic over the size (unless they actually support multiple sizes). This fixes the UX issue introduced in the const generics PR. 3. Avoids some copying. BREAKING CHANGE 1. `Hasher.digest` no longer exists. Users should use `Code::SomeCode.digest` where possible. 2. The hasher digests no longer exist.

vmx · 2021-11-09T10:56:18Z

derive/src/multihash.rs

+        quote!(Self::#ident => {
+            let mut hasher = #hasher::default();
+            io::copy(reader, &mut hasher)?;
+            Ok(Multihash::wrap(#code, hasher.finalize()).unwrap())


Can you please add a comment why it is OK to use unwrap() here (or using an expect() which makes it clear?

vmx · 2021-11-09T10:57:12Z

src/multihash.rs

-    fn digest(&self, input: &[u8]) -> Multihash<S>;
+    fn digest(&self, input: &[u8]) -> Multihash<S> {
+        let mut input = input;
+        self.digest_reader(&mut input).unwrap()


Same here, please add a comment or expect() to make clear why it is OK to use unwrap() here.

vmx · 2021-11-09T10:57:50Z

src/multihash.rs

@@ -49,7 +69,9 @@ pub trait MultihashDigest<const S: usize>:
    /// let hash = Code::Sha3_256.wrap(&hasher.finalize());
    /// println!("{:02x?}", hash);
    /// ```
-    fn wrap(&self, digest: &[u8]) -> Multihash<S>;
+    fn wrap(&self, digest: &[u8]) -> Multihash<S> {
+        Multihash::wrap((*self).into(), digest).unwrap()


Same here, please add a comment or expect() to make clear why it is OK to use unwrap() here.

vmx · 2021-11-09T11:35:19Z

In regards to the io imports. This seems to work:

+++ b/derive/src/multihash.rs
@@ -238,7 +238,8 @@ pub fn multihash(s: Structure) -> TokenStream {
     let code_into_u64 = hashes.iter().map(|h| h.code_into_u64(&params));
     let code_from_u64 = hashes.iter().map(|h| h.code_from_u64());
     let code_digest = hashes.iter().map(|h| h.code_digest());
-    let code_reader = hashes.iter().map(|h| h.code_reader());
+    let code_reader_core = hashes.iter().map(|h| h.code_reader());
+    let code_reader_std = hashes.iter().map(|h| h.code_reader());
 
     quote! {
         /// A Multihash with the same allocated size as the Multihashes produces by this derive.
@@ -253,11 +254,22 @@ pub fn multihash(s: Structure) -> TokenStream {
                 }
             }
 
-            fn digest_reader<R: #io_path::Read>(&self, reader: &mut R) -> #io_path::Result<Multihash> {
-                use #io_path;
+            #[cfg(feature = "std")]
+            fn digest_reader<R: std::io::Read>(&self, reader: &mut R) -> std::io::Result<Multihash> {
+                use std::io;
                 use #mh_crate::Hasher;
                 match self {
-                    #(#code_reader,)*
+                    #(#code_reader_core,)*
+                    _ => unreachable!(),
+                }
+            }
+
+            #[cfg(not(feature = "std"))]
+            fn digest_reader<R: core2::io::Read>(&self, reader: &mut R) -> core2::io::Result<Multihash> {
+                use core2::io;
+                use #mh_crate::Hasher;
+                match self {
+                    #(#code_reader_std,)*
                     _ => unreachable!(),
                 }
             }

Stebalien · 2021-11-09T11:41:18Z

@vmx so, that brings up an interesting question.

The first two are "safe" (except for the identity hash) because we know the digests will fit. But the last one isn't as a digest could have an arbitrary size. This isn't a new issue, but it would be nice to fix it while we're changing these APIs.

Options:

Return a Result.
Remove the method. The user can always call Multihash::wrap(code.into(), digest).

Thoughts?

The multihash digest may not fit.

fixes #141

Stebalien · 2021-11-09T11:54:03Z

For now I'm returning a result.

vmx · 2021-11-09T12:04:34Z

The first two are "safe" (except for the identity hash) because we know the digests will fit.

I'm not sure about this. In the default table we would know. But if someone defines their own table (and people really should), then they could define an allocation size, which is smaller than what the hasher returns. Multihash::wrap() would the return an error. Then the unwrap will lead to a panic.

This behaviour might be fine, but we need documentation for the panic cases.

Stebalien · 2021-11-09T13:11:55Z

You know what, I think I have a good solution. I'm going to codegen tests.

Stebalien · 2021-11-09T17:38:44Z

So, actually, what I can do is:

Have hashers specify a maximum digest size.
Statically assert that the maximum digest size of all hashers is less than the size of the Multihash type. That's basically what we used to do anyways.

vmx · 2022-01-26T17:49:40Z

2. Statically assert that the maximum digest size of all hashers is less than the size of the Multihash type. That's basically what we used to do anyways.

@Stebalien what exactly do you mean with "statically assert"? At compile time or at run-time via assert!()? If it's at compile time, could you give me a hint on how to do that? Then I'll try to implement that.

Stebalien · 2022-01-26T21:31:48Z

I mean at compile time. I can't remember exactly what I was planning, but I think it was something like: when deriving MultihashDigest, emit some static assertions that Digest::SIZE <= target_multihash_size.

But I can't remember exactly how I intended to do it. Something like:

trait Digest {
    const SIZE: usize;
}

struct MyDigest;

impl Digest for MyDigest {
    const SIZE: usize = 20;
}

trait Assert<const T: bool> {
    fn assert() {}
}

impl Assert<true> for () {}

fn main() {
    <() as Assert<{ MyDigest::SIZE < 64 }>>::assert();
    println!("Results:")
}

But I'm not sure if there's a way to get a better error.

vmx · 2022-01-26T21:43:37Z

@mriise On one hand I don't want to drag you into yet another const generics related discussion, on the other hand I'm pretty sure you would have a good idea on how to check if a certain size is smaller then other nicely at compile time :)

mriise · 2022-01-27T03:27:45Z

here is a playground link to the best solution right now IMO: https://play.rust-lang.org/?version=stable&mode=debug&edition=2018&gist=de22846e5857a2b607e884e0589e9aae

The error it puts out is a bit confusing at first glance but its not too terrible, might be even better with different ident names.

as for shrinking arrays, since it can fail at runtime assuming we are shrinking it to the mh.size() it can be covered by a resize function that does checks at runtime for both growing and shrinking the backing array.

Use config flags instead of introducing the new attribute `io_path`.

With some compile-time asserts we can make sure that certain `unwrap()` calls won't panic.

vmx · 2022-01-27T17:16:34Z

I've pushed two new commits (I intentionally didn't rebase this PR for easier review. Once reviewed, I'll rebase and squash it into a single commit).

The first one is about the io_path, I hope my solution is a working replacement.

The second one adds those compile-time asserts. Thanks @mriise your playground like was really useful.

vmx · 2022-01-27T17:17:38Z

@mriise I've added you a few times as a reviewer, when I'd appreciate your input. Though please don't feel obliged to do a review.

Stebalien · 2022-01-27T17:41:46Z

derive/src/multihash.rs

@@ -253,11 +245,22 @@ pub fn multihash(s: Structure) -> TokenStream {
                }
            }

-            fn digest_reader<R: #io_path::Read>(&self, reader: &mut R) -> #io_path::Result<Multihash> {
-                use #io_path;
+            #[cfg(feature = "std")]


This gets expanded in the importing crate, not in the multihash-derive crate. That means if the importing crate doesn't have an std feature, we'll use core2.

I changed it so that it is expanded at the multihash-derive level. Would that work?

vmx · 2022-01-27T22:27:11Z

While testing the std/core2 expansion, I found out that core2 does support io::copy only on nightly Rust. @Stebalien is that a problem?

Perhaps we should also check examples being compiled without the std feature. I discovered the issue by running:

cargo build --example custom_table --no-default-features --features derive,secure-hashes,alloc

Stebalien · 2022-01-28T00:05:34Z

While testing the std/core2 expansion, I found out that core2 does support io::copy only on nightly Rust. @Stebalien is that a problem?

Hm. Yes, it is. Especially because the unstable API is being replaced. Maybe upstream would accept a stable version? There's no good reason it can't be supported on stable.

vmx · 2022-01-28T16:31:41Z

Maybe upstream would accept a stable version?

I've opened technocreatives/core2#17, let's see what happens.

vmx · 2022-01-28T20:53:23Z

Solving this properly probably will take a while. What do you folks think about releasing current master and leaving this one for the release after (especially since this change is just and addition, without breaking APIs)?

Stebalien · 2022-01-28T21:34:53Z

Yeah, that makes sense.

Leena8686

Ass support for hashing from a reader please. Thanks

Stebalien mentioned this pull request Nov 8, 2021

remove per-hash digest type #156

Merged

Stebalien force-pushed the feat/digest-reader branch from df6b56f to 3954ba5 Compare November 8, 2021 16:45

vmx reviewed Nov 9, 2021

View reviewed changes

Stebalien added 2 commits November 9, 2021 11:52

return a result from Code::wrap

184ed59

The multihash digest may not fit.

Add support for hashing from a reader

f002173

fixes #141

Stebalien force-pushed the feat/digest-reader branch from 3954ba5 to f002173 Compare November 9, 2021 11:53

vmx force-pushed the feat/no-digest branch from 184ed59 to 897509d Compare January 27, 2022 11:28

vmx added 2 commits January 27, 2022 18:05

fix: don't introduce io_path

fe59061

Use config flags instead of introducing the new attribute `io_path`.

fix: use compile-time asserts to guards unwraps

ec8cc76

With some compile-time asserts we can make sure that certain `unwrap()` calls won't panic.

vmx requested a review from mriise January 27, 2022 17:16

Stebalien commented Jan 27, 2022

View reviewed changes

vmx added 3 commits January 27, 2022 22:22

chore: remove debugging left-over

e1a6943

chore: rust fmt

095f19c

fix: use std or core2 depending on std feature

868dab7

Leena8686 approved these changes Jun 8, 2023

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add support for hashing from a reader #159

Add support for hashing from a reader #159

Stebalien commented Nov 8, 2021

vmx Nov 9, 2021

vmx Nov 9, 2021

vmx Nov 9, 2021

vmx commented Nov 9, 2021

Stebalien commented Nov 9, 2021

Stebalien commented Nov 9, 2021

vmx commented Nov 9, 2021

Stebalien commented Nov 9, 2021

Stebalien commented Nov 9, 2021

vmx commented Jan 26, 2022

Stebalien commented Jan 26, 2022

vmx commented Jan 26, 2022

mriise commented Jan 27, 2022 •

edited

Loading

vmx commented Jan 27, 2022

vmx commented Jan 27, 2022

Stebalien Jan 27, 2022

vmx Jan 27, 2022

vmx commented Jan 27, 2022

Stebalien commented Jan 28, 2022

vmx commented Jan 28, 2022

vmx commented Jan 28, 2022

Stebalien commented Jan 28, 2022

Leena8686 left a comment

Add support for hashing from a reader #159

Are you sure you want to change the base?

Add support for hashing from a reader #159

Conversation

Stebalien commented Nov 8, 2021

vmx Nov 9, 2021

Choose a reason for hiding this comment

vmx Nov 9, 2021

Choose a reason for hiding this comment

vmx Nov 9, 2021

Choose a reason for hiding this comment

vmx commented Nov 9, 2021

Stebalien commented Nov 9, 2021

Stebalien commented Nov 9, 2021

vmx commented Nov 9, 2021

Stebalien commented Nov 9, 2021

Stebalien commented Nov 9, 2021

vmx commented Jan 26, 2022

Stebalien commented Jan 26, 2022

vmx commented Jan 26, 2022

mriise commented Jan 27, 2022 • edited Loading

vmx commented Jan 27, 2022

vmx commented Jan 27, 2022

Stebalien Jan 27, 2022

Choose a reason for hiding this comment

vmx Jan 27, 2022

Choose a reason for hiding this comment

vmx commented Jan 27, 2022

Stebalien commented Jan 28, 2022

vmx commented Jan 28, 2022

vmx commented Jan 28, 2022

Stebalien commented Jan 28, 2022

Leena8686 left a comment

Choose a reason for hiding this comment

mriise commented Jan 27, 2022 •

edited

Loading