Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: greatly simplify rusb usage #129

Merged
merged 8 commits into from
Dec 2, 2023
Merged

Conversation

prestwich
Copy link
Member

No description provided.

@prestwich prestwich added the bug Something isn't working label Nov 29, 2023
@prestwich prestwich self-assigned this Nov 29, 2023
@prestwich prestwich linked an issue Nov 29, 2023 that may be closed by this pull request
Copy link
Contributor

@DaniPopes DaniPopes left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@prestwich
Copy link
Member Author

leaving this open so that @naps62 and @xJonathanLEI can check it before merge 👍

@naps62
Copy link

naps62 commented Nov 30, 2023

Allright. this seems to fix both issues for me, although it still has one less serious wrinkle.

nevermind. there's still a panic if I try to instantiate a ledger twice while the device is locked (1st call fails with Err(UnexpectedNullResponse). Second call panics at hid.rs:143

Other than that, there's still a less serious wrinkle. the code below, without the sleep, fails 80% of the time on the second call (Err, not panic). adding the sleep makes it work more consistently.
updated my reproduction repo to showcase this

#[tokio::main]
async fn main() {
    // first call, works
    let _ = dbg!(call_ledger().await);

    // tokio::time::sleep(tokio::time::Duration::from_millis(10)).await;

    // second call, if done without sleep first, fails with `hid_error is not implemented yet`
    let _ = dbg!(call_ledger().await);
}

pub(crate) async fn call_ledger() -> Result<Ledger, LedgerError> {
    let path: String = "m/44'/60'/0'/0/0".into();
    Ledger::new(HDPath::Other(path), 1).await
}

including the sleep makes this work consistently.
without the sleep, the 2nd call will fail ~80% of the time:

[src/main.rs:11] call_ledger().await = Ok(
    LedgerEthereum {
        transport: Mutex {
            is_locked: false,
            has_waiters: false,
        },
        derivation: Other(
            "m/44'/60'/0'/0/0",
        ),
        chain_id: 1,
        address: ...........,
    },
)
[src/main.rs:12] call_ledger().await = Err(
    LedgerError(
        NativeTransportError(
            CantOpen(
                HidApiError {
                    message: "hid_error is not implemented yet",
                },
            ),
        ),
    ),
)

@xJonathanLEI
Copy link
Contributor

Will check whether this works on Android tomorrow and revert.

@prestwich
Copy link
Member Author

nevermind. there's still a panic if I try to instantiate a ledger twice while the device is locked (1st call fails with Err(UnexpectedNullResponse). Second call panics at hid.rs:143

Are you sure the line number is correct? that line is currently first_ledger(api).map(Self::from_device) which should not be capable of panicking

Other than that, there's still a less serious wrinkle. the code below, without the sleep, fails 80% of the time on the second call (Err, not panic). adding the sleep makes it work more consistently.

This sounds like it might be a race condition between instnatiating the new task and releasing the resource in the old task? I'll play around with it

@naps62
Copy link

naps62 commented Nov 30, 2023

not sure what I was on about with the panic thing. seems gone now, can't reproduce even pointing to the same rev. I'll assume it was still the sleep deprivation

2nd issue (two successive calls without waiting) still exists, although I can work with it already, if you feel you need to merge this and research that one later

thanks very much for the support btw 🙏

@xJonathanLEI
Copy link
Contributor

Just tested 9f86de4 on Android and it works perfectly. So far no issue found.

@xJonathanLEI
Copy link
Contributor

Actually, no. I made a mistake and tested the wrong code. It actually no longer compiles on Android. I will look into it.

@xJonathanLEI
Copy link
Contributor

Just sent #130 to fix the compilation error. It's trivial.

@xJonathanLEI
Copy link
Contributor

xJonathanLEI commented Dec 1, 2023

However, even with the compilation fix, it still doesn't work properly on Android. I'm testing this crate through ethers-signers so it might be hard to immediately see the cause though.

Specifically, dumping Ethereum addresses from paths works fine. However, when requesting a transaction signature, I'm getting this error:

Response too short. Expected at least 2 bytes. Got []

Again I'm not even sure whether ethers-signers or coins-ledger throws this error. Will need to look deeper into things.

(Just to make sure, I also tested against the current main, which works perfectly. So the issue does come from this PR.)

@naps62
Copy link

naps62 commented Dec 1, 2023

Response too short. Expected at least 2 bytes. Got []

I also get these randomly from time to time (even before this PR). have you tried multiple times, to make sure it's specifically from this commit, and not a fluke run?

@xJonathanLEI
Copy link
Contributor

Yeah I tried multiple times. In fact, I've never encountered this error even once before this PR.

@prestwich
Copy link
Member Author

is there an easy way for me to replicate your test environment without buying an android device @xJonathanLEI? Would be nice to check things locally 😅

@naps62
Copy link

naps62 commented Dec 1, 2023

I have an android, and am willing to do some debugging to narrow it down
If you have a way to replicate I can give it a try

@xJonathanLEI
Copy link
Contributor

@prestwich Usually it's possible to check compilation etc via NDK, but since this is hardware stuff it might be hard to do without the actual hardware 😅.

@naps62 Are you able to reproduce the same bug? I can make a temp repo/gist for this if that's helpful. To make it easier I will use coins-ledger directly for the repro.

@naps62
Copy link

naps62 commented Dec 1, 2023

Are you able to reproduce the same bug? I can make a temp repo/gist for this if that's helpful. To make it easier I will use coins-ledger directly for the repro.

I saw this bug a few times, but seems mostly at random, and never frequently enough to debug.
if it's as you say and it happens consistently on android, I'd be happy to try your repo

I guess there's a lot of variance depending on hardware in all this (so far I tested only on Linux)

@prestwich
Copy link
Member Author

Response too short. Expected at least 2 bytes. Got []

So for clarity, this indicates that the read from the hid device was successful, but no data was available. It actually indicates that the device returned a succesful APDU answer with 0 bytes. If there was not a succesful answer, or if reading of that answer had failed, we would expect a NativeTransportError

So it is strange to me that we're getting Ok(vec![]) when we're expecting a non-zero amount of data

@xJonathanLEI
Copy link
Contributor

Working on the reproducer and noticed that like dumping addresses, signing messages also works perfectly.

@xJonathanLEI
Copy link
Contributor

xJonathanLEI commented Dec 2, 2023

@naps62 The reproducer is now available: xJonathanLEI/coins-android-bug-repro.

@prestwich Turns out the error occurs when calling transport.exchange(), returning a ResponseTooShort([]) error. So it's not the case that the device returns an empty successful message as we thought before.

(Edit: I re-read your comment. Yes it's indeed the case that the native transport is successfully returning an empty answer. The error comes from from_answer() where the response gets parsed.)

Again note that this issue only occurs when signing a transaction but not a message. I originally coded the reproducer to sign a "hellow world" message but it ended up succeeding both on main and on this PR lol.

@prestwich
Copy link
Member Author

Alright, so i can repro without android. Will be diving into this today 🎉

@prestwich
Copy link
Member Author

okay I tracked this down to an off-by-one in the write buffer that was sending the device incorrect data. I am still unsure why this was significant only for certain instructions, but it seems to be fixed in the latest commit. Would you mind verifying on android for me? :D

@xJonathanLEI
Copy link
Contributor

@prestwich I updated rev to the current HEAD (9891bd9) and it works - the transaction to zero address can now be correctly signed. Haven't tested signing personal messages or dumping addresses but I guess those should work fine too.

@xJonathanLEI
Copy link
Contributor

Tested dumping addresses and signing a "hello world" message - both work perfectly. All good on my side now. Thanks!

@prestwich prestwich merged commit 87027bd into main Dec 2, 2023
6 checks passed
@prestwich
Copy link
Member Author

thanks y'all! I've just published at 0.9.2 and yanked 0.9.1 and 0.9.0 🎉

@naps62
Copy link

naps62 commented Dec 2, 2023

FYI, still transient errors with two consecutive invocations. but something I can live with, and probably debug myself later 👍 thanks for the release

@prestwich
Copy link
Member Author

I'll address the race condition in 0.9.3 👍

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging this pull request may close these issues.

ledger: panic + poisoned mutex after new updated
4 participants