Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Orchestrator thread keepalive #524

Open
wants to merge 11 commits into
base: main
Choose a base branch
from
Open

Orchestrator thread keepalive #524

wants to merge 11 commits into from

Conversation

cbrit
Copy link
Member

@cbrit cbrit commented Jun 8, 2023

No description provided.

@cbrit cbrit requested review from mvid and EricBolten June 8, 2023 17:57
@cbrit cbrit temporarily deployed to CI June 8, 2023 18:20 — with GitHub Actions Inactive
@cbrit cbrit temporarily deployed to CI June 8, 2023 18:20 — with GitHub Actions Inactive
@cbrit cbrit temporarily deployed to CI June 8, 2023 18:20 — with GitHub Actions Inactive
Copy link
Contributor

@EricBolten EricBolten left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some comments and questions.

gas_adjustment: f64,
msg_batch_size: usize,
) {
while let Some(messages) = rx.recv().await {
while let Some(messages) = rx.lock().await.recv().await {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why are we creating a lock around the tokio receiver? The async channels should be able to be filled and drained without having to have synchronous locking.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It comes down to satisfying the borrow checker while passing in the receiver to run_sender inside of the loop inside of send_main_loop.

To get the compiler to let me pass the receiver into each loop iteration (each cosmos sender thread restart) I have to wrap the receiver in an atomic reference counter so that I can clone without making a duplicate of the receiver. However, Arc<T> doesn't have the DerefMut trait, and the recv() method requires that the value be a mutable reference. To get a mutable reference of the contained object in the Arc, I have to guarantee that one and only one mutable reference exists at a time, which is where the lock comes in.

@@ -105,6 +110,8 @@ pub async fn orchestrator_main_loop(
} else {
futures::future::join4(a, b, c, d).await;
}

Ok(())
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Given that this is now returning a Result, should any of the other calls within this function use the ? operator as well?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Except for metrics, which still has a naked unwrap on the metrics server result, the design of each function is such that it should only ever stop if it panics, and I believe I've addressed the library-level panic scenarios.

@@ -408,14 +456,3 @@ pub async fn eth_signer_main_loop(
);
}
}

pub async fn check_for_eth(orchestrator_address: EthAddress, eth_client: EthClient) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There was an existing issue around the orchestrator not functioning correctly if a tiny amount of eth had not been deposited into the delegate key wallet. Did that stem from this error, or is it a problem regardless and that's why this check existed?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I removed this because it's a duplicate. There's another copy, the one that's actually used, in gravity_utils/connection_prep.rs

The only place not in gorc that it's used is when running the relayer


let a = send_main_loop(
&contact,
contact.clone(),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is the benefit of cloning contact rather than using the same instance as it was doing before? Is there a concern around multiple threads using it and account sequencing issues? I do wonder if cloning the object isn't just going to create multiple versions of e.g. previous account sequence state, rather than recreating a new Contact when it's needed.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The contact object doesn't maintain account state, but actually queries the required tx args every time it builds a transaction. See send_messages, specifically the call to .get_message_args(...) is where this happens.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, in regards to the real purpose of your question: it's another borrow checker/lifetime thing.

@@ -1,41 +0,0 @@
[package]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe there's also a go.mod for the test runner that we can drop as well.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The testnet directory in the root of the repo? Should I remove the whole thing?

@@ -304,7 +355,7 @@ pub async fn eth_signer_main_loop(
}

// sign the last unsigned valsets
match get_oldest_unsigned_valsets(&mut grpc_client, our_cosmos_address).await {
match get_oldest_unsigned_valsets(&mut grpc_client, cosmos_address.clone()).await {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since the cosmos address should probably be immutable in most places we're using it, do we need to be cloning this object or can we adjust to using a reference?

Copy link
Member Author

@cbrit cbrit Aug 1, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Heh, turns out Address implements copy, so neither .clone() nor an explicit borrow are needed

@cbrit cbrit temporarily deployed to CI August 1, 2023 21:16 — with GitHub Actions Inactive
@cbrit cbrit temporarily deployed to CI August 1, 2023 21:16 — with GitHub Actions Inactive
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants