Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add a way for a valid implementation of sv2 to handle unknown extensions #95

Open
Fi3 opened this issue Aug 28, 2024 · 23 comments
Open
Labels
help wanted Extra attention is needed question Further information is requested

Comments

@Fi3
Copy link
Contributor

Fi3 commented Aug 28, 2024

Here https://github.com/stratum-mining/sv2-spec/blob/cc291562b729a6be28673940ffede3cd8c64f996/03-Protocol-Overview.md#34-protocol-extensions
the specs says:

Extensions MUST require version negotiation with the recipient of the message to check that the extension is supported before sending non-version-negotiation messages for it. This prevents the needlessly wasted bandwidth and potentially serious performance degradation of extension messages when the recipient does not support them.

But nowhere in the spec we say what how to achieve that.

@jakubtrnka proposed to add an universal extension error message, for example when a valid sv2 implementation receive a message for an unknown extension can answer with:

Frame { extension_type, msg_type: 0xff, msg_length: 0, payload: [] }

I think that this is good solution and that we should add it to the spec.

@Fi3 Fi3 added help wanted Extra attention is needed question Further information is requested labels Aug 28, 2024
@jakubtrnka
Copy link
Collaborator

That's what just crossed my mind at that point. It's one way to do it.

But let's go over what the problem was again, shall we?

I guess the problem is "I don't know if the counterpary does support my extension number X"

So what possibilities do we have?

  1. specs says it must do some kind of version negotiation. That implicitly says we should assume the peer doesn't support my extension X unless It receives a positive acknowledgement. So do we even need kind of negative acknowledgement?
  2. Do we even need this clause in the spec? Extension MUST require version negotiation.... Why can't the extensions be fully generic and left up to the implementers? Why do we have to say so vaguely "you must implement some mechanism" but not say how exactly? Doesn't make any sense to me. If anything, then I'd write is as a recommendation, not a requirement.
  3. Perhaps there might be some explicit mechanism specified in the base specification. And here I can think of several mechanisms:
    1. Every extension would mandatorily implement 2 messages: LoadExtension{id == 0}, LoadExtensionSuccess { id == 0xff } (or any numbers) - assume a negative until Success arrives. The spec would describe a mandatory behavior: Party that receives message from an unknown extension would have to ignore the message. It must not respond in any way or raise any error and affect the operation of other extensions. If it does support given extension, it must send a Success response message (msg_id == 0xff). Perhaps the LoadExtension msg would have some kind of extension signature in the payload - to ensure there is not a collision in the namespaces (naively a version number, or fully generic string, or empty string for optimistic engineers)
    2. Same as (1) but also add an Error message somewhere - introduces a 3rd state of "pending". So an extension can be either "supported", "unsupported", or with unknown support because it awaits a response. Seems unnecessary to me.
    3. If the extensions are "official" and listed in the specification, then there might be a general unified mechanism. For example "Tell me what all extensions you do support, in which versions". This would require some centrally managed list of extensions. I don't think it would work well in long term.

@jakubtrnka
Copy link
Collaborator

there might be other approaches.

@jakubtrnka
Copy link
Collaborator

Perhaps another policy. If the counterparty responds with anything else than LoadExtensionSuccess or LoadExtension with the same (or compatible) signature, then it would automatically mean negative acknowledgement - because clearly the counterparty did not follow the protocol and does something unexpected/unknown.
Then the party might terminate the session

@Fi3
Copy link
Contributor Author

Fi3 commented Aug 28, 2024

That's what just crossed my mind at that point. It's one way to do it.

But let's go over what the problem was again, shall we?

I guess the problem is "I don't know if the counterpary does support my extension number X"

So what possibilities do we have?

1. specs says it must do some kind of version negotiation. That implicitly says we should assume the peer **doesn't** support my extension X unless It receives a positive acknowledgement. So do we even need kind of negative acknowledgement?

2. Do we even need this clause in the spec? `Extension MUST require version negotiation...`. Why can't the extensions be fully generic and left up to the implementers? Why do we have to say so vaguely "you must implement some mechanism" but not say how exactly? Doesn't make any sense to me. If anything, then I'd write is as a recommendation, not a requirement.

3. Perhaps there might be some explicit mechanism specified in the base specification. And here I can think of several mechanisms:
   
   1. Every extension would mandatorily implement 2 messages: `LoadExtension{id == 0}`, `LoadExtensionSuccess { id == 0xff }` (or any numbers) - assume a negative until Success arrives. The spec would describe a mandatory behavior: Party that receives message from an unknown extension would have to ignore the message. It must not respond in any way or raise any error and affect the operation of other extensions. If it **does** support given extension, it must send a Success response message (msg_id == 0xff). Perhaps the LoadExtension msg would have some kind of extension signature in the payload - to ensure there is not a collision in the namespaces (naively a version number, or fully generic string, or empty string for optimistic engineers)
   2. Same as (1) but also add an Error message somewhere - introduces a 3rd state of "pending". So an extension can be either "supported", "unsupported", or with unknown support because it awaits a response. Seems unnecessary to me.
   3. If the extensions are "official" and listed in the specification, then there might be a general unified mechanism. For example "Tell me what all extensions you do support, in which versions". This would require some centrally managed list of extensions. I don't think it would work well in long term.
  1. nack is preferable imo so that we dont' have to introduce arbitrary timeouts to know if ext is supported or not
  2. if we want to have a nack for unknown ext I think that the only way is to define it at sv2 spec level
  3. I prefer the second for same reason listed above, about (III) I don't like it, I think that eventually most used ext should go in spec and in SRI as well, but if every ext must be spec and use the general unified mechanism creating ext will be harder.

@Fi3
Copy link
Contributor Author

Fi3 commented Aug 28, 2024

I really like the solution of having an universal extension error message, is very very simple, and also powerful:

  1. super easy to know if peer support ext
  2. if ext is supported you can use if needed an extension specific mechanism to version negotiation and extension setup

@jakubtrnka
Copy link
Collaborator

  1. o so that we dont' have to introduce arbitrary timeouts to know if ext is supp

I'm not sure what you mean. Why would you introduce any timeout?

The rule is simple - if you don't receive positive ACK, it means you're not allowed to proceed with the extension or otherwise your messages would be ignored.

It would also be implementation-wise simpler. If the extension handler is missing entirely and messages belonging to it are discarded, then it's already a correct behavior.
And if somebody doesn't follow the rules and start sending random messages back then the counterparty knows it NACKed the extension and must ignore their messages or terminate the tcp-session.

On the other hand if you introduce the Error. What if the counterparty doesn't respond with the error? What does it say? It's some undefined state, some "pending" and would probably be treated the same way as if Error had been received or at least very similarly.

@jakubtrnka
Copy link
Collaborator

jakubtrnka commented Aug 28, 2024

I think you're looking for some synchronous mechanism to get clear yes/no response from the counterparty, I think it's just an illusion.

It will always be an asynchronous system, since you're operating on a single shared persistent connection.

You can theoretically send the introductory message. And wait two hours. Then you receive the ACK, stating "ok - proceed with whatever you wanted to do with this ext". Or you receive the NACK, stating "nope". And in the meantime what? Some kind of maybe? What to do with that information anyway?

Or should we mandate an immediate response? What is even an "immediate"? Is that below 1 second?

just thinking out loud

@Fi3
Copy link
Contributor Author

Fi3 commented Aug 28, 2024

yep at the end worst cases are the same, but a nack is arguably faster most of the time resepect than not saying anything. If you receive a nack 99% of the time you receive it right away and you don't have to wait where if you don't receive anything 100% of the times you will have to wait a time t before knowing that ext is not supported. (I'm talking of the case where ext is not supported, if the ext is supported you just receive the ext specific ack)

@jakubtrnka
Copy link
Collaborator

I'll have to think about it.

But could you please provide some example where you need an immediate and explicit No answer?

Note you cannot rely on the fact that after sending the introductory message the next incoming would be the response. You'll always have to defer handling the response for later. And in the meantime the extension is not active.
So the question boils down to "do we need to distinguish the "pending" and "no" state somehow? And if yes would you give me an example?

From my experience with designing protocols, I always want to reduce the number of possible states to the minimum. Because the implementations do happen to be buggy. Countless of times I ended up in some kind of strange state that was supposed to last only few seconds, waiting for acknowledgement that would never arrive.

@jakubtrnka
Copy link
Collaborator

I don't feel comfortable designing stuff based on assumption that "it should work in 99 % times".

@Fi3
Copy link
Contributor Author

Fi3 commented Aug 28, 2024

My point was that 99% work faster 1% work at the same speed of not having the nack message.

With NACK

send activate_extension
wait activate_extension_ok timer:
    if timer.finished -> extension not active
    if activate_extension_ok -> extension active
    if universal nack -> extension not active

If remote do not support extension:
99% of the time remote is not busy or whatever and send nack
1% of the time remote take longer or not answer and after timer you can either resend message, do nothing, close connection

Without NACK

send activate_extension
wait activate_extension_ok timer:
    if timer.finished -> extension not active
    if activate_extension_ok -> extension active

If remote do not support extension:
100% of the time you wait timer and then you say that remote do not support ext, but could be also that is busy or whatever

@jakubtrnka
Copy link
Collaborator

Well, this is not exactly what I meant.

I was wondering if there is some use case in which you would need this logic? Putting a timer and synchronously test for support.

typical pattern I use is like

struct State {
  ext1: false,
  ext2: false,
...
}

let ext1_handler = Ext1::handler();
let ext2_handler = Ext2::handler();

connection.send(activate_ext1_msg);
connection.send(activate_ext2_msg);

loop {
  incoming = connection.receive();
  match (incoming.extension, incoming.type) {
    (0, msg) => ordinary_mining.handle(msg), // ordinary mining msg
    (Ext1, ACK) => { state.ext1 = true; ext1_handler.spawn_some_associated_task(); } // mark as active and start some associated task needed for extension1.
    (Ext2, ACK) => { state.ext2 = true; ext2_handler.spawn_some_associated_task(); } // the same but for extension2
    (Ext1, msg) if state.ext1 => ext1_handler.handle(msg), // handle extension1 message only if extension is active.
    (Ext2, msg) if state.ext2 => ext2_handler.handle(msg),
    unexpected  => { warn!("Unexpected message {unexpected}"); } // message not captured by any handler
  }
}

I se no need for NACK message

@Fi3
Copy link
Contributor Author

Fi3 commented Aug 28, 2024

I see your point. For example I could have an extension that modify how the Mining protocol work that you could model with something like that:

Mining Protocol downstream <-> proxy support ext and can work with it or without<-> upstream can either support the ext or not

If the proxy must know before opening the downstream connection if upstream support ext or not you could need a timer. Just thinking out loud, I know that the above example is not very convincing.

@Fi3
Copy link
Contributor Author

Fi3 commented Aug 28, 2024

looking at your example:

struct State {
  ext1: false,
  ext2: false,
...
}

let ext1_handler = Ext1::handler();
let ext2_handler = Ext2::handler();

connection.send(activate_ext1_msg);
connection.send(activate_ext2_msg);

loop {
  incoming = connection.receive();
  match (incoming.extension, incoming.type) {
    (0, msg) => ordinary_mining.handle(msg), // ordinary mining msg
    (Ext1, ACK) => { state.ext1 = true; ext1_handler.spawn_some_associated_task(); } // mark as active and start some associated task needed for extension1.
    (Ext2, ACK) => { state.ext2 = true; ext2_handler.spawn_some_associated_task(); } // the same but for extension2
    (Ext1, msg) if state.ext1 => ext1_handler.handle(msg), // handle extension1 message only if extension is active.
    (Ext2, msg) if state.ext2 => ext2_handler.handle(msg),
    unexpected  => { warn!("Unexpected message {unexpected}"); } // message not captured by any handler
  }
}

Seems that the univarsal nack could create some issue btw. You would need to activate ext1 and ext2 synchronously otherwise you don't know which extension the upstream is nacking, this look like a big disadvantage for the universal nack

@jakubtrnka
Copy link
Collaborator

jakubtrnka commented Aug 29, 2024

That wouldn't be a problem. The suggested universal NACK is meant per extension. Having one single shared universal NACK would be quite silly.

Handling the NACKs would then look like this.

loop {
  incoming = connection.receive();
  match (incoming.extension, incoming.type) {
    (0, msg) => ordinary_mining.handle(msg), // ordinary mining msg
    (Ext1, ACK) => { state.ext1 = true; ext1_handler.spawn_some_associated_task(); }
    (Ext2, ACK) => { state.ext2 = true; ext2_handler.spawn_some_associated_task(); }
    (Ext1, NACK) => { /* some reaction to the NACK from ext1 */ } 
    (Ext2, NACK) => { /* some reaction to the NACK from ext2 */ } 
    (Ext1, msg) if state.ext1 => ext1_handler.handle(msg), // handle extension1 message only if extension is active.
    (Ext2, msg) if state.ext2 => ext2_handler.handle(msg),
    unexpected  => { warn!("Unexpected message {unexpected}"); } // message not captured by any handler
  }
}

But I still don't know what what exact logic should be done in the handler.
How to handle for example NACK after an ACK?

@Fi3
Copy link
Contributor Author

Fi3 commented Aug 29, 2024

A nack after an ack should be invalid you just close the connection, I don't see any possible case where you will have a nack after an ack for the same ext

@jakubtrnka
Copy link
Collaborator

jakubtrnka commented Aug 29, 2024

I see your point. For example I could have an extension that modify how the Mining protocol work that you could model with something like that:

Mining Protocol downstream <-> proxy support ext and can work with it or without<-> upstream can either support the ext or not

If the proxy must know before opening the downstream connection if upstream support ext or not you could need a timer. Just thinking out loud, I know that the above example is not very convincing.

I'd be careful about some wild extension architectures where one extension influences behavior of a different extension. And where some ordering is required.

I can imagine using an extension for some additional control, to pass additional information to make the mining smoother. All you need is simple yes response. Until you have it, don't count on that.

Like passing telemetry, detailed miner monitoring. Perhaps expected hashrate changes (in case of proxy, for example).

The only use case for the NACK message is if you need to draw some immediate conclusion from not supporting an extension.
In my opinion this opens up an easy way to build to all kinds of bad patterns.

An example of bad pattern IMO would be: For some reason, you don't like NewMiningJob message. You implemented an extension with MySuperNiceJob message and want to communicate early on the connection that you want your special message instead of NewMiningJob.

There is nothing wrong with making an extension with MySuperNiceJob. But you still have to be able to work with ordinary NewMiningJob. You establish session normally and then switch to MySuperNiceJob if both parties support it. So there is no need to wait until we know if the counterparty supports it.

On the other hand, if you absolutely can't use NewMiningJob. Then you may implement the entire mining protocol from scratch within your extension to fit your needs and not use the extension 0 at all.

that's the way I look at it.

@Fi3
Copy link
Contributor Author

Fi3 commented Aug 29, 2024

Ok so you propose to just remove the version negotiation part for extension from the spec and leave everything as it is?

@jakubtrnka
Copy link
Collaborator

If I had to make some decision now (do we have to?), then I'd write following into the specification as a recommendation: Reserve message_id == 0 in each extension for the negotiation message (let's call it NegMsg). and message_id == 255 as an ACK.

Payload in NegMsg must be sufficiently unique to the extension being used.
Any response for NegMsg other than ACK counts as NACK.
Sessions where no ACK has been received or where NACK has been received should ignore all extension messages.

But it's not so easy to say just like that.

Imagine I implement extension 5. You also implement extension 5 (due to bad coincidence or miscommunication). I think our apps should gracefully stop exchanging random messages that they do not understand right at the beginning. My app sends to yours NegMsg with some payload. Your App reads the payload and decides it's some garbage, doesn't understand that. So it won't respond. My app won't issue any further message. Moreover it won't affect mining. Won't raise any errors.

@rrybarczyk
Copy link
Contributor

rrybarczyk commented Sep 5, 2024

I understand where @jakubtrnka is coming from suggesting that introducing an explicit NACK may cause latency, however I think the benefits of an explicit NACK outweigh the potential latency. This is because, the setting of a protocol extension does not occur often.

I think a good option is an explicit NACK with a graceful fallback as it provides clear feedback that ensures that the sender knows if an extension is supported or not and helps remove the gray area of "Am I waiting on an ACK or does the other device not support the extension and it is just never going to respond?" Either way, we need some sort of timeout due to potential message delays from the connection potentially prioritizing other messages, but this way we get a clear ACK or NACK.

Here is my understanding of the explicit NACK scenarios with a graceful fallback. I am using the arbitrary ActivateExtension message name:

          ┌─────────────────────────────────────┐
          │ Scenario 1: Extension Supported     │
          └─────────────────────────────────────┘

  ┌───────────────────────────┐             ┌───────────────────────────┐
  │         Client            │             │         Server            │
  └───────────────────────────┘             └───────────────────────────┘
               │                                    │
               │                                    │
      Send 'ActivateExtension' ────────────────────>│
               │                                    │
   [Wait for ACK/NACK]                              │
   (Timeout starts)                                 │
               │                                    │
               │<─────────────── Respond with 'ACK' (Supported)
               │                                    │
   Stop Timeout                                     │
   Proceed with extension                           │
   functionality                                    │


          ┌──────────────────────────────────────────┐
          │    Scenario 2: Extension Not Supported   │
          └──────────────────────────────────────────┘

  ┌───────────────────────────┐             ┌───────────────────────────┐
  │         Client            │             │         Server            │
  └───────────────────────────┘             └───────────────────────────┘
               │                                    │
    Send 'ActivateExtension' ──────────────────────>│
               │                                    │
   [Wait for ACK/NACK]                              │
   (Timeout starts)                                 │
               │                                    │
               │<─────────────── Respond with 'NACK' (Not Supported)
               │                                    │
   Stop Timeout                                     │
   Gracefully fallback to                           │
   default protocol behavior                        │


          ┌──────────────────────────────────────────┐
          │     Scenario 3: No Response (Timeout)    │
          └──────────────────────────────────────────┘

  ┌───────────────────────────┐             ┌───────────────────────────┐
  │         Client            │             │         Server            │
  └───────────────────────────┘             └───────────────────────────┘
               │                                    │
    Send 'ActivateExtension' ──────────────────────>│
               │                                    │
   [Wait for ACK/NACK]                              │
   (Timeout starts)                                 │
               │                                    │
               │  (No response)                     │
               │                                    │
   Timeout expires                                  │
   Gracefully fallback to                           │
   default protocol behavior                        │

The combination of explicit NACK with a graceful fallback would allow mining operations to continue even if optional extensions aren’t supported. Miners and pools could rely on "more immediate" feedback (I say "more immediate" as there can be potential message queuing delays in the channel for either option) from explicit NACKs to know which extensions they can use, while gracefully falling back to core mining functionality (no protocol extension, or even perhaps the previous extension that was being used if there was one, see my comment below), ensuring that mining work isn’t disrupted by unsupported features.

@rrybarczyk
Copy link
Contributor

Additionally, here is the NACK message proposed by @Fi3:

Frame { extension_type, msg_type: 0xff, msg_length: 0, payload: [] }

I think we could improve it to be something like:

Frame { extension_type, msg_type: 0xff, msg_length: X, payload: [error_code, reason_string] }

This is a more flexible NACK in that it can contain additional details about why the extension is not supported, such as version mismatch. This gives the sender the opportunity to adjust its behavior, perhaps by retrying with a different version. I also think its flexibility may be better for scalability as the protocol grows and more extensions are added . However, this does introduce more complexity compared to @Fi3's original suggestion since each extension might need to define its own specific NACK message structure, requiring more implementation effort.

@rrybarczyk
Copy link
Contributor

rrybarczyk commented Sep 5, 2024

We should also think about handling protocol extension changes. What happens if:

  1. We are using a specific extension and the upstream wants to completely override the current extension in favor of a new one (either a different version or totally different extension all together)?
  2. We are using a specific extension and the upstream wants to add in another extension?
  3. We are using an extension and the upstream wants to stop using all extensions all together?

@Fi3
Copy link
Contributor Author

Fi3 commented Sep 6, 2024

We should also think about handling protocol extension changes. What happens if:

1. We are using a specific extension and the upstream wants to completely override the current extension in favor of a new one (either a different version or totally different extension all together)?

2. We are using a specific extension and the upstream wants to add in another extension?

3. We are using an extension and the upstream wants to stop using all extensions all together?

Any role can initialize an extensions not only upstream (which one will do that is extension specific). I don not think that we need an universal mechanism to do that. Every extension can implement a specific way to "close the extension". So I would say that:

  1. Upstream send downstream the specific extension message to stop ext. When ready upstream initialize new extension.
  2. It just initialize another ext, using one ext do not prevent to use others.
  3. It just send messages for stopping every extension.

Is better to have a specific way to stop extension since:

  1. every extension that can be stopped maybe want to do that in a different way
  2. an extension could not carry any state so do not need to be initialized or stopped. Like the one proposed by @jakubtrnka echo ext where you get your message back.

Whatever we decide we should include it in the spec, as suggestion for who write a new extension.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
help wanted Extra attention is needed question Further information is requested
Projects
None yet
Development

No branches or pull requests

3 participants