-
Notifications
You must be signed in to change notification settings - Fork 595
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Optimize cluster election when nodes initiate elections at the same epoch #1009
base: unstable
Are you sure you want to change the base?
Conversation
…poch If multiple primary nodes go down at the same time, their replica nodes will initiate the elections at the same time. There is a certain probability that the replicas will initate the elections in the same epoch. And obviously, in our current election mechanism, only one replica node can eventually get the enough votes, and the other replica node will fail to win due the the insufficient majority, and then its election will time out and we will wait for the retry, which result in a long failure time. If another node has been won the election in the failover epoch, we can assume that my election has failed and we can retry as soom as possible. Signed-off-by: Binbin <binloveplay1314@qq.com>
src/cluster_legacy.c
Outdated
@@ -3113,6 +3113,17 @@ int clusterProcessPacket(clusterLink *link) { | |||
if (sender_claims_to_be_primary && sender_claimed_config_epoch > sender->configEpoch) { | |||
sender->configEpoch = sender_claimed_config_epoch; | |||
clusterDoBeforeSleep(CLUSTER_TODO_SAVE_CONFIG | CLUSTER_TODO_FSYNC_CONFIG); | |||
|
|||
if (server.cluster->failover_auth_time && sender->configEpoch == server.cluster->failover_auth_epoch) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@PingXie Not sure if i am doing it (the time or the conditions) right here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah I think this fix makes sense. I wonder if we should check "greater than or equal to" instead? There is no way "myself" can win an election in the past.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am worried that when multiple shards failover, they each elect in their own epoch. In this case, the sender's epoch may be > my own failover epoch, but it is correct (different epoch). something like #1018
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is no way "myself" can win an election in the past.
this make sense to me, ok, i am changing it and then i will re-test it.
Codecov ReportAll modified and coverable lines are covered by tests ✅
Additional details and impacted files@@ Coverage Diff @@
## unstable #1009 +/- ##
============================================
- Coverage 70.61% 70.60% -0.01%
============================================
Files 114 114
Lines 61664 61668 +4
============================================
- Hits 43541 43540 -1
- Misses 18123 18128 +5
|
Signed-off-by: Binbin <binloveplay1314@qq.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a great bug, @enjoy-binbin! The fix LGTM overall.
src/cluster_legacy.c
Outdated
@@ -3113,6 +3113,17 @@ int clusterProcessPacket(clusterLink *link) { | |||
if (sender_claims_to_be_primary && sender_claimed_config_epoch > sender->configEpoch) { | |||
sender->configEpoch = sender_claimed_config_epoch; | |||
clusterDoBeforeSleep(CLUSTER_TODO_SAVE_CONFIG | CLUSTER_TODO_FSYNC_CONFIG); | |||
|
|||
if (server.cluster->failover_auth_time && sender->configEpoch == server.cluster->failover_auth_epoch) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah I think this fix makes sense. I wonder if we should check "greater than or equal to" instead? There is no way "myself" can win an election in the past.
Co-authored-by: Ping Xie <pingxie@outlook.com> Signed-off-by: Binbin <binloveplay1314@qq.com>
Signed-off-by: Binbin <binloveplay1314@qq.com>
Signed-off-by: Binbin <binloveplay1314@qq.com>
If multiple primary nodes go down at the same time, their replica nodes will
initiate the elections at the same time. There is a certain probability that
the replicas will initate the elections in the same epoch.
And obviously, in our current election mechanism, only one replica node can
eventually get the enough votes, and the other replica node will fail to win
due the the insufficient majority, and then its election will time out and
we will wait for the retry, which result in a long failure time.
If another node has been won the election in the failover epoch, we can assume
that my election has failed and we can retry as soom as possible.