Optimize cluster election when nodes initiate elections at the same epoch #1009

enjoy-binbin · 2024-09-10T08:15:35Z

If multiple primary nodes go down at the same time, their replica nodes will
initiate the elections at the same time. There is a certain probability that
the replicas will initate the elections in the same epoch.

And obviously, in our current election mechanism, only one replica node can
eventually get the enough votes, and the other replica node will fail to win
due the the insufficient majority, and then its election will time out and
we will wait for the retry, which result in a long failure time.

If another node has been won the election in the failover epoch, we can assume
that my election has failed and we can retry as soom as possible.

…poch If multiple primary nodes go down at the same time, their replica nodes will initiate the elections at the same time. There is a certain probability that the replicas will initate the elections in the same epoch. And obviously, in our current election mechanism, only one replica node can eventually get the enough votes, and the other replica node will fail to win due the the insufficient majority, and then its election will time out and we will wait for the retry, which result in a long failure time. If another node has been won the election in the failover epoch, we can assume that my election has failed and we can retry as soom as possible. Signed-off-by: Binbin <binloveplay1314@qq.com>

enjoy-binbin · 2024-09-10T08:16:53Z

src/cluster_legacy.c

@@ -3113,6 +3113,17 @@ int clusterProcessPacket(clusterLink *link) {
        if (sender_claims_to_be_primary && sender_claimed_config_epoch > sender->configEpoch) {
            sender->configEpoch = sender_claimed_config_epoch;
            clusterDoBeforeSleep(CLUSTER_TODO_SAVE_CONFIG | CLUSTER_TODO_FSYNC_CONFIG);
+
+            if (server.cluster->failover_auth_time && sender->configEpoch == server.cluster->failover_auth_epoch) {


@PingXie Not sure if i am doing it (the time or the conditions) right here.

Yeah I think this fix makes sense. I wonder if we should check "greater than or equal to" instead? There is no way "myself" can win an election in the past.

I am worried that when multiple shards failover, they each elect in their own epoch. In this case, the sender's epoch may be > my own failover epoch, but it is correct (different epoch). something like #1018

There is no way "myself" can win an election in the past.

this make sense to me, ok, i am changing it and then i will re-test it.

codecov · 2024-09-10T08:30:08Z

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 70.60%. Comparing base (09def3c) to head (0d13f5d).

Additional details and impacted files

@@             Coverage Diff              @@
##           unstable    #1009      +/-   ##
============================================
- Coverage     70.61%   70.60%   -0.01%     
============================================
  Files           114      114              
  Lines         61664    61668       +4     
============================================
- Hits          43541    43540       -1     
- Misses        18123    18128       +5

Files with missing lines	Coverage Δ
src/cluster_legacy.c	`86.05% <100.00%> (-0.21%)`	⬇️

... and 12 files with indirect coverage changes

Signed-off-by: Binbin <binloveplay1314@qq.com>

PingXie

This is a great bug, @enjoy-binbin! The fix LGTM overall.

src/cluster_legacy.c

PingXie · 2024-09-13T06:38:10Z

src/cluster_legacy.c

@@ -3113,6 +3113,17 @@ int clusterProcessPacket(clusterLink *link) {
        if (sender_claims_to_be_primary && sender_claimed_config_epoch > sender->configEpoch) {
            sender->configEpoch = sender_claimed_config_epoch;
            clusterDoBeforeSleep(CLUSTER_TODO_SAVE_CONFIG | CLUSTER_TODO_FSYNC_CONFIG);
+
+            if (server.cluster->failover_auth_time && sender->configEpoch == server.cluster->failover_auth_epoch) {


Yeah I think this fix makes sense. I wonder if we should check "greater than or equal to" instead? There is no way "myself" can win an election in the past.

src/cluster_legacy.c

Co-authored-by: Ping Xie <pingxie@outlook.com> Signed-off-by: Binbin <binloveplay1314@qq.com>

Signed-off-by: Binbin <binloveplay1314@qq.com>

enjoy-binbin requested a review from PingXie September 10, 2024 08:15

enjoy-binbin commented Sep 10, 2024

View reviewed changes

enjoy-binbin added the run-extra-tests Run extra tests on this PR (Runs all tests from daily except valgrind and RESP) label Sep 10, 2024

fix build and fix format

cea9267

Signed-off-by: Binbin <binloveplay1314@qq.com>

PingXie reviewed Sep 13, 2024

View reviewed changes

enjoy-binbin and others added 3 commits September 13, 2024 15:08

Apply suggestions from code review

41034bd

Co-authored-by: Ping Xie <pingxie@outlook.com> Signed-off-by: Binbin <binloveplay1314@qq.com>

fix format

0270e71

Signed-off-by: Binbin <binloveplay1314@qq.com>

Merge remote-tracking branch 'upstream/unstable' into epoch_timeout

0d13f5d

Signed-off-by: Binbin <binloveplay1314@qq.com>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Optimize cluster election when nodes initiate elections at the same epoch #1009

Optimize cluster election when nodes initiate elections at the same epoch #1009

enjoy-binbin commented Sep 10, 2024

enjoy-binbin Sep 10, 2024

PingXie Sep 13, 2024

enjoy-binbin Sep 13, 2024

enjoy-binbin Sep 13, 2024

codecov bot commented Sep 10, 2024 •

edited

Loading

PingXie left a comment

PingXie Sep 13, 2024

Optimize cluster election when nodes initiate elections at the same epoch #1009

Are you sure you want to change the base?

Optimize cluster election when nodes initiate elections at the same epoch #1009

Conversation

enjoy-binbin commented Sep 10, 2024

enjoy-binbin Sep 10, 2024

Choose a reason for hiding this comment

PingXie Sep 13, 2024

Choose a reason for hiding this comment

enjoy-binbin Sep 13, 2024

Choose a reason for hiding this comment

enjoy-binbin Sep 13, 2024

Choose a reason for hiding this comment

codecov bot commented Sep 10, 2024 • edited Loading

Codecov Report

PingXie left a comment

Choose a reason for hiding this comment

PingXie Sep 13, 2024

Choose a reason for hiding this comment

codecov bot commented Sep 10, 2024 •

edited

Loading