PBM-815: physical restore + logical PITR #844

dAdAbird · 2023-07-04T10:34:54Z

Add an oplog reply stage during the restore data post-processing. All these steps are done whilst the PSMDB cluster is down and isn’t accessible from the outside world.

We cannot start a full replica set at this stage without exposing it to external users. So oplog reply is done only on the “pimary” node in the single-node replicates state. The data will be propagated to the rest of the nodes during the cluster start. This leads to:
1. PITR for physical backup happens only to the primary node and later on is copied during cluster start (cluster initial sync)
2. PITR for sharded collections works only for writes, not for the creation of sharded collections, therefore whenever a sharded collection is created a full physical backup is needed. With the logical backup, it’s covered as a full replica set is restored rather than one node in case of physical.

--base-snashot always flag must be used with the physical base. For example pbm restore --time=2023-07-03T16:23:56 -w --base-snapshot=2023-07-03T16:18:09Z. Without --base-snapshot will always look for a logical backup even if there is no logical or a physical one more recent.

Distributed transactions

The old way on sync all dist transactions between the shards won't work due to no DB available during the physical restore. Hence it would be way too slow to do such a sync over the remote storage.
The new algorithm (changed in logical restores as well in a case of code consistency):

By looking at just transactions in the oplog we can't tell which shards
were participating in it. But we can assume that if there is
commitTransaction at least on one shard then the transaction is commited
everywhere. Otherwise, transactions won't be in the oplog or everywhere
would be transactionAbort. So we just treat distributed as
non-distributed - apply opps once a commit message for this txn is
encountered.
It might happen that by the end of the oplog there are some distributed txns
without commit messages. We should commit such transactions only if the data is
full (all prepared statements observed) and this txn was committed at least by
one other shard. For that, each shard saves the last 100 dist transactions
that were committed, so other shards can check if they should commit their
leftovers. We store the last 100, as prepared statements and commits might be
separated by other oplog events so it might happen that several commit messages
can be cut away on some shards but present on other(s). Given oplog events of
dist txns are more or less aligned in [cluster]time, checking the last 100
should be more than enough.
If the transaction is more than 16Mb it will be split into several prepared
messages. So it might happen that one shard committed the txn but another has
observed not all prepared messages by the end of the oplog. In such a case we
should report it in logs and describe-restore.

It also adds some restore stat for dist transactions (into the restore metadata):
partial - the num of transactions that were allied on other shards
but can't be applied on this one since not all prepare messages got into
the oplog (shouldn't happen).
shard_uncommitted - the number of uncommitted transactions before the sync.
Basically, the transaction is full but no commit message in the oplog of this
shard.
left_uncommitted - the num of transactions that remain uncommitted after the sync.
The transaction is full but no commit message in the oplog of any shard.

pbm/restore.go

pbm/restore/restore.go

cli/restore.go

pbm/oplog/restore.go

pbm/restore/logical.go

defbin · 2023-07-05T05:08:28Z

in cli/cli.go:170 remove "pitrestore" event for log filter

Co-authored-by: Dmytro Zghoba <dmytro.zghoba@percona.com>

Add an oplog reply stage during the restore data post-processing. All these steps are done whilst the PSMDB cluster is down and isn’t accessible from the outside world. We cannot start a full replica set at this stage without exposing it to external users. So oplog reply is done only on the “pimary” node in the single-node replicates state. The data will be propagated to the rest of the nodes during the cluster start. This leads to: 1. PITR for physical backup happens only to the primary node and later on is copied during cluster start (cluster initial sync) 2. PITR for sharded collections works only for writes, not for the creation of sharded collections, therefore **whenever a sharded collection is created a full physical backup is needed**. With the logical backup, it’s covered as a full replica set is restored rather than one node in case of physical. `--base-snashot` always flag must be used with the physical base. For example `pbm restore --time=2023-07-03T16:23:56 -w --base-snapshot=2023-07-03T16:18:09Z`. Without `--base-snapshot` will always look for a logical backup even if there is no logical or a physical one more recent. Distributed transactions ========= The old way on sync all dist transactions between the shards won't work due to no DB available during the physical restore. Hence it would be way too slow to do such a sync over the remote storage. The new algorithm (changed in logical restores as well in a case of code consistency): By looking at just transactions in the oplog we can't tell which shards were participating in it. But we can assume that if there is commitTransaction at least on one shard then the transaction is commited everywhere. Otherwise, transactions won't be in the oplog or everywhere would be transactionAbort. So we just treat distributed as non-distributed - apply opps once a commit message for this txn is encountered. It might happen that by the end of the oplog there are some distributed txns without commit messages. We should commit such transactions only if the data is full (all prepared statements observed) and this txn was committed at least by one other shard. For that, each shard saves the last 100 dist transactions that were committed, so other shards can check if they should commit their leftovers. We store the last 100, as prepared statements and commits might be separated by other oplog events so it might happen that several commit messages can be cut away on some shards but present on other(s). Given oplog events of dist txns are more or less aligned in [cluster]time, checking the last 100 should be more than enough. If the transaction is more than 16Mb it will be split into several prepared messages. So it might happen that one shard committed the txn but another has observed not all prepared messages by the end of the oplog. In such a case we should report it in logs and describe-restore. It also adds some restore stat for dist transactions (into the restore metadata): `partial` - the num of transactions that were allied on other shards but can't be applied on this one since not all prepare messages got into the oplog (shouldn't happen). `shard_uncommitted` - the number of uncommitted transactions before the sync. Basically, the transaction is full but no commit message in the oplog of this shard. `left_uncommitted` - the num of transactions that remain uncommitted after the sync. The transaction is full but no commit message in the oplog of any shard.

dAdAbird added 13 commits June 2, 2023 20:29

PBM-815: WIP phys restore + PITR

b9da2b5

PBM-815: fix oplog reply

7181ef4

PBM-815: minor chores

2c27703

Resolve merge conflicts

9704983

PBM-815: dist txn WIP

8ebb7e5

PBM-815: dist txn

6d8b7d1

PBM-815: merge restore and pitrRestore cmds

76a03a8

PBM-815: fix txn commit and chores

f5c854b

PBM-815: display partial txns

770dcbe

PBM-815: revert fs.yaml changes

439eae0

PBM-815: write txn stat

de421cd

PBM-815: restore stats

38a40a8

Merge branch 'main' into PBM-815_phys_n_pitr

1963488

dAdAbird requested review from sandraromanchenko and olexandr-havryliak July 4, 2023 10:34

dAdAbird requested a review from defbin as a code owner July 4, 2023 10:34

github-actions bot reviewed Jul 4, 2023

View reviewed changes

dAdAbird added 5 commits July 4, 2023 14:22

PBM-815: fix spelling

0fed8d6

PBM-815: fix non-pitr restores

4034be6

PBM-815: comments and spelling

16ed31a

PBM-815: fix external restore

8374c5a

PBM-815: fix dist txn for logical

6503468

defbin reviewed Jul 5, 2023

View reviewed changes

pbm/oplog/restore.go Outdated Show resolved Hide resolved

pbm/oplog/restore.go Outdated Show resolved Hide resolved

pbm/restore/logical.go Outdated Show resolved Hide resolved

dAdAbird and others added 5 commits July 5, 2023 13:50

fix typos

83de758

Co-authored-by: Dmytro Zghoba <dmytro.zghoba@percona.com>

fix typos

8d8423c

Co-authored-by: Dmytro Zghoba <dmytro.zghoba@percona.com>

Fix comment

3ce56d1

PBM-815: remove obsolete log event from cli help

6e43e43

PBM-815: remove unsued code

9528b0c

defbin approved these changes Jul 5, 2023

View reviewed changes

olexandr-havryliak approved these changes Jul 5, 2023

View reviewed changes

dAdAbird merged commit 3f5ccc4 into main Jul 5, 2023
24 of 28 checks passed

dAdAbird deleted the PBM-815_phys_n_pitr branch July 5, 2023 16:18

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

PBM-815: physical restore + logical PITR #844

PBM-815: physical restore + logical PITR #844

dAdAbird commented Jul 4, 2023 •

edited

Loading

defbin commented Jul 5, 2023

PBM-815: physical restore + logical PITR #844

PBM-815: physical restore + logical PITR #844

Conversation

dAdAbird commented Jul 4, 2023 • edited Loading

Distributed transactions

defbin commented Jul 5, 2023

dAdAbird commented Jul 4, 2023 •

edited

Loading