Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Clustering issues #318

Closed
5 tasks done
Jotschi opened this issue Feb 28, 2018 · 8 comments
Closed
5 tasks done

Clustering issues #318

Jotschi opened this issue Feb 28, 2018 · 8 comments

Comments

@Jotschi
Copy link
Contributor

Jotschi commented Feb 28, 2018

Re-Sync errors:

  • https://www.prjhub.com/#/issues/9914
    A bogus delta sync is executed and failing when new files (e.g. new verticle types) have been added.
    The issue should not affect clustering since a full sync is executed as a fallback mechanism.
    The issue will be fixed with the next OrientDB release (2.2.34)
  • Syncing restarted node fails with cluster not found error orientechnologies/orientdb#8127
    Sync issue will be fixed with OrientDB 2.2.34
  • https://www.prjhub.com/#/issues/9860
    Delta Sync was failing in some cases. Issue has been fixed with 2.2.33 (Already included in Gentics Mesh)
  • https://www.prjhub.com/#/issues/9947
    In some cases the "master" election did not finish. OrientDB switched from one instance to another.
    Issue will be fixed with OrientDB 2.2.34
    The issue is also covered by Gentics Mesh clustering tests.
  • https://www.prjhub.com/#/issues/9950
    When recovering from a split-brain situation usually one instance in the cluster is forced to backup the data so that the other node in the cluster will become the new "source of truth". In some cases a wrong backup was used for these nodes. Additionally there seems to be a Gentics Mesh bug which causes the OrientDB database to be modified before the instance is joining the cluster. This is bad because it alters the version of the DB and thus OrientDB could identify this as a recent change which leads to the db being chosen as "the latest" one.
    Somewhat odd workaround: Delete the data/graphdb folder before joining a cluster. Thus a full sync will be executed and no complex mechanism is involved to elect the "latest" db.
@Jotschi Jotschi added this to the 1.0.0 milestone Mar 2, 2018
@cschockaert
Copy link

can you explain what are the issues on prjhub?
cannot see the details

@Jotschi
Copy link
Contributor Author

Jotschi commented Apr 3, 2018

@cschockaert I added some information and new issues.

@clems159
Copy link

Hi again, so 0.19.0 is fixing all theses issues?

@Jotschi
Copy link
Contributor Author

Jotschi commented Apr 30, 2018

@clems159 The last issue is still open. The other issues have been resolved.

@clems159
Copy link

ok, so if i remember, when we tryied master / master we were in an infinite loop between node to find who is the master, causing instance to never restart.

this is associated with split-brain situation ?

@Jotschi
Copy link
Contributor Author

Jotschi commented Apr 30, 2018

@clems159 When setting up a mesh cluster it is important to only initialize one Gentics Mesh instance.
https://getmesh.io/docs/beta/clustering.html#_setup
The other one must be empty in order to join the cluster. Maybe you started a cluster with two instances and OrientDB got stuck in a replication loop.

I don't think that the issue you observed is related to the split brain issue. Could you perhaps try again and provide logs if you encounter issues?

@Jotschi
Copy link
Contributor Author

Jotschi commented Oct 18, 2018

Last issue has been fixed with OrientDB upgrade some time ago.

@Jotschi Jotschi closed this as completed Oct 18, 2018
@cschockaert
Copy link

Agree,
working with latest getmesh version in 1 master / x replicas mode and did not met this situation again.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants