Understanding ArcadeDB local graph traversal performance #1110
-
I've been looking at graph databases for a while, and ArcadeDB has appeared on my radar repeatedly. One thing I am trying to understand better to place it in the landscape of graph-capable graph databases is the indexing and access strategies. For this, I am trying to understand some things.
To me that implies that you can perform extremely fast graph traversals in ArcadeDB since there is never a need for large linear scans or 𝒪 log(n) index scans of global edge lists - once you have found your starting vertex, you perform a local scan of its presumed sparse edge list, where you can recover all the information of the edges pointed to in constant time, thus also find the vertex at the other end in constant time, perform selection/updates on it, and so on. So any graph traversal operations/edge-and-connected-vertex-properties selections that only touches a small section of the larger graph should run very quickly in comparison to a query that runs on most of the graph? |
Beta Was this translation helpful? Give feedback.
Replies: 1 comment 3 replies
-
That's 100% accurate :-) The O(1) time when you retrieve any record by its ID (RID -> RecordID) is because the RID it's encoded the position of the record in the file. For example, the RID So in order to find the physical location where the record is stored, you can apply this simple math: At this point you can just start reading the record at offset You can look in the code at here: int pageId = (int) (rid.getPosition() / maxRecordsInPage);
int positionInPage = (int) (rid.getPosition() % maxRecordsInPage); |
Beta Was this translation helpful? Give feedback.
That's 100% accurate :-)
The O(1) time when you retrieve any record by its ID (RID -> RecordID) is because the RID it's encoded the position of the record in the file. For example, the RID
#12:1000000
means file number 12 (they are mapped in the schema.json by number) and position in file = 1,000,000. This position is not the actual byte where the record starts. By default, the buckets are created with a 64KB page size, and the maximum number of records per page is set to 2048 (by default).So in order to find the physical location where the record is stored, you can apply this simple math:
1,000,000 / 2,048 = 488.28125
. That means page 488, and location on page =576 (1,000,000 mod 2048)
.…