Skip to content
This repository has been archived by the owner on Sep 26, 2023. It is now read-only.
X Xu edited this page Apr 15, 2022 · 20 revisions

Architecture

Architecture Diagram

Design Goal

Have a modularized and simple software solution to allow clients to store and end users to retrieve files via Filecoin storage provider network.

Components

Component Responsibility External Dependencies
Deal Preparation Service Accept request from client to prepare dataset. -
Deal Preparation Worker Scan dataset folder, split dataset into chunks, prepare CAR files and calculate commp. -
Deal Making Service Wrap around Boost Client CLI that can be used to send out stateless deals Boost Client CLI and its dependencies
Deal Tracking Service Update the deal state by looking at on-chain messages lotus for chain head notification
Replication Service Replicate prepared dataset to Filecoin storage providers -
Ranking Service Subscribe to provider metrics published by estuary/filrep via pando Pando integration with Estuary/Filrep
Http Hosting Service Host the generated car files -

Modularize

With the default configuration, all services/modules will be run on a single host and uses the same database. However, all of above components can be run on different host as long as they are connecting to the database that contains the collections they need.

The software can also be utilized in a creative way to offer part of the feature to users. Below table summarizes how end user scenarios maps to what components to enable if the user does not want to enable all contained modules.

Scenario Components to enable Details
As a client, I just need to prepare some CAR files from a dataset Deal Preparation Service, Deal Preparation Worker CAR files will be generated. dataCid, pieceCid and ipld dag structure will be saved in the database
As a client, I have some CAR files on hand and just need to replicate them with selected storage providers Deal Making Service, Deal Tracking Service, Replication Service Client can also use Ranking Service to filter and select the storage providers. pieceCid and dataCid will be generated if the client does not provide them

Database

All databases used by above services will be MongoDB service. By default, A local MongoDb service will be shipped with the software and automatically starts up. More experienced users can setup their own database and provide a connection string in the config.

Use below configuration to start the database with the service

[database]
enabled = true
path = "./.singularity/database"
bind = "127.0.0.1"
port = 37000

Deal Preparation Service

Purpose

The entry point of the whole service that handles user requests to prepare deal. Talks to a MongoDB database to manage the state of each request.

Configuration

[connection]
database = "mongodb://127.0.0.1:37000"
[deal_preparation_service]
enabled = true
bind = "127.0.0.1"
port = 3001
ipfs_api = "http://127.0.0.1:5001"

Deal Preparation

The service will perform simple validation of the request and store the request in the database table. The preparation request is fulfilled with two steps by the Deal Preparation Workers

  1. Scanning - scan the dataset to understand the folder structure and split them into chunks
  2. Generation - for each chunk of the dataset, generate CAR files and compute commp
  3. Indexing - generate the index dag and pin to IPFS network

Whenever the service received a preparation request, it will insert an entry in the database collection ScanningRequest so a worker can poll for this task later.

Database Model ScanningRequest

{
    id: string, // auto generated 
    name: string // dataset name, provided by the client
    path: string, // path of the dataset, provided by the client
    minSize: string, // min size of the CAR archive, default to 0.6 * deal size
    maxSize: string, // max size of the CAR archive, default to 0.9 * deal size
    workerId: string, // worker assigned to this task
    status: string // active, paused, removed, completed
    indexCid: string
}

API - Create dataset preparation request

Request

POST /preparation
{
    "name": "dataset_name",
    "path": "/mnt/dataset/path",
    "dealSize": "32 GiB", --> must be power of 2.
    "minSize": "20 GiB", --> optional
    "maxSize": "30 GiB" --> optional
}

Response

{
    "id": "xxxxx"
}

API - Pause/Resume/Remove dataset preparation request

Once a preparation request is created, it can be paused, resumed and removed.

  • Paused means all dataset preparation and generation should no longer happening however can be resumed later
  • Removed means removing the entry in the database
POST /preparation/:id
{
    "state": "paused|active|removed"
}

API - Get all dataset preparation requests

Request

GET /preparations

Response

[
    {
        "id": "id",
        "name": "dataset_name",
        "path": "path",
        "dealSize": "32 GiB",
        "minSize": "20 GiB",
        "maxSize": "30 GiB",
        "scanningStatus": "active",
        "generationCompleted": 10,
        "generationInprogress": 2,
        "generationTotal": 50
    }
]

API - Get details for a preparation request

Request

GET /preparation/:id

Response

{
        "id": "id",
        "name": "dataset_name",
        "path": "path",
        "dealSize": "32 GiB",
        "minSize": "20 GiB",
        "maxSize": "30 GiB",
        "scanningStatus": "active",
        "generationCompleted": 10,
        "generationInprogress": 2,
        "generationTotal": 50,
        "generationRequests": [
            {
                "id": "id",
                "index": 0,
                "workerId": "xxx",
                "status": "active",
                "dataCid": "bafy",
                "pieceCid": "bafy",
                "pieceSize": 1024
            }
            ...
        ]
}

API - Get file list for a specific generation request

Request

GET /generation/:id

Response

{
    "id": "id",
    "name": "name",
    "path": "path",
    "index": 0,
    "fileList": [
        {
            "path": "/path/to/file1.mp4",
            "name": "file1.mp4",
            "size": 4096,
            "start": 0,
            "end": 2048
        },
        ...
    ],
    "workerId": "xxx",
    "state": "active",
    "dataCid": "bafy",
    "pieceCid": "bafy",
    "pieceSize": 1024
}

Worker Health Check Maintenance

The service will perioidcally check the database for any expired workerIds, all requests assigned to expired workers will have the workerId cleared so another worker can pick up the task.

Deal Preparation Worker

Purpose

Poll for work to scan the dataset and generate car files.

Configuration

[connection]
database = "mongodb://127.0.0.1:37000"
[deal_preparation_worker]
enabled = true
parallelism = 2
out_dir = "./.singularity/cars"

Report Health

The worker will periodically report health to the database. If the worker failed to report health for certain period, the entry should be removed by orchestrator service

Database Model HealthCheck

{
    workerId: string, // uuid generated during worker startup
    updatedAt: Date
}

Scanning Request

The worker will periodically look for ScanningRequest that

  1. has not been completed
  2. does not have assigned worker

To scan the dataset

  1. perform a Glob pattern match and get all files in sorted order
  2. iterate through all the files and keep accumulating file sizes into a chunk
  3. Once the size of the chunk is between minSize and maxSize, start with a new chunk
  4. If the size of a file is to large to put into a chunk, split the file to hit the chunk minSize

Caveat: Due to IPFS chunking, the resulting CAR files may be smaller than expected if there are duplicate chunks, i.e. duplicate file or file content.

Once scanned, all chunks have their own list of file paths. Save the filelist within each chunk to database GenerationRequest table so another worker will pick up the generation request later.

Database Model GenerationRequest

{
    id: string, // auto generated
    name: string, // dataset name
    path: string, // dataset path
    index: number, // the index of the chunk for that dataset
    fileList: { // list of files to include in this chunk
        path: string,
        name: string,
        size: number,
        start: number, 
        end: number // both start and end being non zero indicates partial file
    }[],
    workerId: string, // assigned worker
    status: string, // active, paused, removed, completed
    dataCid: string, // root cid of ipld
    pieceCid: string, // commp
    pieceSize: number // deal size
}

Generation Request

The worker will periodically look for GenerationRequest that

  1. has not been completed
  2. does not have assigned worker

To generate car files and calculate commp

  1. Build IPLD DAG representing the unixfs structure of the file list. Reference code
  2. Calculate CommP using code from stream-commp
  3. To save some disk IO, the above two can be accomplished in one go with streaming
  4. Save the dataCid, pieceCid, pieceSize to database
  5. The CAR files will be generated under out_dir with filename <cid>.car. Those files will be picked up by the HTTP hosting server

Indexing Request

Purpose

Store the index and metadata in IPFS.

Design

The goal of the index is to allow end user to visit individual files or folders with similar experience of visiting files from local file system. The design needs to address a few things

  1. If the folder or the file is splitted across different deals, they need to be reassembled back
  2. The user should be able to search for files they want by filename and path
  3. Avoid dependency on a centralized or a single service endpoint

To achieve above goals, we want to leverage IPFS as a storage to store the metadata for the file structures and IPNS to provide a fixed path to look for such metadata. The metadata will be structured simiarly to Unixfs, but with DAG-CBOR codec.

DataModel {
  __type: DataType,
  __size: Number,
  __source: SourceInfo[],
  // And other subfiles
}
SourceInfo {
    cid: string,
    selector: string,
    offset: Number,
    length: Number
}

When all CAR files are generated from the original dataset, the software will assemble the DAG and pin the object to IPFS network. The client can then utlize DNSLink to publish the object to their DNS records.

Once published, the metadata can be queryable with a human friendly IPFS path or URL via IPFS Gateway.

Client Experience

Once the dataset has been prepared, the client can retrieve the IPFS path for the metadata object.

singularity preparation status This will return an IPFS link, i.e. /ipfs/bafyxxx

Retrieval

The retrieval follows below steps

  1. The user supplys the domain and the path to the folder or file to retrieve, i.e. singularity retrieve my.dataset.com/path/to/file.mp4
  2. The software looks up the ipfs dag object under /ipns/my.dataset.com/path/to/file.mp4, the data contains the type of the object as well as a list of SourceInfo
  3. If the sourceInfo contains a single entry, then the item was not splitted and can be retrieved using the dataCid and the data model selector
  4. If the sourceInfo contains multiple entries, then the item has been splitted. Download each entry and reassemble them using the offset, length and filesize information.
  5. To map the dataCid to the actual storage provider that stores this data, use the cid.contact built by Protocol Lab.

Future Extension

User space file system mount (FUSE) will be possible as the above data structure provides sufficient data for POSIX readdir, getattr and read.

Deal Making Service

Purpose

Send proposals over libp2p with HTTP link for storage provider to download and import.

Implementation

It may be a simple wrapper around Boost Client CLI. Below API may be subject to change depending on the final implementation of Boost Client CLI. Once a deal is made, a proposalCid will be returned

Configuration

[deal_making_service]
enabled = true
other_boost_client_cli_options = "such as lotus connection"

API - make a deal

Note, this is just an example and will depend on the outcome of boost client.

Request

POST /deal
{
    "wallet" : "f1xxxxxx",
    "keyPath" : "path/to/private/key",
    "verified" : true,
    "url" : "http://example.com/bafyxxxx.car",
    "pieceSize": 256,
    "pieceCid": "bafyxxxxx",
    "dataCid": "bafyxxxxxx",
    "provider": "f01234"
}

Response

{
    "proposalCid": "bafyxxxxxx"
}

Deal Tracking Service

Purpose

Periodically update the status for each proposal_cid and deal_id

Implementation Approach 1

Subscribe lotus daemon chain notification and update deal state according to deal state spec. Updated deal states will be stored in database.

  • Unpublished -> Published, triggered by PublishStorageDeals
  • Published -> Deleted, if sealing takes too long
  • Published -> Active, after successful PreCommit and ProveCommit
  • Active -> Deleted, if expired or teminated

Implementation Approach 2

  • Use filfox API to get deal IDs for a specific client.
  • Use Glif nodes to check deal status for those published deals

Configuration

[connection]
database = "mongodb://127.0.0.1:37000"
full_node_api = "http://127.0.0.1:1234"
full_node_token = "token"
[deal_tracking_service]
enabled = true

Database Models DealState

{
    datasetId: string,
    client: string,
    provider: string,
    proposalCid: string,
    label: string,
    dealId: number,
    sectorId: number,
    activation: number,
    expiration: number,
    state: string // proposed, published, active, slashed
}
{
    lastHeight: number // to track which height to resume when coming back online
}

Replication Service

Purpose

Entry point for the client to start replicate a prepared dataset to Filecoin storage providers. Ensure each chunk of the dataset is replicated according to client's requirement.

Configuration

[connection]
database = "mongodb://127.0.0.1:37000"
[replication_service]
enabled = true
bind = "127.0.0.1"
port = 3002
http_url_prefix = "http://example.com" # This will send out deal with URL http://example.com/bafyxxxx.car

Implementation

Stands up an API that accept replication request

API - create replicate request for a dataset

Request

POST /replication
{
    "datasetId": "id",
    "minReplicas": 3,
    "criteria": "provider in (f0100, f0101, f0102) && success_rate > 0.5",
    "client": "client",
}

API - create replicate request for a car file

Request

POST /replication
{
    "path": "/mnt/dataset/bafy.car",
    "dataCid": "bafy...",
    "pieceCid": "bafy...",
    "pieceSize": 1024,
    "minReplicas": 3,
    "criteria": "provider in (f0100, f0101, f0102) && success_rate > 0.5",
    "client": "client",
}

API - update replicate request

Request

POST /replication/:id
{
    "minReplicas": 3,
    "criteria": "provider in (f0100, f0101, f0102) && success_rate > 0.5",
    "client": "client",
    "state": "paused|active|removed"
}

API - get all replicate requests

Request

GET /replications

Response

[
    {
        "id": "id",
        "datasetId": "id",
        "minReplicas": 3,
        "client": "client",
        "criteria": "provider in (f0100, f0101, f0102) && success_rate > 0.5",
        "status": "active",
        "replicationFactor": 1.5,
        "replicationCompleted": 10,
        "replicationInProgress": 2,
        "replicationTotal": 50
    }
]

API - get replicate request details

Request

GET /replication/:id

Response

{
    "id": "id",
    "datasetId": "id",
    "minReplicas": 3,
    "client": "client",
    "criteria": "provider in (f0100, f0101, f0102) && success_rate > 0.5",
    "status": "active",
    "replicationFactor": 1.5,
    "replicationCompleted": 10,
    "replicationInProgress": 2,
    "replicationTotal": 50,
    "replicationDetails": [
        {
            "generationRequestId": "id",
            "index": 0,
            "dataCid": "bafy",
            "pieceCid": "bafy",
            "pieceSize": 1024,
            "deals": [
                {
                    "client" : "client",
                    "provider": "provider",
                    "proposalCid": "proposalCid",
                    "dataCid" : "dataCid",
                    "dealId": "dealId",
                    "sectorId": "sectorId",
                    "activation" : 123123,
                    "expiration" : 123123,
                    "state": "published"
                }
                ...
            ]
            ...
        }
    ]
}

Database Model ReplicationRequest

{
    id: string,
    datasetId: string,
    minReplicas: number,
    criteria: string,
    client: string,
    status: string // active, paused, removed, completed
}

Implementation

Periodically look at the replication request table

  • For each replication request
  • Find corresponding prepared dataset chunks in GenerationRequest table.
  • Look for deals for each Chunk in DealState table, and for those that do not meet minReplica requirement
  • Find storage providers in ProviderMetric tables that satisfy the criteria
  • Send deals to storage providers using Deal Making Service
  • Exponential backoff if the storage provider rejects the deal

Implementation with bidbot route

Forward the replication request to bidbot auctioneer. Currently it is not clear how the API will look like, but it should be some API where we can send the HTTP url along with other options such as minimum copy of replicas and get a jobId in the response. We can later query the API again with the jobId to get the status of the replication request

Ranking Service

Purpose

Periodically get the latest stats of storage providers from Pando and report deal making stats back to Pando.

Integration with Filrep/Estuary

No longer needed as Filrep will be publishing data to Pando. Only integration with Pando is required

Integration with Pando

Open Questions

Pando, as a pub-sub service based on libp2p, does not care about the data model of the underlying payload. It's up to the data provider and consumer to agree on a protocol for how to interprete the data transmitted. That means

  1. As a consumer of data from Estuary or filrep, they need to publish a document detailing how we may consume their published data
  2. As a provider of data to Pando, we will need to document how our data should be consumed and help consumers to integrate with our data.
  3. As this software gets distributed to large clients, each client will have their own peer id. Each consumer will need to discover and trust those peer ids. Since any peer can publish data to Pando, there needs to be a way to certify the peer trustworthy. One way to do that is to have an organization serving as authority, who will verify identity for large clients and publish those whitelisted peerIds. Consumer will get the list of peerIds every day and trust the data coming from those peerIds.

Implementation

Consume data from Pando, and update the corresponding database.

Database Model ProviderMetric

Unknown as we haven't yet seen any published data from Estuary/Filrep via Pando and how to consume them. Below is a tentative structure

{
    provider: string
    dataProvider: string
    metricName: string
    metricValue: mixed
}

Configuration

[ranking_service]
enabled = true

API - lookup storage providers with criteria

Request

POST /ranking/lookup
{
    "limit": 100,
    "criteria": [
        ["filrep.success_rate", "gt" 0.5],
        ["filrep.available", true],
        ["id", "in", "f0111", "f0222"]
    ]
}

Response

[
    {
        "id": "f0111",
        "filrep.success_rate": 0.6
    }
]

HTTP Hosting Service

Purpose

A simple HTTP server that hosts the CAR files

[http_hosting_service]
enabled = true
bind = "0.0.0.0"
port = 80
car_dir = ".singularity/cars/"

Client CLI

Purpose

A light CLI tool that wraps around APIs above

Configuration

[connection]
deal_preparation_service = "http://127.0.0.1:3001"
replication_service = "http://127.0.0.1:3002"

Command - examples

$ singularity preparation|prep start --name <name> --path <path> --deal-size 2GiB
Accepted. dataset-id: 1234
$ singularity preparation pause <dataset_id>
$ singularity preparation resume <dataset_id>
$ singularity preparation remove <dataset_id>
$ singularity preparation list
$ singularity preparation status <dataset_id>

$ singularity replication|rep start --dataset-id <dataset_id> --minReplicas <number> --criteria <criteria>
Accepted. replication request-id: 5678
$ singularity replication update <request_id>
$ singularity replication pause <request_id>
$ singularity replication resume <request_id>
$ singularity replication remove <request_id>
$ singularity replication list
$ singularity replication status <request_id>

$ ssingularity provider lookup --id id1,id2,id3 --criteria <criteria>

Retrieval CLI

Purpose

A CLI tool that connects to libp2p and can be used to download and merge files

Configuration

[connection]
full_node_api = "https://api.node.glif.io"
full_node_token = ""

Command - Retrieve file by path

singularity retrieve my.dataset.com/a/video/file.mp4 --output . [--provider f0xxxx]

Scalability

Deal Preparation

The deal preparation can be scaled by increasing the number of deal preparation worker. Currently, with AMD SHA extension, it takes about 4 minutes to prepare a 32GiB CAR file. After scaling out, the bottleneck will become the disk speed. Below is the estimate of deal preparation speed depending on the hardware spec

Storage Type IO # of workers TiB per day
Single SATA HDD 200 MB/s 1 8 TB
Raid 10 with 8 HDD 800 MB/s 3 36 TB
SAN network up to 20 GB/s 60 800 TB

Replication

Since all deals are made with boost client, each deal making is lightweight and does not need additional node to scale out. The bottleneck will be the data transfer from client's HTTP server to storage provider. The software provides a built-in HTTP server but the client can substitute with any other HTTP server, including hosting the data on S3. Below is the estimate of deals made per day depending on the Internet speed. This estimate is the upper bound and the actual speed may be significantly lower.

Internet Speed Deal making per day
1 Gbps 8 TB
10 Gbps 80 TB
200 Gbps 1.6 PB

Testing Plan

Phase 1 - component testing

We shall start testing while developping the software.

Deal Preparation

  • Test deal preparation with files of different sizes, from 1 byte to 1 TiB.
  • Test deal preparation with folders containing different files of different sizes
  • Test deal preparation with files of different types, including hidden files
  • Test deal preparation with files that does not have read access
  • Test deal preparation with files that are changed during preparation
  • Test deal preparation with remote mounted file system such as AWS S3
  • Verify computed CIDs
  • Verify computed Metadata DAG

Deal Tracking

  • Test deal tracking with existing client addresses
  • Test deal tracking with deal state changing to published, active, slashed

Reputation Service

  • Test reputation service with data from Pando

Bidbot Route

  • Test integration with Bidbot

Deal Making Service

  • Test the deal making API to confirm deal being made to corresponding storage providers

Replication Service

  • Test deal making with selected storage providers (SPX or enterprise wg)
  • Monitor deal making with deals in different state, including failed and slashed.

Retrieval

  • Test retrieval CLI with sealed deals and published index metadata
  • Test retrieval of single file, single folder, whole dataset, splitted large file, splitted large folder

Phase 2 - end to end testing

In this phase, we will be onboarding a selected dataset (~100TiB) with selected storage providers (SPX or enterprise WG). All components will be involved during this phase of testing to make sure each components are integrated well with each other.

Phase 3 - modularization testing

In this phase, we will start deploying the solution to multiple nodes and achieve horizontal scalability. We will also setup different components in a different node to validate the connectivity and configuration. We will also validate modularization of the software as described in Modularize

Phase 4 - scalability testing

In this phase, we will deploy solution with bigger scale and will demonstrate onboarding PiB of dataset within 30 days with selected storage providers (SPX or enterprise WG)

Clone this wiki locally