Draft specification for a file format that groups multiple point cloud files to be treated as a single dataset.
Inspired by GDAL's VRTs and PDAL's tindex.
Goals:
- load VPC in QGIS as a single point cloud layer (rather than each file as a separate map layer)
- run processing tools on a VPC as input
- simple format that's easy to read/write
- allow referencing both local and remote datasets
- allow working with both indexed (e.g. COPC or EPT) and non-indexed (e.g. LAS/LAZ) datasets
Non-goals:
- support other kinds of data than (georeferenced) point clouds
We are using STAC API ItemCollection as the file format and expecting these STAC extensions:
Note: ItemCollection not the same thing as a STAC Collection. An ItemCollection is essentially a single JSON file representing a GeoJSON FeatureCollection containing STAC Items. A STAC Collection is also a JSON file, but with a different structure and extra metadata, but more importantly, it only links to other standalone JSON files (STAC items) which is impractical for our use case as we strongly prefer to have the whole virtual point cloud definition in a single file (for easy manipulation).
We use .vpc
extension (to allow easy format recognition based on the extension).
Normally a virtual point cloud file will not provide any links to other STAC entities (e.g. to a parent, to a root or to self) because often it will be created ad-hoc.
Why STAC:
- it is a good fit into the larger ecosystem of data catalogs, avoiding creation of a new format
- supported natively by PDAL as well (readers.stac)
- search endpoint on STAC API servers returns the same ItemCollection that we use, so a search result can be fed directly as input
- the ItemCollection file is an ordinary GeoJSON and other clients can consume it (to at least show boundaries of individual files)
Each referenced file can have its own CRS and it is defined through the "projection" STAC extension - either using proj:epsg
or proj:wkt2
or proj:projjson
. It is recommended that a single virtual point cloud only references files in the same CRS.
Please note that proj:epsg
only allows a single EPSG code to be specified. When working with a compound CRS that has an EPSG code for both horizontal and vertical CRS (often written as EPSG:1234+5678
where EPSG:1234
is horizontal and EPSG:5678
vertical), but there is no EPSG code for the compound CRS, this can't be encoded in proj:epsg
and it is preferrable to use an alternative encoding (e.g. proj:wkt2
).
The STAC pointcloud extension defines optional pc:statistics
entry with statistics for each attribute. There is a limitation that currently it does not define how to store distinct values where it makes sense and their counts (e.g. Classification attribute) - see stac-extensions/pointcloud#5.
The format requires that boundaries of referenced files are defined in WGS 84 (since GeoJSON requires that) - either as a simple 2D or 3D bounding box or as a boundary geometry (polygon / multi-polygon). In addition to that, the "projection" STAC extension allows that the bounding box or boundary geometry can be specified in native coordinate reference system - this is strongly recommended and if proj:bbox
or proj:geometry
are present, they will be used instead of their WGS 84 equivalents.
It is also strongly recommended to provide bounding box in 3D rather than just 2D bounding box (3D viewers can correctly place the expected box in the 3D space rather than guessing the vertical range).
Overviews are useful for client software to show preview of the point cloud when zoomed out, without having to open all individual files and only rely on overviews (the same idea as with overviews of raster layers). Overviews are an optional feature, they do not need to be provided.
In a simple case, there would be a single overview point cloud for a whole VPC - using thinned original data (e.g. every 1000-th point) and merged from individual data files to a single file.
In case of very large point clouds, using a single overview point cloud may not be enough, and it may be useful to have a hierarchy of overviews with a tiling scheme (e.g. 1 overview at level 0, 4 overviews at level 1, 16 overviews at level 2 - each level having different density).
Overviews are defined as extra assets of STAC items in addition to the actual data file - it is expected that they have overview
role defined. An overview point cloud may be referenced from multiple STAC items.
In case of a hierarchy of overviews, each STAC item may have multiple overview assets linked (e.g. one for level 0, one for level 1, one for level 2). STAC protocol does not provide a way to distinguish which overview is at what level (no place to write spacing between points or density), it is up to the client software to collect all referenced overviews and query their properties.
A sample of encoding overview point cloud in addition to the actual data:
"assets": {
"data": {
"href": "./mydata_54_1.copc.laz",
"roles": [
"data"
]
},
"overview": {
"href": "./mydata-overview.copc.laz",
"roles": [
"overview"
]
}
}