From 8afcc89bd5e9a52b98af950fa64033a1cde7746e Mon Sep 17 00:00:00 2001 From: pablo-gar Date: Mon, 13 Mar 2023 12:26:38 -0700 Subject: [PATCH] [docs] Updates readme, adds more contextual and onboarding information with links to doc-site. Co-authored-by: Aaron Wolen Co-authored-by: Emanuele Bezzi Co-authored-by: John Kerl --- README.md | 89 ++++++++++++++++++++++++++++++++++++++++++------------- 1 file changed, 68 insertions(+), 21 deletions(-) diff --git a/README.md b/README.md index 3aff399353..34e72110fd 100644 --- a/README.md +++ b/README.md @@ -7,45 +7,92 @@ # TileDB-SOMA -SOMA -- _stack of matrices, annotated_ -- is a flexible, extensible, and open-source API enabling access -to data in a variety of formats. +[SOMA](https://github.com/single-cell-data/SOMA/tree/main) – for “Stack Of Matrices, Annotated” – is a flexible, extensible, and open-source API enabling access to data in a variety of formats. The driving use case of SOMA is for single-cell data in the form of annotated matrices where observations are frequently cells and features are genes, proteins, or genomic regions. The TileDB-SOMA package is a C++ library with APIs in Python and R, using [TileDB Embedded](https://github.com/TileDB-Inc/TileDB) to implement the [SOMA specification](https://github.com/single-cell-data/SOMA/blob/main/abstract_specification.md). -# Community +Get started on using TileDB-SOMA: -* Please join the [TileDB Slack community](https://tiledb-community.slack.com/join/shared_invite/zt-ndq1ipwl-QcithaWG6j1BImtuQGSpag#/shared-invite/email) with dedicated channel `#genomics` -* Please join the [CZI Slack community](https://cziscience.slack.com/), with dedicated -channel `#cell-census-users` +* [Quick start](#quick-start). +* Python [documentation](https://tiledbsoma.readthedocs.io/en/latest/python-api.html). +* R [documentation](https://single-cell-data.github.io/TileDB-SOMA/). -# Documentation +## What can TileDB-SOMA do? -* [Python API](https://tiledbsoma.readthedocs.io/en/latest/python-api.html) -* [R API](https://single-cell-data.github.io/TileDB-SOMA) -* [Example Python notebooks](https://github.com/single-cell-data/TileDB-SOMA/tree/main/apis/python/notebooks) +Intended to be used for single-cell data, TileDB-SOMA provides Python and R APIs to allow for storage and data access patterns at scale and for larger-than-memory operations: -# Installation -* [Python](https://github.com/single-cell-data/TileDB-SOMA/blob/main/apis/python/README.md) -* [R](https://github.com/single-cell-data/TileDB-SOMA/blob/main/apis/r/README.md) +* Create and write large volumes of data. +* Open and read data at low latency, locally and from the cloud. +* Query and access interconnected arrays efficiently and at low latency. -# Issues and contacts +TileDB-SOMA provides interoperability with existing single-cell toolkits: -* Any/all questions, comments, and concerns are welcome at the [GitHub new-issue page](https://github.com/single-cell-data/TileDB-SOMA/issues/new/choose) -- or, you can also browse [existing issues](https://github.com/single-cell-data/TileDB-SOMA/issues) -* If you believe you have found a security issue, in lieu of filing an issue please responsibly disclose it by contacting [security@tiledb.com](mailto:security@tiledb.com) +* Load and create [anndata](https://anndata.readthedocs.io/en/latest/) objects. +* Load and create [Seurat](https://satijalab.org/seurat/) objects. *Coming soon*. -# Branches +TileDB-SOMA provides interoperability with existing Python or R data structures: + +* From Python create PyArrow objects, SciPy sparse matrices, NumPy arrays, and pandas data frames. +* From R create R Arrow objects, sparse matrices (via the [Matrix](https://cran.r-project.org/package=Matrix) package), and standard data frames and (dense) matrices. + + +## Community + +* Please join the [TileDB Slack community](https://czi.co/science-slack) with dedicated channel `#genomics`. +* Please join the [CZI Slack community](https://cziscience.slack.com/join/shared_invite/zt-czl1kp2v-sgGpY4RxO3bPYmFg2XlbZA#/shared-invite/email), with dedicated +channel `#cell-census-users`. + + +## Quick start + +### Documentation + +The TileDB-SOMA doc-site ([Python](https://tiledbsoma.readthedocs.io/en/latest/python-api.html)|[R](https://single-cell-data.github.io/TileDB-SOMA/)), contains the reference documentation and tutorials. + +Reference documentation can also be accessed directly from Python `help(tiledsoma)` or R `help(package = "tiledbsoma")`. + +### Main SOMA Objects + +The capabilities of TileDB-SOMA lay on the different read, access, and query patterns that each of the main implementations of SOMA objects provide: + +* `DenseNDArray` is a dense, N-dimensional array, with offset (zero-based) integer indexing on each dimension. +* `SparseNDArray` is the same as `DenseNDArray` but sparse, and supports point indexing (disjoint index access). +* `DataFrame` is a multi-column table with a user-defined columns names and value types, with support for point indexing. +* `Collection` is a persistent container of named SOMA objects. +* `Experiment` is a class that represents a single-cell experiment. It always contains two objects: + * `obs`: a `DataFrame` with primary annotations on the observation axis. + * `ms`: a `Collection` of measurements, each composed of `X` matrices and axis annotation matrices or data frames (e.g. `var`, `varm`, `obsm`, etc). + +### APIs quick start + +* [Python quick start](https://github.com/single-cell-data/TileDB-SOMA/wiki/Python-quick-start) +* [R quick start](https://github.com/single-cell-data/TileDB-SOMA/wiki/R-quick-start) + +## Who is using SOMA? + +* [CZ CELLxGENE Discover](https://cellxgene.cziscience.com/) to build the [Cell Census](https://github.com/chanzuckerberg/cell-census/) which provides efficient access and querying to a corpus containing nearly 50 million cells, compiled from 700+ datasets. + +If you are interested in listing any projects here please contact us at [soma@chanzuckerberg.com](mailto:soma@chanzuckerberg.com). + + +## Issues and contacts + +* Any/all questions, comments, and concerns are welcome at the [GitHub new-issue page](https://github.com/single-cell-data/TileDB-SOMA/issues/new/choose) -- or, you can also browse [existing issues](https://github.com/single-cell-data/TileDB-SOMA/issues). +* If you believe you have found a security issue, in lieu of filing an issue please responsibly disclose it by contacting [security@tiledb.com](mailto:security@tiledb.com). + +## Branches This branch, `main`, implements the [updated specfication](https://github.com/single-cell-data/SOMA/blob/main/abstract_specification.md). Please also see the `main-old` branch which implements the [original specification](https://github.com/single-cell-data/TileDB-SOMA/blob/main-old/spec/specification.md). -# Developer information +## Developer information -* [TileDB-SOMA wiki](https://github.com/single-cell-data/TileDB-SOMA/wiki) -* [Build instructions for developers](libtiledbsoma/README.md) +* [TileDB-SOMA wiki](https://github.com/single-cell-data/TileDB-SOMA/wiki). +* [Build instructions for developers](libtiledbsoma/README.md). -# Code of Conduct +## Code of Conduct All participants in TileDB spaces are expected to adhere to high standards of professionalism in all interactions. This repository is governed by the