Skip to content

Commit

Permalink
arquero doc edits
Browse files Browse the repository at this point in the history
  • Loading branch information
mbostock committed Dec 5, 2023
1 parent 80be4be commit abf2594
Showing 1 changed file with 33 additions and 46 deletions.
79 changes: 33 additions & 46 deletions docs/lib/arquero.md
Original file line number Diff line number Diff line change
@@ -1,45 +1,36 @@
# Arquero

Arquero is a JavaScript library for query processing and transformation of array-backed data tables.

Arquero (version <strong>${aq.version ?? "nope"}</strong>) is available by default as the **aq** symbol from Observable’s stdlib:
[Arquero](https://uwdata.github.io/arquero/) is a JavaScript library for “query processing and transformation of array-backed data tables.” Arquero (currently version ${aq.version}) is available by default as `aq` in Markdown, but you can import it explicitly like so:

```js echo
aq
import * as aq from "npm:arquero";
```

Following the documentation website’s [introduction](https://uwdata.github.io/arquero/), let’s extract some methods:
Following the documentation website’s [introduction](https://uwdata.github.io/arquero/), let’s create a table of the Average hours of sunshine per month, from [usclimatedata.com](https://usclimatedata.com/).

```js echo
const { all, desc, op, table } = aq;
```

We can then create a table of the Average hours of sunshine per month, from [usclimatedata.com](https://usclimatedata.com/).

```js echo
const dt = table({
'Seattle': [69, 108, 178, 207, 253, 268, 312, 281, 221, 142, 72, 52],
'Chicago': [135, 136, 187, 215, 281, 311, 318, 283, 226, 193, 113, 106],
'San Francisco': [165, 182, 251, 281, 314, 330, 300, 272, 267, 243, 189, 156]
const dt = aq.table({
"Seattle": [69, 108, 178, 207, 253, 268, 312, 281, 221, 142, 72, 52],
"Chicago": [135, 136, 187, 215, 281, 311, 318, 283, 226, 193, 113, 106],
"San Francisco": [165, 182, 251, 281, 314, 330, 300, 272, 267, 243, 189, 156]
});
```

As we see, Arquero is column-oriented: each column is an array of values of a given type (here, numbers representing hours of sunshine per month).

But a table is also iterable and as such, its contents can be displayed with [Inputs.table](/lib/inputs#table).
Arquero is column-oriented: each column is an array of values of a given type. Here, numbers representing hours of sunshine per month. But an Arquero table is also iterable and as such, its contents can be displayed with [Inputs.table](/lib/inputs#table).

```js echo
Inputs.table(dt, {width: 370})
Inputs.table(dt, {maxWidth: 640})
```

An Arquero table can be used as a data source to make happy charts with [Observable Plot](/lib/plot):
An Arquero table can also be used to make charts with [Observable Plot](./plot):

```js echo
Plot.plot({
height: 150,
x: {label: "month"},
y: {zero: true, grid: true, label: "hours of ☀️"},
width: Math.min(width, 640),
x: {tickFormat: Plot.formatMonth()},
y: {grid: true, label: "Hours of sunshine ☀️ per month"},
marks: [
Plot.ruleY([0]),
Plot.lineY(dt, {y: "Seattle", marker: true, stroke: "red"}),
Plot.lineY(dt, {y: "Chicago", marker: true, stroke: "turquoise"}),
Plot.lineY(dt, {y: "San Francisco", marker: true, stroke: "orange"})
Expand All @@ -49,44 +40,40 @@ Plot.plot({

Arquero supports a range of data transformation tasks, including filter, sample, aggregation, window, join, and reshaping operations. For example, the following operation derives differences between Seattle and Chicago and sorts the months accordingly.

```js
Inputs.table(diffs, {width: 250})
```

```js echo
const diffs = dt.derive({
month: d => op.row_number(),
diff: d => d.Seattle - d.Chicago
month: (d) => aq.op.row_number(),
diff: (d) => d.Seattle - d.Chicago
})
.select('month', 'diff')
.orderby(desc('diff'));
.select("month", "diff")
.orderby(aq.desc("diff"));

display(Inputs.table(diffs, {maxWidth: 640}));
```

Is Seattle more correlated with San Francisco or Chicago?

```js
Inputs.table(correlations, {width: 250})
```

```js echo
const correlations = dt.rollup({
corr_sf: op.corr('Seattle', 'San Francisco'),
corr_chi: op.corr('Seattle', 'Chicago')
})
corr_sf: aq.op.corr("Seattle", "San Francisco"),
corr_chi: aq.op.corr("Seattle", "Chicago")
});

display(Inputs.table(correlations, {maxWidth: 640}));
```

We can aggregate statistics per city: the following reshapes (folds) the data to a two column layout: city, sun, and shows the output as objects:
We can aggregate statistics per city. The following code reshapes (or “folds) the data into two columns _city_ & _sun_ and shows the output as objects:

```js echo
dt.fold(all(), { as: ['city', 'sun'] })
.groupby('city')
dt.fold(aq.all(), {as: ["city", "sun"]})
.groupby("city")
.rollup({
min: d => op.min(d.sun), // functional form of op.min('sun')
max: d => op.max(d.sun),
avg: d => op.average(d.sun),
med: d => op.median(d.sun),
min: (d) => aq.op.min(d.sun), // functional form of op.min('sun')
max: (d) => aq.op.max(d.sun),
avg: (d) => aq.op.average(d.sun),
med: (d) => aq.op.median(d.sun),
// functional forms permit flexible table expressions
skew: ({sun: s}) => (op.mean(s) - op.median(s)) / op.stdev(s) || 0
skew: ({sun: s}) => (aq.op.mean(s) - aq.op.median(s)) / aq.op.stdev(s) || 0
})
.objects()
```
Expand Down

0 comments on commit abf2594

Please sign in to comment.