Skip to content

Releases: X-DataInitiative/SCALPEL-Flattening

First public release

17 Oct 11:33
edf4ce1
Compare
Choose a tag to compare
First public release Pre-release
Pre-release

Some cleaning has been done, Statistics is deprecated but not removed yet. Documentation effort is ongoing.

performance improvement

28 Mar 11:28
ef71e51
Compare
Choose a tag to compare

1: delete useless cache
2: rebuild configuration structure on years/months
3: rebuild output strategies
4: add config template and update readme

integration-pureconfig-release 1.1

18 Sep 14:43
e6df962
Compare
Choose a tag to compare

This release is based on Spark 2.3.0, It contains the following:

  1. integration of PureConfig.
  2. flatten fall cohort 2014 2016.

Flattening by month

23 Jan 16:45
a4a2295
Compare
Choose a tag to compare

The main feature isthe possibility to join the tables by month so that we avoid memory problems.
The main changes:
• When converting tables from csv to parquet, add the possibility to partition by a column
• Add joinByYearAndMonth method in order to partition by month
• Add some config parameters :
-- partition_column to partition the single table (optional)
-- monthly_partition : yes or no to join by month
• Change sameAs definition so that two dataframes that have different column ordering are considered

Fall data flattening validation

01 Jun 13:25
Compare
Choose a tag to compare
  • Ran at the CNAM on 29/05/2017