Releases · huggingface/datasets

11 Sep 13:50

albertvillanova

3.0.0

3505ed9

3.0.0 Latest

Latest

Dataset Features

Use Polars functions in .map()

Allow Polars as valid output type by @psmyth94 in #6762

Example:

>>> from datasets import load_dataset
>>> ds = load_dataset("lhoestq/CudyPokemonAdventures", split="train").with_format("polars")
>>> cols = [pl.col("content").str.len_bytes().alias("length")]
>>> ds_with_length = ds.map(lambda df: df.with_columns(cols), batched=True)
>>> ds_with_length[:5]
shape: (5, 5)
┌─────┬───────────────────────────────────┬───────────────────────────────────┬───────────────────────┬────────┐
│ idx ┆ title                             ┆ content                           ┆ labels                ┆ length │
│ --- ┆ ---                               ┆ ---                               ┆ ---                   ┆ ---    │
│ i64 ┆ str                               ┆ str                               ┆ str                   ┆ u32    │
╞═════╪═══════════════════════════════════╪═══════════════════════════════════╪═══════════════════════╪════════╡
│ 0   ┆ The Joyful Adventure of Bulbasau… ┆ Bulbasaur embarked on a sunny qu… ┆ joyful_adventure      ┆ 180    │
│ 1   ┆ Pikachu's Quest for Peace         ┆ Pikachu, with his cheeky persona… ┆ peaceful_narrative    ┆ 138    │
│ 2   ┆ The Tender Tale of Squirtle       ┆ Squirtle took everyone on a memo… ┆ gentle_adventure      ┆ 135    │
│ 3   ┆ Charizard's Heartwarming Tale     ┆ Charizard found joy in helping o… ┆ heartwarming_story    ┆ 112    │
│ 4   ┆ Jolteon's Sparkling Journey       ┆ Jolteon, with his zest for life,… ┆ celebratory_narrative ┆ 111    │
└─────┴───────────────────────────────────┴───────────────────────────────────┴───────────────────────┴────────┘

Support NumPy 2
- Allow numpy-2.1 and test it without audio extra by @albertvillanova in #7118

Cache Changes

Use huggingface_hub cache by @lhoestq in #7105
- use the huggingface_hub cache for files downloaded from HF, by default at ~/.cache/huggingface/hub
- cached datasets (Arrow files) will still be reloaded from the datasets cache, by default at ~/.cache/huggingface/datasets

Breaking changes

Remove deprecated code by @albertvillanova in #6996
- removed deprecated arguments like use_auth_token, fs or ignore_verifications
Remove beam by @albertvillanova in #6987
- removed deprecated apache beam datasets support
Remove metrics by @albertvillanova in #6983
- remove deprecated load_metric, please use the evaluate library instead
Remove tasks by @albertvillanova in #6999
- remove deprecated task argument in load_dataset() .prepare_for_task() method, datasets.tasks module

General improvements and bug fixes

Improved the tutorial by adding a link for loading datasets by @AmboThom in #7042
Automatically create cache_dir from cache_file_name by @ringohoffman in #7096
remove more script docs by @lhoestq in #7104
Fix args of feature docstrings by @albertvillanova in #7103
Temporarily pin numpy<2.1 to fix CI by @albertvillanova in #7114
Fix ConnectionError for gated datasets and unauthenticated users by @albertvillanova in #7110
Install transformers with numpy-2 CI by @albertvillanova in #7119
don't mention the script if trust_remote_code=False by @severo in #7120
Fix typed examples iterable state dict by @lhoestq in #7121
Rename LargeList.dtype to LargeList.feature by @albertvillanova in #7106
Fix wrong SHA in CI tests of HubDatasetModuleFactoryWithParquetExport by @albertvillanova in #7125
Disable implicit token in CI by @albertvillanova in #7126
Test get_dataset_config_info with non-existing/gated/private dataset by @albertvillanova in #7124
fix streaming from arrow files by @fschlatt in #7083

New Contributors

@AmboThom made their first contribution in #7042
@fschlatt made their first contribution in #7083

Full Changelog: 2.21.0...3.0.0

Contributors

severo, albertvillanova, and 5 other contributors

Assets 2

14 Aug 08:08

albertvillanova

2.21.0

a1b5a32

2.21.0

Features

Support pyarrow large_list by @albertvillanova in #7019

Support Polars round trip:

import polars as pl
from datasets import Dataset

df1 = pl.from_dict({"col_1": [[1, 2], [3, 4]]}
df2 = Dataset.from_polars(df).to_polars()
assert df1.equals(df2)

What's Changed

Use HF_HUB_OFFLINE instead of HF_DATASETS_OFFLINE by @Wauplin in #6968
packaging: Remove useless dependencies by @daskol in #6971
Fix resuming arrow format by @lhoestq in #6964
Fix webdataset pickling by @lhoestq in #6972
Set temporary numpy upper version < 2.0.0 to fix CI by @albertvillanova in #6975
Fix regression for pandas < 2.0.0 in JSON loader by @albertvillanova in #6978
Ensure compatibility with numpy 2.0.0 by @KennethEnevoldsen in #6976
Remove underlines between badges by @novialriptide in #6966
Update docs on trust_remote_code defaults to False by @albertvillanova in #6981
Improve skip take shuffling and distributed by @lhoestq in #6965
Fix tests using hf-internal-testing/librispeech_asr_dummy by @albertvillanova in #6998
Fix dump of bfloat16 torch tensor by @lhoestq in #7002
minor fix for bfloat16 by @lhoestq in #7003
Fix incorrect rank value in data splitting by @yzhangcs in #6994
less script docs by @lhoestq in #6993
Fix CI by temporarily pinning ruff < 0.5.0 by @albertvillanova in #7007
Support ruff 0.5.0 in CI by @albertvillanova in #7009
Fix WebDatasets KeyError for user-defined Features when a field is missing in an example by @ProGamerGov in #7004
[Streaming] retry on requests errors by @lhoestq in #6963
Re-enable raising error from huggingface-hub FutureWarning in CI by @albertvillanova in #7011
Skip faiss tests on Windows to avoid running CI for 360 minutes by @albertvillanova in #7014
Support fsspec 2024.6.1 by @albertvillanova in #7017
Persist IterableDataset epoch in workers by @lhoestq in #6710
Fix casting list array to fixed size list by @albertvillanova in #7021
Remove dead code for pyarrow < 15.0.0 by @albertvillanova in #7023
Fix check_library_imports by @lhoestq in #7026
Missing line from previous pr by @lhoestq in #7027
Fix ci by @lhoestq in #7028
Add decorator as explicit test dependency by @albertvillanova in #7043
Mark tests that require librosa by @albertvillanova in #7044
Unblock NumPy 2.0 by @NeilGirdhar in #6991
Fix tensorflow min version depending on Python version by @albertvillanova in #7045
Support librosa and numpy 2.0 for Python 3.10 by @albertvillanova in #7046
add checkpoint and resume title in docs by @lhoestq in #7050
Update load_hub.mdx by @severo in #7057
Add batching to IterableDataset by @lappemic in #7054
Avoid calling http_head for non-HTTP URLs by @albertvillanova in #7062
Fix load_dataset for data_files with protocols other than HF by @matstrand in #6862
Add batch method to Dataset class by @lappemic in #7064
Fix doc generation when NamedSplit is used as parameter default value by @albertvillanova in #7036
Fix CI by temporarily marking test_convert_to_parquet as expected to fail by @albertvillanova in #7074
add split argument to Generator by @piercus in #7015
Update required soxr version from pre-release to release by @albertvillanova in #7075
Fix CI test_convert_to_parquet by @albertvillanova in #7078
Fix prepare_single_hop_path_and_storage_options by @albertvillanova in #7068
Set load_from_disk path type as PathLike by @albertvillanova in #7081
Fix push_to_hub by not calling create_branch if branch exists by @albertvillanova in #7069
feat: support non streamable arrow file binary format by @kmehant in #7025
Support HTTP authentication in non-streaming mode by @albertvillanova in #7082
chore: fix typos in docs by @hattizai in #7034
Fix CI for metrics by @albertvillanova in 83e5c05

New Contributors

@novialriptide made their first contribution in #6966
@yzhangcs made their first contribution in #6994
@ProGamerGov made their first contribution in #7004
@NeilGirdhar made their first contribution in #6991
@matstrand made their first contribution in #6862
@lappemic made their first contribution in #7054
@piercus made their first contribution in #7015
@kmehant made their first contribution in #7025
@hattizai made their first contribution in #7034

Full Changelog: 2.20.0...2.21.0

Contributors

piercus, matstrand, and 13 other contributors

Assets 2

13 Jun 14:57

albertvillanova

2.20.0

98fdc9e

2.20.0

Important

Remove default trust_remote_code=True by @lhoestq in #6954
- datasets with a python loading script now require passing trust_remote_code=True to be used

Datasets features

[Resumable IterableDataset] Add IterableDataset state_dict by @lhoestq in #6658

checkpoint and resume an iterable dataset (e.g. when streaming):

>>> iterable_dataset = Dataset.from_dict({"a": range(6)}).to_iterable_dataset(num_shards=3)
>>> for idx, example in enumerate(iterable_dataset):
...     print(example)
...     if idx == 2:
...         state_dict = iterable_dataset.state_dict()
...         print("checkpoint")
...         break
>>> iterable_dataset.load_state_dict(state_dict)
>>> print(f"restart from checkpoint")
>>> for example in iterable_dataset:
...     print(example)

Returns:

{'a': 0}
{'a': 1}
{'a': 2}
checkpoint
restart from checkpoint
{'a': 3}
{'a': 4}
{'a': 5}

General improvements and bug fixes

Add docs about the CLI by @albertvillanova in #6831
Remove token arg from CLI examples by @albertvillanova in #6839
Allow deleting a subset/config from a no-script dataset by @albertvillanova in #6820
Fix line-endings in tests on Windows by @albertvillanova in #6857
Fix CI by temporarily pinning huggingface-hub < 0.23.0 by @albertvillanova in #6861
Fix dataset name for community Hub script-datasets by @albertvillanova in #6855
Update tqdm >= 4.66.3 to fix vulnerability by @albertvillanova in #6870
Fix download for dict of dicts of URLs by @albertvillanova in #6871
Set dev version by @albertvillanova in #6873
Shorten long logs by @lhoestq in #6875
Support jax 0.4.27 in CI tests by @albertvillanova in #6885
Close gzipped files properly by @lhoestq in #6893
Make CLI convert_to_parquet not raise error if no rights to create script branch by @albertvillanova in #6902
Fix YAML error in README files appearing on GitHub by @albertvillanova in #6898
Document that to_json defaults to JSON Lines by @albertvillanova in #6895
Require Pillow >= 9.4.0 to avoid AttributeError when loading image dataset by @albertvillanova in #6883
Create function to convert to parquet by @albertvillanova in #6878
Update features.py to avoid bfloat16 unsupported error by @skaulintel in #6607
Fix decoding multi part extension by @lhoestq in #6904
Use pandas ujson in JSON loader to improve performance by @albertvillanova in #6874
Update requests >=2.32.1 to fix vulnerability by @albertvillanova in #6909
Fix wrong type hints in data_files by @albertvillanova in #6910
Remove dead code for non-dict data_files from packaged modules by @albertvillanova in #6911
Support fsspec 2024.5.0 by @albertvillanova in #6921
Remove torchaudio remnants from code by @albertvillanova in #6922
[WebDataset] Add .pth support for torch tensors by @lhoestq in #6920
Unpin hfh by @lhoestq in #6876
Preserve JSON column order and support list of strings field by @albertvillanova in #6914
[WebDataset] Support compressed files by @lhoestq in #6931
update ci user by @lhoestq in #6933
Revert ci user by @lhoestq in #6934
Fix NonMatchingSplitsSizesError/ExpectedMoreSplits when passing data_dir/data_files in no-code Hub datasets by @albertvillanova in #6925
Set dev version by @albertvillanova in #6944
Update yanked version of minimum requests requirement by @albertvillanova in #6945
Re-enable import sorting disabled by flake8:noqa directive when using ruff linter by @albertvillanova in #6946
Update dataset_dict.py by @Arunprakash-A in #6932
Update process.mdx: Code Listings Fixes by @FadyMorris in #6928
Fix small typo by @marcenacp in #6955
update docs on N-dim arrays by @lhoestq in #6956
Fix typos in docs by @albertvillanova in #6957
Validate config name and data_files in packaged modules by @albertvillanova in #6915
Add support for categorical/dictionary types by @EthanSteinberg in #6892
feat(ci): add trufflehog secrets detection by @McPatate in #6960
Better error handling in dataset_module_factory by @Wauplin in #6959
Move info_utils errors to exceptions module by @albertvillanova in #6952
fix(ci): remove unnecessary permissions by @McPatate in #6962

New Contributors

@skaulintel made their first contribution in #6607
@Arunprakash-A made their first contribution in #6932
@FadyMorris made their first contribution in #6928
@marcenacp made their first contribution in #6955
@EthanSteinberg made their first contribution in #6892
@McPatate made their first contribution in #6960

Full Changelog: 2.19.0...2.20.0

Contributors

EthanSteinberg, albertvillanova, and 7 other contributors

Assets 2

03 Jun 05:26

albertvillanova

2.19.2

7290bbd

2.19.2

Bug fixes

Make CLI convert_to_parquet not raise error if no rights to create script branch by @albertvillanova in #6902
Require Pillow >= 9.4.0 to avoid AttributeError when loading image dataset by @albertvillanova in #6883
Update requests >=2.32.1 to fix vulnerability by @albertvillanova in #6909
Fix NonMatchingSplitsSizesError/ExpectedMoreSplits when passing data_dir/data_files in no-code Hub datasets by @albertvillanova in #6925

Full Changelog: 2.19.1...2.19.2

Contributors

albertvillanova

Assets 2

06 May 09:40

albertvillanova

2.19.1

bb2664c

2.19.1

Bug fixes

Fix download for dict of dicts of URLs by @albertvillanova in #6871

Full Changelog: 2.19.0...2.19.1

Contributors

albertvillanova

Assets 2

19 Apr 08:46

albertvillanova

2.19.0

0d3c746

2.19.0

Dataset Features

Add Polars compatibility by @psmyth94 in #6531

convert to a Polars dataframe using .to_polars();

import polars as pl
from datasets import load_dataset
ds = load_dataset("DIBT/10k_prompts_ranked", split="train")
ds.to_polars() \
    .groupby("topic") \
    .agg(pl.len(), pl.first()) \
    .sort("len", descending=True)

Use Polars formatting to return Polars objects when accessing a dataset:
```
ds = ds.with_format("polars")
ds[:10].group_by("kind").len()
```

Add fsspec support for to_json, to_csv, and to_parquet by @alvarobartt in #6096

Save on HF in any file format:

ds.to_json("hf://datasets/username/my_json_dataset/data.jsonl")
ds.to_csv("hf://datasets/username/my_csv_dataset/data.csv")
ds.to_parquet("hf://datasets/username/my_parquet_dataset/data.parquet")

Add mode parameter to Image feature by @mariosasko in #6735
- Set images to be read in a certain mode like "RGB"
```
dataset = dataset.cast_column("image", Image(mode="RGB"))
```
Add CLI function to convert script-dataset to Parquet by @albertvillanova in #6795
- run command to open a PR in script-based dataset to convert it to Parquet:
```
datasets-cli convert_to_parquet <dataset_id>
```
Add Dataset.take and Dataset.skip by @lhoestq in #6813
- same as IterableDataset.take and IterableDataset.skip
```
ds = ds.take(10)  # take only the first 10 examples
```

General improvements and bug fixes

Bump huggingface-hub lower version to 0.21.2 by @albertvillanova in #6713
fix CastError pickling by @lhoestq in #6712
Expand no-code dataset info with datasets-server info by @mariosasko in #6714
Fix sliced ConcatenationTable pickling with mixed schemas vertically by @lhoestq in #6715
Fix concurrent script loading with force_redownload by @lhoestq in #6718
get_dataset_default_config_name docstring by @lhoestq in #6723
Deprecate Beam API and download from HF GCS bucket by @mariosasko in #6474
Deprecate Pandas builder by @mariosasko in #6730
Using a registry instead of calling globals for fetching feature types by @psmyth94 in #6727
Update torch_formatter.py by @VarunNSrivastava in #6402
Improve default patterns resolution by @mariosasko in #6704
Transpose images with EXIF Orientation tag by @mariosasko in #6739
Fix missing download_config in get_data_patterns by @lhoestq in #6742
Allow null values in dict columns by @mariosasko in #6743
Fix fsspec tqdm callback by @lhoestq in #6749
chore(deps): bump fsspec by @shcheklein in #6747
Fix offline mode with single config by @lhoestq in #6741
Remove deprecated code by @Wauplin in #6761
fixing the issue 6755(small typo) by @JINO-ROHIT in #6767
remove_columns/rename_columns doc fixes by @mariosasko in #6772
Fix CI by @mariosasko in #6780
rename datasets-server to dataset-viewer by @severo in #6785
Install dependencies with uv in CI by @mariosasko in #6779
Fix cache conflict in _check_legacy_cache2 by @lhoestq in #6792
Fix typo in docs (upload CLI) by @Wauplin in #6802
fix DatasetBuilder._split_generators incomplete type annotation by @JonasLoos in #6799
#6791 Improve type checking around FAISS by @Dref360 in #6803
Fix --repo-type order in cli upload docs by @lhoestq in #6804
Fix hf-internal-testing/dataset_with_script commit SHA in CI test by @albertvillanova in #6806
Fix cache path to snakecase for CachedDatasetModuleFactory and Cache by @izhx in #6754
Multithreaded downloads by @lhoestq in #6794
Remove os.path.relpath in resolve_patterns by @mariosasko in #6815
Extract data on the fly in packaged builders by @mariosasko in #6784
add allow_primitive_to_str and allow_decimal_to_str instead of allow_number_to_str by @Modexus in #6811
Support indexable objects in Dataset.__getitem__ by @mariosasko in #6817
Make convert_to_parquet CLI command create script branch by @albertvillanova in #6809
Fix parquet export infos by @lhoestq in #6822

New Contributors

@VarunNSrivastava made their first contribution in #6402
@shcheklein made their first contribution in #6747
@JINO-ROHIT made their first contribution in #6767
@JonasLoos made their first contribution in #6799
@izhx made their first contribution in #6754
@Modexus made their first contribution in #6811

Full Changelog: 2.18.0...2.19.0

Contributors

severo, shcheklein, and 12 other contributors

Assets 2

01 Mar 21:00

lhoestq

2.18.0

ca8409a

2.18.0

Dataset features

Make JSON builder support an array of strings by @albertvillanova in #6696
Base parquet batch_size on parquet row group size by @lhoestq in #6701
- Faster cold start for streaming
Change default compression argument for JsonDatasetWriter by @Rexhaif in #6659
Automatic Conversion for uint16/uint32 to Compatible PyTorch Dtypes by @mohalisad in #6660
fsspec: support fsspec>=2023.12.0 glob changes by @pmrowla in #6687
- Support latest fsspec up to 2024.2.0

General improvements and bug fixes

Fix for Incorrect ex_iterable used with multi num_worker by @kq-chen in #6582
- Previously using PyTorch DDP and num_workers could lead to incorrect shards assignments to workers and cause errors
Fix imagefolder dataset url by @mariosasko in #6683
Improve error message for gated datasets on load by @lewtun in #6684
Updated Quickstart Notebook link by @Codeblockz in #6685
Update the print message for chunked_dataset in process.mdx by @gzbfgjf2 in #6693
Faster xlistdir by @mariosasko in #6698
Update GitHub Actions to Node 20 by @albertvillanova in #6682
Update release instructions by @albertvillanova in #6681
Pass through information about location of cache directory. by @stridge-cruxml in #6677
Allow SplitDict setitem to replace existing SplitInfo by @lhoestq in #6665
Update ruff by @lhoestq in #6706
Silence ruff deprecation messages by @mariosasko in #6707
fix: show correct package name to install biopython by @BioGeek in #6662
Fix data_files when passing data_dir by @lhoestq in #6705
Release: 2.18.0 by @lhoestq in #6708

New Contributors

@Codeblockz made their first contribution in #6685
@gzbfgjf2 made their first contribution in #6693
@stridge-cruxml made their first contribution in #6677
@pmrowla made their first contribution in #6687
@BioGeek made their first contribution in #6662
@Rexhaif made their first contribution in #6659
@mohalisad made their first contribution in #6660
@kq-chen made their first contribution in #6582

Full Changelog: 2.17.1...2.18.0

Contributors

BioGeek, pmrowla, and 10 other contributors

Assets 2

19 Feb 09:58

albertvillanova

2.17.1

5d22682

2.17.1

Bug Fixes

Revert the changes in arrow_writer.py from #6636 by @bryant1410 in #6664
Remove deprecated verbose parameter from CSV builder by @albertvillanova in #6672

Full Changelog: 2.17.0...2.17.1

Contributors

bryant1410 and albertvillanova

Assets 2

09 Feb 10:09

albertvillanova

2.17.0

7063357

2.17.0

Dataset Features

[WebDataset] Audio support and bug fixes by @lhoestq in #6573
Add concurrent loading of shards to datasets.load_from_disk by @kkoutini in #6464
Support data_dir parameter in push_to_hub by @albertvillanova in #6634
Support push_to_hub without org/user to default to logged-in user by @albertvillanova in #6629
Allow concatenation of datasets with mixed structs by @Dref360 in #6587

General improvements and bug fixes

Fix parallel downloads for datasets without scripts by @lhoestq in #6551
Fix imagefolder with one image by @lhoestq in #6556
Fix tests based on datasets that used to have scripts by @lhoestq in #6574
remove eli5 test by @lhoestq in #6583
[IterableDataset] Fix drop_last_batchin map after shuffling or sharding by @lhoestq in #6575
Support standalone yaml by @lhoestq in #6557
Drop redundant None guard. by @xkszltl in #6596
fix os.listdir return name is empty string by @d710055071 in #6581
Fix CI: pyarrow 15, pandas 2.2 and sqlachemy by @lhoestq in #6617
Dedicated RNG object for fingerprinting by @mariosasko in #6606
Migrate from setup.cfg to pyproject.toml by @mariosasko in #6619
keep more info in DatasetInfo.from_merge #6585 by @JochenSiegWork in #6586
Read GeoParquet files using parquet reader by @weiji14 in #6508
Use schema metadata only if it matches features by @lhoestq in #6616
Raise error on bad split name by @lhoestq in #6626
Disable tqdm bars in non-interactive environments by @mariosasko in #6627
Add with_rank param to Dataset.filter by @mariosasko in #6608
Bump max range of dill to 0.3.8 by @ringohoffman in #6630
Fix filelock: use current umask for filelock >= 3.10 by @lhoestq in #6631
Faster webdataset streaming by @lhoestq in #6578
Multi gpu docs by @lhoestq in #6550
dataset viewer requires no-script by @severo in #6633
Make split slicing consistent with list slicing by @mariosasko in #5891
Do not use Parquet exports if revision is passed by @albertvillanova in #6555
Make CLI test support multi-processing by @albertvillanova in #6628
Fix reload cache with data dir by @lhoestq in #6632
Fix array cast/embed with null values by @mariosasko in #6283
Faster column validation and reordering by @psmyth94 in #6636
Better multi-gpu example by @lhoestq in #6646
Fix missing info when loading some datasets from Parquet export by @lhoestq in #6635
Minor multi gpu doc improvement by @lhoestq in #6649
Document usage of hfh cli instead of git by @lhoestq in #6648

New Contributors

@xkszltl made their first contribution in #6596
@kkoutini made their first contribution in #6464
@JochenSiegWork made their first contribution in #6586
@weiji14 made their first contribution in #6508
@ringohoffman made their first contribution in #6630
@psmyth94 made their first contribution in #6636

Full Changelog: 2.16.1...2.17.0

Contributors

severo, xkszltl, and 10 other contributors

Assets 2

30 Dec 16:46

lhoestq

2.16.1

7b2bcd7

2.16.1

Bug fixes

Fix dl_manager.extract returning FileNotFoundError by @lhoestq in #6543
- Fix bug causing FileNotFoundError when passing a relative directory as cache_dir to load_dataset
Fix custom configs from script by @lhoestq in #6544
- Fix bug when loading a dataset with a loading script using custom arguments would fail
- e.g. load_dataset("ted_talks_iwslt", language_pair=("ja", "en"), year="2015")

Full Changelog: 2.16.0...2.16.1

Contributors

lhoestq

Assets 2

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Dataset Features

Cache Changes

Breaking changes

General improvements and bug fixes

New Contributors

Contributors

Features

What's Changed

New Contributors

Contributors

Important

Datasets features

General improvements and bug fixes

New Contributors

Contributors

Bug fixes

Contributors

Bug fixes

Contributors

Dataset Features

General improvements and bug fixes

New Contributors

Contributors

Dataset features

General improvements and bug fixes

New Contributors

Contributors

Bug Fixes

Contributors

Dataset Features

General improvements and bug fixes

New Contributors

Contributors

Bug fixes

Contributors

Releases: huggingface/datasets

3.0.0

Dataset Features

Cache Changes

Breaking changes

General improvements and bug fixes

New Contributors

Contributors

2.21.0

Features

What's Changed

New Contributors

Contributors

2.20.0

Important

Datasets features

General improvements and bug fixes

New Contributors

Contributors

2.19.2

Bug fixes

Contributors

2.19.1

Bug fixes

Contributors

2.19.0

Dataset Features

General improvements and bug fixes

New Contributors

Contributors

2.18.0

Dataset features

General improvements and bug fixes

New Contributors

Contributors

2.17.1

Bug Fixes

Contributors

2.17.0

Dataset Features

General improvements and bug fixes

New Contributors

Contributors

2.16.1

Bug fixes

Contributors