Skip to content

Releases: Hk669/bpetokenizer

v1.2.1

06 Jun 18:34
de22772
Compare
Choose a tag to compare

What's Changed

  • feat: starttime-endtime added with the throughput on verbose by @Hk669 in #10
  • Updates for the pretrained tokenizers. by @Hk669 in #11

Full Changelog: v1.2.0...v1.2.1

v1.2.0

05 Jun 15:15
c7513f3
Compare
Choose a tag to compare

What's Changed

  • deprecated the version check when loading by @Hk669 in #8

Full Changelog: v1.0.4...v1.2.0

v1.0.4

05 Jun 15:01
e5d5e43
Compare
Choose a tag to compare

What's Changed

  • feat: from_pretrained enabled with wi17k_base by @Hk669 in #6

New PRs

Full Changelog: v1.0.32...v1.0.4

v1.0.32

29 May 15:00
Compare
Choose a tag to compare

Full Changelog: v1.0.31...v1.0.32

  • added hyperparameter min_frequency to adjust the merge pairs to avoid extra vocab.
  • default is set to 2.
  • made some changes in the tests.

v1.0.31

29 May 08:42
Compare
Choose a tag to compare

Full Changelog: v1.0.3...v1.0.31

  • added a tokens visibilty feature to the developers to view their splitting of the tokens and as well as the text chunks split using the pattern.
  • added more samples

v1.0.3

28 May 08:28
Compare
Choose a tag to compare

added the mode parameter in the save and load methods to help developers, save and load their vocab and the merges of the tokenizer in their desired format .

Full Changelog: v1.0.21...v1.0.3

v1.0.2

27 May 20:43
Compare
Choose a tag to compare

build working correctly, ensuring the upload to pypi working.

v1.0.10

27 May 17:44
Compare
Choose a tag to compare

testing the pypi package auto upload

v1.0.1

27 May 17:28
Compare
Choose a tag to compare

first release

adds the following functionalities:

  • BPETokenizer: which can be used to build your tokenizer for the LLM
  • Tokenizer: a base class which leverages the save and load of the vocab and merges