-
Notifications
You must be signed in to change notification settings - Fork 3
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Precompile Python HuggingFace's Tokenizers=0.10.3 #65
Comments
This package is not available currently in the Compute Canada wheelhouse. /cvmfs/soft.computecanada.ca/custom/python/wheelhouse/generic/tokenizers-0.10.1+computecanada-cp38-cp38-linux_x86_64.whl #(the latest vers) CC software list: https://docs.computecanada.ca/wiki/Available_Python_wheels Attempting to install in a virtualenv requires ruby, cargo (crates) and results in an error with ruby/1.41.0. $ module load StdEnv/2018.3
$ module load python/3.8.0
$ module load rust/1.41.0
$ . ~/venv/test-tokenizers/bin/activate
$ pip install tokenizers==0.10.3
[..]
Caused by:
process didn't exit successfully: `rustc --edition=2018 --crate-name bitvec /home/fieldsa/.cargo/registry/src/github.com-1ecc6299db9ec823/bitvec-0.19.5/src/lib.rs --error-format=json --json=diagnostic-rendered-ansi,artifacts --crate-type lib --emit=dep-info,metadata,link -C opt-level=3 --cfg 'feature="alloc"' --cfg 'feature="std"' -C metadata=e573222695fb9170 -C extra-filename=-e573222695fb9170 --out-dir /tmp/pip-install-a5ji38el/tokenizers_a44e33f99b4942a5b298511ad70ef886/target/release/deps -L dependency=/tmp/pip-install-a5ji38el/tokenizers_a44e33f99b4942a5b298511ad70ef886/target/release/deps --extern funty=/tmp/pip-install-a5ji38el/tokenizers_a44e33f99b4942a5b298511ad70ef886/target/release/deps/libfunty-2289090f5a439874.rmeta --extern radium=/tmp/pip-install-a5ji38el/tokenizers_a44e33f99b4942a5b298511ad70ef886/target/release/deps/libradium-72e277b2ee5f2108.rmeta --extern tap=/tmp/pip-install-a5ji38el/tokenizers_a44e33f99b4942a5b298511ad70ef886/target/release/deps/libtap-31bb11a449977869.rmeta --extern wyz=/tmp/pip-install-a5ji38el/tokenizers_a44e33f99b4942a5b298511ad70ef886/target/release/deps/libwyz-342e26516d1da351.rmeta --cap-lints allow` (exit code: 1)
warning: build failed, waiting for other jobs to finish...
error: build failed
cargo rustc --lib --manifest-path Cargo.toml --features pyo3/extension-module --release --verbose -- --crate-type cdylib
error: cargo failed with code: 101
----------------------------------------
ERROR: Failed building wheel for tokenizers
Failed to build tokenizers Thus, to provide this package in CC CVMFS may require a request upstream to Compute Canada. Alternately, to compile the package locally witll requires a custom EasyBlock for this particular version. |
If I install latest nightly build of rust from rustup (
Some forums suggest this is due to Debian vs. CentOS compiled shared objects, while another suggests to use RHEL8. ComputeCanada address the GLIBC_2.18 issue with the following page: https://docs.computecanada.ca/wiki/Installing_software_in_your_home_directory#Installing_binary_packages :
Shared object (during build):
|
Thanks @fieldsa I'll take a look at the link you provided. |
Sam using the StdEnv/2020 module with the latest Rust module seemed to work for me. #!/bin/bash |
That's sounds a bit like black magic ;) but I'll give this a try. Thanks |
I gave @ddamoursNRC's script a try and it fails when I try to use it. It gives the same python -c 'import tokenizers'
Traceback (most recent call last):
File "<string>", line 1, in <module>
File "/gpfs/projects/DT/mtp/models/WMT2020/opt/miniconda3/envs/tokenizers-0.10.3/lib/python3.10/site-packages/tokenizers/__init__.py", line 79, in <module>
from .tokenizers import (
ImportError: /lib64/libc.so.6: version `GLIBC_2.18' not found (required by /gpfs/projects/DT/mtp/models/WMT2020/opt/miniconda3/envs/tokenizers-0.10.3/lib/python3.10/site-packages/tokenizers/tokenizers.abi3.so) |
This may be a great opportunity to test singularity on Trixie to leverage the latest gcc when running miniconda, rust and the tokenizer. Can you specify if there is an interest level to give it a try? |
Alternately, with-out using containers it may be possible to leverage a login shell with-out loading CVMFS for running miniconda and custom compiled libraries. The downside is that you wouldn't be able to leverage the CC CVMFS provided modules / python wheels. However, if there is a compatibility issues due to very new code compared to OS libraries this could help keep the GLIBC version conflict minimized by using the same version to compile all software components locally. If the version in OS is just too old then the singularity containers may be best approach. |
@SamuelLarkin - was this issue resolved and you were able to run Tokenizers? |
Hi,
I'm trying to install HuggingFace's Tokenizers=0.10.3 with
pip install tokenizers==0.10.3
and it fails. If I try to install the version 0.10.1 it succeeds becausepip
finds a version build from computeCanadaBased on this output, I would like tokenizers==0.10.3+computecanada for Python-3.8.
Thanks
The text was updated successfully, but these errors were encountered: