Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

python's scipy package broken on Trixie #68

Open
SamuelLarkin opened this issue Nov 17, 2021 · 5 comments
Open

python's scipy package broken on Trixie #68

SamuelLarkin opened this issue Nov 17, 2021 · 5 comments

Comments

@SamuelLarkin
Copy link
Collaborator

Hi,
I want to use simAlign-0.3 and I created a conda environment to do so but the computeCanada package is missing GLIBCXX_3.4.21.

source /gpfs/projects/DT/mtp/models/WMT2020/opt/miniconda3/bin/activate

name: simAlign-0.3
channels:
  - pytorch
  - anaconda
  - conda-forge
  - defaults
dependencies:
  - _libgcc_mutex=0.1=conda_forge
  - _openmp_mutex=4.5=1_llvm
  - attrs=21.2.0=pyhd8ed1ab_0
  - backcall=0.2.0=pyh9f0ad1d_0
  - backports=1.0=py_2
  - backports.functools_lru_cache=1.6.4=pyhd8ed1ab_0
  - blas=1.0=mkl
  - ca-certificates=2021.10.8=ha878542_0
  - colorama=0.4.4=pyh9f0ad1d_0
  - cudatoolkit=10.1.243=h036e899_9
  - cudnn=7.6.5.32=hc0a50b0_1
  - cxxfilt=0.3.0=py37hcd2ae1e_0
  - decorator=5.1.0=pyhd8ed1ab_0
  - importlib-metadata=4.8.2=py37h89c1867_0
  - importlib_metadata=4.8.2=hd8ed1ab_0
  - iniconfig=1.1.1=pyh9f0ad1d_0
  - ipython=7.29.0=py37h6531663_2
  - jedi=0.18.0=py37h89c1867_3
  - ld_impl_linux-64=2.36.1=hea4e1c9_2
  - libffi=3.4.2=h7f98852_5
  - libgcc-ng=11.2.0=h1d223b6_11
  - libnsl=2.0.0=h7f98852_0
  - libstdcxx-ng=11.2.0=he4da1e4_11
  - libuv=1.42.0=h7f98852_0
  - libzlib=1.2.11=h36c2ea0_1013
  - llvm-openmp=12.0.1=h4bd325d_1
  - matplotlib-inline=0.1.3=pyhd8ed1ab_0
  - mkl=2021.4.0=h8d4b97c_729
  - mkl-service=2.4.0=py37h5e8e339_0
  - mkl_fft=1.3.1=py37h3e078e5_1
  - mkl_random=1.2.2=py37h219a48f_0
  - mkl_fft=1.3.1=py37h3e078e5_1                                                                                                                              [211/655]
  - mkl_random=1.2.2=py37h219a48f_0
  - more-itertools=8.11.0=pyhd8ed1ab_0
  - nccl=2.11.4.1=h8b44402_0
  - ncurses=6.2=h58526e2_4
  - numpy=1.21.2=py37h20f2e39_0
  - numpy-base=1.21.2=py37h79a1101_0
  - nvidia-apex=0.1=py37h519209e_4
  - openssl=3.0.0=h7f98852_2
  - packaging=21.0=pyhd8ed1ab_0
  - parso=0.8.2=pyhd8ed1ab_0
  - pexpect=4.8.0=pyh9f0ad1d_2
  - pickleshare=0.7.5=py_1003
  - pip=21.3.1=pyhd8ed1ab_0
  - pluggy=1.0.0=py37h89c1867_2
  - prompt-toolkit=3.0.22=pyha770c72_0
  - ptyprocess=0.7.0=pyhd3deb0d_0
  - py=1.11.0=pyh6c4a22f_0
  - pygments=2.10.0=pyhd8ed1ab_0
  - pyparsing=3.0.6=pyhd8ed1ab_0
  - pytest=6.2.5=py37h89c1867_1
  - python=3.7.12=hf930737_100_cpython
  - python_abi=3.7=2_cp37m
  - pytorch=1.10.0=py3.7_cpu_0
  - pytorch-mutex=1.0=cpu
  - pyyaml=6.0=py37h5e8e339_3
  - readline=8.1=h46c0cb4_0
  - setuptools=59.1.1=py37h89c1867_0
  - six=1.16.0=pyh6c4a22f_0
  - sqlite=3.36.0=h9cd32fc_2
  - tbb=2021.4.0=h4bd325d_1
  - tk=8.6.11=h27826a3_1
  - toml=0.10.2=pyhd8ed1ab_0
  - tqdm=4.62.3=pyhd8ed1ab_0
  - traitlets=5.1.1=pyhd8ed1ab_0
  - typing_extensions=4.0.0=pyha770c72_0
  - wcwidth=0.2.5=pyh9f0ad1d_2
  - wheel=0.37.0=pyhd8ed1ab_1
  - xz=5.2.5=h516909a_1
  - yaml=0.2.5=h516909a_0
  - zipp=3.6.0=pyhd8ed1ab_0
  - zlib=1.2.11=h36c2ea0_1013
  - pip:
    - certifi==2021.10.8+computecanada
    - charset-normalizer==2.0.7+computecanada
    - click==8.0.3+computecanada
    - filelock==3.4.0
    - idna==3.3+computecanada
    - joblib==1.1.0+computecanada
    - networkx==2.4+computecanada
    - regex==2020.11.13+computecanada
    - requests==2.26.0+computecanada
    - sacremoses==0.0.46+computecanada
    - scikit-learn==0.23.0+computecanada
    - scipy==1.4.1+computecanada
    - simalign==0.3
    - threadpoolctl==3.0.0+computecanada
    - tokenizers==0.10.1+computecanada
    - transformers==4.3.2
    - urllib3==1.26.7+computecanada
variables:
  TRANSFORMERS_CACHE: /gpfs/projects/DT/mtp/models/transformers
prefix: /gpfs/projects/DT/mtp/WMT20/opt/miniconda3/envs/simAlign-0.3
ipython
Python 3.7.12 | packaged by conda-forge | (default, Oct 26 2021, 06:08:21)
Type 'copyright', 'credits' or 'license' for more information
IPython 7.29.0 -- An enhanced Interactive Python. Type '?' for help.

In [1]: from simalign import SentenceAligner
---------------------------------------------------------------------------
ImportError                               Traceback (most recent call last)
<ipython-input-1-daf0d028f9f2> in <module>
----> 1 from simalign import SentenceAligner

/gpfs/projects/DT/mtp/WMT20/opt/miniconda3/envs/simAlign-0.3/lib/python3.7/site-packages/simalign/__init__.py in <module>
----> 1 from .simalign import EmbeddingLoader, SentenceAligner

/gpfs/projects/DT/mtp/WMT20/opt/miniconda3/envs/simAlign-0.3/lib/python3.7/site-packages/simalign/simalign.py in <module>
      6
      7 import numpy as np
----> 8 from scipy.stats import entropy
      9 from scipy.sparse import csr_matrix
     10 from sklearn.preprocessing import normalize

/gpfs/projects/DT/mtp/WMT20/opt/miniconda3/envs/simAlign-0.3/lib/python3.7/site-packages/scipy/__init__.py in <module>
    154     # This makes "from scipy import fft" return scipy.fft, not np.fft
    155     del fft
--> 156     from . import fft

/gpfs/projects/DT/mtp/WMT20/opt/miniconda3/envs/simAlign-0.3/lib/python3.7/site-packages/scipy/fft/__init__.py in <module>
     74 from __future__ import division, print_function, absolute_import
     75
---> 76 from ._basic import (
     77     fft, ifft, fft2, ifft2, fftn, ifftn,
     78     rfft, irfft, rfft2, irfft2, rfftn, irfftn,

/gpfs/projects/DT/mtp/WMT20/opt/miniconda3/envs/simAlign-0.3/lib/python3.7/site-packages/scipy/fft/_basic.py in <module>
----> 1 from scipy._lib.uarray import generate_multimethod, Dispatchable
      2 import numpy as np
      3
      4
      5 def _x_replacer(args, kwargs, dispatchables):

/gpfs/projects/DT/mtp/WMT20/opt/miniconda3/envs/simAlign-0.3/lib/python3.7/site-packages/scipy/_lib/uarray.py in <module>
     25     from uarray import _Function
     26 else:
---> 27     from ._uarray import *
     28     from ._uarray import _Function
     29

/gpfs/projects/DT/mtp/WMT20/opt/miniconda3/envs/simAlign-0.3/lib/python3.7/site-packages/scipy/_lib/_uarray/__init__.py in <module>
    112 """
    113
--> 114 from ._backend import *
    115
    116 __version__ = '0.5.1+5.ga864a57.scipy'

/gpfs/projects/DT/mtp/WMT20/opt/miniconda3/envs/simAlign-0.3/lib/python3.7/site-packages/scipy/_lib/_uarray/_backend.py in <module>
     13 import inspect
     14 import functools
---> 15 from . import _uarray  # type: ignore
     16 import copyreg  # type: ignore
     17 import atexit

ImportError: /lib64/libstdc++.so.6: version `GLIBCXX_3.4.21' not found (required by /gpfs/projects/DT/mtp/WMT20/opt/miniconda3/envs/simAlign-0.3/lib/python3.7/site-pac
kages/scipy/_lib/_uarray/_uarray.cpython-37m-x86_64-linux-gnu.so)

There error is from scipy which is a computeCanada package
scipy==1.4.1+computecanada

@fieldsa
Copy link
Collaborator

fieldsa commented Nov 17, 2021

Did you try this command yet?

pip install --upgrade git+https://github.com/cisnlp/simalign.git#egg=simalign

When using the CC CVMFS on trixie - It is not recommended to mix use of Anaconda/miniconda and Python/pip installed from CC wheelhouse (binary python packages in /cvmfs software path use RPATH to their own CC glibc libs).

See: https://docs.computecanada.ca/wiki/Anaconda/en

While it may work during compatible versions, there are many cases were it will not due to library incompatibility. Setting LD_LIBRARY_PATH could break compute canada binary software as it can override the glibc path.

The reason it gives an error is that RPATH is being over-ridden, even though the gcc lib paths are setup correctly with CC stack and would otherwise work.

Prefer conda over pip when-ever possible:

It's best to try to let conda find a package first; though I presume that you need pip since conda channels don't currently have simAlign.

https://www.anaconda.com/blog/using-pip-in-a-conda-environment

  • Problem 1:

If conda decides to use a specific version - pip may conflict with that choice, provided you are not using 100% conda install to get the required libraries.

The pip resolved dependency graph may decide to uninstall and reinstall various modules such as scipy and numpy - even if conda would not have chosen those versions (to maintain compatibility).

This is why it is suggested to use conda first and minimal pip install at the end to fulfill any packages needed, but not to try to then do a conda install after that or pip reinstall of a conflicting package - unfortunately pip sometimes decides to do that. It would be better to re-create the env in conda and then re-do the pip installs if you need to upgrade or add further conda pkg.

  • Problem 2:

That all being stated: the pip command and python commands are in the correct bin PATH inside your env. So it's hard to see why exactly the pip command is getting wheels from the CVMFS; though I've had it happen as my user as well.

Proposed alternative solutions:

  • Workaround 1: build hard-to-find pkg from source

As an alternative solution: it is possible to bypass the use of wheels when installing into conda, and thus avoid the conda pip command from finding the CC wheelhouse (scipy, etc.). (By default the pip command installs wheel files in first preference, when it finds them available.)

[fieldsa@hn2 ~]$ echo $EBEXTSLISTPYTHON 
setuptools-41.0.1,pip-18.1,virtualenv-16.6.2,wheel-0.33.4,gnureadline-8.0.0

# eliminate the wheel module from plugin list:
[fieldsa@hn2 ~]$ export EBEXTSLISTPYTHON=setuptools-41.0.1,pip-18.1,virtualenv-16.6.2,gnureadline-8.0.0 
# perform a source only compile of the package in pip (provided conda doesn't find anything)
[fieldsa@hn2 ~]$ pip install <pkg> --no-binary :all:
  • Workaround 2:
    The other option is to bypass conda entirely by using a virtualenv with CC python/3.8; or
$ exit
$ ssh trixie
$ module load python/3.8
$ virtualenv ~/venv/new-env-py38
$ pip install -r requirements.txt

This may not work as well when conda is providing some other libraries or integration of tools not found in module files of the cluster. In that case, some C/C++/Fortran dependency libs may need to be compiler from source manually into a module file loaded at runtime, or made available with eb command in custom modules path.

@SamuelLarkin
Copy link
Collaborator Author

Hi,
I should've documented how I created that conda environment which is pretty much conda doing the work.

Trixie

conda \
  create \
    --yes \
    --name=simAlign-0.3 \
    --channel=pytorch \
    --channel=conda-forge \
    --channel=anaconda \
    cudatoolkit=10.1 \
    cudnn \
    ipython \
    nccl \
    nvidia-apex \
    python=3.7 \
    pytorch

conda activate simAlign-0.3

pip \
  install \
    simalign==0.3 \
    tokenizers==0.10.1 \
    transformers==4.3.2

conda env config vars set TRANSFORMERS_CACHE=/gpfs/projects/DT/mtp/models/transformers

conda env export > $CONDA_PREFIX/conda.env.export.yaml

What I could try is to use pip --no-deps

@nrcfieldsa
Copy link

nrcfieldsa commented Apr 8, 2022

Another approach to separate dependencies from the base system is to use a container.

A pre-made singularity container which includes minicoda can be leveraged to install the conda environment + pip packages as you detailed above, but as a separate entity which doesn't rely on system python or scipy version.

Container is located at path: /home/fieldsa/singularity/test_simAlign-0.3.sif
Derived from: /home/fieldsa/singularity/miniconda_4.8.2.def

Testing it seems to show it working with example from: https://github.com/cisnlp/simalign

Singularity> . activate simAlign-0.3
(simAlign-0.3) Singularity> ipython
Python 3.7.12 | packaged by conda-forge | (default, Oct 26 2021, 06:08:21)
Type 'copyright', 'credits' or 'license' for more information
IPython 7.32.0 -- An enhanced Interactive Python. Type '?' for help.
In [1]: from simalign import SentenceAligner
In [2]: 
In [2]: # making an instance of our model.
In [3]: # You can specify the embedding model and all alignment settings in the constructor.
In [4]: myaligner = SentenceAligner(model="bert", token_type="bpe", matching_methods="mai")
2022-04-08 10:43:46,265 - simalign.simalign - INFO - Initialized the EmbeddingLoader with model: bert-base-multilingual-cased
In [5]: 
In [5]: # The source and target sentences should be tokenized to words.
In [6]: src_sentence = ["This", "is", "a", "test", "."]
In [7]: trg_sentence = ["Das", "ist", "ein", "Test", "."]
In [8]: 
In [8]: # The output is a dictionary with different matching methods.
In [9]: # Each method has a list of pairs indicating the indexes of aligned words (The alignments are zero-indexed).
In [10]: alignments = myaligner.get_word_aligns(src_sentence, trg_sentence)
In [11]: 
In [11]: for matching_method in alignments:
    ...:         print(matching_method, ":", alignments[matching_method])
    ...: 
mwmf : [(0, 0), (1, 1), (2, 2), (3, 3), (4, 4)]
inter : [(0, 0), (1, 1), (2, 2), (3, 3), (4, 4)]
itermax : [(0, 0), (1, 1), (2, 2), (3, 3), (4, 4)]

In [12]: exit
(simAlign-0.3) Singularity> exit
exit
INFO:    Cleaning up image...

To run a script from a file instead of interactive using singularity container:

[fieldsa@cn101 tmp]$ singularity exec --fakeroot test_simAlign-0.3.sif /bin/bash -s <<<". activate simAlign-0.3; python test_simAlign.py"
INFO:    Convert SIF file to sandbox...
2022-04-08 10:49:33,235 - simalign.simalign - INFO - Initialized the EmbeddingLoader with model: bert-base-multilingual-cased
mwmf : [(0, 0), (1, 1), (2, 2), (3, 3), (4, 4)]
inter : [(0, 0), (1, 1), (2, 2), (3, 3), (4, 4)]
itermax : [(0, 0), (1, 1), (2, 2), (3, 3), (4, 4)]
INFO:    Cleaning up image...
[fieldsa@cn101 tmp]$ 

@SamuelLarkin
Copy link
Collaborator Author

Does that imply that we are able to create our own singularity.sif? I thought we had to be root to do that.

@nrcfieldsa
Copy link

You should be able to, using commands similar to:

$ cd /var/singularity/extfs; module --force purge # needed since GPFS and NFS not supported for container builds, use CentOS 7 libs
$ # singularity pull miniconda3.sif docker://conda/miniconda3-centos7 #could use standard image vs. Definition file
$ singularity build --fakeroot miniconda_4.8.2.sif miniconda_4.8.2.def 
$ singularity build --sandbox test_simalign miniconda_4.8.2.sif
$ singularity shell --fakeroot --writable test_simalign/
   # at this point, type conda commands to setup env
   # once done, exit Singlarity> bash shell prompt and save as an .sif file
$ singularity build --fakeroot test_simAlign-0.3.sif test_simalign/

The --fakeroot feature of singularity 3.x allows users to build and update containers with regular privileges and does not require root login or sudo access to run singularity command (as long as you are willing to work in a sandbox dir).

The contents of the miniconda_4.8.2.def file:

[fieldsa@hn2 singularity]$ cat -n miniconda_4.8.2.def 
     1  Bootstrap: library
     2  From: ubuntu:18.04
     3  
     4  %post
     5      apt-get -y update
     6      apt-get install -y wget
     7  
     8      wget https://repo.anaconda.com/miniconda/Miniconda3-latest-Linux-x86_64.sh
     9      bash Miniconda3-latest-Linux-x86_64.sh -b -p miniconda
    10      export PATH=/miniconda/bin:$PATH
    11      rm Miniconda3-latest-Linux-x86_64.sh
    12      conda update conda
    13  
    14  %environment
    15      export PATH=/miniconda/bin:$PATH
    16  
    17  %labels
    18      Author NadjaKry
    19  
    20  %help
    21      This is a Miniconda container based on Ubuntu 18.04. 
    22      The latest version of Miniconda on the time of build is installed.

However, as I note there are generic miniconda3 containers in docker repo and you can also pull a docker image from singularity and run it directly. Such as:

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants