Chapter 4 - Inference Error. Pinpointed error, however don't know how to solve #141

Ice-Citron · 2024-06-27T15:27:22Z

Information

The problem arises in chapter:

Describe the bug

The error stems from that if you run the exemplar colab code, and push it to your huggingface hub and try to run it. When trying to run inference, it outputs the error: "Can't load tokenizer using from_pretrained, please update its configuration: tokenizers.AddedToken() got multiple values for keyword argument 'special'".

To Reproduce

Steps to reproduce the behavior:

Run the exemplar colab code (https://colab.research.google.com/github/nlp-with-transformers/notebooks/blob/main/04_multilingual-ner.ipynb) up to:
Then visit the Huggingface hub once the model is trained and pushed to the hub, and use your personal inference API, and this tokenizer error can be seen:
Same also apply if you try to import the model off it, the error occurs at the tokenizer stage, and more precisely I believe at "special_tokens_map.json".
However, this seems to be subvertable if I instead just pass in the special token of "mask_token" as an extra kwargs, as per recommended by GPT-4

from transformers import AutoTokenizer, AutoModelForTokenClassification

# Manually specify special tokens if the default configuration is problematic
special_tokens_dict = {
    "mask_token": {
    "content": "<mask>",
    "single_word": False,
    "lstrip": True,
    "rstrip": False,
    "normalized": True,
    "special": True,  
    "__type": "AddedToken"
}
}
tokenizer = AutoTokenizer.from_pretrained("shng2025/xlm-roberta-base-finetuned-panx-de", use_fast=True, **special_tokens_dict)
model = AutoModelForTokenClassification.from_pretrained("shng2025/xlm-roberta-base-finetuned-panx-de")

it seems to be that the model can be imported if I just declare the "mask_token" in general. But I don't know what is causing this error in general.

Expected behavior

I was expecting that once I fine-tuned the model by running the exemplar code and pushed it to the hub. That the model can be easily ran from the Inference API. Can also try and check my code on my personal notebook: https://colab.research.google.com/drive/1F5L_vL1o6WC3DxGWDF_g6ZPKTJ7dcmxR#scrollTo=orgQubxKVrNX

However, the same error occured when I ran it using the exemplar code directly, so I think it's likely due to some changes made with the library after this book was published causing this? it's still runnable as mentioned if I passed in "mask_token" as a **kwarg. But this is very strange, and I would love to know what's causing this error, as I am still learning, etc.

Ice-Citron · 2024-06-27T15:37:24Z

Very funny. Actually I managed to get it working now, I simply deleted "mask_token" from special_token_file and it worked. Still not sure why it worked before and why it doesn't now. If possible could someone try and point out to me regarding the config changes, etc. But also best for this code to be changed too.

I will keep this issue as opened for now, so that people in the future can see and find solution for the exact same error.

Ice-Citron · 2024-06-27T15:56:49Z

Did further digging. I checked others who trained this too on HuggingFace, especially ones that were created within the last week. Those who ran in Tensorflow seems to still be able to run inference automatically. However, for people who used PyTorch like me, simillar issue were faced. Hence I believe this error is more PyTorch only for now, and will try and resolve and make a pull request soon when debugged.

Ice-Citron · 2024-06-27T16:12:47Z

Sorry. I think I found the error. It's from now updating my libraries. I will try and update and retrain the models (and spend 0.6 usd of compute credits) and test this hypothesis again tmr.

Ice-Citron · 2024-06-28T16:25:36Z

I managed to get it to work. basically the install file on colab (idk if ur using colab) is faulty, or if you can call it that. basically it's installation requirement is installing older versions, when newer ones exists. And the ones that works are the newer libraries. You ca try and run this after running the default installation file:

#%%capture
!pip install transformers==4.41.2
!pip install datasets==2.20.0

!pip install pyarrow==16.0
!pip install requests==2.32.3

!pip install torch==2.3.0 torchvision==0.18.0 torchaudio==2.3.0

!pip install importlib-metadata

!pip install accelerate -U

And you can mainly just try and refer to my file too: https://colab.research.google.com/drive/1F5L_vL1o6WC3DxGWDF_g6ZPKTJ7dcmxR#scrollTo=r1SReYWcdRjZ

Ice-Citron · 2024-06-28T16:26:31Z

I will leave this here if anyone encounters the same bug. The bug is caused by this code block:

!git clone https://github.com/nlp-with-transformers/notebooks.git
%cd notebooks
from install import *
install_requirements()

Because it would install older depracated versions of libraries, and cause these bugs. My hypothesis at least.

Ice-Citron · 2024-06-28T16:34:28Z

To the authors, you should try and fix the requirement.txt and install.py to get them up to date. That's all for now.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Chapter 4 - Inference Error. Pinpointed error, however don't know how to solve #141

Chapter 4 - Inference Error. Pinpointed error, however don't know how to solve #141

Ice-Citron commented Jun 27, 2024

Ice-Citron commented Jun 27, 2024

Ice-Citron commented Jun 27, 2024

Ice-Citron commented Jun 27, 2024

Ice-Citron commented Jun 28, 2024

Ice-Citron commented Jun 28, 2024

Ice-Citron commented Jun 28, 2024

Chapter 4 - Inference Error. Pinpointed error, however don't know how to solve #141

Chapter 4 - Inference Error. Pinpointed error, however don't know how to solve #141

Comments

Ice-Citron commented Jun 27, 2024

Information

Describe the bug

To Reproduce

Expected behavior

Ice-Citron commented Jun 27, 2024

Ice-Citron commented Jun 27, 2024

Ice-Citron commented Jun 27, 2024

Ice-Citron commented Jun 28, 2024

Ice-Citron commented Jun 28, 2024

Ice-Citron commented Jun 28, 2024