Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

NotImplementedError #10

Closed
xueliu8617112 opened this issue Jul 10, 2022 · 5 comments · Fixed by #11 or #13
Closed

NotImplementedError #10

xueliu8617112 opened this issue Jul 10, 2022 · 5 comments · Fixed by #11 or #13

Comments

@xueliu8617112
Copy link

ic

hello, there is a problem occurred when i run the command " python train.py \ --dataset=smnist \ --batch_size=16 \ --max_epochs=100 \ --lr=1e-2 \ --n_blocks=6 \ --d_model=128 \ --norm_type=layer" The class SMnistDataset is based on the class SequenceDataset . The __len__in class SequenceDataset is not implemented, but the function (train_val_split) that divided the training set and the validation set needs len.
@TariqAHassan
Copy link
Owner

TariqAHassan commented Jul 10, 2022

Hey @xueliu8617112

Thanks for reporting this. I pushed (what I hope will be) a fix.
Feel free to try again with the branch linked to above, and let me know if that change resolves things.

@xueliu8617112
Copy link
Author

@TariqAHassan thanks for replying! I have tested again. it seems another problem occurs with multiporcessing.
image

@TariqAHassan
Copy link
Owner

Hey @xueliu8617112

Thanks for flagging this too. I have only trained this model on a single GPU, whereas it looks like you're doing multi-GPU training?

At any rate, yes, pickling lambdas will fail. I've tried to fix this issue by replacing them in #13.
I hope those changes fix this issue. Please let me know if they do not.

@TariqAHassan TariqAHassan reopened this Jul 11, 2022
@TariqAHassan
Copy link
Owner

Also, note that the repo this one is based on (https://github.com/srush/annotated-s4/) originally had some bugs in its implementation of S4, which have been gradually fixed by one of the creators of S4, among others. (S4 is a very challenging model to implement.) Over the coming weeks I will be applying those fixes to this PyTorch implementation (see #9).

You should still obtain decent performance using the model as it currently exists, but performance will improve as more and more of the bugs are ironed out.

@xueliu8617112
Copy link
Author

xueliu8617112 commented Jul 18, 2022

Hi @TariqAHassan, sorry for the late reply. I have 1 GPU (GTX 3090 24G) and 64 CPUs. I have tested again and found that it remains stagnant in the Sanity Checking (i think it is due to the multi-processing training). When I reset the num_workers = 1, it works.
image

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
2 participants