Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Slow tasks #509

Closed
guiscaranse opened this issue Jul 26, 2022 · 5 comments
Closed

Slow tasks #509

guiscaranse opened this issue Jul 26, 2022 · 5 comments
Labels
bug Something isn't working

Comments

@guiscaranse
Copy link

guiscaranse commented Jul 26, 2022

Please answer these questions before submitting your issue. Thanks!

Gorse version
0.4.5

Describe the bug
My tasks have been taking several days to complete (sometimes never completing), I am wondering if this is normal or I am doing something wrong.
Captura de Tela 2022-07-26 às 14 17 41

Additional context

  • Items: 724173
  • Number of User Labels: 2570647
  • Number of Item Labels: 2571813
  • Master RAM: 6gb
  • Master CPU Cores: 2 CPU cores
  • Number of tasks on master (config.yml): 8
@guiscaranse guiscaranse added the bug Something isn't working label Jul 26, 2022
@zhenghaoz
Copy link
Collaborator

There are two problems:

  1. Gorse can't run multiple tasks concurrently. A slow task will block following tasks. This problem will be addressed in the next version.
  2. It seems that you have too much labels. Are they really needed? And how many labels each user or item has? I think you should review your dataset.

@guiscaranse
Copy link
Author

It seems that you have too much labels. Are they really needed? And how many labels each user or item has? I think you should review your dataset.

@zhenghaoz an item has around 40 tags, is that a large amount? What would be a good amount of tags from a performance standpoint?

@zhenghaoz
Copy link
Collaborator

It seems that you have too much labels. Are they really needed? And how many labels each user or item has? I think you should review your dataset.

@zhenghaoz an item has around 40 tags, is that a large amount? What would be a good amount of tags from a performance standpoint?

Are these tags shared by multiple items?

@guiscaranse
Copy link
Author

They are, I've started sampling the tags and reduced them by 95% and now the training is pretty fast (went from 5 days to 1 hour). Closing this issue :)

@sethmills21
Copy link

@guiscaranse how long would it take to load your dataset? i'm still fighting loading a lot of data over here too

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

3 participants