Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use blas to evaluate several samples #61

Merged
merged 9 commits into from
Dec 20, 2023
Merged

Use blas to evaluate several samples #61

merged 9 commits into from
Dec 20, 2023

Conversation

oysteijo
Copy link
Owner

@oysteijo oysteijo commented Dec 7, 2023

This will address #60 .

There are still some issues.

  • There is no support for softmax activation functions as this has we have to calculate the exp()-sum for each sample. (How to solve this?)
  • I'm afraid the workmem that is used to store the temporary matrices can in some cases be big it there's a lot of samples. This might lead to a stack overflow. How do I fix? I do not want to heap allocate and then free inside the function. Maybe I can pre-allocate and send in a pointer to that memory?
  • This, of course, only works when compiled with USE_CBLAS. Can this be fixed in any way?

@oysteijo
Copy link
Owner Author

OK! I've committed a change to this such that the static memory is a fixed size whatever that is needed.

Now we just need a solution for the softmax activation functions.

@oysteijo oysteijo changed the title Use blas to evaluate several samples - first commit Use blas to evaluate several samples Dec 19, 2023
@oysteijo
Copy link
Owner Author

Getting closer. I think I might want to add this neuralnet_predict_batch() into neuralnet.c.

@oysteijo oysteijo merged commit 83d67b6 into master Dec 20, 2023
@oysteijo oysteijo deleted the blas_forward branch December 20, 2023 17:57
@oysteijo
Copy link
Owner Author

Merged this w/o pushing neuralnet_predict_batch into neuralnet.c

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant