Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Sigmoid with residue #869

Closed

Conversation

opfromthestart
Copy link
Contributor

I don't know if this is something that should be in dfdx but it is useful for my use case where I need my model to return probabilities between 0 and 1, where each input is independent of all the others. If I use normal sigmoid I get vanishing gradients, so I made this.

@opfromthestart
Copy link
Contributor Author

right now it returns a minimum gradient of 0.0001, this maybe should be configurable.

@coreylowman
Copy link
Owner

Since this can be accomplished by doing (x.negate().exp() + 1.0).recip() I'm inclined to not merge this. I know that this specialized kernel will be more efficient, but I think this is niche enough where I don't want to add to core.

Thanks for the PR!

@opfromthestart
Copy link
Contributor Author

opfromthestart commented Oct 27, 2023

This is not entirely right. While the forward pass is the same, the backwards pass is different. It makes sure that the gradients through the function are always at least some small epsilon. As in, if x=100000000, sigmoid(x)=1, which should make sigmoid_dx(x)=0 but instead it is 0.00001. I should definitely refactor the forward pass to do that, however.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants