Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Dynamic dimensions in neural network layers? #755

Closed
bplevin36 opened this issue Apr 29, 2023 · 1 comment · Fixed by #854
Closed

Dynamic dimensions in neural network layers? #755

bplevin36 opened this issue Apr 29, 2023 · 1 comment · Fixed by #854
Labels
new feature New feature or request

Comments

@bplevin36
Copy link

The new dynamic dimension support for Tensors is great! Are there any plans to support it for Modules as well? To allow for e.g. a Linear layer where the input size is a const generic but the output size is dynamic?

I tried hacking this together myself, but ran into problems with allocation. It seems like the existing Device APIs are pretty strongly angled towards being able to allocate a Module based purely on its type. I can probably implement allocation and Module on my type manually, but it would be great to have a more ergonomic workaround.

@coreylowman
Copy link
Owner

Glad you like it! Yeah aligning the existing nn layer to support both compile time & run time dimensions would complicate things quite a bit. It's definitely possible, for one example we could have linear defined as:

struct Linear<In: Dim, Out: Dim, E: Dtype, D: DeviceStorage> {
    weight: Tensor<(In, Out), E, D>,
    bias: Tensor<(Out, ), E, D>,
}

and then you could create type aliases like this:

type ConstLinear<const I: usize, const O: usize, E, D> = Linear<Const<I>, Const<O>, E, D>;
type DynLinear<E, D> = Linear<usize, usize, E, D>;

As far as instantiation, you're right the existing method assumes sizes known at compile time. We'd like have to do something similar to what tensors do where we have a version of creation with compile time shapes (dev.zeros(), and a separate version for runtime sizes (dev.zeros_like(&(3, 5))).

I think my main concern would be the increased complexity in adding these, both from internals perspective and also external usability perspective.

The other part of this is that currently in the deep learning ecosystem, most training/inference libraries redefine neural network types in their own libraries. huggingface does this all over this place. As far as dfdx goes, hopefully people can define their own nn layers similar to that. So if someone wanted to implement linear/transformer/etc with runtime shapes outside of dfdx, that should be possible.

Thoughts?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
new feature New feature or request
Projects
None yet
2 participants