Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Gradient decent in C? #35

Open
cpennington opened this issue Aug 4, 2017 · 9 comments
Open

Gradient decent in C? #35

cpennington opened this issue Aug 4, 2017 · 9 comments

Comments

@cpennington
Copy link

Why did you choose to write the gradient descent code in C, rather than using the library you used for the other matrix computations? Would you get a speedup by doing the descent in hblas?

@HuwCampbell
Copy link
Owner

HuwCampbell commented Aug 4, 2017

In a word: fusion; or rather, lack of.

I had a version using hmatrix, but profiling showed it was taking up a large proportion of the runtime. I believe it was because it couldn't unroll the loops and work on one value at a time. The C rewrite was a good deal faster, and I have a benchmark on it in the suite (thought can't remember the speed up right now).

HBLAS might do it better, but again it's mostly a fusion issue. One might do better as well trying to aggressively use SIMD.

@cpennington
Copy link
Author

Ah, ok. I'm about this close (holds fingers close together) to trying to make an accelerate backend/branch/fork (but I'm not sure how much work that would take) to get fusion/gpu/simd for "free". Is that something you'd be interested, if I could make it work?

(Unrelatedly, I've also got some outstanding changes to make various things instances of NFData so that you can better control parallelism, and Num so that you can add gradients together. I'm not sure if either one is worth bringing upstream, though).

@HuwCampbell
Copy link
Owner

I would be interested (especially if there's benchmarks). In grenade, for most networks, most of the run time is matrix-matrix multiplications, which is pretty much what you want. I know CUDA/cuDNN would be faster, but I'm not sure how well accelerate does the tasks we need.

If you're using LSTMs, probably the one thing which would get the biggest easy improvement would be proper minibatching. Matrix-matrix multiplications with BLAS are far more efficient that n matrix vector multiplications. 50 examples in a matrix runs in about the same time as 5 in a vector for instance.

As for Num and NFData instances, that sounds reasonable, and I have also thought about adding them. The main reasons I didn't just make them Num and call it a day was efficiency and API usage; but I'm happy to look at anything you've come up with.

I added the updateGradients function to the UpdateLayer class so one could efficiently add gradients before an update, but it's clunky.

Thanks for the issue :)

@cpennington
Copy link
Author

So, I've started poking at an accelerate backend. I think I'm going to have to get a fair way into it before I figure out what the speed change is, though. I'll let you know what I see.

@HuwCampbell
Copy link
Owner

I'm at ICML at the moment, and have spoken with a few people who are interested in helping out in this effort. I might also talk to Trevor (who wrote accelerate) next meetup to see if he has any advice.

@cpennington
Copy link
Author

Neat. I'm happy to put what I have so far up on a branch... It's a bit fragmented so far, but as my first stab, I'm trying to replicate im2col in order to test out the benchmarks.

My main dev laptop isn't CUDA-friendly, so I won't be able to test the upper limits. Also, I suspect a bunch of the improvement will be once you're actually stacking multiple layers together and the fusion starts kicking in. In the project that's motivating all of this work, I've notice that the garbage collector is quick active in general.

@theunixman
Copy link

If there's anything I can do to help with an accelerate back end let me know. I was about to take a look myself.

@HuwCampbell
Copy link
Owner

I chatted with Trevor today, and he is also interested in getting this working.

@unhammer
Copy link

Just noticed this, figured it should be linked from here since it seems relevant: #38

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants