HuwCampbell · cpennington · May 29, 2017 · May 29, 2017 · Aug 9, 2017 · Aug 9, 2017
diff --git a/.travis.yml b/.travis.yml
@@ -1,5 +1,19 @@
 # NB: don't set `language: haskell` here
 
+dist: trusty
+
+addons:
+  apt:
+    sources:
+      - llvm-toolchain-trusty-4.0
+    packages:
+      - libllvm4.0
+      - libllvm4.0-dbg
+      - lldb-4.0
+      - llvm-4.0
+      - llvm-4.0-dev
+      - llvm-4.0-runtime
+
 # The following enables several GHC versions to be tested; often it's enough to test only against the last release in a major GHC version. Feel free to omit lines listings versions you don't need/want testing for.
 env:
  - CABALVER=1.22 GHCVER=7.10.3

diff --git a/README.md b/README.md
@@ -1,5 +1,4 @@
-Grenade
-=======
+# Grenade
 
 [![Build Status](https://api.travis-ci.org/HuwCampbell/grenade.svg?branch=master)](https://travis-ci.org/HuwCampbell/grenade)
 [![Hackage page (downloads and API reference)][hackage-png]][hackage]
@@ -47,8 +46,7 @@ type Shakespeare
     '[ 'D1 40, 'D1 80, 'D1 40, 'D1 40, 'D1 40 ]
 ```
 
-Design
-------
+## Design
 
 Networks in Grenade can be thought of as a heterogeneous lists of layers, where
 their type includes not only the layers of the network, but also the shapes of
@@ -78,8 +76,7 @@ outputs a three dimensional (`D3`) 24x24x10 image. The last item in the list is
 one dimensional (`D1`) with 10 values, representing the categories of the MNIST
 data.
 
-Usage
------
+## Usage
 
 To perform back propagation, one can call the eponymous function
 ```haskell
@@ -102,8 +99,7 @@ easy in downstream code. If the shapes of a network are not specified correctly
 and a layer can not sensibly perform the operation between two shapes, then
 it will result in a compile time error.
 
-Composition
------------
+## Composition
 
 Networks and Layers in Grenade are easily composed at the type level. As a `Network`
 is an instance of `Layer`, one can use a trained Network as a small component in a
@@ -125,24 +121,24 @@ See the [MNIST](https://github.com/HuwCampbell/grenade/blob/master/examples/main
 example, which has been overengineered to contain both residual style learning as well
 as inception style convolutions.
 
-Generative Adversarial Networks
--------------------------------
+## Generative Adversarial Networks
 
 As Grenade is purely functional, one can compose its training functions in flexible
 ways. [GAN-MNIST](https://github.com/HuwCampbell/grenade/blob/master/examples/main/gan-mnist.hs)
 example displays an interesting, type safe way of writing a generative adversarial
 training function in 10 lines of code.
 
-Layer Zoo
----------
+## Layer Zoo
 
 Grenade layers are normal haskell data types which are an instance of `Layer`, so
 it's easy to build one's own downstream code. We do however provide a decent set
 of layers, including convolution, deconvolution, pooling, pad, crop, logit, relu,
 elu, tanh, and fully connected.
 
-Build Instructions
-------------------
+## Build Instructions
+
+### Mafia
+
 Grenade is most easily built with the [mafia](https://github.com/ambiata/mafia)
 script that is located in the repository. You will also need the `lapack` and
 `blas` libraries and development tools. Once you have all that, Grenade can be
@@ -160,16 +156,59 @@ and the tests run using:
 
 Grenade builds with ghc 7.10 and 8.0.
 
-Thanks
-------
+### Stack
+
+Grenade also supports [stack](https://docs.haskellstack.org). You can build
+the whole project with
+
+```
+stack build
+```
+
+and run the tests using:
+
+```
+stack test grenade
+```
+
+and run the benchmarkes using:
+
+```
+stack bench grenade
+```
+
+## Windows build
+
+This recipe is for Stack 1.4.0 - tested and working.
+
+1)
+
+	> stack setup
+
+2) Download and unzip somewhere OpenBLAS http://www.openblas.net/
+
+3) In MSYS2 console of Stack, i.e.: C:\Users\{User}\AppData\Local\Programs\stack\x86_64-windows\msys2-{version}\msys2_shell.bat
+
+    > cd /.../OpenBLAS
+    > pacman -Sy
+    > pacman -S make perl gcc-fortran
+    > make clean
+    > make
+    > make install
+
+3) Then in normal Windows console (fill in user name, versions and check if paths are different on your machine):
+
+    > stack install --flag hmatrix:openblas --extra-include-dirs=C:\Users\{User}\AppData\Local\Programs\stack\x86_64-windows\msys2-20150512\opt\OpenBLAS\include --extra-lib-dirs=C:\Users\{User}\AppData\Local\Programs\stack\x86_64-windows\msys2-20150512\opt\OpenBLAS\bin --extra-lib-dirs=C:\Users\{User}\AppData\Local\Programs\stack\x86_64-windows\msys2-20150512\usr\lib\gcc\x86_64-pc-msys\6.3.0\
+
+
+## Thanks
 Writing a library like this has been on my mind for a while now, but a big shout
 out must go to [Justin Le](https://github.com/mstksg), whose
 [dependently typed fully connected network](https://blog.jle.im/entry/practical-dependent-types-in-haskell-1.html)
 inspired me to get cracking, gave many ideas for the type level tools I
 needed, and was a great starting point for writing this library.
 
-Performance
------------
+## Performance
 Grenade is backed by hmatrix, BLAS, and LAPACK, with critical functions optimised
 in C. Using the im2col trick popularised by Caffe, it should be sufficient for
 many problems.
@@ -181,8 +220,7 @@ threaded.
 Training 15 generations over Kaggle's 41000 sample MNIST training set on a single
 core took around 12 minutes, achieving 1.5% error rate on a 1000 sample holdout set.
 
-Contributing
-------------
+## Contributing
 Contributions are welcome.
 
  [hackage]: http://hackage.haskell.org/package/grenade

diff --git a/bench/bench.hs b/bench/bench.hs
@@ -3,9 +3,18 @@
 import Criterion.Main
 
 import           Grenade
+import           Grenade.Accelerate as GA
 
 import           Grenade.Layers.Internal.Convolution
+import qualified Grenade.Layers.Internal.Convolution.Accelerate as CA
 import           Grenade.Layers.Internal.Pooling
+import qualified Grenade.Layers.Internal.Pooling.Accelerate as PA
+
+import qualified Data.Array.Accelerate as A
+import           Data.Array.Accelerate (Z(..), (:.)(..))
+import           Data.Array.Accelerate.Interpreter as I
+import           Data.Array.Accelerate.LLVM.Native as LN
+--import           Data.Array.Accelerate.LLVM.PTX as LP
 
 import           Numeric.LinearAlgebra
 
@@ -14,28 +23,49 @@ main = do
   x    :: S ('D2 60 60  )  <- randomOfShape
   y    :: S ('D3 60 60 1)  <- randomOfShape
 
-  defaultMain [
-      bgroup "im2col" [ bench "im2col 3x4"     $ whnf (im2col 2 2 1 1)   ((3><4) [1..])
-                      , bench "im2col 28x28"   $ whnf (im2col 5 5 1 1)   ((28><28) [1..])
-                      , bench "im2col 100x100" $ whnf (im2col 10 10 1 1) ((100><100) [1..])
-                      ]
-    , bgroup "col2im" [ bench "col2im 3x4"      $ whnf (col2im 2 2 1 1 3 4)       ((6><4) [1..])
-                      , bench "col2im 28x28"    $ whnf (col2im 5 5 1 1 28 28)     ((576><25) [1..])
-                      , bench "col2im 100x100"  $ whnf (col2im 10 10 1 1 100 100) ((8281><100) [1..])
-                      ]
-    , bgroup "poolfw" [ bench "poolforwards 3x4"      $ whnf (poolForward 1 3 4 2 2 1 1)     ((3><4) [1..])
-                      , bench "poolforwards 28x28"    $ whnf (poolForward 1 28 28 5 5 1 1)    ((28><28) [1..])
-                      , bench "poolforwards 100x100"  $ whnf (poolForward 1 100 100 10 10 1 1) ((100><100) [1..])
-                      ]
-    , bgroup "poolbw" [ bench "poolbackwards 3x4"      $ whnf (poolBackward 1 3 4 2 2 1 1   ((3><4) [1..]))     ((2><3) [1..])
-                      , bench "poolbackwards 28x28"    $ whnf (poolBackward 1 28 28 5 5 1 1   ((28><28) [1..]))   ((24><24) [1..])
-                      , bench "poolbackwards 100x100"  $ whnf (poolBackward 1 100 100 10 10 1 1 ((100><100) [1..])) ((91><91) [1..])
-                      ]
-    , bgroup "padcrop" [ bench "pad 2D 60x60"    $ whnf (testRun2D Pad) x
-                       , bench "pad 3D 60x60"    $ whnf (testRun3D Pad) y
-                       , bench "crop 2D 60x60"   $ whnf (testRun2D' Crop) x
-                       , bench "crop 3D 60x60"   $ whnf (testRun3D' Crop) y
-                       ]
+  defaultMain
+    [ bgroup "linear algebra"
+        [ bgroup "im2col"
+            [ bench "im2col 3x4"     $ nf (im2col 2 2 1 1)   ((3><4) [1..])
+            , bench "im2col 28x28"   $ nf (im2col 5 5 1 1)   ((28><28) [1..])
+            , bench "im2col 100x100" $ nf (im2col 10 10 1 1) ((100><100) [1..])
+            ]
+        , bgroup "col2im"
+            [ bench "col2im 3x4"      $ nf (col2im 2 2 1 1 3 4)       ((6><4) [1..])
+            , bench "col2im 28x28"    $ nf (col2im 5 5 1 1 28 28)     ((576><25) [1..])
+            , bench "col2im 100x100"  $ nf (col2im 10 10 1 1 100 100) ((8281><100) [1..])
+            ]
+        , bgroup "poolfw"
+            [ bench "poolforwards 3x4"      $ nf (poolForward 1 3 4 2 2 1 1)     ((3><4) [1..])
+            , bench "poolforwards 28x28"    $ nf (poolForward 1 28 28 5 5 1 1)    ((28><28) [1..])
+            , bench "poolforwards 100x100"  $ nf (poolForward 1 100 100 10 10 1 1) ((100><100) [1..])
+            ]
+        , bgroup "poolbw"
+            [ bench "poolbackwards 3x4"      $ nf (poolBackward 1 3 4 2 2 1 1   ((3><4) [1..]))     ((2><3) [1..])
+            , bench "poolbackwards 28x28"    $ nf (poolBackward 1 28 28 5 5 1 1   ((28><28) [1..]))   ((24><24) [1..])
+            , bench "poolbackwards 100x100"  $ nf (poolBackward 1 100 100 10 10 1 1 ((100><100) [1..])) ((91><91) [1..])
+            ]
+        , bgroup "padcrop"
+            [ bench "pad 2D 60x60"    $ nf (testRun2D Pad) x
+            , bench "pad 3D 60x60"    $ nf (testRun3D Pad) y
+            , bench "crop 2D 60x60"   $ nf (testRun2D' Crop) x
+            , bench "crop 3D 60x60"   $ nf (testRun3D' Crop) y
+            ]
+        ]
+    , bgroup "accelerate"
+        [ bgroup name
+            [ bgroup "im2col"
+                [ bench "im2col 3x4"     $ nf (run . CA.im2col 2 2 1 1)   (A.use $ A.fromList (Z :. 3 :. 4) [1..])
+                , bench "im2col 28x28"   $ nf (run . CA.im2col 5 5 1 1)   (A.use $ A.fromList (Z :. 28 :. 28) [1..])
+                , bench "im2col 100x100" $ nf (run . CA.im2col 10 10 1 1) (A.use $ A.fromList (Z :. 100 :. 100) [1..])
+                ]
+            ]
+        | (name, run) <-
+            [ ("interpreter", I.run)
+            , ("llvm-native", LN.run)
+            --, ("llvm-ptx", LP.run1)
+            ]
+        ]
     ]
 
 

diff --git a/grenade.cabal b/grenade.cabal
@@ -50,6 +50,8 @@ library
                      , text                            == 1.2.*
                      , singletons                      >= 2.1         && < 2.4
                      , vector                          >= 0.11        && < 0.13
+                     , accelerate                      == 1.0.*
+                     , accelerate-io                   == 1.0.*
 
   ghc-options:
                        -Wall
@@ -62,12 +64,19 @@ library
 
   exposed-modules:
                        Grenade
+                       Grenade.Accelerate
                        Grenade.Core
+                       Grenade.Core.Accelerate
                        Grenade.Core.Layer
+                       Grenade.Core.Layer.Accelerate
                        Grenade.Core.LearningParameters
+                       Grenade.Core.LearningParameters.Accelerate
+                       Grenade.Core.Matrix.Accelerate
                        Grenade.Core.Network
+                       Grenade.Core.Network.Accelerate
                        Grenade.Core.Runner
                        Grenade.Core.Shape
+                       Grenade.Core.Shape.Accelerate
 
                        Grenade.Layers
                        Grenade.Layers.Concat
@@ -89,9 +98,12 @@ library
                        Grenade.Layers.Trivial
 
                        Grenade.Layers.Internal.Convolution
+                       Grenade.Layers.Internal.Convolution.Accelerate
                        Grenade.Layers.Internal.Pad
                        Grenade.Layers.Internal.Pooling
+                       Grenade.Layers.Internal.Pooling.Accelerate
                        Grenade.Layers.Internal.Update
+                       Grenade.Layers.Internal.Update.Accelerate
 
                        Grenade.Recurrent
 
@@ -129,15 +141,18 @@ test-suite test
 
   other-modules:       Test.Hedgehog.Compat
                        Test.Hedgehog.Hmatrix
+                       Test.Hedgehog.Accelerate
                        Test.Hedgehog.TypeLits
 
                        Test.Grenade.Network
                        Test.Grenade.Layers.Convolution
                        Test.Grenade.Layers.FullyConnected
+                       Test.Grenade.Layers.FullyConnected.Accelerate
                        Test.Grenade.Layers.Nonlinear
                        Test.Grenade.Layers.PadCrop
                        Test.Grenade.Layers.Pooling
                        Test.Grenade.Layers.Internal.Convolution
+                       Test.Grenade.Layers.Internal.Convolution.Accelerate
                        Test.Grenade.Layers.Internal.Pooling
                        Test.Grenade.Layers.Internal.Reference
 
@@ -163,6 +178,7 @@ test-suite test
                      , ad
                      , reflection
                      , vector
+                     , accelerate
 
 
 benchmark bench
@@ -181,6 +197,9 @@ benchmark bench
                      , criterion                       == 1.1.*
                      , grenade
                      , hmatrix
+                     , accelerate
+                     , accelerate-llvm-native
+                     --, accelerate-llvm-ptx
 
 benchmark bench-lstm
   type:                exitcode-stdio-1.0

diff --git a/src/Grenade/Accelerate.hs b/src/Grenade/Accelerate.hs
@@ -0,0 +1 @@
+module Grenade.Accelerate where
diff --git a/src/Grenade/Core/Accelerate.hs b/src/Grenade/Core/Accelerate.hs
@@ -0,0 +1,13 @@
+module Grenade.Core.Accelerate (
+    module Grenade.Core.Layer.Accelerate
+  , module Grenade.Core.LearningParameters.Accelerate
+  , module Grenade.Core.Network.Accelerate
+  , module Grenade.Core.Shape.Accelerate
+  , module Grenade.Core.Matrix.Accelerate
+  ) where
+
+import           Grenade.Core.Layer.Accelerate
+import           Grenade.Core.LearningParameters.Accelerate
+import           Grenade.Core.Network.Accelerate
+import           Grenade.Core.Shape.Accelerate
+import           Grenade.Core.Matrix.Accelerate