Distributed layers #1270

angeloskath · 2024-07-15T21:52:18Z

Adds linear layers that allow training and inference of a model sharded across several devices. The main things added are

float16/bfloat16 reductions for MPI
AllToShardedLinear and its quantized sibling
ShardedToAllLinear and its quantized sibling

simply changing linear layers to the above results in a model that works out of the box with distributed inference and training.

I am starting it as a draft so that we can iterate a bit on the design. The negative aspects of the above design are that we have yet another linear layer to think about when implementing LoRA and friends or weird new quantizations for instance. Perhaps it would be better to make the above layers with an internal linear layer so model surgery that swaps linear layers would still work out of the box.

awni · 2024-07-17T14:40:31Z

python/mlx/nn/layers/distributed.py

+        sl = cls(input_dims, output_dims, False, group)
+        # The multiplication with 1.0 forces a copy, perhaps change to
+        # something better when available.
+        sl.weight = linear_layer.weight[r * step : (r + 1) * step] * 1


Is it possible the input buffer could be donated so we'd still hold on to the memory?

If so, maybe another option is to do sl.weight[ ... ] = ... that will force the copy since it's a slice update?

Nice! That does sound better actually!

awni · 2024-07-17T15:21:02Z

I kind of like this design. I like that it's all quite simple and easy to follow and we have a lot of control over how to shard the model (as in ml-explore/mlx-examples#890). We could possibly find a way to reduce the code needed for adding a new custom linear-like layer.. but the simplicity is nice, I wouldn't want to give that up.

angeloskath mentioned this pull request Jul 15, 2024

Distributed inference example ml-explore/mlx-examples#890

Draft

awni reviewed Jul 17, 2024

View reviewed changes

angeloskath force-pushed the distributed-layers branch from 6090542 to fea9644 Compare August 1, 2024 22:29

angeloskath force-pushed the distributed-layers branch 2 times, most recently from 061d214 to b32ce2c Compare August 29, 2024 08:20

angeloskath force-pushed the distributed-layers branch from b32ce2c to ab26116 Compare September 4, 2024 09:35

angeloskath added 5 commits September 6, 2024 11:03

Add the distributed linear layers

104e811

Add quantized distributed layers

91bd9f1

Add distributed layers to nn top-level

d7b507a

Fixes in distributed layers

d6ab58d

Add float16 reduction to MPI

3d431c0

angeloskath force-pushed the distributed-layers branch from ab26116 to 3d431c0 Compare September 6, 2024 18:03

awni mentioned this pull request Sep 16, 2024

Data parallel helper #1407

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Distributed layers #1270

Distributed layers #1270

angeloskath commented Jul 15, 2024

awni Jul 17, 2024

angeloskath Jul 17, 2024

awni commented Jul 17, 2024

Distributed layers #1270

Are you sure you want to change the base?

Distributed layers #1270

Conversation

angeloskath commented Jul 15, 2024

awni Jul 17, 2024

Choose a reason for hiding this comment

angeloskath Jul 17, 2024

Choose a reason for hiding this comment

awni commented Jul 17, 2024