Skip to content

Releases: huggingface/tgi-gaudi

v2.0.5: Llava multi-card support

07 Sep 18:17
Compare
Choose a tag to compare

Llava multi-card support

Tested models and configurations

Model BF16 FP8 Single Card Multi-Cards
Llama2-7B
Llama2-70B
Llama3-8B
Llama3-70B
Llama3.1-8B
Llama3.1-70B
CodeLlama-13B
Mixtral-8x7B
Mistral-7B
Llava-v1.6-Mistral-7B

Full Changelog: v2.0.4...v2.0.5

v2.0.4: SynapseAI v1.17.0

27 Aug 13:42
fde061c
Compare
Choose a tag to compare

SynapseAI v1.17.0

The codebase is validated with SynapseAI 1.17.0 and optimum-habana 1.13.1.

Tested models and configurations

Model BF16 FP8 Single Card Multi-Cards
Llama2-7B
Llama2-70B
Llama3-8B
Llama3-70B
Llama3.1-8B
Llama3.1-70B
CodeLlama-13B
Mixtral-8x7B
Mistral-7B
Llava-v1.6-Mistral-7B

Highlights

  • Added support for vision-language models

Full Changelog: v2.0.1...v2.0.4

v2.0.1: SynapseAI v1.16.0

24 Jun 09:58
Compare
Choose a tag to compare

SynapseAI v1.16.0

The codebase is validated with SynapseAI 1.16.0 and optimum-habana 1.12.0.

Tested configurations

  • LLama2 7B BF16 / FP8 on 1xGaudi2
  • LLama2 70B BF16 / FP8 on 8xGaudi2
  • Falcon 180B BF16 / FP8 on 8xGaudi2
  • Mistral 7B BF16 / FP8 on 1xGaudi2
  • Mixtral 8x7B BF16 / FP8 on 1xGaudi2

Highlights

  • Add support for grammar feature
  • Add support for Habana Flash Attention

Full Changelog: v2.0.0...v2.0.1

v2.0.0: SynapseAI v1.15.0

13 May 12:33
1a8c7d0
Compare
Choose a tag to compare

SynapseAI v1.15.0

The codebase is validated with SynapseAI 1.15.0 and optimum-habana 1.11.1.

Tested configurations

  • LLama2 70B BF16 / FP8 on 8xGaudi2

Highlights

  • Add support for FP8 precision

Full Changelog: v1.2.1...v2.0.0

v1.2.1: SynapseAI v1.14.0

19 Mar 07:43
d752317
Compare
Choose a tag to compare

SynapseAI v1.14

The codebase is validated with SynapseAI 1.14.0 and optimum-habana 1.10.4.

Tested configuration

  • LLama2 70B BF16 on 8xGaudi2

Highlights

  • Add support for continuous batching on Intel Gaudi
  • Add batch size bucketing
  • Add sequence bucketing for prefill operation
  • Optimize concatenate operation
  • Add speculative scheduling

Full Changelog: v1.2.0...v1.2.1