Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add NPU Engine #31

Open
wants to merge 11 commits into
base: main
Choose a base branch
from
6 changes: 5 additions & 1 deletion .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -11,4 +11,8 @@ scripts/*.ps1
scripts/*.sh
**/dist
**/build
*.log
*.log
benchmark/
modelTest/
nc_workspace/
debug_openai_history.txt
11 changes: 11 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -35,6 +35,7 @@ Run local LLMs on iGPU, APU and CPU (AMD , Intel, and Qualcomm (Coming Soon)). E
* Onnxruntime CPU Models [Link](./docs/model/onnxruntime_cpu_models.md)
* Ipex-LLM Models [Link](./docs/model/ipex_models.md)
* OpenVINO-LLM Models [Link](./docs/model/openvino_models.md)
* NPU-LLM Models [Link](./docs/model/npu_models.md)

## Getting Started

Expand All @@ -56,12 +57,14 @@ Run local LLMs on iGPU, APU and CPU (AMD , Intel, and Qualcomm (Coming Soon)). E
- **CUDA:** `$env:ELLM_TARGET_DEVICE='cuda'; pip install -e .[cuda]`
- **IPEX:** `$env:ELLM_TARGET_DEVICE='ipex'; python setup.py develop`
- **OpenVINO:** `$env:ELLM_TARGET_DEVICE='openvino'; pip install -e .[openvino]`
- **NPU:** `$env:ELLM_TARGET_DEVICE='npu'; pip install -e .[npu]`
- **With Web UI**:
- **DirectML:** `$env:ELLM_TARGET_DEVICE='directml'; pip install -e .[directml,webui]`
- **CPU:** `$env:ELLM_TARGET_DEVICE='cpu'; pip install -e .[cpu,webui]`
- **CUDA:** `$env:ELLM_TARGET_DEVICE='cuda'; pip install -e .[cuda,webui]`
- **IPEX:** `$env:ELLM_TARGET_DEVICE='ipex'; python setup.py develop; pip install -r requirements-webui.txt`
- **OpenVINO:** `$env:ELLM_TARGET_DEVICE='openvino'; pip install -e .[openvino,webui]`
- **NPU:** `$env:ELLM_TARGET_DEVICE='npu'; pip install -e .[npu,webui]`

- **Linux**

Expand All @@ -77,12 +80,14 @@ Run local LLMs on iGPU, APU and CPU (AMD , Intel, and Qualcomm (Coming Soon)). E
- **CUDA:** `ELLM_TARGET_DEVICE='cuda' pip install -e .[cuda]`
- **IPEX:** `ELLM_TARGET_DEVICE='ipex' python setup.py develop`
- **OpenVINO:** `ELLM_TARGET_DEVICE='openvino' pip install -e .[openvino]`
- **NPU:** `ELLM_TARGET_DEVICE='npu' pip install -e .[npu]`
- **With Web UI**:
- **DirectML:** `ELLM_TARGET_DEVICE='directml' pip install -e .[directml,webui]`
- **CPU:** `ELLM_TARGET_DEVICE='cpu' pip install -e .[cpu,webui]`
- **CUDA:** `ELLM_TARGET_DEVICE='cuda' pip install -e .[cuda,webui]`
- **IPEX:** `ELLM_TARGET_DEVICE='ipex' python setup.py develop; pip install -r requirements-webui.txt`
- **OpenVINO:** `ELLM_TARGET_DEVICE='openvino' pip install -e .[openvino,webui]`
- **NPU:** `ELLM_TARGET_DEVICE='npu' pip install -e .[npu,webui]`

### Launch OpenAI API Compatible Server

Expand Down Expand Up @@ -142,6 +147,9 @@ It is an interface that allows you to download and deploy OpenAI API compatible

# OpenVINO
ellm_server --model_path '.\meta-llama_Meta-Llama-3.1-8B-Instruct\' --backend 'openvino' --device 'gpu' --port 5555 --served_model_name 'meta-llama_Meta/Llama-3.1-8B-Instruct'

# NPU
ellm_server --model_path 'microsoft/Phi-3-mini-4k-instruct' --backend 'npu' --device 'npu' --port 5555 --served_model_name 'microsoft/Phi-3-mini-4k-instruct'
```

## Prebuilt OpenAI API Compatible Windows Executable (Alpha)
Expand All @@ -161,6 +169,9 @@ _Powershell/Terminal Usage (Use it like `ellm_server`)_:

# OpenVINO
.\ellm_api_server.exe --model_path '.\meta-llama_Meta-Llama-3.1-8B-Instruct\' --backend 'openvino' --device 'gpu' --port 5555 --served_model_name 'meta-llama_Meta/Llama-3.1-8B-Instruct'

# NPU
.\ellm_api_server.exe --model_path 'microsoft/Phi-3-mini-4k-instruct' --backend 'npu' --device 'npu' --port 5555 --served_model_name 'microsoft/Phi-3-mini-4k-instruct'
```

## Acknowledgements
Expand Down
15 changes: 15 additions & 0 deletions docs/model/npu_models.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,15 @@
# Model Powered by NPU-LLM

## Verified Models
Verified models can be found from EmbeddedLLM NPU-LLM model collections
* EmbeddedLLM NPU-LLM Model collections: [link](https://huggingface.co/collections/EmbeddedLLM/npu-llm-66d692817e6c9509bb8ead58)

| Model | Model Link |
| --- | --- |
| Phi-3-mini-4k-instruct | [link](https://huggingface.co/microsoft/Phi-3-mini-4k-instruct) |
| Phi-3-mini-128k-instruct | [link](https://huggingface.co/microsoft/Phi-3-mini-128k-instruct) |
| Phi-3-medium-4k-instruct | [link](https://huggingface.co/microsoft/Phi-3-medium-4k-instruct) |
| Phi-3-medium-128k-instruct | [link](https://huggingface.co/microsoft/Phi-3-medium-128k-instruct) |

## Contribution
We welcome contributions to the verified model list.
3 changes: 3 additions & 0 deletions requirements-npu.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
intel-npu-acceleration-library
torch>=2.4
transformers>=4.42
2 changes: 1 addition & 1 deletion requirements-webui.txt
Original file line number Diff line number Diff line change
@@ -1 +1 @@
gradio~=4.36.1
gradio~=4.43.0
9 changes: 9 additions & 0 deletions setup.py
Original file line number Diff line number Diff line change
Expand Up @@ -54,6 +54,10 @@ def _is_openvino() -> bool:
return ELLM_TARGET_DEVICE == "openvino"


def _is_npu() -> bool:
return ELLM_TARGET_DEVICE == "npu"


class ELLMInstallCommand(install):
def run(self):
install.run(self)
Expand Down Expand Up @@ -186,6 +190,8 @@ def get_requirements() -> List[str]:
requirements = _read_requirements("requirements-ipex.txt")
elif _is_openvino():
requirements = _read_requirements("requirements-openvino.txt")
elif _is_npu():
requirements = _read_requirements("requirements-npu.txt")
else:
raise ValueError("Unsupported platform, please use CUDA, ROCm, Neuron, or CPU.")
return requirements
Expand All @@ -204,6 +210,8 @@ def get_ellm_version() -> str:
version += "+ipex"
elif _is_openvino():
version += "+openvino"
elif _is_npu():
version += "+npu"
else:
raise RuntimeError("Unknown runtime environment")

Expand Down Expand Up @@ -256,6 +264,7 @@ def get_ellm_version() -> str:
"cuda": ["onnxruntime-genai-cuda==0.3.0rc2"],
"ipex": [],
"openvino": [],
"npu": [],
},
dependency_links=dependency_links,
entry_points={
Expand Down
Loading