Skip to content

Running GenAI on Intel AI Laptops and Simple LLM Inference on CPU and fine-tuning of LLM Models using Intel® OpenVINO™

Notifications You must be signed in to change notification settings

Rahul-Biju-03/Technix

Repository files navigation

Technix

Problem Statement

Our Problem statement is “Running GenAI on Intel AI Laptops and Simple LLM Inference on CPU and fine-tuning of LLM Models using Intel® OpenVINO™.” The challenge lies in efficiently running Generative AI applications and performing LLM inference on Intel AI Laptops and CPUs, while maintaining high performance without specialized hardware. Additionally, fine-tuning LLM models using Intel® OpenVINO™ for real-time applications requires addressing computational efficiency and resource constraints.

Objective

This project leverages Intel® OpenVINO™ to optimize and execute GenAI and LLM inference on Intel AI Laptops' CPUs, minimizing the reliance on GPUs and enabling efficient, high-performance AI deployment in consumer-grade environments. By fine-tuning LLM models with OpenVINO™, we aim to enhance the performance and accessibility of AI applications. Specifically, we have developed a text generation chatbot using TinyLlama/TinyLlama-1.1B-Chat-v1.0 to showcase these capabilities.

Team Members and Contribution

  • Rahul Biju (Team Leader): CPU Inference
  • Nandakrishnan A: Model Optimization and Quantization
  • Nandana S Nair: Project Report
  • Krishna Sagar P: Project Report
  • Rahul Zachariah: User Interface Implementation

Running locally

1. Clone the repository.

git clone https://github.com/Rahul-Biju-03/Technix.git

2. Move into the project directory.

cd Technix

3. Install all the required libraries, by installing the requirements.txt file.

pip install -r requirements.txt

4. (Optional) Running it in a virtual environment.

  • Downloading and installing virtualenv.
pip install virtualenv
  • Create the virtual environment in Python 3.
 virtualenv -p path\to\your\python.exe test_env
  • Activate the test environment.

For Windows:

test_env\Scripts\Activate

For Unix:

source test_env/bin/activate

5. Converting and Quantizing TinyLlama Model with OpenVINO.

  • This script outlines the steps to convert the TinyLlama model from its original format to ONNX, and subsequently quantize it using OpenVINO for optimized performance.
python Conversion_and_Optimisation.py

6. Benchmarking Original and Quantized TinyLlama Model with OpenVINO

  • This script benchmarks the performance and memory usage of the original TinyLlama model against the quantized version using OpenVINO, including model size calculations and inference time measurements.
python CPU_INFERENCE.py

7. TinyLlama Chatbot with Gradio Interface

  • This script sets up a TinyLlama chatbot with a Gradio interface, including preprocessing and postprocessing functions for improved text handling.
python Chatbot.py

Chatbot Interface

Below are two images illustrating the chatbot interface on a mobile device.

cove rep

Demo

Chatbot.Demo.Video.mp4

About

Running GenAI on Intel AI Laptops and Simple LLM Inference on CPU and fine-tuning of LLM Models using Intel® OpenVINO™

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published