Skip to content

Latest commit

 

History

History
75 lines (58 loc) · 2.79 KB

README.md

File metadata and controls

75 lines (58 loc) · 2.79 KB

LLM-Chatbot

This repository contains the implementation of a Large Language Model (LLM) chatbot powered by TinyLlama, optimized with Intel® OpenVINO™ to enhance performance on Intel AI laptops. This project aims to minimize dependency on GPUs and efficiently run on CPUs, ensuring smooth and responsive interactions.

Objective

This project leverages Intel® OpenVINO™ to optimize and execute GenAI and LLM inference on Intel AI Laptops' CPUs, minimizing the reliance on GPUs and enabling efficient, high-performance AI deployment in consumer-grade environments. By fine-tuning LLM models with OpenVINO™, we aim to enhance the performance and accessibility of AI applications. Specifically, we have developed a text generation chatbot using TinyLlama/TinyLlama-1.1B-Chat-v1.0 to showcase these capabilities.

Running locally

1. Clone the repository.

git clone https://github.com/Rahul-Biju-03/Technix.git

2. Move into the project directory.

cd Technix

3. Install all the required libraries, by installing the requirements.txt file.

pip install -r requirements.txt

4. (Optional) Running it in a virtual environment.

  • Downloading and installing virtualenv.
pip install virtualenv
  • Create the virtual environment in Python 3.
 virtualenv -p path\to\your\python.exe test_env
  • Activate the test environment.

For Windows:

test_env\Scripts\Activate

For Unix:

source test_env/bin/activate

5. Converting and Quantizing TinyLlama Model with OpenVINO.

  • This script outlines the steps to convert the TinyLlama model from its original format to ONNX, and subsequently quantize it using OpenVINO for optimized performance.
python Conversion_and_Optimisation.py

6. Benchmarking Original and Quantized TinyLlama Model with OpenVINO

  • This script benchmarks the performance and memory usage of the original TinyLlama model against the quantized version using OpenVINO, including model size calculations and inference time measurements.
python CPU_INFERENCE.py

7. TinyLlama Chatbot with Gradio Interface

  • This script sets up a TinyLlama chatbot with a Gradio interface, including preprocessing and postprocessing functions for improved text handling.
python Chatbot.py

Chatbot Interface

Below are two images illustrating the chatbot interface on a mobile device.

cove rep

Demo

Chatbot.Demo.Video.mp4