Skip to content

This repository contains the implementation of a Large Language Model (LLM) chatbot powered by TinyLlama, optimized with Intel® OpenVINO™ to enhance performance on Intel AI laptops. This project aims to minimize dependency on GPUs and efficiently run on CPUs, ensuring smooth and responsive interactions.

License

Notifications You must be signed in to change notification settings

Nandan-03/LLM-Chatbot

Repository files navigation

LLM-Chatbot

This repository contains the implementation of a Large Language Model (LLM) chatbot powered by TinyLlama, optimized with Intel® OpenVINO™ to enhance performance on Intel AI laptops. This project aims to minimize dependency on GPUs and efficiently run on CPUs, ensuring smooth and responsive interactions.

Objective

This project leverages Intel® OpenVINO™ to optimize and execute GenAI and LLM inference on Intel AI Laptops' CPUs, minimizing the reliance on GPUs and enabling efficient, high-performance AI deployment in consumer-grade environments. By fine-tuning LLM models with OpenVINO™, we aim to enhance the performance and accessibility of AI applications. Specifically, we have developed a text generation chatbot using TinyLlama/TinyLlama-1.1B-Chat-v1.0 to showcase these capabilities.

Running locally

1. Clone the repository.

git clone https://github.com/Rahul-Biju-03/Technix.git

2. Move into the project directory.

cd Technix

3. Install all the required libraries, by installing the requirements.txt file.

pip install -r requirements.txt

4. (Optional) Running it in a virtual environment.

  • Downloading and installing virtualenv.
pip install virtualenv
  • Create the virtual environment in Python 3.
 virtualenv -p path\to\your\python.exe test_env
  • Activate the test environment.

For Windows:

test_env\Scripts\Activate

For Unix:

source test_env/bin/activate

5. Converting and Quantizing TinyLlama Model with OpenVINO.

  • This script outlines the steps to convert the TinyLlama model from its original format to ONNX, and subsequently quantize it using OpenVINO for optimized performance.
python Conversion_and_Optimisation.py

6. Benchmarking Original and Quantized TinyLlama Model with OpenVINO

  • This script benchmarks the performance and memory usage of the original TinyLlama model against the quantized version using OpenVINO, including model size calculations and inference time measurements.
python CPU_INFERENCE.py

7. TinyLlama Chatbot with Gradio Interface

  • This script sets up a TinyLlama chatbot with a Gradio interface, including preprocessing and postprocessing functions for improved text handling.
python Chatbot.py

Chatbot Interface

Below are two images illustrating the chatbot interface on a mobile device.

cove rep

Demo

Chatbot.Demo.Video.mp4

About

This repository contains the implementation of a Large Language Model (LLM) chatbot powered by TinyLlama, optimized with Intel® OpenVINO™ to enhance performance on Intel AI laptops. This project aims to minimize dependency on GPUs and efficiently run on CPUs, ensuring smooth and responsive interactions.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages