GitHub - Nandan-03/LLM-Chatbot: This repository contains the implementation of a Large Language Model (LLM) chatbot powered by TinyLlama, optimized with Intel® OpenVINO™ to enhance performance on Intel AI laptops. This project aims to minimize dependency on GPUs and efficiently run on CPUs, ensuring smooth and responsive interactions.

LLM-Chatbot

This repository contains the implementation of a Large Language Model (LLM) chatbot powered by TinyLlama, optimized with Intel® OpenVINO™ to enhance performance on Intel AI laptops. This project aims to minimize dependency on GPUs and efficiently run on CPUs, ensuring smooth and responsive interactions.

Objective

This project leverages Intel® OpenVINO™ to optimize and execute GenAI and LLM inference on Intel AI Laptops' CPUs, minimizing the reliance on GPUs and enabling efficient, high-performance AI deployment in consumer-grade environments. By fine-tuning LLM models with OpenVINO™, we aim to enhance the performance and accessibility of AI applications. Specifically, we have developed a text generation chatbot using TinyLlama/TinyLlama-1.1B-Chat-v1.0 to showcase these capabilities.

Running locally

1. Clone the repository.

git clone https://github.com/Rahul-Biju-03/Technix.git

2. Move into the project directory.

cd Technix

3. Install all the required libraries, by installing the requirements.txt file.

pip install -r requirements.txt

4. (Optional) Running it in a virtual environment.

Downloading and installing virtualenv.

pip install virtualenv

Create the virtual environment in Python 3.

 virtualenv -p path\to\your\python.exe test_env

Activate the test environment.

For Windows:

test_env\Scripts\Activate

For Unix:

source test_env/bin/activate

5. Converting and Quantizing TinyLlama Model with OpenVINO.

This script outlines the steps to convert the TinyLlama model from its original format to ONNX, and subsequently quantize it using OpenVINO for optimized performance.

python Conversion_and_Optimisation.py

6. Benchmarking Original and Quantized TinyLlama Model with OpenVINO

This script benchmarks the performance and memory usage of the original TinyLlama model against the quantized version using OpenVINO, including model size calculations and inference time measurements.

python CPU_INFERENCE.py

7. TinyLlama Chatbot with Gradio Interface

This script sets up a TinyLlama chatbot with a Gradio interface, including preprocessing and postprocessing functions for improved text handling.

python Chatbot.py

Chatbot Interface

Below are two images illustrating the chatbot interface on a mobile device.

Demo

Chatbot.Demo.Video.mp4

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
CPU_INFERENCE.py		CPU_INFERENCE.py
Chatbot.py		Chatbot.py
Conversion_and_Optimisation.py		Conversion_and_Optimisation.py
LICENSE		LICENSE
README.md		README.md
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

LLM-Chatbot

Objective

Running locally

Chatbot Interface

Demo

About

Releases

Packages

Languages

License

Nandan-03/LLM-Chatbot

Folders and files

Latest commit

History

Repository files navigation

LLM-Chatbot

Objective

Running locally

Chatbot Interface

Demo

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages