Technix

Problem Statement

Our Problem statement is “Running GenAI on Intel AI Laptops and Simple LLM Inference on CPU and fine-tuning of LLM Models using Intel® OpenVINO™.” The challenge lies in efficiently running Generative AI applications and performing LLM inference on Intel AI Laptops and CPUs, while maintaining high performance without specialized hardware. Additionally, fine-tuning LLM models using Intel® OpenVINO™ for real-time applications requires addressing computational efficiency and resource constraints.

Objective

This project leverages Intel® OpenVINO™ to optimize and execute GenAI and LLM inference on Intel AI Laptops' CPUs, minimizing the reliance on GPUs and enabling efficient, high-performance AI deployment in consumer-grade environments. By fine-tuning LLM models with OpenVINO™, we aim to enhance the performance and accessibility of AI applications. Specifically, we have developed a text generation chatbot using TinyLlama/TinyLlama-1.1B-Chat-v1.0 to showcase these capabilities.

Team Members and Contribution

Rahul Biju (Team Leader): CPU Inference
Nandakrishnan A: Model Optimization and Quantization
Nandana S Nair: Project Report
Krishna Sagar P: Project Report
Rahul Zachariah: User Interface Implementation

Running locally

1. Clone the repository.

git clone https://github.com/Rahul-Biju-03/Technix.git

2. Move into the project directory.

cd Technix

3. Install all the required libraries, by installing the requirements.txt file.

pip install -r requirements.txt

4. (Optional) Running it in a virtual environment.

Downloading and installing virtualenv.

pip install virtualenv

Create the virtual environment in Python 3.

 virtualenv -p path\to\your\python.exe test_env

Activate the test environment.

For Windows:

test_env\Scripts\Activate

For Unix:

source test_env/bin/activate

5. Converting and Quantizing TinyLlama Model with OpenVINO.

This script outlines the steps to convert the TinyLlama model from its original format to ONNX, and subsequently quantize it using OpenVINO for optimized performance.

python Conversion_and_Optimisation.py

6. Benchmarking Original and Quantized TinyLlama Model with OpenVINO

This script benchmarks the performance and memory usage of the original TinyLlama model against the quantized version using OpenVINO, including model size calculations and inference time measurements.

python CPU_INFERENCE.py

7. TinyLlama Chatbot with Gradio Interface

This script sets up a TinyLlama chatbot with a Gradio interface, including preprocessing and postprocessing functions for improved text handling.

python Chatbot.py

Chatbot Interface

Below are two images illustrating the chatbot interface on a mobile device.

Demo

Chatbot.Demo.Video.mp4

Name		Name	Last commit message	Last commit date
Latest commit History 23 Commits
Report LaTeX Source Code		Report LaTeX Source Code
CPU_INFERENCE.py		CPU_INFERENCE.py
Chatbot.py		Chatbot.py
Conversion_and_Optimisation.py		Conversion_and_Optimisation.py
LLM PPT.pptx		LLM PPT.pptx
README.md		README.md
Technix- Report.pdf		Technix- Report.pdf
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Technix

Problem Statement

Objective

Team Members and Contribution

Running locally

Chatbot Interface

Demo

About

Releases

Packages

Contributors 5

Languages

Rahul-Biju-03/Technix

Folders and files

Latest commit

History

Repository files navigation

Technix

Problem Statement

Objective

Team Members and Contribution

Running locally

Chatbot Interface

Demo

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 5

Languages

Packages