Skip to content

ParthaPRay/Curated-List-of-Generative-AI-Tools

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 

Repository files navigation

Curated-List-of-Generative-AI-Tools

This repo contains the curated list of tools for generative AI

Tools

  • LitGPT

Pretrain, finetune, evaluate, and deploy 20+ LLMs on your own data.

LitGPT is a command-line tool designed to easily finetune, pretrain, evaluate, and deploy 20+ LLMs on your own data. It features highly-optimized training recipes for the world's most powerful open-source large language models (LLMs).

image

⚡ LitGPT is a hackable implementation of state-of-the-art open-source large language models released under the Apache 2.0 license.

https://github.com/Lightning-AI/litgpt

https://www.youtube.com/watch?v=PDuzbj5MhoQ&t=485s&ab_channel=FahdMirza

Zero to LitGPT: Getting Started with Pretraining, Finetuning, and Using LLMs

https://github.com/Lightning-AI/litgpt/blob/main/tutorials/0_to_litgpt.md

Making the development of autonomous human-like agents accessible to all.

OpenAGI aims to make human-like agents accessible to everyone, thereby paving the way towards open agents and, eventually, AGI for everyone. https://github.com/aiplanethub/openagi/

https://openagi.aiplanet.com/

  • pyrit

    Python Risk Identification Tool for generative AI (PyRIT)

    It is an open access automation framework to empower security professionals and machine learning engineers to proactively find risks in their generative AI systems.

    https://github.com/Azure/PyRIT

  • LLM OS

Specs:

  • LLM: OpenAI GPT-4 Turbo 256 core (batch size) processor @ 20Hz (tok/s)

  • RAM: 128Ktok

  • Filesystem: Ada002

    image

https://twitter.com/karpathy/status/1723140519554105733

LangChain Templates is a collection of easily deployable reference architectures for a wide variety of tasks.

https://python.langchain.com/docs/templates

Langserve is a library for deploying LangChain chains as a REST API.

https://www.langchain.com/langserve

Langsmith is a developer platform that lets you debug, test, evaluate, and monitor chains built on any LLM framework and seamlessly integrates with LangChain.

https://www.langchain.com/langsmith

LangChain Expression Language (LCEL) is a declarative way to easily compose chains together. LCEL was designed from day 1 to support putting prototypes in production, with no code changes, from the simplest “prompt + LLM” chain to the most complex chains (we’ve seen folks successfully run LCEL chains with 100s of steps in production).

https://python.langchain.com/docs/expression_language/cookbook

  • setfit

    Efficient few-shot learning with Sentence Transformers

    https://github.com/huggingface/setfit

  • MLFlow Build better models and generative AI apps on a unified, end-to-end, open source MLOps platform. It is an open source framework for tracking ML experiments, packaging ML code for training pipelines, and capturing models logged from experiments. It enables data scientists to iterate quickly during model development while keeping their experiments and training pipelines reproducible.

https://github.com/mlflow/mlflow

https://mlflow.org/

  • BentoML is a framework for building reliable, scalable and cost-efficient AI applications. It comes with everything you need for model serving, application packaging, and production deployment. It focuses on ML in production. By design, BentoML is agnostic to the experimentation platform and the model development environment. It is best fitted to manage your “finalized model”, sets of models that yield the best outcomes from your periodic training pipelines and are meant for running in production. BentoML integrates with MLflow natively. Users can not only port over models logged with MLflow Tracking to BentoML for high-performance model serving but also combine MLFlow projects and pipelines with BentoML’s model deployment workflow in an efficient manner.

    image

    https://github.com/bentoml/BentoML

    https://bentoml.com/

  • agency-swarm

    An opensource agent orchestration framework built on top of the latest OpenAI Assistants API.

    https://github.com/VRSEN/agency-swarm

  • moondream a tiny vision language model that kicks ass and runs anywhere

    https://github.com/vikhyat/moondream

  • TaskingAI is an open source framework for LLM applications deployment https://github.com/TaskingAI/TaskingAI

  • The Kubeflow project is dedicated to making deployments of machine learning (ML) workflows on Kubernetes simple, portable and scalable. Kubeflow the cloud-native platform for machine learning operations - pipelines, training and deployment.

    https://github.com/kubeflow/kubeflow

    https://www.kubeflow.org/

  • The Triton Inference Server provides an optimized cloud and edge inferencing solution. Triton Inference Server is an open source inference serving software that streamlines AI inferencing. Triton enables teams to deploy any AI model from multiple deep learning and machine learning frameworks, including TensorRT, TensorFlow, PyTorch, ONNX, OpenVINO, Python, RAPIDS FIL, and more. Triton Inference Server supports inference across cloud, data center, edge and embedded devices on NVIDIA GPUs, x86 and ARM CPU, or AWS Inferentia. Triton Inference Server delivers optimized performance for many query types, including real time, batched, ensembles and audio/video streaming. Triton inference Server is part of NVIDIA AI Enterprise, a software platform that accelerates the data science pipeline and streamlines the development and deployment of production AI.

    https://github.com/triton-inference-server/server

    https://docs.nvidia.com/deeplearning/triton-inference-server/user-guide/docs/index.html

  • PyTriton

    PyTriton is a Flask/FastAPI-like interface that simplifies Triton's deployment in Python environments.

    https://github.com/triton-inference-server/pytriton/

    https://resources.nvidia.com/en-us-ai-inference-large-language-models/

  • Flowwise AI

Open source UI visual tool to build your customized LLM orchestration flow & AI agents

https://flowiseai.com/

https://github.com/FlowiseAI/Flowise

  • BitNet

    Implementation of "BitNet: Scaling 1-bit Transformers for Large Language Models" in pytorch

    https://github.com/kyegomez/BitNet

  • Ray is an open-source unified compute framework that makes it easy to scale AI and Python workloads — from reinforcement learning to deep learning to tuning, and model serving.

    https://github.com/ray-project/ray

    https://www.ray.io/

  • Llma Coder

    Llma coder is a Replace Copilot with a more powerful and local AI

    https://github.com/ex3ndr/llama-coder

  • Code Llama

    Code Llama is a family of large language models for code based on Llama 2 providing state-of-the-art performance among open models, infilling capabilities, support for large input contexts, and zero-shot instruction following ability for programming tasks. We provide multiple flavors to cover a wide range of applications: foundation models (Code Llama), Python specializations (Code Llama - Python), and instruction-following models (Code Llama - Instruct) with 7B, 13B and 34B parameters each. All models are trained on sequences of 16k tokens and show improvements on inputs with up to 100k tokens. 7B and 13B Code Llama and Code Llama - Instruct variants support infilling based on surrounding content. Code Llama was developed by fine-tuning Llama 2 using a higher sampling of code. As with Llama 2, we applied considerable safety mitigations to the fine-tuned versions of the model. For detailed information on model training, architecture and parameters, evaluations, responsible AI and safety refer to our research paper. Output generated by code generation features of the Llama Materials, including Code Llama, may be subject to third party licenses, including, without limitation, open source licenses. https://github.com/facebookresearch/codellama?tab=readme-ov-file

  • Tabby

    It is Self-hosted AI coding assistant

    https://github.com/TabbyML/tabby

  • LlamaIndex

    LlamaIndex is a data framework for LLM-based applications to ingest, structure, and access private or domain-specific data. It’s available in Python (these docs) and Typescript.

    https://github.com/jerryjliu/llama_index

    https://docs.llamaindex.ai/en/stable/

    https://llamahub.ai/

    https://github.com/run-llama/llama-lab

  • SWE-Agent

    SWE-agent takes a GitHub issue and tries to automatically fix it, using GPT-4, or your LM of choice. It solves 12.29% of bugs in the SWE-bench evaluation set and takes just 1.5 minutes to run.

    image

https://github.com/princeton-nlp/SWE-agent

RAGFlow is an open-source RAG (Retrieval-Augmented Generation) engine based on deep document understanding. It offers a streamlined RAG workflow for businesses of any scale, combining LLM (Large Language Models) to provide truthful question-answering capabilities, backed by well-founded citations from various complex formatted data.

![image](https://github.com/ParthaPRay/Curated-List-of-Generative-AI-Tools/assets/1689639/48642478-45b2-4913-a7de-020583419f0a)


 ![image](https://github.com/ParthaPRay/Curated-List-of-Generative-AI-Tools/assets/1689639/6b1c533d-4700-431a-a9ed-0abb6e90af0a)

 ![image](https://github.com/ParthaPRay/Curated-List-of-Generative-AI-Tools/assets/1689639/0d358ab1-8694-49d2-af0c-3eab0358e344)



https://github.com/infiniflow/ragflow?tab=readme-ov-file

A natural language interface for computers

Open Interpreter lets LLMs run code (Python, Javascript, Shell, and more) locally. You can chat with Open Interpreter through a ChatGPT-like interface in your terminal by running $ interpreter after installing.

This provides a natural-language interface to your computer's general-purpose capabilities:

Create and edit photos, videos, PDFs, etc. Control a Chrome browser to perform research Plot, clean, and analyze large datasets ...etc.

https://github.com/OpenInterpreter/open-interpreter

  • AutoCodeRover

    AutoCodeRover is a fully automated approach for resolving GitHub issues (bug fixing and feature addition) where LLMs are combined with analysis and debugging capabilities to prioritize patch locations ultimately leading to a patch.

    image

    https://github.com/nus-apr/auto-code-rover

  • The MetaVoice-1B is a 1.2B parameter base model trained on 100K hours of speech for TTS (text-to-speech). It has been built with the following priorities:

    Emotional speech rhythm and tone in English. No hallucinations.
    Zero-shot cloning for American & British voices, with 30s reference audio.
    Support for (cross-lingual) voice cloning with finetuning.
    We have had success with as little as 1 minute training data for Indian speakers.
    Support for long-form synthesis.
    

    https://github.com/metavoiceio/metavoice-src

  • ChainLit

    Chainlit is an open-source async Python framework which allows developers to build scalable Conversational AI or agentic applications.

    https://github.com/Chainlit/chainlit

    https://docs.chainlit.io/

  • LightLLM

    LightLLM is a Python-based LLM (Large Language Model) inference and serving framework, notable for its lightweight design, easy scalability, and high-speed performance.

    https://github.com/ModelTC/lightllm

  • LiteLLM

    An open source library to simplify LLM completion + embedding calls

    https://litellm.ai/

    https://github.com/BerriAI/litellm

  • FastChat

    An open platform for training, serving, and evaluating large language models. Release repo for Vicuna and Chatbot Arena.

    https://github.com/lm-sys/FastChat

  • seemore

    From scratch implementation of a vision language model in pure PyTorch

    HuggingFace Community Blog that walks through this: https://huggingface.co/blog/AviSoori1x/seemore-vision-language-model

    In this simple implementation of a vision language model (VLM), there are 3 main components.

    • Image Encoder to extract visual features from images. In this case I use a from scratch implementation of the original vision transformer used in CLIP. This is actually a popular choice in many modern VLMs. The one notable exception is Fuyu series of models from Adept, that passes the patchified images directly to the projection layer.

    • Vision-Language Projector - Image embeddings are not of the same shape as text embeddings used by the decoder. So we need to ‘project’ i.e. change dimensionality of image features extracted by the image encoder to match what’s observed in the text embedding space. So image features become ‘visual tokens’ for the decoder. This could be a single layer or an MLP. I’ve used an MLP because it’s worth showing.

    • A decoder only language model. This is the component that ultimately generates text. In my implementation I’ve deviated from what you see in LLaVA etc. a bit by incorporating the projection module to my decoder. Typically this is not observed, and you leave the architecture of the decoder (which is usually an already pretrained model) untouched.

    https://github.com/AviSoori1x/seemore

    The scaled dot product self attention implementation is borrowed from Andrej Kapathy's makemore (https://github.com/karpathy/makemore[https://github.com/karpathy/makemore]). Also the decoder is an autoregressive character-level language model, just like in makemore. Now you see where the name 'seemore' came from :)

  • OnnxStream

    Lightweight inference library for ONNX files, written in C++. It can run SDXL on a RPI Zero 2 but also Mistral 7B on desktops and servers.

    https://github.com/vitoplantamura/OnnxStream

  • PEFT

    Fine-tuning large pretrained models is often prohibitively costly due to their scale. Parameter-Efficient Fine-Tuning (PEFT) methods enable efficient adaptation of large pretrained models to various downstream applications by only fine-tuning a small number of (extra) model parameters instead of all the model's parameters. This significantly decreases the computational and storage costs. Recent state-of-the-art PEFT techniques achieve performance comparable to fully fine-tuned models.

    https://github.com/huggingface/peft

    https://huggingface.co/docs/peft

  • Empower your organization's Business Intelligence with SEC Insights

    A real world full-stack application using LlamaIndex

    image

    https://github.com/run-llama/sec-insights

    https://www.secinsights.ai/

  • AutoTrain Advanced

    AutoTrain Advanced: faster and easier training and deployments of state-of-the-art machine learning models. AutoTrain Advanced is a no-code solution that allows you to train machine learning models in just a few clicks. Please note that you must upload data in correct format for project to be created. For help regarding proper data format and pricing, check out the documentation.https://github.com/huggingface/autotrain-advanced

    https://github.com/huggingface/autotrain-advanced

  • Ludwig

    Ludwig is a low-code framework for building custom AI models like LLMs and other deep neural networks.

    https://github.com/ludwig-ai/ludwig

    http://ludwig.ai/

  • Genmo AI

    Free animation video maker

    https://www.genmo.ai/

  • Kaiber AI

    Discover the artist within you. Turn text, videos, photos, and music into stunning videos with our advanced AI generation engine.

    https://kaiber.ai/

  • VectorShift The No-Code. AI automations platform. An integrated framework of no-code, low-code, and out of the box generative AI solutions to build AI search engines, assistants, chatbots, and automations.

    https://vectorshift.ai/

  • AutoQuant

It allows you to quantize your models in five different formats:

  • GGUF: perfect for inference on CPUs (and LM Studio)
  • GPTQ/EXL2: fast inference on GPUs
  • AWQ: super fast inference on GPUs with vLLM (https://github.com/vllm-project/vllm)
  • HQQ: extreme quantization with decent 2-bit and 3-bit models

https://github.com/qwopqwop200/AutoQuant https://colab.research.google.com/drive/1b6nqC7UZVt8bx4MksX7s656GXPM-eWw4?usp=sharing https://colab.research.google.com/drive/1P646NEg33BZy4BfLDNpTz0V0lwIU3CHu

  • Krea AI

    Real-Time AI Art Generation

    1: Text to Image, 2: Image to Image, 3: Upscaling, 4: AI Patterns, 5: Logo Illusion

    https://www.krea.ai/

  • PixVerse AI

    Create breath-taking videos with AI. Transform your ideas into stunning visuals with our powerful video creation platform

    https://pixverse.ai/

  • mamba - state space model

    Mamba is a new state space model architecture showing promising performance on information-dense data such as language modeling, where previous subquadratic models fall short of Transformers. It is based on the line of progress on structured state space models, with an efficient hardware-aware design and implementation in the spirit of FlashAttention.

    https://github.com/state-spaces/mamba

  • Stable Cascade

    This is the official codebase for Stable Cascade. We provide training & inference scripts, as well as a variety of different models you can use.

    https://github.com/Stability-AI/StableCascade

  • OpenCodeInterpreter

    Integrating Code Generation with Execution and Refinement

    image

    https://opencodeinterpreter.github.io/

    https://huggingface.co/collections/m-a-p/opencodeinterpreter-65d312f6f88da990a64da456

  • TensorRT-LLM

    TensorRT-LLM provides users with an easy-to-use Python API to define Large Language Models (LLMs) and build TensorRT engines that contain state-of-the-art optimizations to perform inference efficiently on NVIDIA GPUs. TensorRT-LLM also contains components to create Python and C++ runtimes that execute those TensorRT engines. It also includes a backend for integration with the NVIDIA Triton Inference Server; a production-quality system to serve LLMs. Models built with TensorRT-LLM can be executed on a wide range of configurations going from a single GPU to multiple nodes with multiple GPUs (using Tensor Parallelism and/or Pipeline Parallelism).

    https://github.com/NVIDIA/TensorRT-LLM/

    https://nvidia.github.io/TensorRT-LLM/

    Old repo Notused now: Transformer related optimization, including BERT, GPT: https://github.com/NVIDIA/FasterTransformer

  • text-generation-webui

    A Gradio web UI for Large Language Models. Supports transformers, GPTQ, AWQ, EXL2, llama.cpp (GGUF), Llama models.

    https://github.com/oobabooga/text-generation-webui

  • Portkey's AI Gateway

    It is the interface between your app and hosted LLMs. It streamlines API requests to OpenAI, Anthropic, Mistral, LLama2, Anyscale, Google Gemini and more with a unified API.

    A Blazing Fast AI Gateway. Route to 100+ LLMs with 1 fast & friendly API.

    https://github.com/portkey-ai/gateway

    https://portkey.ai/

  • Groq

    It is the fastest inference platform for LLM. but not to be used for training of fine tuning purposes. It is dependent on Language Processing Unit LPU.

    https://groq.com/

  • llama-cpp-python

    Python bindings for llama.cpp. Simple Python bindings for @ggerganov's llama.cpp library. This package provides:

    Low-level access to C API via ctypes interface. High-level Python API for text completion, OpenAI-like API, LangChain compatibility, LlamaIndex compatibility, OpenAI compatible web server, Local Copilot replacement, Function Calling support, Vision API support, Multiple Models

    https://github.com/abetlen/llama-cpp-python

    https://llama-cpp-python.readthedocs.io/en/latest/

  • Gemma.cpp

    gemma.cpp is a lightweight, standalone C++ inference engine for the Gemma foundation models from Google.

For additional information about Gemma, see ai.google.dev/gemma https://ai.google.dev/gemma) Model weights, including gemma.cpp specific artifacts, are available on kaggle https://www.kaggle.com/models/google/gemma.

https://github.com/google/gemma.cpp

  • Pandas-AI

PandasAI is a Python library that makes it easy to ask questions to your data (CSV, XLSX, PostgreSQL, MySQL, BigQuery, Databrick, Snowflake, etc.) in natural language. xIt helps you to explore, clean, and analyze your data using generative AI.

https://docs.pandas-ai.com/en/latest/

https://github.com/Sinaptik-AI/pandas-ai

  • Auto Data

Auto Data is a library designed for quick and effortless creation of datasets tailored for fine-tuning Large Language Models (LLMs) using json format

support for ChatGPT API only

https://github.com/Itachi-Uchiha581/Auto-Data

  • Cleanlab

    The standard data-centric AI package for data quality and machine learning with messy, real-world data and labels.

    cleanlab helps you clean data and labels by automatically detecting issues in a ML dataset. To facilitate machine learning with messy, real-world data, this data-centric AI package uses your existing models to estimate dataset problems that can be fixed to train even better models.

    https://cleanlab.ai/

    https://github.com/cleanlab/cleanlab

  • LlamaHub

    Get your RAG application rolling in no time. Mix and match our Data Loaders and Agent Tools to build custom RAG apps or use our LlamaPacks as a starting point for your retrieval use cases.

    https://github.com/run-llama/llama_index

    https://llamahub.ai/

  • FlagEmbedding

    Retrieval and Retrieval-augmented LLMs.

    FlagEmbedding focuses on retrieval-augmented LLMs, consisting of the following projects currently:

    • Long-Context LLM: Activation Beacon
    • Fine-tuning of LM : LM-Cocktail
    • Dense Retrieval: BGE-M3, LLM Embedder, BGE Embedding
    • Reranker Model: BGE Reranker
    • Benchmark: C-MTEB

    https://github.com/FlagOpen/FlagEmbedding

    https://huggingface.co/BAAI/bge-base-en-v1.5

  • AssemblyAI

    With a single API call, get access to AI models built on the latest AI breakthroughs to transcribe and understand audio and speech data securely at large scale.

    https://github.com/AssemblyAI/assemblyai-python-sdk

    CookBook:

    https://github.com/AssemblyAI/cookbook

  • quanto

    A pytorch Quantization Toolkit

    https://github.com/huggingface/quanto

  • pi-genai-stack

    Run 🦙 @ollama and 🐬 TinyDolphin, 🦙 TinyLlama and other small LLMs on a Raspberry Pi 5 with @docker #Compose

    The stack provides development environments to experiment with Ollama and 🦜🔗 Lanchain without installing anything:

    • Python dev environment (available)
    • JavaScript dev environment (available)

    https://github.com/bots-garden/pi-genai-stack

  • iter

    🔁 Code iteration tool running on Groq

    https://github.com/freuk/iter

    https://www.youtube.com/watch?v=m1qnOKXGSAk&t=10s&ab_channel=MervinPraison

  • outlines

    Outlines〰 is a Python library that allows you to use Large Language Model in a simple and robust way (with structured generation). It is built by .txt, and is already used in production by many companies.

    We support Openai, but the true power of Outlines〰 is unleashed with Open Source models available via the Transformers, llama.cpp, exllama2 and mamba_ssm libraries. If you want to build and maintain an integration with another library, get in touch.

    Structured Text Generation

    - Outlines 〰 is a library for neural text generation. You can think of it as a more flexible replacement for the generate method in the transformers library.
    
    - Outlines 〰 helps developers structure text generation to build robust interfaces with external systems. Provides generation methods that guarantee that the output will match a regular expressions, or follow a JSON schema.
    
    - Outlines 〰 provides robust prompting primitives that separate the prompting from the execution logic and lead to simple implementations of few-shot generations, ReAct, meta-prompting, agents, etc.
    
    - Outlines 〰 is designed as a library that is meant to be compatible the broader ecosystem, not to replace it. We use as few abstractions as possible, and generation can be interleaved with control flow, conditionals, custom Python functions and calls to other libraries.
    
     - Outlines 〰 is compatible with every auto-regressive model. It only interfaces with models via the next-token logits.
    

    https://github.com/outlines-dev/outlines

    https://outlines-dev.github.io/outlines/

  • agentkit

Starter-kit to build constrained agents with Nextjs, FastAPI and Langchain

AgentKit is a LangChain-based starter kit developed by BCG X to build Agent apps. Developers can use AgentKit to

- Quickly experiment on your constrained agent architecture with a beautiful UI
- Build a full stack chat-based Agent app that can scale to production-grade MVP

https://agentkit.infra.x.bcg.com/

https://github.com/BCG-X-Official/agentkit

  • OpenSora

    image

    Open-Sora, an initiative dedicated to efficiently produce high-quality video and make the model, tools and contents accessible to all. By embracing open-source principles, Open-Sora not only democratizes access to advanced video generation techniques, but also offers a streamlined and user-friendly platform that simplifies the complexities of video production. With Open-Sora, we aim to inspire innovation, creativity, and inclusivity in the realm of content creation.

    Open-Sora: Democratizing Efficient Video Production for All

    https://github.com/hpcaitech/Open-Sora

  • Dramatron

Dramatron uses existing, pre-trained large language models to generate long, coherent text and could be useful for authors for co-writing theatre scripts and screenplays. Dramatron uses hierarchical story generation for consistency across the generated text. Starting from a log line, Dramatron interactively generates character descriptions, plot points, location descriptions, and dialogue. These generations provide human authors with material for compilation, editing, and rewriting.

Dramatron is conceived as a writing tool and as a source of inspiration and exploration for writers. To evaluate Dramatron’s usability and capabilities, we engaged 15 playwrights and screenwriters in two-hour long user study sessions to co-write scripts alongside Dramatron.

image

One concrete illustration of how Dramatron can be utilised by creative communities is how one playwright staged 4 heavily edited and rewritten scripts co-written alongside Dramatron. In the public theatre show, Plays by Bots, a talented cast of experienced actors with improvisational skills gave meaning to Dramatron scripts through acting and interpretation.

Dramatron uses large language models to generate coherent scripts and screenplays.

https://colab.research.google.com/github/deepmind/dramatron/blob/main/colab/dramatron.ipynb

https://deepmind.github.io/dramatron

https://github.com/google-deepmind/dramatron

Ragas is a framework that helps you evaluate your Retrieval Augmented Generation (RAG) pipelines. RAG denotes a class of LLM applications that use external data to augment the LLM’s context. There are existing tools and frameworks that help you build these pipelines but evaluating it and quantifying your pipeline performance can be hard. This is where Ragas (RAG Assessment) comes in.

Ragas provides you with the tools based on the latest research for evaluating LLM-generated text to give you insights about your RAG pipeline. Ragas can be integrated with your CI/CD to provide continuous checks to ensure performance.

Router

  • OpenRouter

A unified interface for LLMs select more than 100 LLMs to router dynamically.

https://openrouter.ai/

LLM Evaluation

RAG Evaluation

  • giskard

🐢 Evaluation & Testing framework for LLMs and ML models.

https://github.com/Giskard-AI/giskard

https://www.youtube.com/watch?v=ZPX3W77h_1E&ab_channel=Underfitted

  • RAGAS

Evaluation framework for your Retrieval Augmented Generation (RAG) pipelines

https://github.com/explodinggradients/ragas

  • MIRAGE

Medical Information Retrieval-Augmented Generation Evaluation) Benchmark! This repository contains a comprehensive dataset and benchmark results aimed at evaluating Retrieval-Augmented Generation (RAG) systems for medical question answering (QA). We use the MedRAG toolkit to evaluate existing solutions of various components in RAG on MIRAGE

https://github.com/Teddy-XiongGZ/MIRAGE

  • fastRAG

Efficient Retrieval Augmentation and Generation Framework. fastRAG is a research framework for efficient and optimized retrieval augmented generative pipelines, incorporating state-of-the-art LLMs and Information Retrieval. fastRAG is designed to empower researchers and developers with a comprehensive tool-set for advancing retrieval augmented generation.

https://github.com/IntelLabs/fastRAG

Graph Related Tools for Helping RAG or GRAPH RAG Tools

  • graspologic

    Python package for graph statistics.

    A graph, or network, provides a mathematically intuitive representation of data with some sort of relationship between items. For example, a social network can be represented as a graph by considering all participants in the social network as nodes, with connections representing whether each pair of individuals in the network are friends with one another. Naively, one might apply traditional statistical techniques to a graph, which neglects the spatial arrangement of nodes within the network and is not utilizing all of the information present in the graph. In this package, we provide utilities and algorithms designed for the processing and analysis of graphs with specialized graph statistical algorithms.

    https://github.com/microsoft/graspologic

    https://microsoft.github.io/graspologic/latest/index.html

Web & Desktop Apps

BionicGPT is an on-premise replacement for ChatGPT, offering the advantages of Generative AI while maintaining strict data confidentiality

https://github.com/bionic-gpt/bionic-gpt

The hypothesis of the project is that AI tools for thought should run models locally by default. Reor stands on the shoulders of the giants Llama.cpp, Transformers.js & LanceDB to enable both LLMs and embedding models to run locally. (Connecting to OpenAI-compatible APIs like Oobabooga is also supported.)

Connect your data sources, set up some data views (i.e. SQL scripts), configure a GPT Assistant, publish a Custom GPT in the ChatGPT store, and share it with your users, employees, or customers!

  • QAnything

    Question and Answer based on Anything.

    QAnything(Question and Answer based on Anything) is a local knowledge base question-answering system designed to support a wide range of file formats and databases, allowing for offline installation and use.

With QAnything, you can simply drop any locally stored file of any format and receive accurate, fast, and reliable answers.

Currently supported formats include: PDF(pdf),Word(docx),PPT(pptx),XLS(xlsx),Markdown(md),Email(eml),TXT(txt),Image(jpg,jpeg,png),CSV(csv),Web links(html) and more formats coming soon…

Architecture:

image

Use with: https://huggingface.co/netease-youdao/Qwen-7B-QAnything

https://github.com/netease-youdao/QAnything

Local LLM Running Tools

image Reference: https://www.youtube.com/watch?v=MKnj-qsWNrw&ab_channel=FahdMirza

TRL is a full stack library where we provide a set of tools to train transformer language models with Reinforcement Learning, from the Supervised Fine-tuning step (SFT), Reward Modeling step (RM) to the Proximal Policy Optimization (PPO) step. The library is integrated with 🤗 transformers.

https://huggingface.co/docs/trl/main/en/index

  • The open-source language model computer

    The 01 Project is building an open-source ecosystem for AI devices.

Our flagship operating system can power conversational devices like the Rabbit R1, Humane Pin, or Star Trek computer.

We intend to become the GNU/Linux of this space by staying open, modular, and free.

The 01 exposes a speech-to-speech websocket at localhost:10001.

If you stream raw audio bytes to / in LMC format, you will receive its response in the same format.

Inspired in part by Andrej Karpathy's LLM OS, we run a code-interpreting language model, and call it when certain events occur at your computer's kernel.

The 01 wraps this in a voice interface:

image

https://github.com/OpenInterpreter/01

https://youtu.be/YxiNUST6gU4?si=e_jvAbLL5N6QDrVU

Fine Tuning Tools

  • LLaMA Factory

    Easy-to-use LLM fine-tuning framework (LLaMA, BLOOM, Mistral, Baichuan, Qwen, ChatGLM)

    https://github.com/hiyouga/LLaMA-Factory

  • unsloth

    5X faster 60% less memory QLoRA finetuning. Fine tune Mistral, Llama 2-5x faster with 70% less memory!

    https://github.com/unslothai/unsloth

  • TRL

    TRL - Transformer Reinforcement Learning. Full stack transformer language models with reinforcement learning. trl is a full stack library where we provide a set of tools to train transformer language models and stable diffusion models with Reinforcement Learning, from the Supervised Fine-tuning step (SFT), Reward Modeling step (RM) to the Proximal Policy Optimization (PPO) step. The library is built on top of the transformers library by 🤗 Hugging Face. Therefore, pre-trained language models can be directly loaded via transformers. At this point, most of decoder architectures and encoder-decoder architectures are supported. Refer to the documentation or the examples/ folder for example code snippets and how to run these tools.

    Untitled

    https://github.com/huggingface/trl

    A starting point could be: https://github.com/huggingface/trl/blob/main/examples/scripts/sft.py

  • Axolotl

    Axolotl is a tool designed to streamline the fine-tuning of various AI models, offering support for multiple configurations and architectures.

    https://github.com/OpenAccess-AI-Collective/axolotl

  • AutoTrain Advanced

    AutoTrain Advanced: faster and easier training and deployments of state-of-the-art machine learning models.

    https://github.com/huggingface/autotrain-advanced

    https://huggingface.co/autotrain

External List of Tool on ML

This repository contains a curated list of awesome open source libraries that will help you deploy, monitor, version, scale and secure your production machine learning

https://github.com/EthicalML/awesome-production-machine-learning

Open Source LLMs for live chat

Rise of AI

1686648543214

1686648771444

Tools

  • 🐶 Bark

🔊 Text-Prompted Generative Audio Model

https://colab.research.google.com/drive/1eJfA2XUa-mXwdMy7DoYKVYHI1iTd9Vkt?usp=sharing

https://github.com/suno-ai/bark

https://app.suno.ai/

Innovations to be Fueled by Generative AI

Generative AI is revolutionizing various sectors, offering a wide array of innovations and capabilities. Let's delve into each of the critical technologies you mentioned:

  • Artificial General Intelligence (AGI): This refers to a machine's ability to understand, learn, and apply intellectual skills at a level equal to or surpassing human intelligence. AGI remains a theoretical concept but represents the ultimate goal of many AI research endeavors.

  • AI Engineering: This is about creating a systematic approach to developing, maintaining, and supporting AI systems in enterprise environments. It ensures that AI applications are scalable, sustainable, and effectively integrated into existing business processes.

  • Autonomic Systems: These are systems capable of self-management, adapting to changes in their environment while maintaining their objectives. They are autonomous, learn from interactions, and make decisions based on their programming and experiences.

  • Cloud AI Services: These services provide tools for building AI models, APIs for existing services, and middleware support. They enable the development, deployment, and operation of machine learning models as cloud-based services, making AI more accessible and scalable.

  • Composite AI: This involves integrating various AI techniques to enhance learning efficiency and broaden the scope of knowledge representations. It addresses a wider range of business problems more effectively by combining different AI approaches.

  • Computer Vision: This technology focuses on interpreting and understanding visual information from the physical world. It involves capturing, processing, and analyzing images and videos to extract meaningful insights.

  • Data-centric AI: This approach emphasizes improving training data quality to enhance AI outcomes. It deals with data quality, privacy, and scalability, focusing on the data used in AI systems rather than just the algorithms.

  • Edge AI: This refers to AI systems implemented at the 'edge' of networks, such as in IoT devices, rather than centralized in cloud-based systems. It's crucial for real-time processing in applications like autonomous vehicles and medical diagnostics.

  • Intelligent Applications: These applications adapt and respond autonomously to interactions with people and other machines, learning from these interactions to improve their responses and actions.

  • Model Operationalization (ModelOps): This focuses on managing the entire lifecycle of AI models, including development, deployment, monitoring, and governance. It's essential for maintaining the effectiveness and integrity of AI systems.

  • Operational AI Systems (OAISys): These systems facilitate the orchestration, automation, and scaling of AI applications in enterprise settings, encompassing machine learning, deep neural networks, and generative AI.

  • Prompt Engineering: This involves crafting inputs for AI models to guide the responses they generate. It's particularly relevant for generative AI models where the input significantly influences the output.

  • Smart Robots: These are autonomous, often mobile robots equipped with AI, capable of performing physical tasks independently.

  • Synthetic Data: This is data generated through algorithms or simulations, used as an alternative to real-world data for training AI models. It's particularly useful in situations where real data is scarce, expensive, or sensitive.

Each of these technologies contributes to the rapidly evolving landscape of generative AI, pushing the boundaries of what's possible and opening up new opportunities across various industries.

Foundation Model

A foundation model is an AI model that is trained on broad and extensive datasets, allowing it to be applied across a wide range of use cases. These models have become instrumental in the field of artificial intelligence and have powered various applications, including chatbots and generative AI. The term "foundation model" was popularized by the Center for Research on Foundation Models (CRFM) at the Stanford Institute for Human-Centered Artificial Intelligence (HAI).

The term "foundation model," as coined by the Stanford Institute for Human-Centered Artificial Intelligence's (HAI) Center for Research on Foundation Models (CRFM) in August 2021, refers to a class of AI models that have been meticulously designed to be adaptable powerhouses in the realm of artificial intelligence. These models are characterized by their extensive training on diverse data using self-supervision at scale, making them versatile and capable of tackling a wide range of tasks. The term was chosen with great care to emphasize their intended function, which is to serve as the foundational building blocks for diverse AI applications. Unlike narrower terms like "large language model" or "self-supervised model," "foundation model" underscores their adaptability and applicability across various domains, thereby avoiding misconceptions about their capabilities and training methods. In essence, foundation models represent a groundbreaking approach to AI development, offering boundless potential for innovation and problem-solving across different fields and modalities.

Key points about foundation models:

  • General-Purpose Technology: Foundation models are designed to be general-purpose technologies that can support a diverse range of applications. They are versatile and can be adapted to various tasks.

  • Resource-Intensive Development: Building foundation models can be highly resource-intensive, with significant costs involved. Some of the most advanced models require substantial investments in data collection and computational power, often costing hundreds of millions of dollars.

  • Examples Across Modalities: Foundation models are not limited to text-based applications. They have been developed for various modalities, including images (e.g., DALL-E and Flamingo), music (e.g., MusicGen), robotic control (e.g., RT-2), and more. This broadens their applicability.

  • Diverse Fields of Application: Foundation models are being developed and applied in a wide range of fields, including astronomy, radiology, robotics, genomics, music composition, coding, mathematics, and others. They are seen as transformative in AI development across multiple domains.

  • Definitions and Regulation: The term "foundation model" was coined by the CRFM, and various definitions have emerged as governments and regulatory bodies aim to provide legal frameworks for these models. In the U.S., a foundation model is defined as having broad data, self-supervision, and tens of billions of parameters. The European Union and the United Kingdom have their own definitions with some subtle distinctions.

  • Personalization: Foundation models are not inherently capable of handling specific personal concepts. Methods have been developed to augment these models with personalized information or concepts without requiring a full retraining of the model. This personalization can be achieved for various tasks, such as image retrieval or text-to-image generation.

  • Opportunities and Risks: Foundation models offer tremendous opportunities in various fields, including language processing, vision, robotics, and more. However, they also come with risks, including concerns about inequity, misuse, economic and environmental impacts, and ethical considerations. The widespread use of foundation models has raised questions about the concentration of economic and political power.

Large-scale Language Models

Large-scale language models (LLMs) are distinguished by their comprehensive language comprehension and generation abilities. These models are trained on vast data sets, learning billions of parameters, and require significant computational power for both training and operation. Typically structured as artificial neural networks, predominantly transformers, LLMs are trained through self-supervised and semi-supervised learning methods.

Functioning as autoregressive language models, LLMs process input text and iteratively predict subsequent words or tokens. Until 2020, fine-tuning was the sole approach for tailoring these models to specific tasks. However, larger models like GPT-3 have demonstrated that prompt engineering can achieve comparable results. LLMs are believed to assimilate knowledge of syntax, semantics, and "ontology" from human language data, but they also inherit any inaccuracies and biases present in these data sources.

Prominent examples of LLMs include OpenAI's GPT series (such as GPT-3.5 and GPT-4 used in ChatGPT), Google's PaLM (utilized in Bard), Meta's LLaMA, along with BLOOM, Ernie 3.0 Titan, and Anthropic's Claude 2.

We present the comparative list of LLMs below. Traning cost is presented as (petaFLOP/day). For the training cost column, 1 petaFLOP-day = 1 petaFLOP/sec × 1 day = 8.64E19 FLOP.

Model Name Release Year Developer #Parameters Corpus size Training cost License Comments
GPT-1 Jun-18 OpenAI 117 million First GPT model, decoder-only transformer
BERT Oct-18 Google 340 million 3.3 billion words 9 Apache 2.0 An early and influential language model, but encoder-only and thus not built to be prompted or generative
XLNet Jun-19 Google ~340 million 33 billion words An alternative to BERT; designed as encoder-only
GPT-2 Feb-19 OpenAI 1.5 billion 40GB (~10 billion tokens) MIT general-purpose model based on transformer architecture
GPT-3 May-20 OpenAI 175 billion 300 billion tokens 3640 proprietary A fine-tuned variant of GPT-3, termed GPT-3.5, was made available to the public through a web interface called ChatGPT in 2022
GPT-Neo Mar-21 EleutherAI 2.7 billion 825 GiB MIT The first of a series of free GPT-3 alternatives released by EleutherAI. GPT-Neo outperformed an equivalent-size GPT-3 model on some benchmarks, but was significantly worse than the largest GPT-3
GPT-J Jun-21 EleutherAI 6 billion 825 GiB 200 Apache 2.0 GPT-3-style language model
Megatron-Turing NLG October 2021 Microsoft and Nvidia 530 billion 338.6 billion tokens Restricted web access Standard architecture but trained on a supercomputing cluster
Ernie 3.0 Titan Dec-21 Baidu 260 billion 4 Tb Proprietary Chinese-language LLM. Ernie Bot is based on this model
Claude Dec-21 Anthropic 52 billion 400 billion tokens beta Fine-tuned for desirable behavior in conversations
GLaM (Generalist Language Model) Dec-21 Google 1.2 trillion 1.6 trillion tokens 5600 Proprietary Sparse mixture of experts model, making it more expensive to train but cheaper to run inference compared to GPT-3
Gopher Dec-21 DeepMind 280 billion 300 billion tokens 5833 Proprietary Further developed into the Chinchilla model
LaMDA (Language Models for Dialog Applications) Jan-22 Google 137 billion 1.56T words, 168 billion tokens 4110 Proprietary Specialized for response generation in conversations
GPT-NeoX Feb-22 EleutherAI 20 billion 825 GiB 740 Apache 2.0 based on the Megatron architecture
Chinchilla Mar-22 DeepMind 70 billion 1.4 trillion tokens 6805 Proprietary Reduced-parameter model trained on more data. Used in the Sparrow bot. Often cited for its neural scaling law
PaLM (Pathways Language Model) Apr-22 Google 540 billion 768 billion tokens 29250 Proprietary aimed to reach the practical limits of model scale
OPT (Open Pretrained Transformer) May-22 Meta 175 billion 180 billion tokens 310 Non-commercial research GPT-3 architecture with some adaptations from Megatron
YaLM 100B Jun-22 Yandex 100 billion 1.7TB Apache 2.0 English-Russian model based on Microsoft's Megatron-LM
Minerva Jun-22 Google 540 billion 38.5B tokens from webpages filtered for mathematical content and from papers submitted to the arXiv preprint server Proprietary LLM trained for solving "mathematical and scientific questions using step-by-step reasoning". Minerva is based on PaLM model, further trained on mathematical and scientific data
BLOOM Jul-22 Large collaboration led by Hugging Face 175 billion 350 billion tokens (1.6TB) Responsible AI Essentially GPT-3 but trained on a multi-lingual corpus (30% English excluding programming languages)
Galactica Nov-22 Meta 120 billion 106 billion tokens unknown CC-BY-NC-4.0 Trained on scientific text and modalities
AlexaTM (Teacher Models) Nov-22 Amazon 20 billion 1.3 trillion proprietary bidirectional sequence-to-sequence architecture
LLaMA (Large Language Model Meta AI) Feb-23 Meta 65 billion 1.4 trillion 6300 Non-commercial research Trained on a large 20-language corpus to aim for better performance with fewer parameters. Researchers from Stanford University trained a fine-tuned model based on LLaMA weights, called Alpaca
GPT-4 Mar-23 OpenAI Exact number unknown Unknown Unknown proprietary Available for ChatGPT Plus users and used in several products
Cerebras-GPT Mar-23 Cerebras 13 billion 270 Apache 2.0 Trained with Chinchilla formula
Falcon Mar-23 Technology Innovation Institute 40 billion 1 trillion tokens, from RefinedWeb (filtered web text corpus) plus some "curated corpora" 2800 Apache 2.0
BloombergGPT Mar-23 Bloomberg L.P. 50 billion 363 billion token dataset based on Bloomberg's data sources, plus 345 billion tokens from general purpose datasets Proprietary LLM trained on financial data from proprietary sources, that "outperforms existing models on financial tasks by significant margins without sacrificing performance on general LLM benchmarks"
PanGu-Σ Mar-23 Huawei 1.085 trillion 329 billion tokens Proprietary
OpenAssistant Mar-23 LAION 17 billion 1.5 trillion tokens Apache 2.0 Trained on crowdsourced open data
Jurassic-2 Mar-23 AI21 Labs Exact size unknown Unknown Proprietary Multilingual
PaLM 2 May-23 Google 340 billion 3.6 trillion tokens 85000 Proprietary Used in Bard chatbot
Llama 2 Jul-23 Meta 70 billion 2 trillion tokens Llama 2 license Successor of LLaMA
Claude 2 Jul-23 Anthropic Unknown Unknown Unknown Proprietary Used in Claude chatbot
Falcon 180B Sep-23 Technology Innovation Institute 180 billion 3.5 trillion tokens Falcon 180B TII license
Mistral 7B Sep-23 Mistral AI 7.3 billion Unknown Apache 2.0
OpenHermes-15B Sep-23 Nous Research 13 billion Unknown Unknown MIT
Claude 2.1 Nov-23 Anthropic Unknown Unknown Unknown Proprietary Used in Claude chatbot. Has a context window of 200,000 tokens, or ~500 pages
Grok-1 Nov-23 x.AI Unknown Unknown Unknown Proprietary Used in Grok chatbot. Grok-1 has a context length of 8,192 tokens and has access to X (Twitter)
Gemini Dec-23 Google DeepMind Unknown Unknown Unknown Proprietary Multimodal model, comes in three sizes. Used in Bard chatbot
Mixtral 8x7B Dec-23 Mistral AI 46.7B total, 12.9B parameters per token Unknown Unknown Apache 2.0 Mixture of experts model, outperforms GPT-3.5 and Llama 2 70B on many benchmarks. All weights were released via torrent
Phi-2 Dec-23 Microsoft 2.7B 1.4T tokens Unknown Proprietary So-called small language model, that "matches or outperforms models up to 25x larger", trained on "textbook-quality" data based on the paper "Textbooks Are All You Need". Model training took "14 days on 96 A100 GPUs"

Evaluating Models

Evaluating a generative AI model involves a multifaceted assessment that encompasses several critical aspects. Firstly, assessing the quality of the model involves scrutinizing the accuracy and relevance of its generated output. However, with the increasing complexity of these models, their behavior can sometimes become unpredictable, potentially leading to outputs that may not always be reliable. Secondly, evaluating the model's robustness is essential, focusing on its ability to handle a wide range of inputs effectively. A pressing concern in the evaluation process is the presence of biases in AI models, which can inadvertently surface due to the inherent biases in the human-generated data used for training. Addressing these biases and navigating the ethical considerations surrounding AI technology are formidable challenges that the AI community must actively address and mitigate.

model-evaluation

Emerging LLM App Stack

The emerging tech stack for LLMs represents a rapidly evolving ecosystem of tools and platforms that empower developers to build and deploy LLM-based applications. With the continuous growth and innovation in the LLM field, it's crucial to highlight the tooling available to complement these models.

One essential component in the LLM app stack is "Playgrounds." Playgrounds serve as user-friendly interfaces that allow developers to experiment with LLM-based applications. They provide an entry point for individuals to interact with LLMs, such as generating text based on prompts or transcribing audio files. These browser-based interfaces often come equipped with the necessary resources, such as GPU access, making them accessible for experimentation.

In terms of app hosting, developers have several options. Local hosting, while cost-effective during the development phase, is limited to individual use and may not scale well for production applications. Self-hosting offers more control over data privacy and application management but comes with significant GPU costs and quality considerations.

Emerging app hosting products like Vercel, Steamship, Streamlit, and Modal are simplifying the deployment of LLM applications. Vercel, for instance, streamlines front-end deployment, allowing developers to quickly deploy AI apps using pre-built templates. Steamship focuses on building AI agents powered by LLMs for problem-solving and automation. Streamlit, an open-source Python library, enables developers to create web front-ends for LLM projects without prior front-end experience. Modal abstracts complexities related to cloud deployment, improving the feedback loop between local development and cloud execution.

The common theme among these emerging tools is their ability to abstract complex technologies, allowing developers to focus on their code and applications. As the AI landscape evolves rapidly, these tools play a crucial role in reducing the time and effort required for building and deploying LLM applications, making them invaluable resources for developers in this dynamic field.

Screenshot 2023-12-30 121743

ML Workflow

The classical ML workflow involves a series of meticulously defined steps, beginning with problem definition and data preparation, followed by feature engineering, data splitting, model selection, training, hyperparameter tuning, and evaluation. Once the model demonstrates satisfactory performance, it is deployed into a production environment, where it is continuously monitored and maintained. This process is characterized by its emphasis on manual intervention at each stage, requiring substantial expertise in data science and machine learning. The workflow is iterative, with feedback from model monitoring being used to refine and improve the model, particularly in response to challenges like data drift.

1698349719667

LLM Workflow

In contrast, the LLM workflow, as exemplified by technologies like GPT-3, represents a shift towards utilizing pre-trained models. These models are accessible through REST API endpoints provided by organizations like OpenAI, allowing a wide range of users to leverage advanced ML capabilities without the need for extensive ML expertise. This approach democratizes access to powerful machine learning tools, enabling not just ML practitioners but also developers and less technical users to benefit from the models' capabilities. The LLM workflow is particularly notable for its real-time application, and architectures like Retrieval Augmented Generation (RAG) play a crucial role in maintaining information freshness and contextuality, thereby enhancing the models' effectiveness in tasks like question answering and summarization. This shift from building and training models from scratch to utilizing pre-trained models represents a significant transformation in the field of machine learning, broadening the scope and accessibility of these technologies.

1698348847518

LLMops Landscape

The landscape of Large Language Model Operations, commonly referred to as LLMops, is a dynamic and evolving realm, distinct from the more traditional Machine Learning Operations (MLops). LLMops involves a set of tools and infrastructure specifically tailored to the implementation of generative AI use cases. This distinction arises from the fundamental differences between generative AI and predictive AI applications.

In MLops (Machine Learning Operations), the focus is on systems of prediction, where machine learning models perform objective-focused tasks, often providing recommendations, classifications, or predictions. On the other hand, LLMops pertains to systems of creation, where generative AI applications produce open-ended or qualitative content, such as generating marketing copy in a company's voice.

Several factors differentiate MLops from LLMops:

  • Transfer Learning: Generative AI products often begin with pre-trained foundation models, which are then customized for specific use cases. This process is typically easier than creating predictive ML models from scratch, involving data gathering, annotation, training, and hyperparameter tuning.

  • Compute Management: Training and running large language models are computationally intensive tasks. LLMs, even when leveraging pre-trained models, demand significant computational resources for inference compared to predictive ML models.

  • Feedback Loops: Predictive ML models often produce clear performance metrics, making evaluation straightforward. In contrast, generative AI models produce qualitative output, which can be challenging to assess. Techniques like Reinforcement Learning from Human Feedback (RLHF) or reinforcement learning from AI feedback (RLAIF) are used to fine-tune generative models.

Despite these differences, there are areas of convergence between LLMops and MLops in the enterprise context. Both share concerns related to data privacy, model governance, and model security. Ensuring data privacy and handling software code in prompts or fine-tuning LLMs require careful consideration. Model governance is challenging for both predictive ML and generative AI, as complex models are difficult to explain and track. Model security is crucial for protecting data sets and models from potential threats.

The current LLMOps landscape includes various tools and solutions across categories like vector databases, prompt engineering, and model monitoring. Many of these tools have emerged recently, reflecting the growing interest in generative AI. Efficiency in inference infrastructure has become a critical differentiator, with solutions like Run:AI and Deci AI addressing compute optimization challenges.

Areas warranting more focus in the LLMops ecosystem include privacy, model security, and model governance. Enterprises often face challenges in these aspects when deploying generative AI products, and building trust and reliability in LLMs will be a significant competitive advantage.

In conclusion, the LLMops landscape is a rapidly evolving field with its own set of tools and considerations. While distinct from MLops, it shares common concerns and challenges in the enterprise context. As generative AI continues to gain traction, LLMops will play a crucial role in enabling the deployment of powerful AI capabilities. Existing players and startups are navigating this space to leverage their strengths and compete in the emerging generative AI landscape.

llmops-mlops-tech-stack-for-generative-ai

llmops-market-map-1

Retrieval Augmented Generation (RAG)

Large Language Models (LLMs) like GPT-3 have revolutionized the field of natural language processing with their ability to generate human-like text. However, despite their impressive capabilities, these models have inherent limitations, particularly in accessing external, up-to-date information or specific data that is not within their training set. To address these challenges, the concept of Retrieval Augmented Generation (RAG) has been introduced. RAG combines the generative power of LLMs with the precision of a retrieval system. This approach significantly enhances the performance of LLMs, making them more contextually aware and factually accurate. In an era where AI is increasingly utilized across various fields, the accuracy and relevance of the information provided by these models are of paramount importance. RAG, therefore, emerges as a critical component in the evolution of AI, ensuring that interactions with these models are not only natural and human-like but also informative and reliable.

Implementing a Retrieval Augmented Generation system involves integrating several key components, each contributing to the efficiency and effectiveness of the final system. The core element is the Large Language Model, which is responsible for generating human-like responses. Complementing this is the Vector Store, a specialized database that holds embeddings of textual data, enabling rapid and accurate information retrieval. The Vector Store Retriever acts as a search engine, fetching relevant documents by comparing vector similarities. Before any data can be stored or retrieved, it must be converted into a compatible format through an Embedder, which transforms text into vector representations. The process begins with a user's query or statement, captured by the Prompt, setting the stage for retrieval and generation. The Document Loader plays a crucial role in importing and processing large volumes of data, while the Document Chunker breaks this data into manageable segments. Finally, the User Input tool captures the initial query from the end-user, triggering the entire RAG process.

deci-langchain-rag-featured-1024x576

The RAG system is designed to augment LLMs with contextually relevant and factually accurate information, ensuring high-quality, relevant content generation. It comprises several subsystems, each fulfilling a specific function within the overall process. These subsystems are the Index, Retrieval, and Augment systems.

  • Index System: This is where the data preparation and organization occur. It involves loading and chunking documents, converting them into vector representations, and then storing these embeddings for future retrieval. Retrieval System: In this phase, the system fetches the most pertinent information in response to a user's query. It captures the query, transforms it into a vector, and then conducts a vector search to find the most relevant documents.
  • Augment System: This subsystem enhances the input prompt for the LLM with the retrieved context. It merges the initial prompt with the retrieved information, providing a rich and informed input for the LLM, which then generates an appropriate response. RAG systems represent a significant advancement in AI, merging the creative and intuitive aspects of generative models with the precision and knowledge base of retrieval systems. This synergy not only improves the quality of generated content but also extends the applicability of LLMs across a wider range of tasks, making them more practical and useful in real-world scenarios.

Hugging Face models on AWS AI Accelerators

image

Source: https://youtu.be/66JUlAA8nOU

Developer Tools

The Forbes present a technology stack leveraging avrious tools, models and frameworks for developing Generative AI.

Screenshot 2023-12-30 115145

As of December, 2023, we show the most used tool sets in generative AI development below.

52307a3c-6727-4ca5-a4da-208969e7b833_1944x1090

Chatbots

  1. ChatGPT - ChatGPT by OpenAI is a large language model that interacts in a conversational way.
  2. Bing Chat - A conversational AI language model powered by Microsoft Bing.
  3. Bard - An experimental AI chatbot by Google, powered by the LaMDA model.
  4. Character.AI - Character.AI lets you create characters and chat to them.
  5. ChatPDF - Chat with any PDF.
  6. ChatSonic - An AI-powered assistant that enables text and image creation.

References

  1. https://en.wikipedia.org/wiki/Generative_artificial_intelligence
  2. https://en.wikipedia.org/wiki/Large_language_model
  3. https://github.com/steven2358/awesome-generative-ai
  4. https://www.turing.com/resources/generative-ai-tools
  5. https://aimagazine.com/top10/top-10-generative-ai-tools
  6. https://www.linkedin.com/pulse/generative-ai-landscape-2023-florian-belschner/
  7. https://www.forbes.com/sites/konstantinebuhler/2023/04/11/ai-50-2023-generative-ai-trends/?sh=3e21848d7c0e
  8. https://www.gartner.com/en/articles/what-s-new-in-artificial-intelligence-from-the-2023-gartner-hype-cycle
  9. https://www.aitidbits.ai/p/most-used-tools
  10. https://clickup.com/blog/ai-tools/
  11. https://www.linkedin.com/pulse/aiaa-alternative-intelligence-alien-augmented-data-azamat-abdoullaev/
  12. https://www.analyticsvidhya.com/blog/2023/09/evaluation-of-generative-ai-models-and-search-use-case/
  13. https://blog.gopenai.com/a-deep-dive-into-a16z-emerging-llm-app-stack-playgrounds-and-app-hosting-bf2c9fe7cf18
  14. https://www.linkedin.com/pulse/emerging-architectures-large-language-models-data-science-dojo/
  15. https://www.insightpartners.com/ideas/llmops-mlops-what-you-need-to-know/
  16. https://deci.ai/blog/retrieval-augmented-generation-using-langchain/
  17. https://www.linkedin.com/pulse/impact-llms-evolving-data-ml-stack-apoorva-pandhi-gnxcc/

About

This repo contains the curated list of tools for generative AI

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published