Skip to content

Transform your private LLM into an expert by utilizing a carefully curated dataset leveraging state-of-the-art GPT-4 and then fine-tuning with LLama2 7B.

Notifications You must be signed in to change notification settings

sarmadafzalj/LLMOps-3pipelines-Batch_Ingestion-Finetuning-And-RAG_Inference

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

22 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

LLMOps: Feature | Inference | Finetune - 3Pipeline Architecture for LLM Based RAG Application on Tech News Architecture

Transform your private LLM into an expert by utilizing a carefully curated dataset leveraging state-of-the-art GPT-4 and then fine-tuning with LLama2 7B.

Your Image

Application Overview

Every morning 9AM CST, fetch latest articles from Aylien tech news API. Chunk, embed and load into Vectara database and using OpenAI GPT-4 created a curated Q/A set for finetuning. Serve the on with Inference pipeline which query the vectors from Vectara database and synthesize with your privately hosted LLama2 7B. With a good amount of QA set generated with diverse articles run Fine-tuning pipeline every month and employ CI/CD to deploy updated model weights into production. Down the line in 6 months, the small LLama2 is your new expert with specialization in understanding Tech and its jargon.

Objective

Automated pipelines for feature store, inference and finetuning for making your compact and private LLM (LLama2 7B) an expert on Technology. It leverages GPT-4 to curate Q/A dataset whcih is then used to finetune the LLama2 and then down the line Q/A extraction is stopped once we have optimal responses from Llama2.

Architecture Diagram

Your Image

Tech-Stack

Your Image Your Image Your Image Your Image Your Image Your Image Your Image

Llama-Index

Framework for ingestion and RAG orchestration

Vectara

Vector database for storing and querying embeddings

Beam

Infrastructure with GPUs and storage volume

Llama2

LLM for RAG and finetuning

Streamlit

Chatbot user interface

Quantexa

Data source for tech news articles

OpenAI

GPT-4 for creating Q/A dataset for finetuning

How to Setup

There are total 3 apps in this project which will be deployed on Beam.

  • Feature app: It will run as scheduler everyday 9AM. It contains ETL to grab news articles from Aylien API the using Vectara it chunks, embeds, and loads in the vectorstore. Second part of this is to use a GPT-4 to generate 5 questions and answer pair from each article and save it onto Beam storage volume as a csv file.
  • Inference app: On a streamlit chat UI, user can ask questions which goes as an input input for Vectara to embed and search relevant chunks from all the news articles by cosine similarity. Llama2 7B is hosted for inference on Beam as a restful API which is then called to synthesize the final response.
  • Training app: This is again deployed as a scheduler to be ran monthly. It uses PEFT LoRA and huggingface transformer library for finetuning, parameters and prompter is same as used for Alpaca.

To setup you need to get following Tokens and API keys and create a .env file as below and save for all the pipelines

AYLIEN_USERNAME=
AYLIEN_PASSWORD=
AYLIEN_APPID=

OPENAI_API=
VECTARA_CUSTOMER_ID=
VECTARA_CORPUS_ID=
VECTARA_API_KEY=

Beam_key=

HF_key

Deploy each app one by one on Beam on WSL. You need beam account first and here are detail intsallation guaidance: https://docs.beam.cloud/getting-started/installation

  • Feature pipeline: beam deploy app.py:FeaturePipeline
  • Inference pipeline: here we have to deploy LLM as rest api - cd inside the llama2 folder and run beam deploy app.py:generate
  • Training pipeline: beam deploy app.py:train_model

Then inside the inference pipeline you can start the chatbot UI by: streamlit run app.py

Medum article: https://medium.com/@sarmadafzalj/rag-v-fine-tuning-why-not-harness-the-power-of-both-91c49a4744da

Reach out to me


About

Transform your private LLM into an expert by utilizing a carefully curated dataset leveraging state-of-the-art GPT-4 and then fine-tuning with LLama2 7B.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages