LLMOps: Feature | Inference | Finetune - 3Pipeline Architecture for LLM Based RAG Application on Tech News Architecture

Transform your private LLM into an expert by utilizing a carefully curated dataset leveraging state-of-the-art GPT-4 and then fine-tuning with LLama2 7B.

Application Overview

Every morning 9AM CST, fetch latest articles from Aylien tech news API. Chunk, embed and load into Vectara database and using OpenAI GPT-4 created a curated Q/A set for finetuning. Serve the on with Inference pipeline which query the vectors from Vectara database and synthesize with your privately hosted LLama2 7B. With a good amount of QA set generated with diverse articles run Fine-tuning pipeline every month and employ CI/CD to deploy updated model weights into production. Down the line in 6 months, the small LLama2 is your new expert with specialization in understanding Tech and its jargon.

Objective

Automated pipelines for feature store, inference and finetuning for making your compact and private LLM (LLama2 7B) an expert on Technology. It leverages GPT-4 to curate Q/A dataset whcih is then used to finetune the LLama2 and then down the line Q/A extraction is stopped once we have optimal responses from Llama2.

Architecture Diagram

Tech-Stack

Llama-Index

Framework for ingestion and RAG orchestration

Vectara

Vector database for storing and querying embeddings

Beam

Infrastructure with GPUs and storage volume

Llama2

LLM for RAG and finetuning

Streamlit

Chatbot user interface

Quantexa

Data source for tech news articles

OpenAI

GPT-4 for creating Q/A dataset for finetuning

How to Setup

There are total 3 apps in this project which will be deployed on Beam.

Feature app: It will run as scheduler everyday 9AM. It contains ETL to grab news articles from Aylien API the using Vectara it chunks, embeds, and loads in the vectorstore. Second part of this is to use a GPT-4 to generate 5 questions and answer pair from each article and save it onto Beam storage volume as a csv file.
Inference app: On a streamlit chat UI, user can ask questions which goes as an input input for Vectara to embed and search relevant chunks from all the news articles by cosine similarity. Llama2 7B is hosted for inference on Beam as a restful API which is then called to synthesize the final response.
Training app: This is again deployed as a scheduler to be ran monthly. It uses PEFT LoRA and huggingface transformer library for finetuning, parameters and prompter is same as used for Alpaca.

To setup you need to get following Tokens and API keys and create a .env file as below and save for all the pipelines

AYLIEN_USERNAME=
AYLIEN_PASSWORD=
AYLIEN_APPID=

OPENAI_API=
VECTARA_CUSTOMER_ID=
VECTARA_CORPUS_ID=
VECTARA_API_KEY=

Beam_key=

HF_key

Deploy each app one by one on Beam on WSL. You need beam account first and here are detail intsallation guaidance: https://docs.beam.cloud/getting-started/installation

Feature pipeline: beam deploy app.py:FeaturePipeline
Inference pipeline: here we have to deploy LLM as rest api - cd inside the llama2 folder and run beam deploy app.py:generate
Training pipeline: beam deploy app.py:train_model

Then inside the inference pipeline you can start the chatbot UI by: streamlit run app.py

Medum article: https://medium.com/@sarmadafzalj/rag-v-fine-tuning-why-not-harness-the-power-of-both-91c49a4744da

Reach out to me

Author: Sarmad Afzal
Linkedin: https://www.linkedin.com/in/sarmadafzal/
Github: https://github.com/sarmadafzalj
Youtube: https://www.youtube.com/@sarmadafzalj

Name		Name	Last commit message	Last commit date
Latest commit History 22 Commits
FeaturePipeline		FeaturePipeline
FinetuningPipeline		FinetuningPipeline
Images		Images
InferencePipeline		InferencePipeline
.gitignore		.gitignore
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

LLMOps: Feature | Inference | Finetune - 3Pipeline Architecture for LLM Based RAG Application on Tech News Architecture

Application Overview

Objective

Architecture Diagram

Tech-Stack

How to Setup

Reach out to me

About

Releases

Packages

Languages

sarmadafzalj/LLMOps-3pipelines-Batch_Ingestion-Finetuning-And-RAG_Inference

Folders and files

Latest commit

History

Repository files navigation

LLMOps: Feature | Inference | Finetune - 3Pipeline Architecture for LLM Based RAG Application on Tech News Architecture

Application Overview

Objective

Architecture Diagram

Tech-Stack

How to Setup

Reach out to me

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages