LLMOps: Feature | Inference | Finetune - 3Pipeline Architecture for LLM Based RAG Application on Tech News Architecture
Transform your private LLM into an expert by utilizing a carefully curated dataset leveraging state-of-the-art GPT-4 and then fine-tuning with LLama2 7B.
Automated pipelines for feature store, inference and finetuning for making your compact and private LLM (LLama2 7B) an expert on Technology. It leverages GPT-4 to curate Q/A dataset whcih is then used to finetune the LLama2 and then down the line Q/A extraction is stopped once we have optimal responses from Llama2.
There are total 3 apps in this project which will be deployed on Beam.
- Feature app: It will run as scheduler everyday 9AM. It contains ETL to grab news articles from Aylien API the using Vectara it chunks, embeds, and loads in the vectorstore. Second part of this is to use a GPT-4 to generate 5 questions and answer pair from each article and save it onto Beam storage volume as a csv file.
- Inference app: On a streamlit chat UI, user can ask questions which goes as an input input for Vectara to embed and search relevant chunks from all the news articles by cosine similarity. Llama2 7B is hosted for inference on Beam as a restful API which is then called to synthesize the final response.
- Training app: This is again deployed as a scheduler to be ran monthly. It uses PEFT LoRA and huggingface transformer library for finetuning, parameters and prompter is same as used for Alpaca.
To setup you need to get following Tokens and API keys and create a .env file as below and save for all the pipelines
AYLIEN_USERNAME=
AYLIEN_PASSWORD=
AYLIEN_APPID=
OPENAI_API=
VECTARA_CUSTOMER_ID=
VECTARA_CORPUS_ID=
VECTARA_API_KEY=
Beam_key=
HF_key
Deploy each app one by one on Beam on WSL. You need beam account first and here are detail intsallation guaidance: https://docs.beam.cloud/getting-started/installation
- Feature pipeline: beam deploy app.py:FeaturePipeline
- Inference pipeline: here we have to deploy LLM as rest api - cd inside the llama2 folder and run beam deploy app.py:generate
- Training pipeline: beam deploy app.py:train_model
Then inside the inference pipeline you can start the chatbot UI by: streamlit run app.py
Medum article: https://medium.com/@sarmadafzalj/rag-v-fine-tuning-why-not-harness-the-power-of-both-91c49a4744da
- Author: Sarmad Afzal
- Linkedin: https://www.linkedin.com/in/sarmadafzal/
- Github: https://github.com/sarmadafzalj
- Youtube: https://www.youtube.com/@sarmadafzalj