On this repo you will have three folders:
slides
- that contains the slides of the presentation;data
- that contains the data and the R script to put it on a databasechat
- that contains a simple chat interface built with svelte to allow you to speak with your documents.
- To make everyone's life easier, I did something that you SHOULD never do in a real project: I put the database credentials on the code. This is a huge security flaw, but I did it to make it easier for you to run the code. Please, never do this in a real project.
- Keep in mind that everytime you run something with MY KEY you are using my money. If you can afford it, please change it into a key of yours. If not, use it, but be aware that I can run out of money and I might not have money to get back to Portugal 🇵🇹
drop function if exists public.match_maker (
query_embedding VECTOR (1536),
similarity_threshold float,
match_count int
);
create
or replace function public.match_maker (
query_embedding VECTOR (1536),
similarity_threshold float,
match_count int
) returns table (id bigint, text text, similarity float) language sql as $$ SELECT id, text, 1 - (schedule.embedding <-> query_embedding) AS similarity FROM schedule WHERE 1 - (schedule.embedding <-> query_embedding) > similarity_threshold ORDER BY schedule.embedding <-> query_embedding LIMIT match_count; $$;
To run the chat, you will need to have node installed on your machine. If you don't have it, you can download it here.
After you have node installed, you can run the following commands on your terminal:
cd chat
npm install
npm run dev
After that, you will be able to access the chat on your browser.
To put the data on the database, you will need to have R installed on your machine. If you don't have it, you can download it here.
After you have R installed, you can run the following commands on your terminal:
cd data
Rscript data.R
Or, you can open the data.R
file on RStudio and run the script.
After that, you will have the data on the database and you will be able to chat with your documents.
You can actually use any language to do an API call to OpenAI and put the data on the database.
You can use any database you want. You just need to change the connection string on the data.R
file and learn how to do a similarity search on your database. If you are using PostgreSQL, you can use the pg_similarity
extension to do so.
You can use any other API that does the same thing. You just need to change the API call on the data.R
file. There are several other APIs that do the same thing.
You can use any other frontend framework you want. You just need to change the code on the chat
folder.
OpenAI API is kinda language agnostic, so you can use any language you want.
It will cost you more money. Be aware of that.
Always double-check the results. AI is not perfect and it can make mistakes. Also, you better spend some time refining your initial prompt to get better results. It's always a good idea to have the model identify, for example, the page of the document that the answer is in, so that you can double-check it.
I would not recommend it. This is a simple example and it has a lot of things you should not do in a real project. But thats something you could fix easily. There are two main things you should be aware of:
- The model can ruin your reputation. We all have seen when GPT goes crazy and starts saying things that are not true. You should always double-check the results. You cannot do that if the model is talking with your readers.
- People might start use the model to do things that are not on the documents. Everytime I see an LLM on a website I always ask it to write me a script in Python to solve my homework. It usually does it. You should be aware of that.
- The price of the APIcan get out of control. If you have a lot of readers, you might not be able to afford it.