diff --git a/.github/workflows/pytest.yaml b/.github/workflows/pytest.yaml index 93d0f0832..21d2e1985 100644 --- a/.github/workflows/pytest.yaml +++ b/.github/workflows/pytest.yaml @@ -64,6 +64,7 @@ jobs: run: make test-api-unit env: LFAI_RUN_REPEATER_TESTS: true + DEV: true integration: runs-on: ai-ubuntu-big-boy-8-core diff --git a/packages/api/chart/values.yaml b/packages/api/chart/values.yaml index 65b397e46..4c217ba8a 100644 --- a/packages/api/chart/values.yaml +++ b/packages/api/chart/values.yaml @@ -25,6 +25,8 @@ api: value: "*.toml" - name: DEFAULT_EMBEDDINGS_MODEL value: "text-embeddings" + - name: DEV + value: "false" - name: PORT value: "8080" - name: SUPABASE_URL diff --git a/packages/api/values/registry1-values.yaml b/packages/api/values/registry1-values.yaml index d269c6415..4bd35ee39 100644 --- a/packages/api/values/registry1-values.yaml +++ b/packages/api/values/registry1-values.yaml @@ -16,6 +16,8 @@ api: value: "*.toml" - name: DEFAULT_EMBEDDINGS_MODEL value: "###ZARF_VAR_DEFAULT_EMBEDDINGS_MODEL###" + - name: DEV + value: "###ZARF_VAR_DEV###" - name: PORT value: "8080" - name: SUPABASE_URL diff --git a/packages/api/values/upstream-values.yaml b/packages/api/values/upstream-values.yaml index 6d867260e..ef2dcdad9 100644 --- a/packages/api/values/upstream-values.yaml +++ b/packages/api/values/upstream-values.yaml @@ -14,6 +14,8 @@ api: value: "*.toml" - name: DEFAULT_EMBEDDINGS_MODEL value: "###ZARF_VAR_DEFAULT_EMBEDDINGS_MODEL###" + - name: DEV + value: "###ZARF_VAR_DEV###" - name: PORT value: "8080" - name: SUPABASE_URL diff --git a/packages/api/zarf.yaml b/packages/api/zarf.yaml index 4fa6c59f2..92b3c8123 100644 --- a/packages/api/zarf.yaml +++ b/packages/api/zarf.yaml @@ -16,6 +16,9 @@ variables: description: "Flag to expose the OpenAPI schema for debugging." - name: DEFAULT_EMBEDDINGS_MODEL default: "text-embeddings" + - name: DEV + default: "false" + description: "Flag to enable development endpoints." components: - name: leapfrogai-api diff --git a/src/leapfrogai_api/README.md b/src/leapfrogai_api/README.md index eec4dd0c6..214c986a9 100644 --- a/src/leapfrogai_api/README.md +++ b/src/leapfrogai_api/README.md @@ -56,3 +56,72 @@ See the ["Access" section of the DEVELOPMENT.md](../../docs/DEVELOPMENT.md#acces ### Tests See the [tests directory documentation](../../tests/README.md) for more details. + +### Reranking Configuration + +The LeapfrogAI API includes a Retrieval Augmented Generation (RAG) pipeline for enhanced question answering. This section details how to configure its reranking options. All RAG configurations are managed through the `/leapfrogai/v1/rag/configure` API endpoint. + +#### 1. Enabling/Disabling Reranking + +Reranking improves the accuracy and relevance of RAG responses. You can enable or disable it using the `enable_reranking` parameter: + +* **Enable Reranking:** Send a PATCH request to `/leapfrogai/v1/rag/configure` with the following JSON payload: + +```json +{ + "enable_reranking": true +} +``` + +* **Disable Reranking:** Send a PATCH request with: + +```json +{ + "enable_reranking": false +} +``` + +#### 2. Selecting a Reranking Model + +Multiple reranking models are supported, each offering different performance characteristics. Choose your preferred model using the `ranking_model` parameter. Ensure you've installed any necessary Python dependencies for your chosen model (see the [rerankers library documentation](https://github.com/AnswerDotAI/rerankers) on dependencies). + +* **Supported Models:** The system supports several models, including (but not limited to) `flashrank`, `rankllm`, `cross-encoder`, and `colbert`. Refer to the [rerankers library documentation](https://github.com/AnswerDotAI/rerankers) for a complete list and details on their capabilities. + +* **Model Selection:** Use a PATCH request to `/leapfrogai/v1/rag/configure` with the desired model: + +```json +{ + "enable_reranking": true, // Reranking must be enabled + "ranking_model": "rankllm" // Or another supported model +} +``` + +#### 3. Adjusting the Number of Results Before Reranking (`rag_top_k_when_reranking`) + +This parameter sets the number of top results retrieved from the vector database *before* the reranking process begins. A higher value increases the diversity of candidates considered for reranking but also increases processing time. A lower value can lead to missing relevant results if not carefully chosen. This setting is only relevant when reranking is enabled. + +* **Configuration:** Use a PATCH request to `/leapfrogai/v1/rag/configure` to set this value: + +```json +{ + "enable_reranking": true, + "ranking_model": "flashrank", + "rag_top_k_when_reranking": 150 // Adjust this value as needed +} +``` + +#### 4. Retrieving the Current RAG Configuration + +To check the current RAG configuration (including reranking status, model, and `rag_top_k_when_reranking`), send a GET request to `/leapfrogai/v1/rag/configure`. The response will be a JSON object containing all the current settings. + +#### 5. Example Configuration Flow + +1. **Initial Setup:** Start with reranking enabled using the default `flashrank` model and a `rag_top_k_when_reranking` value of 100. + +2. **Experiment with Models:** Test different reranking models (`rankllm`, `colbert`, etc.) by changing the `ranking_model` parameter and observing the impact on response quality. Adjust `rag_top_k_when_reranking` as needed to find the optimal balance between diversity and performance. + +3. **Fine-tuning:** Once you identify a suitable model, fine-tune the `rag_top_k_when_reranking` parameter for optimal performance. Monitor response times and quality to determine the best setting. + +4. **Disabling Reranking:** If needed, disable reranking by setting `"enable_reranking": false`. + +Remember to always consult the [rerankers library documentation](https://github.com/AnswerDotAI/rerankers) for information on supported models and their specific requirements. The API documentation provides further details on request formats and potential error responses. diff --git a/src/leapfrogai_api/backend/rag/query.py b/src/leapfrogai_api/backend/rag/query.py index e5e0decce..bd0ae9bf6 100644 --- a/src/leapfrogai_api/backend/rag/query.py +++ b/src/leapfrogai_api/backend/rag/query.py @@ -1,11 +1,15 @@ """Service for querying the RAG model.""" +from rerankers.results import RankedResults from supabase import AClient as AsyncClient from langchain_core.embeddings import Embeddings from leapfrogai_api.backend.rag.leapfrogai_embeddings import LeapfrogAIEmbeddings from leapfrogai_api.data.crud_vector_content import CRUDVectorContent -from leapfrogai_api.typedef.vectorstores.search_types import SearchResponse +from leapfrogai_api.typedef.rag.rag_types import ConfigurationSingleton +from leapfrogai_api.typedef.vectorstores.search_types import SearchResponse, SearchItem from leapfrogai_api.backend.constants import TOP_K +from leapfrogai_api.utils.logging_tools import logger +from rerankers import Reranker # Allows for overwriting type of embeddings that will be instantiated embeddings_type: type[Embeddings] | type[LeapfrogAIEmbeddings] | None = ( @@ -22,7 +26,10 @@ def __init__(self, db: AsyncClient) -> None: self.embeddings = embeddings_type() async def query_rag( - self, query: str, vector_store_id: str, k: int = TOP_K + self, + query: str, + vector_store_id: str, + k: int = TOP_K, ) -> SearchResponse: """ Query the Vector Store. @@ -36,11 +43,70 @@ async def query_rag( SearchResponse: The search response from the vector store. """ + logger.debug("Beginning RAG query...") + # 1. Embed query vector = await self.embeddings.aembed_query(query) # 2. Perform similarity search + _k: int = k + if ConfigurationSingleton.get_instance().enable_reranking: + """Use the user specified top-k value unless reranking. + When reranking, use the reranking top-k value to get the initial results. + Then filter the list down later to just the k that the user has requested after reranking.""" + _k = ConfigurationSingleton.get_instance().rag_top_k_when_reranking + crud_vector_content = CRUDVectorContent(db=self.db) - return await crud_vector_content.similarity_search( - query=vector, vector_store_id=vector_store_id, k=k + results = await crud_vector_content.similarity_search( + query=vector, vector_store_id=vector_store_id, k=_k ) + + # 3. Rerank results + if ( + ConfigurationSingleton.get_instance().enable_reranking + and len(results.data) > 0 + ): + ranker = Reranker(ConfigurationSingleton.get_instance().ranking_model) + ranked_results: RankedResults = ranker.rank( + query=query, + docs=[result.content for result in results.data], + doc_ids=[result.id for result in results.data], + ) + results = rerank_search_response(results, ranked_results) + # Narrow down the results to the top-k value specified by the user + results.data = results.data[0:k] + + logger.debug("Ending RAG query...") + + return results + + +def rerank_search_response( + original_response: SearchResponse, ranked_results: RankedResults +) -> SearchResponse: + """ + Reorder the SearchResponse based on reranked results. + + Args: + original_response (SearchResponse): The original search response. + ranked_results (List[str]): List of ranked content strings. + + Returns: + SearchResponse: A new SearchResponse with reordered items. + """ + # Create a mapping of id to original SearchItem + content_to_item = {item.id: item for item in original_response.data} + + # Create new SearchItems based on reranked results + ranked_items = [] + for content in ranked_results.results: + if content.document.doc_id in content_to_item: + item: SearchItem = content_to_item[content.document.doc_id] + item.rank = content.rank + item.score = content.score + ranked_items.append(item) + + ranked_response = SearchResponse(data=ranked_items) + + # Create a new SearchResponse with reranked items + return ranked_response diff --git a/src/leapfrogai_api/main.py b/src/leapfrogai_api/main.py index 85822f7f3..f9b3682d4 100644 --- a/src/leapfrogai_api/main.py +++ b/src/leapfrogai_api/main.py @@ -14,6 +14,7 @@ from leapfrogai_api.routers.leapfrogai import models as lfai_models from leapfrogai_api.routers.leapfrogai import vector_stores as lfai_vector_stores from leapfrogai_api.routers.leapfrogai import count as lfai_token_count +from leapfrogai_api.routers.leapfrogai import rag as lfai_rag from leapfrogai_api.routers.openai import ( assistants, audio, @@ -81,6 +82,8 @@ async def validation_exception_handler(request, exc): app.include_router(messages.router) app.include_router(runs_steps.router) app.include_router(lfai_vector_stores.router) +if os.environ.get("DEV"): + app.include_router(lfai_rag.router) app.include_router(lfai_token_count.router) app.include_router(lfai_models.router) # This should be at the bottom to prevent it preempting more specific runs endpoints diff --git a/src/leapfrogai_api/pyproject.toml b/src/leapfrogai_api/pyproject.toml index a18f6422f..ea9b8f7e4 100644 --- a/src/leapfrogai_api/pyproject.toml +++ b/src/leapfrogai_api/pyproject.toml @@ -26,6 +26,7 @@ dependencies = [ "postgrest==0.16.11", # required by supabase, bug when using previous versions "openpyxl == 3.1.5", "psutil == 6.0.0", + "rerankers[flashrank] == 0.5.3" ] requires-python = "~=3.11" diff --git a/src/leapfrogai_api/routers/leapfrogai/rag.py b/src/leapfrogai_api/routers/leapfrogai/rag.py new file mode 100644 index 000000000..3b61b616e --- /dev/null +++ b/src/leapfrogai_api/routers/leapfrogai/rag.py @@ -0,0 +1,56 @@ +"""LeapfrogAI endpoints for RAG.""" + +from fastapi import APIRouter +from leapfrogai_api.typedef.rag.rag_types import ( + ConfigurationSingleton, + ConfigurationPayload, +) +from leapfrogai_api.routers.supabase_session import Session +from leapfrogai_api.utils.logging_tools import logger + +router = APIRouter(prefix="/leapfrogai/v1/rag", tags=["leapfrogai/rag"]) + + +@router.patch("/configure") +async def configure(session: Session, configuration: ConfigurationPayload) -> None: + """ + Configures the RAG settings at runtime. + + Args: + session (Session): The database session. + configuration (Configuration): The configuration to update. + """ + + # We set the class variable to update the configuration globally + ConfigurationSingleton._instance = ConfigurationSingleton.get_instance().copy( + update=configuration.dict(exclude_none=True) + ) + + +@router.get("/configure") +async def get_configuration(session: Session) -> ConfigurationPayload: + """ + Retrieves the current RAG configuration. + + Args: + session (Session): The database session. + + Returns: + Configuration: The current RAG configuration. + """ + + instance = ConfigurationSingleton.get_instance() + + # Create a new dictionary with only the relevant attributes + config_dict = { + key: value + for key, value in instance.__dict__.items() + if not key.startswith("_") # Exclude private attributes + } + + # Create a new ConfigurationPayload instance with the filtered dictionary + new_configuration = ConfigurationPayload(**config_dict) + + logger.info(f"The current configuration has been set to {new_configuration}") + + return new_configuration diff --git a/src/leapfrogai_api/routers/leapfrogai/vector_stores.py b/src/leapfrogai_api/routers/leapfrogai/vector_stores.py index 09f8f4a77..5251440c1 100644 --- a/src/leapfrogai_api/routers/leapfrogai/vector_stores.py +++ b/src/leapfrogai_api/routers/leapfrogai/vector_stores.py @@ -33,9 +33,7 @@ async def search( """ query_service = QueryService(db=session) return await query_service.query_rag( - query=query, - vector_store_id=vector_store_id, - k=k, + query=query, vector_store_id=vector_store_id, k=k ) diff --git a/src/leapfrogai_api/typedef/rag/__init__.py b/src/leapfrogai_api/typedef/rag/__init__.py new file mode 100644 index 000000000..65c2e26cd --- /dev/null +++ b/src/leapfrogai_api/typedef/rag/__init__.py @@ -0,0 +1,3 @@ +from .rag_types import ( + ConfigurationSingleton as ConfigurationSingleton, +) diff --git a/src/leapfrogai_api/typedef/rag/rag_types.py b/src/leapfrogai_api/typedef/rag/rag_types.py new file mode 100644 index 000000000..17fe6601c --- /dev/null +++ b/src/leapfrogai_api/typedef/rag/rag_types.py @@ -0,0 +1,40 @@ +from typing import Optional + +from pydantic import BaseModel, Field + + +class ConfigurationSingleton: + """Singleton manager for ConfigurationPayload.""" + + _instance = None + + @classmethod + def get_instance(cls): + if cls._instance is None: + cls._instance = ConfigurationPayload() + cls._instance.enable_reranking = True + cls._instance.rag_top_k_when_reranking = 100 + cls._instance.ranking_model = "flashrank" + return cls._instance + + +class ConfigurationPayload(BaseModel): + """Response for RAG configuration.""" + + enable_reranking: Optional[bool] = Field( + default=None, + examples=[True, False], + description="Enables reranking for RAG queries", + ) + # More model info can be found here: + # https://github.com/AnswerDotAI/rerankers?tab=readme-ov-file + # https://pypi.org/project/rerankers/ + ranking_model: Optional[str] = Field( + default=None, + description="What model to use for reranking. Some options may require additional python dependencies.", + examples=["flashrank", "rankllm", "cross-encoder", "colbert"], + ) + rag_top_k_when_reranking: Optional[int] = Field( + default=None, + description="The top-k results returned from the RAG call before reranking", + ) diff --git a/src/leapfrogai_api/typedef/vectorstores/search_types.py b/src/leapfrogai_api/typedef/vectorstores/search_types.py index d8d2a2d13..ea69df1fe 100644 --- a/src/leapfrogai_api/typedef/vectorstores/search_types.py +++ b/src/leapfrogai_api/typedef/vectorstores/search_types.py @@ -1,3 +1,5 @@ +from typing import Optional + from pydantic import BaseModel, Field @@ -25,6 +27,14 @@ class SearchItem(BaseModel): similarity: float = Field( ..., description="Similarity score of this item to the query." ) + rank: Optional[int] = Field( + default=None, + description="The rank of this search item after ranking has occurred.", + ) + score: Optional[float] = Field( + default=None, + description="The score of this search item after ranking has occurred.", + ) class SearchResponse(BaseModel): diff --git a/src/leapfrogai_api/utils/logging_tools.py b/src/leapfrogai_api/utils/logging_tools.py new file mode 100644 index 000000000..aa2448288 --- /dev/null +++ b/src/leapfrogai_api/utils/logging_tools.py @@ -0,0 +1,12 @@ +import os +import logging +from dotenv import load_dotenv + +load_dotenv() + +logging.basicConfig( + level=os.getenv("LFAI_LOG_LEVEL", logging.INFO), + format="%(name)s: %(asctime)s | %(levelname)s | %(filename)s:%(lineno)s >>> %(message)s", +) + +logger = logging.getLogger(__name__) diff --git a/src/leapfrogai_evals/pyproject.toml b/src/leapfrogai_evals/pyproject.toml index 1974da81a..9726c51c0 100644 --- a/src/leapfrogai_evals/pyproject.toml +++ b/src/leapfrogai_evals/pyproject.toml @@ -8,7 +8,7 @@ version = "0.13.1" dependencies = [ "deepeval == 1.3.0", - "openai == 1.42.0", + "openai == 1.45.0", "tqdm == 4.66.5", "python-dotenv == 1.0.1", "seaborn == 0.13.2", @@ -16,7 +16,8 @@ dependencies = [ "huggingface-hub == 0.24.6", "anthropic ==0.34.2", "instructor ==1.4.3", - "pyPDF2 == 3.0.1" + "pyPDF2 == 3.0.1", + "python-dotenv == 1.0.1" ] requires-python = "~=3.11" readme = "README.md" diff --git a/tests/integration/api/test_rag_files.py b/tests/integration/api/test_rag_files.py index 45f832418..7520ddbcc 100644 --- a/tests/integration/api/test_rag_files.py +++ b/tests/integration/api/test_rag_files.py @@ -1,9 +1,13 @@ import os +from typing import Optional + +import requests from openai.types.beta.threads.text import Text import pytest from tests.utils.data_path import data_path -from tests.utils.client import client_config_factory +from leapfrogai_api.typedef.rag.rag_types import ConfigurationPayload +from tests.utils.client import client_config_factory, get_leapfrogai_api_url_base def make_test_assistant(client, model, vector_store_id): @@ -77,3 +81,66 @@ def test_rag_needle_haystack(): for a in message_content.annotations: print(a.text) + + +def configure_rag( + enable_reranking: bool, + ranking_model: str, + rag_top_k_when_reranking: int, +): + """ + Configures the RAG settings. + + Args: + enable_reranking: Whether to enable reranking. + ranking_model: The ranking model to use. + rag_top_k_when_reranking: The top-k results to return before reranking. + """ + url = f"{get_leapfrogai_api_url_base()}/leapfrogai/v1/rag/configure" + configuration = ConfigurationPayload( + enable_reranking=enable_reranking, + ranking_model=ranking_model, + rag_top_k_when_reranking=rag_top_k_when_reranking, + ) + + try: + response = requests.patch(url, json=configuration.model_dump()) + response.raise_for_status() # Raise an exception for bad status codes (4xx or 5xx) + print("RAG configuration updated successfully.") + except requests.exceptions.RequestException as e: + print(f"Error configuring RAG: {e}") + + +def get_rag_configuration() -> Optional[ConfigurationPayload]: + """ + Retrieves the current RAG configuration. + + Args: + base_url: The base URL of the API. + + Returns: + The RAG configuration, or None if there was an error. + """ + url = f"{get_leapfrogai_api_url_base()}/leapfrogai/v1/rag/configure" + + try: + response = requests.get(url) + response.raise_for_status() + config = ConfigurationPayload.model_validate_json(response.text) + print(f"Current RAG configuration: {config}") + return config + except requests.exceptions.RequestException as e: + print(f"Error getting RAG configuration: {e}") + return None + + +@pytest.mark.skipif( + os.environ.get("LFAI_RUN_NIAH_TESTS") != "true", + reason="LFAI_RUN_NIAH_TESTS envvar was not set to true", +) +def test_rag_needle_haystack_with_reranking(): + configure_rag(True, "flashrank", 100) + config_result = get_rag_configuration() + assert config_result is not None + assert config_result.enable_reranking is True + test_rag_needle_haystack() diff --git a/tests/pytest/leapfrogai_api/test_api.py b/tests/pytest/leapfrogai_api/test_api.py index 724b0dc58..ec6460fda 100644 --- a/tests/pytest/leapfrogai_api/test_api.py +++ b/tests/pytest/leapfrogai_api/test_api.py @@ -32,6 +32,7 @@ ) TEXT_INPUT_LEN = len(TEXT_INPUT) + ######################### ######################### @@ -147,6 +148,7 @@ def test_routes(): "/openai/v1/files": ["POST"], "/openai/v1/assistants": ["POST"], "/leapfrogai/v1/count/tokens": ["POST"], + "/leapfrogai/v1/rag/configure": ["GET", "PATCH"], } openai_routes = [ @@ -196,10 +198,14 @@ def test_routes(): ] actual_routes = app.routes - for route in actual_routes: - if hasattr(route, "path") and route.path in expected_routes: - assert route.methods == set(expected_routes[route.path]) - del expected_routes[route.path] + for expected_route in expected_routes: + matching_routes = {expected_route: []} + for actual_route in actual_routes: + if hasattr(actual_route, "path") and expected_route == actual_route.path: + matching_routes[actual_route.path].extend(actual_route.methods) + assert set(expected_routes[expected_route]) <= set( + matching_routes[expected_route] + ) for route, name, methods in openai_routes: found = False @@ -214,8 +220,6 @@ def test_routes(): break assert found, f"Missing route: {route}, {name}, {methods}" - assert len(expected_routes) == 0 - def test_healthz(): """Test the healthz endpoint.""" @@ -535,3 +539,55 @@ def test_token_count(dummy_auth_middleware): assert "token_count" in response_data assert isinstance(response_data["token_count"], int) assert response_data["token_count"] == len(input_text) + + +@pytest.mark.skipif( + os.environ.get("LFAI_RUN_REPEATER_TESTS") != "true" + or os.environ.get("DEV") != "true", + reason="LFAI_RUN_REPEATER_TESTS envvar was not set to true", +) +def test_configure(dummy_auth_middleware): + """Test the RAG configuration endpoints.""" + with TestClient(app) as client: + rag_configuration_request = { + "enable_reranking": True, + "ranking_model": "rankllm", + "rag_top_k_when_reranking": 50, + } + response = client.patch( + "/leapfrogai/v1/rag/configure", json=rag_configuration_request + ) + assert response.status_code == 200 + + response = client.get("/leapfrogai/v1/rag/configure") + assert response.status_code == 200 + response_data = response.json() + assert "enable_reranking" in response_data + assert "ranking_model" in response_data + assert "rag_top_k_when_reranking" in response_data + assert isinstance(response_data["enable_reranking"], bool) + assert isinstance(response_data["ranking_model"], str) + assert isinstance(response_data["rag_top_k_when_reranking"], int) + assert response_data["enable_reranking"] is True + assert response_data["ranking_model"] == "rankllm" + assert response_data["rag_top_k_when_reranking"] == 50 + + # Update only some of the configs to see if the existing ones persist + rag_configuration_request = {"ranking_model": "flashrank"} + response = client.patch( + "/leapfrogai/v1/rag/configure", json=rag_configuration_request + ) + assert response.status_code == 200 + + response = client.get("/leapfrogai/v1/rag/configure") + assert response.status_code == 200 + response_data = response.json() + assert "enable_reranking" in response_data + assert "ranking_model" in response_data + assert "rag_top_k_when_reranking" in response_data + assert isinstance(response_data["enable_reranking"], bool) + assert isinstance(response_data["ranking_model"], str) + assert isinstance(response_data["rag_top_k_when_reranking"], int) + assert response_data["enable_reranking"] is True + assert response_data["ranking_model"] == "flashrank" + assert response_data["rag_top_k_when_reranking"] == 50