Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add DataStax Astra DB vector store driver #1022

Closed

Conversation

hemidactylus
Copy link
Contributor

@hemidactylus hemidactylus commented Jul 26, 2024

Describe your changes

Adds an AstraDBVectorStoreDriver class, plus:

  • mkdocs.yaml
  • changelog
  • a new end-to-end example on the docs ("query a web page" with Astra DB for vector store)
  • a specific new extra in the pyproject.yaml
  • unit test
  • integration test (requires secrets/connection params to an Astra database)

Notes

I can run the integration testing on my machine (all green). Depending on Griptape's policy on integration testing, I can share secrets to a dedicated CI database or similar, just let me know if that is a valid option.

Also, I noticed that the number of returned vectors in the query method is generally called count (as I did and as other drivers do); but at least one example in the docs seems to refer to it as top_n. Is it preferred that the method accepts both parameter names, perhaps?

Issue ticket number and link

Closes #1021


📚 Documentation preview 📚: https://griptape--1022.org.readthedocs.build//1022/

@collindutter
Copy link
Member

Thanks for the PR @hemidactylus! Will review this in detail next week.

@collindutter collindutter self-requested a review July 26, 2024 22:12
Copy link

codecov bot commented Jul 26, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

📢 Thoughts on this report? Let us know!

@hemidactylus
Copy link
Contributor Author

Thank you so much, @collindutter ! Meanwhile I should have addressed the test coverage issues reported above. Cheers!

Copy link
Member

@collindutter collindutter left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great work! Thank you for putting the time into creating consistent documentation and passing tests -- it is greatly appreciated :)

griptape/drivers/vector/astra_db_vector_store_driver.py Outdated Show resolved Hide resolved
griptape/drivers/vector/astra_db_vector_store_driver.py Outdated Show resolved Hide resolved
griptape/drivers/vector/astra_db_vector_store_driver.py Outdated Show resolved Hide resolved
griptape/drivers/vector/astra_db_vector_store_driver.py Outdated Show resolved Hide resolved
griptape/drivers/vector/astra_db_vector_store_driver.py Outdated Show resolved Hide resolved
- simplify the 'query webpage Astra' markdown page
- add env.vars to the CI action yaml
- remove Griptape version detection
- remove mistakenly-added logging level setting
- moved the "indexing" value to a class attribute
- reinstated `dimensions` as a required parameter
- removed kwargs warnings from methods (and implications thereof)
- replaced implicit-bool checks with "is None" checks in a few places
- (testing) move sample docs and Entries to being pytest fixtures
- (testing) move utility function `_descore_entry` into a class method
@hemidactylus
Copy link
Contributor Author

(replying about the renaming of the mock_output parameter)

Sounds good to me -- renamed.

Only, I had to take out the Factory layer: the mock_output has to be a function, so I went for

default=lambda chunk: [0, 1]

(alternatively, I guess default=Factory(lambda: lambda chunk: [0, 1]) would work as well - but I suppose it's not necessary since the "innermost" lambda does not look like a 'dangerous' mutable thing. I may be mistaken though)

@collindutter
Copy link
Member

Great work again @hemidactylus! And thanks for working with me through the requested changes.

Just need to do some testing on a live AstraDb instance and then we'll be good to merge 🥳

@hemidactylus
Copy link
Contributor Author

That's great news @collindutter ! Do you want me to assist you with the (actually very simple) setup for integration testing? Please do let me know!

@collindutter
Copy link
Member

I appreciate the offer! I should be good, but will reach out if I have any questions.

@collindutter
Copy link
Member

@hemidactylus I tested it out, everything worked great! I'm recreating the PR in our repo so that integration tests can run with our secrets, but all the commits from your fork are still present.

Thanks again for the excellent integration.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add a vector store driver for DataStax Astra DB
2 participants