Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Explore S3 caching options #15

Open
mjuric opened this issue Dec 12, 2019 · 0 comments
Open

Explore S3 caching options #15

mjuric opened this issue Dec 12, 2019 · 0 comments
Labels
enhancement New feature or request help wanted Extra attention is needed

Comments

@mjuric
Copy link
Member

mjuric commented Dec 12, 2019

We've talked about a use case where archives decide to keep datasets internally, but put up S3 API facade for remote access with AXS. E.g., imagine the data is physically in IPAC and MAST, but being analyzed at TACC. The question then is whether accesses to the datasets can transparently be cached where AXS is running, for faster repeated access.

Option 1: Spark seems to have recently added support for caching of remote datasets through Delta cache. It's not clear to me whether this is broadly available, or a Databricks-only thing? This should be the thing to investigate first.

Option 2: Another way to do this may be to have AXS access the files through a caching layer. I looked at S3 caching options, and found there are many. Example:

(and see the list of more projects at the bottom of s3fs-fuse README).

Opening this issue so we don't forget about this use case.

(@dennyglee, @zecevicp, any thoughts/ideas/comments?)

@mjuric mjuric added enhancement New feature or request help wanted Extra attention is needed labels Dec 12, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request help wanted Extra attention is needed
Projects
None yet
Development

No branches or pull requests

1 participant