You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
We've talked about a use case where archives decide to keep datasets internally, but put up S3 API facade for remote access with AXS. E.g., imagine the data is physically in IPAC and MAST, but being analyzed at TACC. The question then is whether accesses to the datasets can transparently be cached where AXS is running, for faster repeated access.
Option 1: Spark seems to have recently added support for caching of remote datasets through Delta cache. It's not clear to me whether this is broadly available, or a Databricks-only thing? This should be the thing to investigate first.
Option 2: Another way to do this may be to have AXS access the files through a caching layer. I looked at S3 caching options, and found there are many. Example:
We've talked about a use case where archives decide to keep datasets internally, but put up S3 API facade for remote access with AXS. E.g., imagine the data is physically in IPAC and MAST, but being analyzed at TACC. The question then is whether accesses to the datasets can transparently be cached where AXS is running, for faster repeated access.
Option 1: Spark seems to have recently added support for caching of remote datasets through Delta cache. It's not clear to me whether this is broadly available, or a Databricks-only thing? This should be the thing to investigate first.
Option 2: Another way to do this may be to have AXS access the files through a caching layer. I looked at S3 caching options, and found there are many. Example:
(and see the list of more projects at the bottom of s3fs-fuse README).
Opening this issue so we don't forget about this use case.
(@dennyglee, @zecevicp, any thoughts/ideas/comments?)
The text was updated successfully, but these errors were encountered: