Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Enable Spark fetching mechanism (optional) and improve reading/writing of index and checksum files. #72

Merged
merged 6 commits into from
Sep 13, 2023

Conversation

pspoerri
Copy link
Contributor

This pull-request implements several improvements:

  • Improve reading/writing of index and checksum files.
    • Uses Java DataInput and DataOutput streams to write Longs.
    • Avoids extra allocation of serializer/deserializer.
    • Accumulate array lengths before writing the index files.
  • Enable Spark fetching mechanism: Implements the Spark fetching algorithm as an option and makes use of the spark.storage.decommission.fallbackStorage.path to read shuffle data. S3-Shuffle will write to the Spark fallback storage path directly.

Signed-off-by: Pascal Spörri <psp@zurich.ibm.com>
- Uses Java DataInput and DataOutput streams to write Longs.
- Avoids extra allocation of serializer/deserializer.
- Accumulate array lenghts before writing the index files.

Signed-off-by: Pascal Spörri <psp@zurich.ibm.com>
This commit implements the Spark fetching algorithm and makes use of
the `spark.storage.decommission.fallbackStorage.path` to read shuffle
data.

Signed-off-by: Pascal Spörri <psp@zurich.ibm.com>
Signed-off-by: Pascal Spörri <psp@zurich.ibm.com>
@pspoerri pspoerri changed the title Enable Spark fetching mechanism. Enable Spark fetching mechanism (optional) and improve reading/writing of index and checksum files. Sep 13, 2023
@pspoerri pspoerri force-pushed the add_spark_fetch branch 3 times, most recently from cf5840f to e44b587 Compare September 13, 2023 09:27
Signed-off-by: Pascal Spörri <psp@zurich.ibm.com>
Signed-off-by: Pascal Spörri <psp@zurich.ibm.com>
@pspoerri pspoerri merged commit faf5a45 into main Sep 13, 2023
17 checks passed
@pspoerri pspoerri deleted the add_spark_fetch branch September 13, 2023 12:13
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant