fix(c/driver/sqlite): Wrap bulk ingests in a single begin/commit txn #910
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Fixes #466 by having a single begin/commit txn for ingesting tables, instead of committing once per row.
Testing
R
I first installed the SQLite R driver and measured the time it takes to bulk ingest the
nycflights13
dataset, noting the 336776 rowcount:As well as the time it takes to execute the query:
As a followup, I noticed it takes significantly longer (~30 minutes) to execute the query on my XPS 15 9520:
Compared to my Macbook Air (1.711345 minutes):
Both on the same version of R 4.3.1 (Beagle Scout).
After making my changes, I ran
R CMD INSTALL . --preclean
inarrow-adbc/r/adbcdrivermanager
. I also installed the following R packages:After which I ran the following commands to validate no build / compile issues showing up as R packaged my changes:
Noting that the file I made changes to showed up as a vendored file:
After packaging and installing my changes, I ran through the same bulk ingest commands for the
nycflights13
dataset and verified that the table contained the same number of rows as the previous run, also noting the speedup from 1.7 minutes to 0.2 seconds: