Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Replace one insert call per row with one insert call for multiple rows #1502

Conversation

tristanvuong2021
Copy link
Contributor

@tristanvuong2021 tristanvuong2021 commented Feb 26, 2024

The addBinding method executes one insert per row. When the number of rows is large, the performance suffers. For 800 metrics with one measurement each and the same reporting set for each metric, insertion takes 4.2 seconds locally. With the change in this PR, insertion takes 1.4 seconds.

The refactoring is done this way because R2DBC doesn't automically transform the inserts into a single batch insert. See pgjdbc/r2dbc-postgresql#527.

@wfa-reviewable
Copy link

This change is Reviewable

Copy link
Member

@SanjayVas SanjayVas left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: 0 of 4 files reviewed, 1 unresolved discussion (waiting on @tristanvuong2021)


src/main/kotlin/org/wfanet/measurement/reporting/deploy/v2/postgres/writers/CreateMetrics.kt line 216 at r2 (raw file):

        )

        val tempMetricsCurIndex = metricsCurIndex

I'm not a fan of this approach. There should be a better way to handle this. I know that the JDBC driver has a reWriteBatchedInserts option that basically does this for you, but I'm not sure if this applies for R2DBC. You could give that a shot.

If we really need to handle it ourselves, we should do the rewriting in the common-jvm statement builder (basically changing how add works).

Copy link
Contributor Author

@tristanvuong2021 tristanvuong2021 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: 0 of 4 files reviewed, 1 unresolved discussion (waiting on @SanjayVas)


src/main/kotlin/org/wfanet/measurement/reporting/deploy/v2/postgres/writers/CreateMetrics.kt line 216 at r2 (raw file):

Previously, SanjayVas (Sanjay Vasandani) wrote…

I'm not a fan of this approach. There should be a better way to handle this. I know that the JDBC driver has a reWriteBatchedInserts option that basically does this for you, but I'm not sure if this applies for R2DBC. You could give that a shot.

If we really need to handle it ourselves, we should do the rewriting in the common-jvm statement builder (basically changing how add works).

For parameterized statements, this is the only method. For statements that aren't parameterized, there is a batch connection.

Copy link
Member

@SanjayVas SanjayVas left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: 0 of 4 files reviewed, 1 unresolved discussion (waiting on @tristanvuong2021)


src/main/kotlin/org/wfanet/measurement/reporting/deploy/v2/postgres/writers/CreateMetrics.kt line 216 at r2 (raw file):

Previously, tristanvuong2021 (Tristan Vuong) wrote…

For parameterized statements, this is the only method. For statements that aren't parameterized, there is a batch connection.

Did you try setting reWriteBatchedInserts in the connection options? For the underlying JDBC driver, that option is intended to basically automatically rewrite the multiple insert statements into a single one.

Copy link
Contributor Author

@tristanvuong2021 tristanvuong2021 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: 0 of 4 files reviewed, 1 unresolved discussion (waiting on @SanjayVas)


src/main/kotlin/org/wfanet/measurement/reporting/deploy/v2/postgres/writers/CreateMetrics.kt line 216 at r2 (raw file):

Previously, SanjayVas (Sanjay Vasandani) wrote…

Did you try setting reWriteBatchedInserts in the connection options? For the underlying JDBC driver, that option is intended to basically automatically rewrite the multiple insert statements into a single one.

That isn't one of the connection options for postgres r2dbc

Copy link
Member

@SanjayVas SanjayVas left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: 0 of 4 files reviewed, 1 unresolved discussion (waiting on @tristanvuong2021)


src/main/kotlin/org/wfanet/measurement/reporting/deploy/v2/postgres/writers/CreateMetrics.kt line 216 at r2 (raw file):

Previously, tristanvuong2021 (Tristan Vuong) wrote…

That isn't one of the connection options for postgres r2dbc

It's not one of the Option constants, but the options are just strings. The question is whether anything actually reads that option in the R2DBC implementation (e.g. does it end up calling some code from the JDBC impl)

Copy link
Member

@SanjayVas SanjayVas left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: 0 of 4 files reviewed, 1 unresolved discussion (waiting on @tristanvuong2021)


src/main/kotlin/org/wfanet/measurement/reporting/deploy/v2/postgres/writers/CreateMetrics.kt line 216 at r2 (raw file):

Previously, SanjayVas (Sanjay Vasandani) wrote…

It's not one of the Option constants, but the options are just strings. The question is whether anything actually reads that option in the R2DBC implementation (e.g. does it end up calling some code from the JDBC impl)

Ah, perhaps this does indeed not do anything for the R2DBC version. Please reference pgjdbc/r2dbc-postgresql#527 in a comment indicating that we need to do this because R2DBC does not support rewriting batched inserts.

Copy link
Contributor Author

@tristanvuong2021 tristanvuong2021 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: 0 of 4 files reviewed, 1 unresolved discussion (waiting on @SanjayVas)


src/main/kotlin/org/wfanet/measurement/reporting/deploy/v2/postgres/writers/CreateMetrics.kt line 216 at r2 (raw file):

Previously, SanjayVas (Sanjay Vasandani) wrote…

Ah, perhaps this does indeed not do anything for the R2DBC version. Please reference pgjdbc/r2dbc-postgresql#527 in a comment indicating that we need to do this because R2DBC does not support rewriting batched inserts.

Done.

Copy link
Member

@SanjayVas SanjayVas left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed all commit messages.
Reviewable status: 0 of 4 files reviewed, 1 unresolved discussion (waiting on @tristanvuong2021)


src/main/kotlin/org/wfanet/measurement/reporting/deploy/v2/postgres/writers/CreateMetrics.kt line 59 at r2 (raw file):

class CreateMetrics(private val requests: List<CreateMetricRequest>) :
  PostgresWriter<List<Metric>>() {
  private data class WeightedMeasurementsAndStatementComponents(

This is still pretty hard to read. Is there any way to factor things out to make this more readable? I'm thinking something like creating a Postgres-specific builder for multi-valued bound statements. Usage example:

multiValueBoundStatement(paramCount = 3, 
  """
  INSERT INTO Metrics (Foo, Bar, Baz) VALUES (${MultiValueBoundStatement.PARAM_LIST_PLACEHOLDER})
  """
) {
  for (item in items) {
    addBinding {
      bind(paramIndex = 0, item.foo)
      bind(paramIndex = 1, item.bar)
      bind(paramIndex = 2, item.baz)
    }
  }
}

Copy link
Contributor Author

@tristanvuong2021 tristanvuong2021 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: 0 of 6 files reviewed, 1 unresolved discussion (waiting on @SanjayVas)


src/main/kotlin/org/wfanet/measurement/reporting/deploy/v2/postgres/writers/CreateMetrics.kt line 59 at r2 (raw file):

Previously, SanjayVas (Sanjay Vasandani) wrote…

This is still pretty hard to read. Is there any way to factor things out to make this more readable? I'm thinking something like creating a Postgres-specific builder for multi-valued bound statements. Usage example:

multiValueBoundStatement(paramCount = 3, 
  """
  INSERT INTO Metrics (Foo, Bar, Baz) VALUES (${MultiValueBoundStatement.PARAM_LIST_PLACEHOLDER})
  """
) {
  for (item in items) {
    addBinding {
      bind(paramIndex = 0, item.foo)
      bind(paramIndex = 1, item.bar)
      bind(paramIndex = 2, item.baz)
    }
  }
}

Done.

@tristanvuong2021 tristanvuong2021 marked this pull request as ready for review March 5, 2024 23:58
Copy link
Member

@SanjayVas SanjayVas left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed 6 of 6 files at r3, all commit messages.
Reviewable status: all files reviewed, 1 unresolved discussion (waiting on @tristanvuong2021)


MODULE.bazel line 132 at r3 (raw file):

    repo_name = "wfa_common_jvm",
)
archive_override(

Until you have a version, add a DO_NOT_SUBMIT referencing the source PR.

@tristanvuong2021 tristanvuong2021 force-pushed the tristanvuong-replace-separate-inserts-with-1-large-insert-per-table branch from bd7fe76 to 3f165a1 Compare March 11, 2024 17:43
Copy link
Contributor Author

@tristanvuong2021 tristanvuong2021 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: 5 of 6 files reviewed, 1 unresolved discussion (waiting on @SanjayVas)


MODULE.bazel line 132 at r3 (raw file):

Previously, SanjayVas (Sanjay Vasandani) wrote…

Until you have a version, add a DO_NOT_SUBMIT referencing the source PR.

Done.

Copy link
Member

@SanjayVas SanjayVas left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed 2 of 2 files at r4, all commit messages.
Reviewable status: :shipit: complete! all files reviewed, all discussions resolved (waiting on @tristanvuong2021)

Copy link
Collaborator

@stevenwarejones stevenwarejones left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed 4 of 6 files at r3, 2 of 2 files at r4, all commit messages.
Reviewable status: :shipit: complete! all files reviewed, all discussions resolved (waiting on @tristanvuong2021)

@tristanvuong2021 tristanvuong2021 merged commit 746b7a6 into main Mar 13, 2024
4 checks passed
@tristanvuong2021 tristanvuong2021 deleted the tristanvuong-replace-separate-inserts-with-1-large-insert-per-table branch March 13, 2024 16:38
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants