-
Notifications
You must be signed in to change notification settings - Fork 177
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Inefficient statement-level batching #527
Comments
(I can try to create another reproduction including wireshark output if you need it, but it should be pretty easy to reproduce) |
I think it is running the bindings one-by-one here: r2dbc-postgresql/src/main/java/io/r2dbc/postgresql/PostgresqlStatement.java Lines 223 to 229 in a32a679
|
The performance you experience pretty much depends on the input variance. Parametrized queries use the extended flow to prepare a query on the server side and then run the prepared query with the bindings provided to the statement. Queries that do not change and are called with the same parameter types should typically yield a similar performance profile as inline-parametrized (static) statements. The catch with I'm actually wondering why the JDBC driver would yield a better performance profile as it should work exactly the same way. Both drivers have the same amount of information and should be able to have a similar throughput. Maybe @davecramer has some more insights. |
Good point, I guess, batch statements mostly make sense for updates where you don't care about the result, so that might be difference? Also, there's a difference in API where JDBC only allows to consume the data from the first statement while r2dbc seems to allow consumption of all results? |
I haven't looked at this in a while. But if @jrudolph is correct we send all the requests at once which minimizes the round trips. |
Yes, this is true. I need to think about proposals on how we could optimize here. Right now, I can imagine that pipelining would work if the fetch size is zero (fetch everything) so we can request all data (send all the requests within a small number of packets) at once without employing a multi-round trip conversation that takes backpressure into account. |
If it matters, in my opinion it's most important to optimize this for batch inserts. For inserts you don't care about the result value. I guess it's always |
What we (pgjdbc) do is rewrite the inserts into one insert using multivalue writes ie
becomes
|
Interesting. R2DBC drivers refrain from rewriting SQL in any form as we do not want to get ourselves into SQL parsing (sigh) or rewriting business. In any case, a multi-value insert can be passed into the driver directly because it is SQL after all. |
We don't parse it either. this is one case where we rewrite it though. |
I understand the examples above with literal values, but could you give some hints of how to do that with bind parameters? Would it be just one bind with very many bind parameters? If that is the case, the number of bind parameters would also depend on how many rows to insert, which probably isn't great for prepared statement caching? Or would that be used with only one bind parameter and the array syntax you showed? |
|
prepared statement caching doesn't buy you that much in Postgres. I expect 1 round trip which would be more than madeup for by doing the inserts all in one statement |
Taking the example from https://www.postgresql.org/docs/current/dml-insert.html
would require 9 bind parameters for 3 rows, 12 for 4, and so on? Or did I misunderstand the syntax? Thanks for the hint, but it feels like an extreme workaround for such a common use case as batch inserts. I hope it can be solved in a better way in the driver itself. |
which is why the JDBC driver does it for you |
@davecramer I'm coming against this issue now. Any chance you could point out the code in the JDBC driver that does this rewriting? In my case, we are trying to do a batch insert of around 100 rows at a time and it seems we're really feeling the effects of the DB round trip. I'm wondering if in our case, we can do something like this in user-space even if the driver doesn't support it out of the box. |
Using
Statement.add()
andbind
to execute batches is significantly slower than usingConnection.createBatch()
. Using bindings should be faster than recreating individual textual SQL statements and sending them over the connection.As far as I have debugged it with wireshark, it looks like
Connection.createBatch
will send all the (mostly duplicated) SQL statements in a single packet back-to-back. On the other hand, statement batches do not send all of the bind/execute etc command in one go, but each statement is run one by one. Compare this to how JDBC statement-level batching works where all the binds and execute commands are also sent back-to-back in one go. For this reason (but not only), batched JDBC statements are also quite a bit faster than using r2dbc.(In akka/akka-persistence-r2dbc@2ea4121 we were considering actual going to connection-level batching but issuing statements without bind (and creating individual textual SQL statements) seems to big of a risk / anti-pattern to work around this issue in the driver.)
The text was updated successfully, but these errors were encountered: