-
Notifications
You must be signed in to change notification settings - Fork 534
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Disable Object Reuse By Default #1382
base: main
Are you sure you want to change the base?
Conversation
- Enabling object reuse by default has the potential to cause unforeseen bugs in user code. - When a user decides to pass a `Stream<ClickHouseRecord>` back as a `List<ClickHouseRecord>`, all objects by default will be pointing back to the same object. This can be extremely jarring for users who aren't aware objects are being reused. - Added a `reuseObjects()` method to quickly enable object reuse when appropriate. This allows the user to decide when memory efficiency is a goal.
- Decided it was too hard to understand in tests
- Horray copy paste
I use this library within Kotlin code, and so, Kotlin Collections makes it very easy to go from |
Thanks for your contribution @rickysaltzer! Besides memory efficiency, object creation also slows things down - the CI failure was because all tests couldn't finish in 15 minutes. I think the proposed change will greatly impact to all reads, so I'd not suggest to do that. Have you tried public Object[] extractValues() {
int size = size();
Object[] arr = new Object[size];
for (int i = 0; i < size; i++) {
arr[i] = getValue(i).asObject();
}
return arr;
} |
I think the underlying question is, should we be silently corrupting data returned from ClickHouse? Because that is exactly what is happening if a user decides to pass the I think in general it's bad API design to rely on a user to read the code implementation (as I had to do) and call Can we not simply enable object reuse by default for tests? I think it might be presumptuous of us to assume it would slow down user's code significantly, because it depends entirely on what they're doing with the Java API. Are they streaming hundreds of millions of rows? Maybe should consider object reuse. Are they simply performing a large aggregation that returns a few hundred or thousand rows? Object reuse might not be so significant now. Take Apache Flink's API for example, a streaming platform that is meant for extreme scale and load. They disable |
@zhicwu It would be worthwhile to brainstorm alternative changes to API with @rickysaltzer and @mzitnik. Data integrity is a top priority. Let's try to find a balance here. |
Thanks again @rickysaltzer for the inputs! Your points are indeed valid and well-reasoned. However, I would like to emphasize that it's important to consider the differences in memory efficiency and performance between a small library optimized for a single JVM and a distributed middleware like Flink. As you know, we have multiple APIs to choose from, each with its own characteristics. JDBC is a well-known and mature option, while R2DBC is asynchronous and gaining popularity, although the driver has not thoroughly tested yet. On the other hand, the Java client provides better performance and lower memory usage compared to others. If performance and memory usage are not a concern, why not stick with JDBC? Anyway, I think what we're trying to resolve here is to improve Java client API to minimize unintended side effects. |
Thanks for your response, I do very much appreciate the efficiency we're trying to maintain, especially when it comes to higher-level APIs leveraging this one. That being said, I think coming up with an elegant solution to this issue is warranted. |
A few comments here |
Summary
Enabling object reuse by default has the potential to cause unforeseen bugs in user code.
When a user decides to pass a
Stream<ClickHouseRecord>
back as aList<ClickHouseRecord>
, all objects by default will be pointing back to the same object. This can be extremely jarring for users who aren't aware objects are being reused.Added a
reuseObjects()
method to quickly enable object reuse when appropriate. This allows the user to decide when memory efficiency is a goal.Checklist
Delete items not relevant to your PR: