[jdbc] Enhance useragent to indicate Apache Spark #1890

BentsiLeviav · 2024-10-27T14:10:07Z

Many users use Apache Spark (or any other framework) via this JDBC connector.
This PR adds "framework detecting" functionality. It looks at the current stack trace, infers the framework usage based on a closed list of tracked frameworks, and adds an indication for it in the user agent string.

Paultagoras · 2024-10-28T06:37:20Z

clickhouse-http-client/src/main/java/com/clickhouse/client/http/ClickHouseHttpConnection.java

@@ -417,10 +410,23 @@ protected String getDefaultUserAgent() {
        return config.getClientName();
    }

+    protected String getAdditionalFrameworkUserAgent() {


Wouldn't this check be run every time someone makes a request?

It would, but this is what we wanted.
Eventually, a user can write a Java application, using Spark with JDBC, but also might use just JDBC (without any framework). We want to classify each request and ensure we capture the right use case.
Do you happen to have another way to implement this to get the behavior I described?

mzitnik · 2024-10-28T06:37:05Z

clickhouse-http-client/src/main/java/com/clickhouse/client/http/ClickHouseHttpConnection.java

+        for (StackTraceElement ste : Thread.currentThread().getStackTrace()) {
+            for (String framework : frameworks) {
+                if (ste.toString().contains(framework)) {
+                    inferredFrameworks.add(String.format("(%s)", framework));


Is the idea to collect all framework, can you provide an example?

or we should break once we detected one of them

IMHO we don't want to limit ourselves to a single framework use case. A nested or co-shared use case might be possible. From a complexity point of view, it is kind of the same, therefore I would prefer us not to limit this.

I think, we need to exit loop when framework is detected.
And may be remember it to not search again.

@chernser
You must search every time. Each request in an application doesn't necessarily mean we use the framework (you can write s Spark application, work with Spark JDBC, and later on open a regular JDBC connection and query CH).

In addition, could you elaborate on why do we need to exit the loop if a framework was detected? what if a user might use a combination of frameworks that we want to detect? (because as I said, from a complexity point of view there isn't much difference (the list of FRAMEWORKS_TO_INFER is going to be limited), so what is the rationality behind limiting this ability?)

clickhouse-http-client/src/main/java/com/clickhouse/client/http/ClickHouseHttpConnection.java

chernser · 2024-10-28T06:53:28Z

clickhouse-http-client/src/main/java/com/clickhouse/client/http/ClickHouseHttpConnection.java

+    protected String getAdditionalFrameworkUserAgent() {
+        List<String> frameworks = List.of("apache.spark");
+        Set<String> inferredFrameworks = new LinkedHashSet<>();
+        for (StackTraceElement ste : Thread.currentThread().getStackTrace()) {


Taking current thread stack trace approach would not work with asynchronous tasks (but we have moved away from this default behavior) because task may have very independent stack trace.

Do you have any suggestions for that? If not, we better have something rather than nothing.

We may look up on startup for specific classes - usually frameworks have some.

I might have the jar of Apache Spark loaded and won't use it, and with that approach, it will be classified as Apache Spark usage.

@Paultagoras and @mzitnik suggested moving this logic to the JDBC client initiation logic (where there are no async tasks). So that solves this issue.

chernser · 2024-10-28T07:00:33Z

Good day, @BentsiLeviav!
This feature is only useful for clients that using ClickHouse Cloud.
Other users should have an option to turn it off.

Would you please also tell story behind this feature? What would we like to find out?

Thanks!

BentsiLeviav · 2024-10-28T12:30:41Z

@chernser
This feature was meant for us to better monitor and understand the real usage of the JDBC driver.
We want to be able to know exactly what kind of technology/framework our users use.
As of today, the JDBC driver is the underlying connection for many frameworks/technologies such as:

Apache Spark
DBeaver
Tableau

for some of these frameworks, we can't customize the product name. Therefore, I added this feature to infer what kind of framework is used (out of a close list we define), and add it to the user agent metadata.

I talked in this PR, I have moved the implementation to the JDBC (to avoid any async tasks) and minimized the usage by making sure the logic runs only on client creation.

chernser · 2024-10-28T16:10:26Z

clickhouse-jdbc/src/main/java/com/clickhouse/jdbc/ClickHouseDriver.java

@@ -152,6 +172,8 @@ public ClickHouseConnection connect(String url, Properties info) throws SQLExcep
        if (!acceptsURL(url)) {
            return null;
        }
+        if (!url.toLowerCase().contains("disable-frameworks-detection"))


not every case passes properties thru url - often properties are used.
I suggest do this check in com.clickhouse.jdbc.internal.ClickHouseConnectionImpl#ClickHouseConnectionImpl(com.clickhouse.jdbc.internal.ClickHouseJdbcUrlParser.ConnectionInfo) - this method is called right after URL is parsed for properties so ConnectionInfo contains all of them.

chernser · 2024-10-28T16:13:11Z

@BentsiLeviav thank you for explanation!

chernser

Please do

Enhance useragent to indicate Apache Spark

18621f2

BentsiLeviav requested review from mzitnik, chernser and Paultagoras October 27, 2024 14:10

Paultagoras reviewed Oct 28, 2024

View reviewed changes

mzitnik reviewed Oct 28, 2024

View reviewed changes

chernser reviewed Oct 28, 2024

View reviewed changes

clickhouse-http-client/src/main/java/com/clickhouse/client/http/ClickHouseHttpConnection.java Outdated Show resolved Hide resolved

chernser reviewed Oct 28, 2024

View reviewed changes

BentsiLeviav added 2 commits October 28, 2024 08:54

change frameworks to be static

dd532a3

rename the variable

9d96d03

move logic to jdbc + structure a singleton

a829daa

chernser reviewed Oct 28, 2024

View reviewed changes

move initiation and feature flag check to ConnImpl

9ef5ca2

mshustov requested review from chernser, Paultagoras and mzitnik October 30, 2024 09:23

chernser mentioned this pull request Oct 30, 2024

[jdbc] framework detection #1894

Merged

3 tasks

chernser approved these changes Oct 30, 2024

View reviewed changes

chernser changed the title ~~Enhance useragent to indicate Apache Spark~~ [jdbc] Enhance useragent to indicate Apache Spark Oct 30, 2024

chernser merged commit e04e44f into ClickHouse:main Oct 30, 2024
30 of 59 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[jdbc] Enhance useragent to indicate Apache Spark #1890

[jdbc] Enhance useragent to indicate Apache Spark #1890

BentsiLeviav commented Oct 27, 2024

Paultagoras Oct 28, 2024

BentsiLeviav Oct 28, 2024 •

edited

Loading

mzitnik Oct 28, 2024

mzitnik Oct 28, 2024

BentsiLeviav Oct 28, 2024

chernser Oct 28, 2024

BentsiLeviav Oct 28, 2024

chernser Oct 28, 2024

BentsiLeviav Oct 28, 2024

chernser Oct 28, 2024

BentsiLeviav Oct 28, 2024

chernser commented Oct 28, 2024

BentsiLeviav commented Oct 28, 2024

chernser Oct 28, 2024

chernser commented Oct 28, 2024

chernser left a comment

[jdbc] Enhance useragent to indicate Apache Spark #1890

[jdbc] Enhance useragent to indicate Apache Spark #1890

Conversation

BentsiLeviav commented Oct 27, 2024

Choose a reason for hiding this comment

BentsiLeviav Oct 28, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

chernser commented Oct 28, 2024

BentsiLeviav commented Oct 28, 2024

Choose a reason for hiding this comment

chernser commented Oct 28, 2024

chernser left a comment

Choose a reason for hiding this comment

BentsiLeviav Oct 28, 2024 •

edited

Loading