-
Notifications
You must be signed in to change notification settings - Fork 534
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[jdbc] Enhance useragent to indicate Apache Spark #1890
Conversation
@@ -417,10 +410,23 @@ protected String getDefaultUserAgent() { | |||
return config.getClientName(); | |||
} | |||
|
|||
protected String getAdditionalFrameworkUserAgent() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wouldn't this check be run every time someone makes a request?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would, but this is what we wanted.
Eventually, a user can write a Java application, using Spark with JDBC, but also might use just JDBC (without any framework). We want to classify each request and ensure we capture the right use case.
Do you happen to have another way to implement this to get the behavior I described?
for (StackTraceElement ste : Thread.currentThread().getStackTrace()) { | ||
for (String framework : frameworks) { | ||
if (ste.toString().contains(framework)) { | ||
inferredFrameworks.add(String.format("(%s)", framework)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is the idea to collect all framework, can you provide an example?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
or we should break once we detected one of them
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IMHO we don't want to limit ourselves to a single framework use case. A nested or co-shared use case might be possible. From a complexity point of view, it is kind of the same, therefore I would prefer us not to limit this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think, we need to exit loop when framework is detected.
And may be remember it to not search again.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@chernser
You must search every time. Each request in an application doesn't necessarily mean we use the framework (you can write s Spark application, work with Spark JDBC, and later on open a regular JDBC connection and query CH).
In addition, could you elaborate on why do we need to exit the loop if a framework was detected? what if a user might use a combination of frameworks that we want to detect? (because as I said, from a complexity point of view there isn't much difference (the list of FRAMEWORKS_TO_INFER is going to be limited), so what is the rationality behind limiting this ability?)
clickhouse-http-client/src/main/java/com/clickhouse/client/http/ClickHouseHttpConnection.java
Outdated
Show resolved
Hide resolved
protected String getAdditionalFrameworkUserAgent() { | ||
List<String> frameworks = List.of("apache.spark"); | ||
Set<String> inferredFrameworks = new LinkedHashSet<>(); | ||
for (StackTraceElement ste : Thread.currentThread().getStackTrace()) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Taking current thread stack trace approach would not work with asynchronous tasks (but we have moved away from this default behavior) because task may have very independent stack trace.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you have any suggestions for that? If not, we better have something rather than nothing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We may look up on startup for specific classes - usually frameworks have some.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I might have the jar of Apache Spark loaded and won't use it, and with that approach, it will be classified as Apache Spark usage.
@Paultagoras and @mzitnik suggested moving this logic to the JDBC client initiation logic (where there are no async tasks). So that solves this issue.
Good day, @BentsiLeviav! Would you please also tell story behind this feature? What would we like to find out? Thanks! |
@chernser
for some of these frameworks, we can't customize the product name. Therefore, I added this feature to infer what kind of framework is used (out of a close list we define), and add it to the user agent metadata. I talked in this PR, I have moved the implementation to the JDBC (to avoid any async tasks) and minimized the usage by making sure the logic runs only on client creation. |
@@ -152,6 +172,8 @@ public ClickHouseConnection connect(String url, Properties info) throws SQLExcep | |||
if (!acceptsURL(url)) { | |||
return null; | |||
} | |||
if (!url.toLowerCase().contains("disable-frameworks-detection")) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
not every case passes properties thru url
- often properties are used.
I suggest do this check in com.clickhouse.jdbc.internal.ClickHouseConnectionImpl#ClickHouseConnectionImpl(com.clickhouse.jdbc.internal.ClickHouseJdbcUrlParser.ConnectionInfo)
- this method is called right after URL is parsed for properties so ConnectionInfo contains all of them.
@BentsiLeviav thank you for explanation! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please do
Many users use Apache Spark (or any other framework) via this JDBC connector.
This PR adds "framework detecting" functionality. It looks at the current stack trace, infers the framework usage based on a closed list of tracked frameworks, and adds an indication for it in the user agent string.