QoS / Kafka Topic Prioritization w/ ClickHouse Kafka Connect Sink #337

hodgesz · 2024-02-29T16:51:10Z

hodgesz
Feb 29, 2024

Our use case is using Kafka and an intermediate step for our events before persisting in ClickHouse to offer backpressure and avoid causing performance issues on the ClickHouse cluster for other transactions. We are currently using self-managed ClickHouse and Kafka clusters.

We have no issues publishing our events to Kafka, but curious if we are able to use the CH sink (https://www.confluent.io/hub/clickhouse/clickhouse-kafka-connect) as a way to move these over to CH. We have one tricky requirement where certain events/topics need to take priority over others, sort of like quality of service.

Do you know how we could approach using the sink to satisfy this requirement? We were thinking of maybe running multiple instances of the sink with different configurations to try to satisfy the different priorities or QoS across the events/topics.

As a fallback solution, we could certainly write our own custom Kafka consumer, but we are trying to avoid this complexity if possible. Any thoughts or ideas?

Answered by Paultagoras

Mar 2, 2024

Well there's kind of a few different ways to do this, I think. You can definitely have multiple instances of the connector running at once, covering multiple topics, so the simplest way would be to have:

High Priority Connector instance (for all of the High Priority Topics) w/ some high number of workers (it's customizable)
Low Priority Connector instance (for everything else) w/ some lower number of workers

It kind of depends on how much data are you thinking about - do you have an estimate in mind?

View full answer

Paultagoras · 2024-03-01T22:22:35Z

Paultagoras
Mar 1, 2024
Maintainer

Hmm - in the sense that high priority messages in the queue need to be inserted before any others? Or something else?

0 replies

hodgesz · 2024-03-01T23:05:49Z

hodgesz
Mar 1, 2024
Author

Basically would it be possible to process certain topics at a higher throughput than the lower priority topics. Our thought was configuring multiple CH Kafka Connect Sinks and having more partitions/tasks/threads on the higher priority topics to have greater throughput to CH. Is this possible, and if so, can you share how to configure your CH Kafka Connect Sinks to accomplish this?

…

On Fri, Mar 1, 2024 at 3:22 PM Paultagoras ***@***.***> wrote: Hmm - in the sense that high priority messages in the queue need to be inserted before any others? Or something else? — Reply to this email directly, view it on GitHub <#337 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AANO4ZJFNK3PQNMUPVSC46LYWD5TNAVCNFSM6AAAAABEAJJUYSVHI2DSMVQWIX3LMV43SRDJONRXK43TNFXW4Q3PNVWWK3TUHM4DMNBYGMYDA> . You are receiving this because you authored the thread.Message ID: <ClickHouse/clickhouse-kafka-connect/repo-discussions/337/comments/8648300 @github.com>

1 reply

Paultagoras Mar 2, 2024
Maintainer

Well there's kind of a few different ways to do this, I think. You can definitely have multiple instances of the connector running at once, covering multiple topics, so the simplest way would be to have:

High Priority Connector instance (for all of the High Priority Topics) w/ some high number of workers (it's customizable)
Low Priority Connector instance (for everything else) w/ some lower number of workers

It kind of depends on how much data are you thinking about - do you have an estimate in mind?

Answer selected by Paultagoras

hodgesz · 2024-03-02T14:57:41Z

hodgesz
Mar 2, 2024
Author

We have about 5-6TBs/day of traffic in production. So, what might be a good number of workers to use? We can start with two connectors, high and low priority. The following is our configuration, so I am guessing tsks.max is at least one of the settings we need to change: connector.class=com.clickhouse.kafka.connect.ClickHouseSinkConnector tasks.max=1 topics=<topic_name> ssl=true jdbcConnectionProperties=?sslmode=STRICT security.protocol=SSL hostname=<hostname> database=<database_name> password=<password> ssl.truststore.location=/tmp/kafka.client.truststore.jks port=8443 value.converter.schemas.enable=false value.converter=org.apache.kafka.connect.json.JsonConverter exactlyOnce=true username=default schemas.enable=false

…

On Sat, Mar 2, 2024 at 1:01 AM Paultagoras ***@***.***> wrote: Well there's kind of a few different ways to do this, I think. You can have multiple instances of the connector running at once, covering multiple topics, so the simplest way would be to have: - High Priority Connector instance (for all of the High Priority Topics) w/ some high number of workers (it's customizable) - Low Priority Connector instance (for everything else) w/ some lower number of workers It kind of depends on how much data are you thinking about - do you have an estimate in mind? — Reply to this email directly, view it on GitHub <#337 (reply in thread)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AANO4ZMXIK2V4TE7B6M5CN3YWGBM3AVCNFSM6AAAAABEAJJUYSVHI2DSMVQWIX3LMV43SRDJONRXK43TNFXW4Q3PNVWWK3TUHM4DMNJQGMYDE> . You are receiving this because you authored the thread.Message ID: <ClickHouse/clickhouse-kafka-connect/repo-discussions/337/comments/8650302 @github.com>

1 reply

Paultagoras Mar 3, 2024
Maintainer

Interesting, so roughly 70MB/second? Do you know how many messages that would be, roughly?

tasks.max is definitely one - the general recommendation is 1:1 with Partitions:Tasks, another is to set consumer.override.max.poll.records=5000 to better batch records (the default behavior is 500).

We actually happen to have a few different docs around this sort of thing - https://clickhouse.com/blog/measure-visaualize-minimize-kafka-latency-clickhouse and https://clickhouse.com/docs/en/integrations/kafka/clickhouse-kafka-connect-sink#tuning-performance come to mind, but I'm sure there are others as well. Mind be worth taking a peak though, they also have links to other docs (like from Confluent) about Kafka Connect and optimizing connectors that are worthwhile to read.

hodgesz · 2024-03-04T15:06:33Z

hodgesz
Mar 4, 2024
Author

Good question on # of messages. I have an email out to the ops team on this, but we might have to gather this as part of load testing. Thanks for sharing these docs. I will take a look at those, and you are right, we will need to figure out the Confluent/Kafka config first e.g., partitions, which will inform things like tasks.max,

…

On Sun, Mar 3, 2024 at 4:16 AM Paultagoras ***@***.***> wrote: Interesting, so roughly 70MB/second? Do you know how many messages that would be, roughly? tasks.max is definitely one - the general recommendation is 1:1 with Partitions:Tasks, another is to set consumer.override.max.poll.records=5000 to better batch records (the default behavior is 500). We actually happen to have a few different docs around this sort of thing - https://clickhouse.com/blog/measure-visaualize-minimize-kafka-latency-clickhouse and https://clickhouse.com/docs/en/integrations/kafka/clickhouse-kafka-connect-sink#tuning-performance come to mind, but I'm sure there are others as well. Mind be worth taking a peak though, they also have links to other docs (like from Confluent) about Kafka Connect and optimizing connectors that are worthwhile to read. — Reply to this email directly, view it on GitHub <#337 (reply in thread)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AANO4ZLXU7OCJWPL5FXXA7DYWMA7TAVCNFSM6AAAAABEAJJUYSVHI2DSMVQWIX3LMV43SRDJONRXK43TNFXW4Q3PNVWWK3TUHM4DMNJWHEYDC> . You are receiving this because you authored the thread.Message ID: <ClickHouse/clickhouse-kafka-connect/repo-discussions/337/comments/8656901 @github.com>

4 replies

hodgesz Mar 6, 2024
Author

Hello,

We are working on setting up a load test environment to gather some of these metrics and need your help. We were able to get the CH Kafka Connect Sink Connector to work properly against a CH Cloud instance, where we mapped a Kafka topic to a table and successfully see event messages on the topic getting persisted as records in the CH table.

However, when we tried to set this up with a self-managed CH cluster for the load test, we got the following error:

[2024-03-06 18:22:40,013] ERROR WorkerSinkTask{id=ClickHouseSinkConnectorConnector_0-0} Task threw an uncaught and unrecoverable exception. Task is being killed and will not recover until manually restarted (org.apache.kafka.connect.runtime.WorkerTask)
java.lang.RuntimeException: com.clickhouse.client.ClickHouseException: Code: 36. DB::Exception: KeeperMap is disabled because 'keeper_map_path_prefix' config is not defined. (BAD_ARGUMENTS) (version 23.8.9.54 (official build))
, server ClickHouseNode [uri=https://demo-loadtests.userpilot.io:443/analytex_db, options={sslmode=STRICT}]@551141146
at com.clickhouse.kafka.connect.sink.db.helper.ClickHouseHelperClient.query(ClickHouseHelperClient.java:159)
at com.clickhouse.kafka.connect.sink.db.helper.ClickHouseHelperClient.query(ClickHouseHelperClient.java:137)
at com.clickhouse.kafka.connect.sink.state.provider.KeeperStateProvider.init(KeeperStateProvider.java:70)
at com.clickhouse.kafka.connect.sink.state.provider.KeeperStateProvider.(KeeperStateProvider.java:54)
at com.clickhouse.kafka.connect.sink.ProxySinkTask.(ProxySinkTask.java:45)
at com.clickhouse.kafka.connect.sink.ClickHouseSinkTask.start(ClickHouseSinkTask.java:57)
at org.apache.kafka.connect.runtime.WorkerSinkTask.initializeAndStart(WorkerSinkTask.java:319)
at org.apache.kafka.connect.runtime.WorkerTask.doRun(WorkerTask.java:227)
at org.apache.kafka.connect.runtime.WorkerTask.run(WorkerTask.java:284)
at org.apache.kafka.connect.runtime.isolation.Plugins.lambda$withClassLoader$1(Plugins.java:181)
at java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515)
at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264)
at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
at java.base/java.lang.Thread.run(Thread.java:829)
Caused by: com.clickhouse.client.ClickHouseException: Code: 36. DB::Exception: KeeperMap is disabled because 'keeper_map_path_prefix' config is not defined. (BAD_ARGUMENTS) (version 23.8.9.54 (official build))
, server ClickHouseNode [uri=https://demo-loadtests.userpilot.io:443/analytex_db, options={sslmode=STRICT}]@551141146
at com.clickhouse.client.ClickHouseException.of(ClickHouseException.java:168)
at com.clickhouse.client.AbstractClient.lambda$execute$0(AbstractClient.java:275)
at java.base/java.util.concurrent.CompletableFuture$AsyncSupply.run(CompletableFuture.java:1700)
... 3 more
Caused by: java.io.IOException: Code: 36. DB::Exception: KeeperMap is disabled because 'keeper_map_path_prefix' config is not defined. (BAD_ARGUMENTS) (version 23.8.9.54 (official build))

at com.clickhouse.client.http.ApacheHttpConnectionImpl.checkResponse(ApacheHttpConnectionImpl.java:209)
at com.clickhouse.client.http.ApacheHttpConnectionImpl.post(ApacheHttpConnectionImpl.java:243)
at com.clickhouse.client.http.ClickHouseHttpClient.send(ClickHouseHttpClient.java:118)
at com.clickhouse.client.AbstractClient.sendAsync(AbstractClient.java:161)
at com.clickhouse.client.AbstractClient.lambda$execute$0(AbstractClient.java:273)
... 4 more

I guess KeeperMap is required on our self-managed CH cluster? Can you please help us either set this up or another way to resolve this error?

Paultagoras Mar 6, 2024
Maintainer

KeeperMap is used when exactlyOnce is enabled - does your use case require EO?

hodgesz Mar 6, 2024
Author

Ah I see. I don't think we need that for the load test, and can decide later for specific topics if this makes sense.

Paultagoras Mar 6, 2024
Maintainer

EO is pretty use-case specific, since most cases "at-least once" is more than enough. That said, enabling it is on a connector level and applies to all topics that connector handles.

hodgesz · 2024-03-06T19:56:29Z

hodgesz
Mar 6, 2024
Author

Got it, so we will need to configure multiple connectors if we want to have some with EO and some AO.

…

On Wed, Mar 6, 2024 at 12:46 PM Paultagoras ***@***.***> wrote: EO is pretty use-case specific, since most cases "at-least once" is more than enough. That said, enabling it is on a connector level and applies to all topics that connector handles. — Reply to this email directly, view it on GitHub <#337 (reply in thread)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AANO4ZLZBNOE3ZVN65VPBGLYW5XBDAVCNFSM6AAAAABEAJJUYSVHI2DSMVQWIX3LMV43SRDJONRXK43TNFXW4Q3PNVWWK3TUHM4DMOJYGM3DI> . You are receiving this because you authored the thread.Message ID: <ClickHouse/clickhouse-kafka-connect/repo-discussions/337/comments/8698364 @github.com>

0 replies

hodgesz · 2024-03-07T17:13:40Z

hodgesz
Mar 7, 2024
Author

I really appreciate all your help. We have been researching on the Confluent/Kafka side on guidance for partitions and tasks, which is lining up with what you have suggested. The whole point of us fronting CH with Kafka is to implement backpressure in case we get really high number of events and don't want to overwhelm CH. Is there some way to get the current load or number of connections/transactions on CH? If there is, we could use this value as a way to throttle the messages getting processed in Kafka before sending to CH. I know this can't be managed in the sink connector, but we are planning on writing our own streaming service to work with the sink.

…

On Wed, Mar 6, 2024 at 12:56 PM Jonathan Hodges ***@***.***> wrote: Got it, so we will need to configure multiple connectors if we want to have some with EO and some AO. On Wed, Mar 6, 2024 at 12:46 PM Paultagoras ***@***.***> wrote: > EO is pretty use-case specific, since most cases "at-least once" is more > than enough. That said, enabling it is on a connector level and applies to > all topics that connector handles. > > — > Reply to this email directly, view it on GitHub > <#337 (reply in thread)>, > or unsubscribe > <https://github.com/notifications/unsubscribe-auth/AANO4ZLZBNOE3ZVN65VPBGLYW5XBDAVCNFSM6AAAAABEAJJUYSVHI2DSMVQWIX3LMV43SRDJONRXK43TNFXW4Q3PNVWWK3TUHM4DMOJYGM3DI> > . > You are receiving this because you authored the thread.Message ID: > <ClickHouse/clickhouse-kafka-connect/repo-discussions/337/comments/8698364 > @github.com> >

1 reply

Paultagoras Mar 11, 2024
Maintainer

CH itself can handle a large volume of data, it's generally more a question of batches - large groups, rather than frequent small inserts. That's why I was curious about how many messages you expected 🙂

As far as connections/transactions - I believe if you go to the instance details page, it has the various metrics around that (or leads to more, depending)

hodgesz · 2024-03-14T12:27:14Z

hodgesz
Mar 14, 2024
Author

Makes sense. Can the Kafka Connector Sink support batch or asynchronous inserts on the CH side?

…

On Mon, Mar 11, 2024 at 5:44 AM Paultagoras ***@***.***> wrote: CH itself can handle a large volume of data, it's generally more a question of batches - large groups, rather than frequent small inserts. That's why I was curious about how many messages you expected 🙂 As far as connections/transactions - I believe if you go to the instance details page, it has the various metrics around that (or leads to more, depending) — Reply to this email directly, view it on GitHub <#337 (reply in thread)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AANO4ZIRFV33NZOY6ZTUDFTYXWRLDAVCNFSM6AAAAABEAJJUYSVHI2DSMVQWIX3LMV43SRDJONRXK43TNFXW4Q3PNVWWK3TUHM4DONBUG4ZTA> . You are receiving this because you authored the thread.Message ID: <ClickHouse/clickhouse-kafka-connect/repo-discussions/337/comments/8744730 @github.com>

1 reply

Paultagoras Mar 14, 2024
Maintainer

You could either enable it on the user, or pass it as a clickhouse setting - see https://clickhouse.com/docs/en/optimize/asynchronous-inserts#enabling-asynchronous-inserts

hodgesz · 2024-03-14T17:21:56Z

hodgesz
Mar 14, 2024
Author

Actually after further thought, it seems batch inserts are most likely what we are looking for versus async. That is, if we can control when the Kafka Connector Sync will flush records to CH, either based on # of events or time. Is that possible?

…

On Thu, Mar 14, 2024 at 9:22 AM Paultagoras ***@***.***> wrote: You could either enable it on the user, or pass it as a clickhouse setting - see https://clickhouse.com/docs/en/optimize/asynchronous-inserts#enabling-asynchronous-inserts — Reply to this email directly, view it on GitHub <#337 (reply in thread)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AANO4ZJBNQRDSVX7CL32N6TYYG6CXAVCNFSM6AAAAABEAJJUYSVHI2DSMVQWIX3LMV43SRDJONRXK43TNFXW4Q3PNVWWK3TUHM4DOOBXG44TO> . You are receiving this because you authored the thread.Message ID: <ClickHouse/clickhouse-kafka-connect/repo-discussions/337/comments/8787797 @github.com>

2 replies

Paultagoras Mar 15, 2024
Maintainer

I believe you would want fetch-min-bytes and fetch-max-wait-ms to handle batching more records.

These would likely also be relevant:
https://docs.confluent.io/platform/current/installation/configuration/consumer-configs.html#max-poll-interval-ms
https://docs.confluent.io/platform/current/installation/configuration/consumer-configs.html#max-poll-records

Paultagoras Mar 15, 2024
Maintainer

So fetch-max-wait-ms defaults to 500ms (half a second) so you would want it to maybe 1000ms or 2000ms (can play around with the settings, I would just be careful of heartbeat.interval.ms because fetches that take too long might cause issues.

hodgesz · 2024-03-15T15:10:28Z

hodgesz
Mar 15, 2024
Author

Nice, fetch-min-bytes <https://docs.confluent.io/platform/current/installation/configuration/consumer-configs.html#fetch-min-bytes> looks to be what we need on # of events. I am not sure on fetch-max-wait-ms <https://docs.confluent.io/platform/current/installation/configuration/consumer-configs.html#fetch-max-wait-ms> as these are still really low values. For certain events / topics, there won't be as high of messages per second as others, so we could have scenarios where a fairly small batch is there and we would ideally want to wait for more records to accumulate for that topci, but not past a max time, maybe 3-5 minutes. This is obviously much longer than the ms values of fetch-max-wait-ms <https://docs.confluent.io/platform/current/installation/configuration/consumer-configs.html#fetch-max-wait-ms> so wondering how we might be able to support that?

…

On Fri, Mar 15, 2024 at 8:48 AM Paultagoras ***@***.***> wrote: So fetch-max-wait-ms defaults to 500ms (half a second) so you would want it to maybe 1000ms or 2000ms (can play around with the settings, I would just be careful of heartbeat.interval.ms <https://docs.confluent.io/platform/current/installation/configuration/consumer-configs.html#heartbeat-interval-ms> because fetches that take too long might cause issues. — Reply to this email directly, view it on GitHub <#337 (reply in thread)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AANO4ZJL5VZWOET7OT3CCL3YYMC3TAVCNFSM6AAAAABEAJJUYSVHI2DSMVQWIX3LMV43SRDJONRXK43TNFXW4Q3PNVWWK3TUHM4DQMBTGA4DQ> . You are receiving this because you authored the thread.Message ID: <ClickHouse/clickhouse-kafka-connect/repo-discussions/337/comments/8803088 @github.com>

1 reply

Paultagoras Mar 15, 2024
Maintainer

Correcting my answer slightly - I believe heartbeat and poll are separate threads. I think for your use case, it would be worth setting fetch-min-bytes and fetch-max-wait-ms to a higher value and seeing if that suits 🙂

hodgesz · 2024-03-15T15:17:17Z

hodgesz
Mar 15, 2024
Author

You're the man, Paul and you have been a huge help! We will test this out and let you know how it goes. Unfortunately, we got a lot of snow out here in Denver so I am going to do some shoveling and may not get to this until next week. If that is the case, have a great weekend!

…

On Fri, Mar 15, 2024 at 9:15 AM Paultagoras ***@***.***> wrote: Correcting my answer slightly - I believe heartbeat and poll are separate threads. I think for your use case, it would be worth setting fetch-min-bytes and fetch-max-wait-ms to a higher value and seeing if that suits 🙂 — Reply to this email directly, view it on GitHub <#337 (reply in thread)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AANO4ZKZF352EHAZDDCRUG3YYMGBXAVCNFSM6AAAAABEAJJUYSVHI2DSMVQWIX3LMV43SRDJONRXK43TNFXW4Q3PNVWWK3TUHM4DQMBTGQ3DG> . You are receiving this because you authored the thread.Message ID: <ClickHouse/clickhouse-kafka-connect/repo-discussions/337/comments/8803463 @github.com>

1 reply

Paultagoras Mar 15, 2024
Maintainer

No worries, let me know how it goes and good luck with the snow! We've been pretty light here in Boston thankfully 😅

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

QoS / Kafka Topic Prioritization w/ ClickHouse Kafka Connect Sink #337

{{title}}

Replies: 10 comments 12 replies

{{title}}

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

Select a reply

QoS / Kafka Topic Prioritization w/ ClickHouse Kafka Connect Sink #337

hodgesz Feb 29, 2024

Replies: 10 comments · 12 replies

Paultagoras Mar 1, 2024 Maintainer

hodgesz Mar 1, 2024 Author

Paultagoras Mar 2, 2024 Maintainer

hodgesz Mar 2, 2024 Author

Paultagoras Mar 3, 2024 Maintainer

hodgesz Mar 4, 2024 Author

hodgesz Mar 6, 2024 Author

Paultagoras Mar 6, 2024 Maintainer

hodgesz Mar 6, 2024 Author

Paultagoras Mar 6, 2024 Maintainer

hodgesz Mar 6, 2024 Author

hodgesz Mar 7, 2024 Author

Paultagoras Mar 11, 2024 Maintainer

hodgesz Mar 14, 2024 Author

Paultagoras Mar 14, 2024 Maintainer

hodgesz Mar 14, 2024 Author

Paultagoras Mar 15, 2024 Maintainer

Paultagoras Mar 15, 2024 Maintainer

hodgesz Mar 15, 2024 Author

Paultagoras Mar 15, 2024 Maintainer

hodgesz Mar 15, 2024 Author

Paultagoras Mar 15, 2024 Maintainer

hodgesz
Feb 29, 2024

Replies: 10 comments 12 replies

Paultagoras
Mar 1, 2024
Maintainer

hodgesz
Mar 1, 2024
Author

Paultagoras Mar 2, 2024
Maintainer

hodgesz
Mar 2, 2024
Author

Paultagoras Mar 3, 2024
Maintainer

hodgesz
Mar 4, 2024
Author

hodgesz Mar 6, 2024
Author

Paultagoras Mar 6, 2024
Maintainer

hodgesz Mar 6, 2024
Author

Paultagoras Mar 6, 2024
Maintainer

hodgesz
Mar 6, 2024
Author

hodgesz
Mar 7, 2024
Author

Paultagoras Mar 11, 2024
Maintainer

hodgesz
Mar 14, 2024
Author

Paultagoras Mar 14, 2024
Maintainer

hodgesz
Mar 14, 2024
Author

Paultagoras Mar 15, 2024
Maintainer

Paultagoras Mar 15, 2024
Maintainer

hodgesz
Mar 15, 2024
Author

Paultagoras Mar 15, 2024
Maintainer

hodgesz
Mar 15, 2024
Author

Paultagoras Mar 15, 2024
Maintainer