ClickHouse 可以通过 KafkaEngine 拉取 Kafka 数据,在 DDL 中指定:[1]
kafka_broker_list = 'host:port',
kafka_topic_list = 'topic1,topic2,...',
kafka_group_name = 'group_name',
kafka_format = 'data_format'[,]
但是公有云 Kafka 往往需要使用 kerberos 以加强安全性,ClickHouse 这边的配置就不在 DDL 中,而是需要配置文件。
ClickHouse 访问单 Kafka 配置 kerberos
若 ClickHouse 集群只访问一个配置了 kerberos 的 Kafka 集群,那只需在配置文件中添加如下内容 [2][3]:
代码语言:javascript复制<clickhouse>
<kafka>
<sasl_username>username</sasl_username>
<sasl_password>password</sasl_password>
<security_protocol>sasl_ssl</security_protocol>
<sasl_mechanisms>PLAIN</sasl_mechanisms>
</kafka>
</clickhouse>
具体可以配置哪些参数可以看 librdkafka的配置,这是 ClickHouse 使用的底层 Kafka 库。
值得一提的是,DDL 中的配置都可以写到配置文件中,比如 [4]:
代码语言:javascript复制<clickhouse>
<kafka_broker_list>host:port</kafka_broker_list>
<kafka_topic_list>topic1,topic2,...</kafka_topic_list>
<kafka>
<sasl_username>username</sasl_username>
<sasl_password>password</sasl_password>
<security_protocol>sasl_ssl</security_protocol>
<sasl_mechanisms>PLAIN</sasl_mechanisms>
</kafka>
</clickhouse>
ClickHouse 访问多 Kafka 配置 kerberos
当 ClickHouse 需要访问多个不同 Kafka 集群,且都配置了 kerberos,又该如何配置呢?
这就需要使用 ClickHouse 的 Named collections 配置,简单来讲就是将一些需要覆盖的配置整合成一个命名集合,在 SQL 中指定该命名集合后就会用集合内的配置覆盖原本的配置(需要配置开启 allow_named_collection_override_by_default
)。
于是我们可以这样配置 [5]:
代码语言:javascript复制<clickhouse>
<named_collections>
<the_first_kafka>
<kafka>
<sasl_username>username</sasl_username>
<sasl_password>password</sasl_password>
<security_protocol>sasl_ssl</security_protocol>
<sasl_mechanisms>PLAIN</sasl_mechanisms>
</kafka>
</the_first_kafka>
<the_second_kafka>
<kafka_broker_list>host:port</kafka_broker_list>
<kafka_topic_list>topic1,topic2,...</kafka_topic_list>
<kafka_group_name>group_name</kafka_group_name>
<kafka>
<sasl_username>username</sasl_username>
<sasl_password>password</sasl_password>
<security_protocol>sasl_ssl</security_protocol>
<sasl_mechanisms>PLAIN</sasl_mechanisms>
</kafka>
</the_second_kafka>
</named_collections>
</clickhouse>
然后通过 DDL 创建 KafkaEngine 时指定该 Named collections:
代码语言:javascript复制CREATE TABLE kafka_test
(
...
) ENGINE = Kafka(the_second_kafka)
SETTINGS
kafka_format = 'JSON';
- https://clickhouse.com/docs/en/engines/table-engines/integrations/kafka ↩︎
- https://clickhouse.com/docs/en/integrations/kafka/kafka-table-engine#2-configure-clickhouse ↩︎
- https://clickhouse.com/docs/en/engines/table-engines/integrations/kafka#kafka-kerberos-support ↩︎
- https://github.com/ClickHouse/ClickHouse/issues/28703#issuecomment-1241852550 ↩︎
- https://kb.altinity.com/altinity-kb-integrations/altinity-kb-kafka/altinity-kb-adjusting-librdkafka-settings/#different-configurations-for-different-tables ↩︎