首页>国内 > 正文

Sentry 后端云原生中间件实践 ClickHouse PaaS ,为 Snuba 事件分析引擎提供动力

2023-02-13 12:07:38来源:黑客下午茶

ClickHouse PaaS 云原生多租户平台(Altinity.Cloud)

官网:https://altinity.cloud

PaaS 架构概览


(资料图)

设计一个拥有云原生编排能力、支持多云环境部署、自动化运维、弹性扩缩容、故障自愈等特性,同时提供租户隔离、权限管理、操作审计等企业级能力的高性能、低成本的分布式中间件服务是真挺难的。

SaaS 模式交付给用户

Sentry Snuba 事件大数据分析引擎架构概览

Snuba 是一个在 Clickhouse 基础上提供丰富数据模型、快速摄取消费者和查询优化器的服务。以搜索和提供关于 Sentry 事件数据的聚合引擎。

数据完全存储在 Clickhouse 表和物化视图中,它通过输入流(目前只有 Kafka 主题)摄入,可以通过时间点查询或流查询(订阅)进行查询。

文档:https://getsentry.github.io/snuba/architecture/overview.html

Kubernetes ClickHouse Operator什么是 Kubernetes Operator?

Kubernetes Operator 是一种封装、部署和管理 Kubernetes 应用的方法。我们使用 Kubernetes API(应用编程接口)和 kubectl 工具在 Kubernetes 上部署并管理 Kubernetes 应用。

​​https://kubernetes.io/zh-cn/docs/concepts/extend-kubernetes/operator/​​Altinity Operator for ClickHouse

Altinity:ClickHouse Operator 业界领先开源提供商。

Altinity:https://altinity.com/GitHub:https://github.com/Altinity/clickhouse-operatorYoutube:https://www.youtube.com/@Altinity

当然这种多租户隔离的 ClickHouse 中间件 PaaS 云平台,公司或云厂商几乎是不开源的。

RadonDB ClickHouse​​https://github.com/radondb/radondb-clickhouse-operator​​​​https://github.com/radondb/radondb-clickhouse-kubernetes​​

云厂商(青云)基于 altinity-clickhouse-operator 定制的。对于快速部署生产集群做了些优化。

Helm + Operator 快速上云 ClickHouse 集群云原生实验环境VKE K8S Cluster,Vultr托管集群(v1.23.14)Kubesphere v3.3.1 集群可视化管理,全栈的 Kubernetes 容器云 PaaS 解决方案。Longhorn 1.14,Kubernetes 的云原生分布式块存储。部署 clickhouse-operator

这里我们使用 RadonDB 定制的 Operator。

values.operator.yaml定制如下两个参数:
# operator 监控集群所有 namespace 的 clickhouse 部署watchAllNamespaces: true# 启用 operator 指标监控enablePrometheusMonitor: true
helm 部署 operator:
cd vip-k8s-paas/10-cloud-native-clickhouse# 部署在 kube-systemhelm install clickhouse-operator ./clickhouse-operator -f values.operator.yaml -n kube-systemkubectl -n kube-system get po | grep clickhouse-operator# clickhouse-operator-6457c6dcdd-szgpd       1/1     Running   0          3m33skubectl -n kube-system get svc | grep clickhouse-operator# clickhouse-operator-metrics   ClusterIP      10.110.129.244     8888/TCP    4m18skubectl api-resources | grep clickhouse# clickhouseinstallations            chi          clickhouse.radondb.com/v1              true         ClickHouseInstallation# clickhouseinstallationtemplates    chit         clickhouse.radondb.com/v1              true         ClickHouseInstallationTemplate# clickhouseoperatorconfigurations   chopconf     clickhouse.radondb.com/v1              true         ClickHouseOperatorConfiguration
部署 clickhouse-cluster

这里我们使用 RadonDB 定制的 clickhouse-cluster helm charts。快速部署 2 shards + 2 replicas + 3 zk nodes 的集群。

values.cluster.yaml定制:
clickhouse:    clusterName: snuba-clickhouse-nodes    shardscount: 2    replicascount: 2...zookeeper:  install: true  replicas: 3
helm 部署 clickhouse-cluster:
kubectl create ns cloud-clickhousehelm install clickhouse ./clickhouse-cluster -f values.cluster.yaml -n cloud-clickhousekubectl get po -n cloud-clickhouse# chi-clickhouse-snuba-ck-nodes-0-0-0   3/3     Running   5 (6m13s ago)   16m# chi-clickhouse-snuba-ck-nodes-0-1-0   3/3     Running   1 (5m33s ago)   6m23s# chi-clickhouse-snuba-ck-nodes-1-0-0   3/3     Running   1 (4m58s ago)   5m44s# chi-clickhouse-snuba-ck-nodes-1-1-0   3/3     Running   1 (4m28s ago)   5m10s# zk-clickhouse-0                       1/1     Running   0               17m# zk-clickhouse-1                       1/1     Running   0               17m# zk-clickhouse-2                       1/1     Running   0               17m
借助 Operator 快速扩展 clickhouse 分片集群使用如下命令,将shardsCount改为3:

kubectl edit chi/clickhouse -n cloud-clickhouse

查看 pods:

kubectl get po -n cloud-clickhouse# NAME                                  READY   STATUS    RESTARTS       AGE# chi-clickhouse-snuba-ck-nodes-0-0-0   3/3     Running   5 (24m ago)    34m# chi-clickhouse-snuba-ck-nodes-0-1-0   3/3     Running   1 (23m ago)    24m# chi-clickhouse-snuba-ck-nodes-1-0-0   3/3     Running   1 (22m ago)    23m# chi-clickhouse-snuba-ck-nodes-1-1-0   3/3     Running   1 (22m ago)    23m# chi-clickhouse-snuba-ck-nodes-2-0-0   3/3     Running   1 (108s ago)   2m33s# chi-clickhouse-snuba-ck-nodes-2-1-0   3/3     Running   1 (72s ago)    119s# zk-clickhouse-0                       1/1     Running   0              35m# zk-clickhouse-1                       1/1     Running   0              35m# zk-clickhouse-2                       1/1     Running   0              35m

发现多出chi-clickhouse-snuba-ck-nodes-2-0-0与chi-clickhouse-snuba-ck-nodes-2-1-0。分片与副本已自动由Operator新建。

小试牛刀ReplicatedMergeTree+Distributed+Zookeeper 构建多分片多副本集群连接 clickhouse

我们进入 Pod, 使用原生命令行客户端clickhouse-client连接。

kubectl exec -it chi-clickhouse-snuba-ck-nodes-0-0-0 -n cloud-clickhouse -- bashkubectl exec -it chi-clickhouse-snuba-ck-nodes-0-1-0 -n cloud-clickhouse -- bashkubectl exec -it chi-clickhouse-snuba-ck-nodes-1-0-0 -n cloud-clickhouse -- bashkubectl exec -it chi-clickhouse-snuba-ck-nodes-1-1-0 -n cloud-clickhouse -- bashkubectl exec -it chi-clickhouse-snuba-ck-nodes-2-0-0 -n cloud-clickhouse -- bashkubectl exec -it chi-clickhouse-snuba-ck-nodes-2-1-0 -n cloud-clickhouse -- bash

我们直接通过终端分别进入这 6 个 pod。然后进行测试:

clickhouse-client --multiline -u username -h ip --password passowrd# clickhouse-client -m
创建分布式数据库查看system.clusters
select * from system.clusters;

2.创建名为test的数据库

create database test on cluster "snuba-ck-nodes";# 删除:drop database test on cluster "snuba-ck-nodes";
在各个节点查看,都已存在test数据库。
show databases;
创建本地表(ReplicatedMergeTree)建表语句如下:

在集群中各个节点test数据库中创建t_local本地表,采用ReplicatedMergeTree表引擎,接受两个参数:

zoo_path— zookeeper 中表的路径,针对表同一个分片的不同副本,定义相同路径。

"/clickhouse/tables/{shard}/test/t_local"

replica_name— zookeeper 中表的副本名称
CREATE TABLE test.t_local on cluster "snuba-ck-nodes"(    EventDate DateTime,    CounterID UInt32,    UserID UInt32)ENGINE = ReplicatedMergeTree("/clickhouse/tables/{shard}/test/t_local", "{replica}")PARTITION BY toYYYYMM(EventDate)ORDER BY (CounterID, EventDate, intHash32(UserID))SAMPLE BY intHash32(UserID);
宏(macros)占位符:

建表语句参数包含的宏替换占位符(如:{replica})。会被替换为配置文件里 macros 部分的值。

查看集群中 clickhouse 分片&副本节点configmap:

kubectl get configmap -n cloud-clickhouse | grep clickhouseNAME                                             DATA   AGEchi-clickhouse-common-configd                    6      20hchi-clickhouse-common-usersd                     6      20hchi-clickhouse-deploy-confd-snuba-ck-nodes-0-0   2      20hchi-clickhouse-deploy-confd-snuba-ck-nodes-0-1   2      20hchi-clickhouse-deploy-confd-snuba-ck-nodes-1-0   2      20hchi-clickhouse-deploy-confd-snuba-ck-nodes-1-1   2      20hchi-clickhouse-deploy-confd-snuba-ck-nodes-2-0   2      19hchi-clickhouse-deploy-confd-snuba-ck-nodes-2-1   2      19h

查看节点配置值:

kubectl describe configmap chi-clickhouse-deploy-confd-snuba-ck-nodes-0-0  -n cloud-clickhouse
创建对应的分布式表(Distributed)
CREATE TABLE test.t_dist on cluster "snuba-ck-nodes"(    EventDate DateTime,    CounterID UInt32,    UserID UInt32)ENGINE = Distributed("snuba-ck-nodes", test, t_local, rand());# drop table test.t_dist on cluster "snuba-ck-nodes";

这里,Distributed 引擎的所用的四个参数:

cluster - 服务为配置中的集群名(snuba-ck-nodes)database - 远程数据库名(test)table - 远程数据表名(t_local)sharding_key - (可选) 分片key(CounterID/rand())

查看相关表,如:

use test;show tables;# t_dist# t_local

通过分布式表插入几条数据:

# 插入INSERT INTO test.t_dist VALUES ("2022-12-16 00:00:00", 1, 1),("2023-01-01 00:00:00",2, 2),("2023-02-01 00:00:00",3, 3);

任一节点查询数据:

select * from test.t_dist;
实战,为 Snuba 引擎提供 ClickHouse PaaS拆解与分析 Sentry Helm Charts

在我们迁移到 Kubernetes Operator 之前,我们先拆解与分析下 sentry-charts 中自带的 clickhouse & zookeeper charts。

非官方 Sentry Helm Charts:

​​https://github.com/sentry-kubernetes/charts​​

他的Chart.yaml如下:

apiVersion: v2appVersion: 22.11.0dependencies:- condition: sourcemaps.enabled  name: memcached  repository: https://charts.bitnami.com/bitnami  version: 6.1.5- condition: redis.enabled  name: redis  repository: https://charts.bitnami.com/bitnami  version: 16.12.1- condition: kafka.enabled  name: kafka  repository: https://charts.bitnami.com/bitnami  version: 16.3.2- condition: clickhouse.enabled  name: clickhouse  repository: https://sentry-kubernetes.github.io/charts  version: 3.2.0- condition: zookeeper.enabled  name: zookeeper  repository: https://charts.bitnami.com/bitnami  version: 9.0.0- alias: rabbitmq  condition: rabbitmq.enabled  name: rabbitmq  repository: https://charts.bitnami.com/bitnami  version: 8.32.2- condition: postgresql.enabled  name: postgresql  repository: https://charts.bitnami.com/bitnami  version: 10.16.2- condition: nginx.enabled  name: nginx  repository: https://charts.bitnami.com/bitnami  version: 12.0.4description: A Helm chart for Kubernetesmaintainers:- name: sentry-kubernetesname: sentrytype: applicationversion: 17.9.0

这个 sentry-charts 将所有中间件 helm charts 耦合依赖在一起部署,不适合 sentry 微服务 & 中间件集群扩展。更高级的做法是每个中间件拥有定制的 Kubernetes Operator(如:clickhouse-operator) & 独立的 K8S 集群,形成中间件 PaaS 平台对外提供服务。

这里我们拆分中间件 charts 到独立的 namespace 或单独的集群运维。设计为:

ZooKeeper 命名空间:cloud-zookeeper-paasClickHouse 命名空间:cloud-clickhouse-paas独立部署 ZooKeeper Helm Chart

这里 zookeeper chart 采用的是 bitnami/zookeeper,他的仓库地址如下:

​​https://github.com/bitnami/charts/tree/master/bitnami/zookeeper​​​​https://github.com/bitnami/containers/tree/main/bitnami/zookeeper​​ZooKeeper Operator 会在后续文章专项讨论。创建命名空间:
kubectl create ns cloud-zookeeper-paas
简单定制下values.yaml:
# 暴露下 prometheus 监控所需的服务metrics:  containerPort: 9141  enabled: true........service:  annotations: {}  clusterIP: ""  disableBaseClientPort: false  externalTrafficPolicy: Cluster  extraPorts: []  headless:    annotations: {}    publishNotReadyAddresses: true  loadBalancerIP: ""  loadBalancerSourceRanges: []  nodePorts:    client: ""    tls: ""  ports:    client: 2181    election: 3888    follower: 2888    tls: 3181  sessionAffinity: None  type: ClusterIP

注意:在使用支持外部负载均衡器的云提供商的服务时,需设置 Sevice 的 type 的值为 "LoadBalancer", 将为 Service 提供负载均衡器。来自外部负载均衡器的流量将直接重定向到后端 Pod 上,不过实际它们是如何工作的,这要依赖于云提供商。

helm 部署:
helm install zookeeper ./zookeeper -f values.yaml -n cloud-zookeeper-paas

集群内,可使用zookeeper.cloud-zookeeper-paas.svc.cluster.local:2181对外提供服务。

zkCli 连接 ZooKeeper:
export POD_NAME=$(kubectl get pods --namespace cloud-zookeeper-paas -l "app.kubernetes.io/name=zookeeper,app.kubernetes.io/instance=zookeeper,app.kubernetes.io/compnotallow=zookeeper" -o jsnotallow="{.items[0].metadata.name}")kubectl -n cloud-zookeeper-paas exec -it $POD_NAME -- zkCli.sh# test[zk: localhost:2181(CONNECTED) 0] ls /[zookeeper][zk: localhost:2181(CONNECTED) 1] ls /zookeeper[config, quota][zk: localhost:2181(CONNECTED) 2] quit# 外部访问# kubectl port-forward --namespace cloud-zookeeper-paas svc/zookeeper 2181: & zkCli.sh 127.0.0.1:2181
查看zoo.cfg
kubectl -n cloud-zookeeper-paas exec -it $POD_NAME -- cat /opt/bitnami/zookeeper/conf/zoo.cfg
# The number of milliseconds of each ticktickTime=2000# The number of ticks that the initial# synchronization phase can takeinitLimit=10# The number of ticks that can pass between# sending a request and getting an acknowledgementsyncLimit=5# the directory where the snapshot is stored.# do not use /tmp for storage, /tmp here is just# example sakes.dataDir=/bitnami/zookeeper/data# the port at which the clients will connectclientPort=2181# the maximum number of client connections.# increase this if you need to handle more clientsmaxClientCnxns=60## Be sure to read the maintenance section of the# administrator guide before turning on autopurge.## https://zookeeper.apache.org/doc/current/zookeeperAdmin.html#sc_maintenance## The number of snapshots to retain in dataDirautopurge.snapRetainCount=3# Purge task interval in hours# Set to "0" to disable auto purge featureautopurge.purgeInterval=0## Metrics Providers## https://prometheus.io Metrics ExportermetricsProvider.className=org.apache.zookeeper.metrics.prometheus.PrometheusMetricsProvider#metricsProvider.httpHost=0.0.0.0metricsProvider.httpPort=9141metricsProvider.exportJvmInfo=truepreAllocSize=65536snapCount=100000maxCnxns=0recnotallow=falsequorumListenOnAllIPs=false4lw.commands.whitelist=srvr, mntr, ruokmaxSessinotallow=40000admin.serverPort=8080admin.enableServer=trueserver.1=zookeeper-0.zookeeper-headless.cloud-zookeeper-paas.svc.cluster.local:2888:3888;2181server.2=zookeeper-1.zookeeper-headless.cloud-zookeeper-paas.svc.cluster.local:2888:3888;2181server.3=zookeeper-2.zookeeper-headless.cloud-zookeeper-paas.svc.cluster.local:2888:3888;2181
独立部署 ClickHouse Helm Chart

这里 clickhouse chart 采用的是 sentry-kubernetes/charts 自己维护的一个版本:

sentry snuba 目前对于 clickhouse 21.x 等以上版本支持的并不友好,这里的镜像版本是yandex/clickhouse-server:20.8.19.4。​​https://github.com/sentry-kubernetes/charts/tree/develop/clickhouse​​ClickHouse Operator + ClickHouse Keeper 会在后续文章专项讨论。

这个自带的 clickhouse-charts 存在些问题,Service 部分需简单修改下允许配置 "type:LoadBalancer" or "type:NodePort"。

注意:在使用支持外部负载均衡器的云提供商的服务时,需设置 Sevice 的 type 的值为 "LoadBalancer", 将为 Service 提供负载均衡器。来自外部负载均衡器的流量将直接重定向到后端 Pod 上,不过实际它们是如何工作的,这要依赖于云提供商。

创建命名空间:
kubectl create ns cloud-clickhouse-paas
简单定制下values.yaml:

注意上面zoo.cfg的 3 个 zookeeper 实例的地址:

server.1=zookeeper-0.zookeeper-headless.cloud-zookeeper-paas.svc.cluster.local:2888:3888;2181server.2=zookeeper-1.zookeeper-headless.cloud-zookeeper-paas.svc.cluster.local:2888:3888;2181server.3=zookeeper-2.zookeeper-headless.cloud-zookeeper-paas.svc.cluster.local:2888:3888;2181
# 修改 zookeeper_serversclickhouse:  configmap:    zookeeper_servers:      config:      - hostTemplate: "zookeeper-0.zookeeper-headless.cloud-zookeeper-paas.svc.cluster.local"        index: clickhouse        port: "2181"      - hostTemplate: "zookeeper-1.zookeeper-headless.cloud-zookeeper-paas.svc.cluster.local"        index: clickhouse        port: "2181"      - hostTemplate: "zookeeper-2.zookeeper-headless.cloud-zookeeper-paas.svc.cluster.local"        index: clickhouse        port: "2181"      enabled: true      operation_timeout_ms: "10000"      session_timeout_ms: "30000"# 暴露下 prometheus 监控所需的服务metrics:  enabled: true

当然这里也可以不用 Headless Service,因为是同一个集群的不同 namespace 的内部访问,所以也可简单填入 ClusterIP 类型 Sevice:

# 修改 zookeeper_serversclickhouse:  configmap:    zookeeper_servers:      config:      - hostTemplate: "zookeeper.cloud-zookeeper-paas.svc.cluster.local"        index: clickhouse        port: "2181"      enabled: true      operation_timeout_ms: "10000"      session_timeout_ms: "30000"# 暴露下 prometheus 监控所需的服务metrics:  enabled: true
helm 部署:
helm install clickhouse ./clickhouse -f values.yaml -n cloud-clickhouse-paas
连接 clickhouse
kubectl -n cloud-clickhouse-paas exec -it clickhouse-0 -- clickhouse-client --multiline --host="clickhouse-1.clickhouse-headless.cloud-clickhouse-paas"
验证集群
show databases;select * from system.clusters;select * from system.zookeeper where path = "/clickhouse";
当前 ClickHouse 集群的 ConfigMap

kubectl get configmap -n cloud-clickhouse-paas | grep clickhouse

clickhouse-config    1      28hclickhouse-metrica   1      28hclickhouse-users     1      28h
clickhouse-config(config.xml)
    /var/lib/clickhouse/    /var/lib/clickhouse/tmp/    /var/lib/clickhouse/user_files/    /var/lib/clickhouse/format_schemas/    /etc/clickhouse-server/metrica.d/metrica.xml    users.xml    clickhouse    0.0.0.0    8123    9000    9009    4096    3    100    8589934592    5368709120    UTC    022    false                3600    3600    60    1            system        query_log
toYYYYMM(event_date) 7500
system query_thread_log
toYYYYMM(event_date) 7500
/clickhouse/task_queue/ddl trace /var/log/clickhouse-server/clickhouse-server.log /var/log/clickhouse-server/clickhouse-server.err.log 1000M 10
clickhouse-metrica(metrica.xml)
                        zookeeper-0.zookeeper-headless.cloud-zookeeper-paas.svc.cluster.local            2181                            zookeeper-1.zookeeper-headless.cloud-zookeeper-paas.svc.cluster.local            2181                            zookeeper-2.zookeeper-headless.cloud-zookeeper-paas.svc.cluster.local            2181                30000        10000                                                                                true                    clickhouse-0.clickhouse-headless.cloud-clickhouse-paas.svc.cluster.local                    9000                    default                    true                                                                            true                    clickhouse-1.clickhouse-headless.cloud-clickhouse-paas.svc.cluster.local                    9000                    default                    true                                                                            true                    clickhouse-2.clickhouse-headless.cloud-clickhouse-paas.svc.cluster.local                    9000                    default                    true                                                                
clickhouse-users(users.xml)
Sentry Helm Charts 定制接入 ClickHouse PaaS, 单集群多节点

我们简单修改values.yml

禁用 sentry-charts 中的 clickHouse & zookeeper
clickhouse:  enabled: falsezookeeper:  enabled: false
修改externalClickhouse
externalClickhouse:  database: default  host: "clickhouse.cloud-clickhouse-paas.svc.cluster.local"  httpPort: 8123  password: ""  singleNode: false  clusterName: "clickhouse"  tcpPort: 9000  username: default

注意:

这里只是简单的集群内部接入1个多节点分片集群,而 Snuba 系统的设计是允许你接入多个 ClickHouse多节点多分片多副本集群,将多个 Schema 分散到不同的集群,从而实现超大规模吞吐。因为是同一个集群的不同 namespace 的内部访问,所以这里简单填入类型为 ClusterIP Sevice 即可。注意这里singleNode要设置成false。因为我们是多节点,同时我们需要提供clusterName:源码分析:这将用于确定:以及确定来使用不同的 ClickHouse Table Engines 等。当然,ClickHouse 本身是一个单独的技术方向,这里就不展开讨论了。

将运行哪些迁移(仅本地或本地和分布式表)

查询中的差异 - 例如是否选择了 _local 或 _dist 表

部署
helm install sentry ./sentry -f values.yaml -n sentry
验证 _local 与 _dist 表以及 system.zookeeper
kubectl -n cloud-clickhouse-paas exec -it clickhouse-0 -- clickhouse-client --multiline --host="clickhouse-1.clickhouse-headless.cloud-clickhouse-paas"show databases;show tables;select * from system.zookeeper where path = "/clickhouse";

高级部分 & 超大规模吞吐接入 ClickHouse多集群/多节点/多分片/多副本的中间件 PaaS

独立部署多套 VKE LoadBlancer+ VKE K8S Cluster + ZooKeeper-Operator + ClickHouse-Operator,分散 Schema 到不同的集群以及多节点分片。

分析 Snuba 系统设计查看测试用例源码,了解系统设计与高阶配置

关于针对 ClickHouse 集群各个分片、副本之间的读写负载均衡、连接池等问题。Snuba 在系统设计、代码层面部分就已经做了充分的考虑以及优化。

关于 ClickHouse Operator 独立的多个云原生编排集群以及 Snuba 系统设计等高级部分会在 VIP 专栏直播课单独讲解。

关键词: 命名空间 系统设计 提供服务 们是如何 采用的是

相关新闻

Copyright 2015-2020   三好网  版权所有 联系邮箱:435 22 640@qq.com  备案号: 京ICP备2022022245号-21