clickhouse的docker架构

​ 由于最终选用了docker方法安装了clickhouse,便从github下载了clickhouse源码,对clickhouse提供的docker目录结构进行了解,并且积累有关docker的知识(本文仅会通过clickhouse本身的Dockerfile文件简单描述其中使用到的各个命令的作用及在当前使用的情况,详细关于Dockerfile中指令的作用示例等会在之后的docker部分中进行学习描述^_^),从而方便之后修改打包出需要的镜像。

一、ClickHouse的docker文件架构

​ clickhouse关于docker的文件目录分为builder、client、packager、server、test以及外层的docker-compose.yml文件,docker目录的树状结构如下:

docker文件目录
  
docker
├── README.md
├── builder                            # gcc-9/ninja/cmake... 等基础环境的安装
│   ├── Dockerfile
│   ├── Makefile
│   ├── README.md
│   └── build.sh
├── client                            # 安装clickhouse client
│   ├── Dockerfile
│   └── README.md
├── images.json
├── packager
│   ├── README.md
│   ├── binary
│   │   ├── Dockerfile
│   │   └── build.sh
│   ├── deb
│   │   ├── Dockerfile
│   │   └── build.sh
│   ├── freebsd
│   │   └── Vagrantfile
│   └── packager
├── server                        # 安装配置clickhouse server
│   ├── Dockerfile
│   ├── README.md
│   ├── docker_related_config.xml
│   ├── entrypoint.sh
│   └── local.Dockerfile
└── test                            # 用于测试使用
    ├── Dockerfile
    ├── README.md
    ├── codebrowser
    │   └── Dockerfile
    ├── compatibility
    │   ├── centos
    │   │   └── Dockerfile
    │   └── ubuntu
    │       └── Dockerfile
    ├── coverage
    │   └── Dockerfile
    ├── integration
    │   └── Dockerfile
    ├── performance
    │   ├── Dockerfile
    │   ├── run.sh
    │   └── s3downloader
    ├── performance-comparison
    │   ├── Dockerfile
    │   ├── compare.sh
    │   ├── config
    │   │   ├── config.d
    │   │   │   └── perf-comparison-tweaks-config.xml
    │   │   └── users.d
    │   │       └── perf-comparison-tweaks-users.xml
    │   ├── download.sh
    │   ├── entrypoint.sh
    │   ├── eqmed.sql
    │   ├── perf.py
    │   ├── performance_comparison.md
    │   └── report.py
    ├── pvs
    │   └── Dockerfile
    ├── split_build_smoke_test
    │   ├── Dockerfile
    │   └── run.sh
    ├── stateful
    │   ├── Dockerfile
    │   └── s3downloader
    ├── stateful_with_coverage
    │   ├── Dockerfile
    │   ├── run.sh
    │   └── s3downloader
    ├── stateless
    │   ├── Dockerfile
    │   └── clickhouse-statelest-test-runner.Dockerfile
    ├── stateless_with_coverage
    │   ├── Dockerfile
    │   └── run.sh
    ├── stress
    │   ├── Dockerfile
    │   ├── README.md
    │   └── stress
    ├── test_runner.sh
    ├── test_runner_docker_compose.yaml
    └── unit
        └── Dockerfile
  

二、server内容详解

​ ClickHouse docker的server目录包含了:Dockerfile、README.md、docker_related_config.xml、entrypoint.sh、local.Dockerfile这些文件,其中README.md简单介绍了如何通过docker启动clickhouse服务等基本应用,将从Dockerfile文件内容入手从而解释其他文件的作用。

2.1、Dockerfile文件内容

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
FROM ubuntu:18.04

## ARG 构建参数: repository、version、gosu_ver
ARG repository="deb http://repo.yandex.ru/clickhouse/deb/stable/ main/"

## 官方目前还没有clickhouse-common-statis/client/server的20.4.1.*这个版本,改为version=20.3.4.*即可docker build打包镜像
ARG version=20.4.1.*
ARG gosu_ver=1.10

## RUN 执行命令:安装基础命令及clickhouse-client/clickhouse-server/clickhouse-common-static...
RUN apt-get update \
&& apt-get install --yes --no-install-recommends \
apt-transport-https \
dirmngr \
gnupg \
&& mkdir -p /etc/apt/sources.list.d \
&& apt-key adv --keyserver keyserver.ubuntu.com --recv E0C56BD4 \
&& echo $repository > /etc/apt/sources.list.d/clickhouse.list \
&& apt-get update \
&& env DEBIAN_aFRONTEND=noninteractive \
apt-get install --allow-unauthenticated --yes --no-install-recommends \
clickhouse-common-static=$version \
clickhouse-client=$version \
clickhouse-server=$version \
locales \
tzdata \
wget \
&& rm -rf \
/var/lib/apt/lists/* \
/var/cache/debconf \
/tmp/* \
&& apt-get clean

## ADD: 将github上的gosu-amd64下载为/bin/gosu文件, gosu类似于linux中的sudo命令
ADD https://github.com/tianon/gosu/releases/download/$gosu_ver/gosu-amd64 /bin/gosu

# 语言环境修改
## 生成需要的locale文件
RUN locale-gen en_US.UTF-8
## ENV:设置环境变量
ENV LANG en_US.UTF-8
ENV LANGUAGE en_US:en
ENV LC_ALL en_US.UTF-8
ENV TZ UTC

RUN mkdir /docker-entrypoint-initdb.d

## COPY:复制文件,eg:将docker_related_config.xml文件复制到/etc/clickhouse-server/config.d/目录下,不需要提前创建目标目录,若不存在,COPY会自动创建
COPY docker_related_config.xml /etc/clickhouse-server/config.d/
COPY entrypoint.sh /entrypoint.sh

RUN chmod +x \
/entrypoint.sh \
/bin/gosu

## EXPOSE:暴露端口,仅声明容器运行时应该打开哪些服务端口,方便启动docker run -p时配置映射关系
EXPOSE 9000 8123 9009

## VOLUME: 数据持久化,会将启动后容器/var/lib/clickhouse目录下的数据写入到宿主机上,如果docker run时指定了--volume=/work/docker/clickhouse_test_db:/var/lib/clickhouse,便会同步至宿主机的/work/docker/clickhouse_test_db目录,若未指定会写入宿主机docker info的Docker Root Dir目录下
VOLUME /var/lib/clickhouse

## 设置CLICKHOUSE_CONFIG环境变量
ENV CLICKHOUSE_CONFIG /etc/clickhouse-server/config.xml

## ENTRYPOINT ["shell脚本"]:执行容器运行前的一些准备工作
ENTRYPOINT ["/entrypoint.sh"]

​ 从Dockerfile中可以看到,docker_related_config.xml文件会被放置在clickhouse-server的配置目录/etc/clickhouse-server/config.d/下,用来配置clickhouse-server容器的监听地址,默认配置允许接受来自其他容器和主机网络的连接。

1
2
3
4
5
6
7
8
9
10
11
12
<yandex>
<!-- Listen wildcard address to allow accepting connections from other containers and host network. -->
<listen_host>::</listen_host>
<listen_host>0.0.0.0</listen_host>
<listen_try>1</listen_try>

<!--
<logger>
<console>1</console>
</logger>
-->
</yandex>

2.3、entrypoint.sh文件内容

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
#!/bin/bash
DO_CHOWN=1
if [ "$CLICKHOUSE_DO_NOT_CHOWN" = 1 ]; then
DO_CHOWN=0
fi

CLICKHOUSE_UID="${CLICKHOUSE_UID:-"$(id -u clickhouse)"}"
CLICKHOUSE_GID="${CLICKHOUSE_GID:-"$(id -g clickhouse)"}"

# support --user
## 设置USER/GROUP为clickhouse
if [ x"$UID" == x0 ]; then
USER=$CLICKHOUSE_UID
GROUP=$CLICKHOUSE_GID
gosu="gosu $USER:$GROUP"
else
USER="$(id -u)"
GROUP="$(id -g)"
gosu=""
DO_CHOWN=0
fi

# set some vars
## 配置文件 /etc/clickhouse-server/config.xml
CLICKHOUSE_CONFIG="${CLICKHOUSE_CONFIG:-/etc/clickhouse-server/config.xml}"

# port is needed to check if clickhouse-server is ready for connections
HTTP_PORT="$(clickhouse extract-from-config --config-file $CLICKHOUSE_CONFIG --key=http_port)"

# get CH directories locations
DATA_DIR="$(clickhouse extract-from-config --config-file $CLICKHOUSE_CONFIG --key=path || true)"
TMP_DIR="$(clickhouse extract-from-config --config-file $CLICKHOUSE_CONFIG --key=tmp_path || true)"
USER_PATH="$(clickhouse extract-from-config --config-file $CLICKHOUSE_CONFIG --key=user_files_path || true)"
LOG_PATH="$(clickhouse extract-from-config --config-file $CLICKHOUSE_CONFIG --key=logger.log || true)"
LOG_DIR="$(dirname $LOG_PATH || true)"
ERROR_LOG_PATH="$(clickhouse extract-from-config --config-file $CLICKHOUSE_CONFIG --key=logger.errorlog || true)"
ERROR_LOG_DIR="$(dirname $ERROR_LOG_PATH || true)"
FORMAT_SCHEMA_PATH="$(clickhouse extract-from-config --config-file $CLICKHOUSE_CONFIG --key=format_schema_path || true)"
CLICKHOUSE_USER="${CLICKHOUSE_USER:-default}"

for dir in "$DATA_DIR" \
"$ERROR_LOG_DIR" \
"$LOG_DIR" \
"$TMP_DIR" \
"$USER_PATH" \
"$FORMAT_SCHEMA_PATH"
do
# check if variable not empty
[ -z "$dir" ] && continue
# ensure directories exist
if ! mkdir -p "$dir"; then
echo "Couldn't create necessary directory: $dir"
exit 1
fi

if [ "$DO_CHOWN" = "1" ]; then
# ensure proper directories permissions
chown -R "$USER:$GROUP" "$dir"
elif [ "$(stat -c %u "$dir")" != "$USER" ]; then
echo "Necessary directory '$dir' isn't owned by user with id '$USER'"
exit 1
fi
done



if [ -n "$(ls /docker-entrypoint-initdb.d/)" ]; then
$gosu /usr/bin/clickhouse-server --config-file=$CLICKHOUSE_CONFIG &
pid="$!"

# check if clickhouse is ready to accept connections
# will try to send ping clickhouse via http_port (max 12 retries, with 1 sec delay)
if ! wget --spider --quiet --tries=12 --waitretry=1 --retry-connrefused "http://localhost:$HTTP_PORT/ping" ; then
echo >&2 'ClickHouse init process failed.'
exit 1
fi

if [ ! -z "$CLICKHOUSE_PASSWORD" ]; then
printf -v WITH_PASSWORD '%s %q' "--password" "$CLICKHOUSE_PASSWORD"
fi

clickhouseclient=( clickhouse-client --multiquery -u $CLICKHOUSE_USER $WITH_PASSWORD )

echo
for f in /docker-entrypoint-initdb.d/*; do
case "$f" in
*.sh)
if [ -x "$f" ]; then
echo "$0: running $f"
"$f"
else
echo "$0: sourcing $f"
. "$f"
fi
;;
*.sql) echo "$0: running $f"; cat "$f" | "${clickhouseclient[@]}" ; echo ;;
*.sql.gz) echo "$0: running $f"; gunzip -c "$f" | "${clickhouseclient[@]}"; echo ;;
*) echo "$0: ignoring $f" ;;
esac
echo
done

if ! kill -s TERM "$pid" || ! wait "$pid"; then
echo >&2 'Finishing of ClickHouse init process failed.'
exit 1
fi
fi

# if no args passed to `docker run` or first argument start with `--`, then the user is passing clickhouse-server arguments
if [[ $# -lt 1 ]] || [[ "$1" == "--"* ]]; then
exec $gosu /usr/bin/clickhouse-server --config-file=$CLICKHOUSE_CONFIG "$@"
fi

# Otherwise, we assume the user want to run his own process, for example a `bash` shell to explore this image
exec "$@"

2.4、local.Dockerfile文件

​ 暂时保留,未发现使用该文件的地方

2.5 server镜像打包

​ 根据clickhouse-server的docker源代码发现Dockerfile、entrypoint.sh、docker_related_config.xml三个文件为docker必需文件,新建一个目录,只写入这三个文件后,对原生server镜像进行打包。

镜像打包目录架构

​ 上图为镜像打包准备目录的架构,确保docker服务启动后使用docker build -t clickhouse-server-demo:1.0 .打包镜像名为clickhouse-server-demo,tag为1.0的docker镜像。

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
## 打包镜像
[root... test-clickhouse]# docker build -t clickhouse-server-demo:1.0 .
Sending build context to Docker daemon 8.704 kB
Step 1/19 : FROM ubuntu:18.04
...

Successfully built 16acf3ee797d ## 当最终出现打包镜像成功表示docker build完成
## 若打包失败处理方法(本次打包过程中从github下载源代码版本较新,官方deb包还未更新,出现过打包失败的情况):
[root... test-clickhouse]# docker build -t clickhouse-server-demo:1.0 .
...
E: Version '20.4.1.*' for 'clickhouse-common-static' was not found
E: Version '20.4.1.*' for 'clickhouse-client' was not found
E: Version '20.4.1.*' for 'clickhouse-server' was not found
The command '/bin/sh -c apt-get update && apt-get install --yes --no-install-recommends apt-transport-https dirmngr gnupg && mkdir -p /etc/apt/sources.list.d && apt-key adv --keyserver keyserver.ubuntu.com --recv E0C56BD4 && echo $repository > /etc/apt/sources.list.d/clickhouse.list && apt-get update && env DEBIAN_FRONTEND=noninteractive apt-get install --allow-unauthenticated --yes --no-install-recommends clickhouse-common-static=$version clickhouse-client=$version clickhouse-server=$version locales tzdata wget && rm -rf /var/lib/apt/lists/*

### 以上报错表示clickhouse-server/client..未找到20.4.1.*的包,在执行Dockerfile中的/bin/sh -c apt-get update...命令执行失败
#### 1. 通过`docker image ls`的命令看到并未打包镜像成功,其中image id为289a...
[root... test-clickhouse]# docker image ls
REPOSITORY TAG IMAGE ID CREATED SIZE
<none> <none> 9feb83a76f47 8 minutes ago 64.2 MB
ubuntu 18.04 4e5021d210f6 37 hours ago 64.2 MB
hello-world latest fce289e99eb9 14 months ago 1.84 kB
#### 2. 通过`docker run -it 9feb83a76f47 /bin/bash`进入目前已打包的镜像中,执行失败的命令,查看报错原因
[root... test-clickhouse]# docker run -it 9feb83a76f47 /bin/bash
## 进入后可通过执行命令查看打包失败原因
root@8974a71311f5:/# apt-get update && apt-get install --yes --no-install-recommends apt-transport-https dirmngr gnupg && mkdir -p /etc/apt/sources.list.d && apt-key adv --keyserver keyserver.ubuntu.com --recv E0C56BD4 && echo $repository > /etc/apt/sources.list.d/clickhouse.list && apt-get update && env DEBIAN_FRONTEND=noninteractive apt-get install --allow-unauthenticated --yes --no-install-recommends clickhouse-common-static=$version clickhouse-client=$version clickhouse-server=$version locales tzdata wget && rm -rf /var/lib/apt/lists/* /var/cache/debconf /tmp/* && apt-get clean
...
E: Unable to locate package clickhouse-common-static
E: Unable to locate package clickhouse-client
E: Unable to locate package clickhouse-server
## 由执行结果看到是由于没有找到clickhouse-server/client/common-static这三个包,通过查看命令可以看到,安装这三个包时指定了版本为Dockerfile初始化的version值,可能是由于version版本过高导致
### 可以通过不指定版本安装查看是否是由于版本过高导致,通过结果可以看到,会得到的版本是20.3.4.10
root@8974a71311f5:/# apt-get install --allow-unauthenticated --yes --no-install-recommends clickhouse-common-static
Reading package lists... Done
Building dependency tree
Reading state information... Done
The following additional packages will be installed:
tzdata
Suggested packages:
clickhouse-common-static-dbg
The following NEW packages will be installed:
clickhouse-common-static tzdata
0 upgraded, 2 newly installed, 0 to remove and 0 not upgraded.
Need to get 117 MB of archives.
After this operation, 397 MB of additional disk space will be used.
Get:1 http://archive.ubuntu.com/ubuntu bionic-updates/main amd64 tzdata all 2019c-0ubuntu0.18.04 [190 kB]
Get:2 http://repo.yandex.ru/clickhouse/deb/stable main/ clickhouse-common-static 20.3.4.10 [116 MB]
### 也可通过获取路径看到最新版的clickhouse-client等包为20.3.4.10,所以将Dockerfile中的version值改为20.3.4.*即可,或者在命令中不指定版本安装,默认打包最新的安装包即可。

## 其他docker build过程中的报错也可通过该方法解决,启动镜像找到报错原因去解决问题即可。

## 问题解决后,将已有镜像删除,重新进行打包
### 由于当时只启动了关于该镜像的容器,所以可以直接全部stop掉,如果有其他启动的容器,则不能这样关闭
[root... test-clickhouse]# docker stop $(docker ps -a -q)
8974a71311f5
fae466f31ccf
[root... test-clickhouse]# docker rm fae466f31ccf
fae466f31ccf
[root... test-clickhouse]# docker rm 8974a71311f5
8974a71311f5
### 删除镜像
[root... test-clickhouse]# docker rmi 9feb83a76f47
Deleted: sha256:9feb83a76f4778aba0523c9e4bd492aa6804d1fa759558b8f93d8aa61ae6ac57
Deleted: sha256:b99e0912ee05a0f2a581f061828a89b1b08d6485b5481bc3585e44a7c8f53857
Deleted: sha256:227768fbfa5b173b32260e0cb38ef33d9495693593699c95b837f7b71a286064
[root... test-clickhouse]# docker image ls
REPOSITORY TAG IMAGE ID CREATED SIZE
ubuntu 18.04 4e5021d210f6 38 hours ago 64.2 MB
hello-world latest fce289e99eb9 14 months ago 1.84 kB
## 重新打包
[root... test-clickhouse]# docker build -t clickhouse-server-demo:1.0 .
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
## 查看镜像
[root... test-clickhouse]# docker image ls
REPOSITORY TAG IMAGE ID CREATED SIZE
clickhouse-server-demo 1.0 16acf3ee797d 21 minutes ago 492 MB
ubuntu 18.04 4e5021d210f6 3 days ago 64.2 MB
hello-world latest fce289e99eb9 14 months ago 1.84 kB
## 启动容器
[root... test-clickhouse]# docker run -d --name clickhouse-test-server --ulimit nofile=262144:262144 --volume=/work/clickhouse/clickhouse-test-server:/var/lib/clickhouse clickhouse-server-demo:1.0
958ed2deaab03542f2880f41e66346165252f953f91efa60b7d95ad6a1a88330
[root... test-clickhouse]# docker ps
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
958ed2deaab0 clickhouse-server-demo:1.0 "/entrypoint.sh" 3 seconds ago Up 3 seconds 8123/tcp, 9000/tcp, 9009/tcp clickhouse-test-server
## 进入容器
[root... work]# docker exec -it 958e /bin/bash
### 进入当前clickhouse
root@958ed2deaab0:/# clickhouse-client
ClickHouse client version 20.3.4.10 (official build).
Connecting to localhost:9000 as user default.
Connected to ClickHouse server version 20.3.4 revision 54433.

958ed2deaab0 :) show databases;

SHOW DATABASES

┌─name────┐
│ default │
│ system │
└─────────┘

2 rows in set. Elapsed: 0.002 sec.

三、client内容详解

​ client目录下仅有Dockerfile以及README.md文件

3.1、Dockerfile文件内容

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
FROM ubuntu:18.04

ARG repository="deb http://repo.yandex.ru/clickhouse/deb/stable/ main/"
ARG version=20.4.1.*

RUN apt-get update \
&& apt-get install --yes --no-install-recommends \
apt-transport-https \
dirmngr \
gnupg \
&& mkdir -p /etc/apt/sources.list.d \
&& apt-key adv --keyserver keyserver.ubuntu.com --recv E0C56BD4 \
&& echo $repository > /etc/apt/sources.list.d/clickhouse.list \
&& apt-get update \
&& env DEBIAN_FRONTEND=noninteractive \
apt-get install --allow-unauthenticated --yes --no-install-recommends \
clickhouse-client=$version \
clickhouse-common-static=$version \
locales \
tzdata \
&& rm -rf /var/lib/apt/lists/* /var/cache/debconf \
&& apt-get clean

RUN locale-gen en_US.UTF-8
ENV LANG en_US.UTF-8
ENV LANGUAGE en_US:en
ENV LC_ALL en_US.UTF-8

ENTRYPOINT ["/usr/bin/clickhouse-client"]