行存与列存的简单对比

一、行式存储

​ 一般的事务型数据库(OLTP)基本会增删改查同一行数据,故大多使用了行式存储,所有数据按列名排成一行,可通过主键快速找到对应的那行数据,基本数据存储情况如下表。

编号 姓名 年龄 居住地 学位
1 张三 20 北京 本科
2 李四 21 上海 硕士
3 王五 22 广州 博士

​ 物理存储格式基本如下:

二、列式存储

​ 一般OLAP系统需要查询大量的数据,但仅需要关注其中几个列的数据,基于使用情况,一般会使用列式存储。同样的数据按列存储,基本数据存储情况为:

1 2 3
张三 李四 王五
20 21 22
北京 上海 广州
本科 硕士 博士

​ 物理存储格式基本如下:

三、读写对比

1、数据写入

  1. 按照上述示例结构,行存储写入一条数据仅需一次便可直接完成,并且可以保证数据完整性
  2. 列存储上述示例数据写入一条数据,需要拆成五列进行保存,写入次数比行存储翻了五倍,实际写入时间会更长

2、数据修改

  1. 若需要修改一条数据,行存储到指定位置写入一次即可,列存储需要定位到多个位置进行写入,按上述示例数据,列存储所需次数仍是行存储的5倍

3、数据读取

  1. 读取某行数据时,行存储会将该条记录的所有数据读出,若仅需要其中某列的数据,则存在了数据冗余,通常会消耗内存来消除这些冗余列数据
  2. 列存储读取的均为所需要的某一段数据,不存在冗余列的数据
  3. 由于行存储读取的数据包含了多种类型,可能存在数据类型之间的转换从而对数据进行解析,列存储读出的每一段数据的类型均相同,不需要对数据进行类型转换,可以使用不同方法对不同类型进行数据压缩,列存储更有利于对大数据进行分析

四、适用场景

​ 简单来讲,行存储更适合于OLAP场景,列存储更适合于OLTP场景。

​ 当经常需要对某些行进行增删改,无法进行批处理操作,经常关注整张表的结构和数据而不只是某几列的数据,对获取到的数据也并没有很大二次计算处理的需求,此时更适合使用行存储。

​ 若写数据的操作可以进行批量处理,并且经常需要对读取的数据进行聚合运算分析场景时,更适合使用列存储,这是由于列式存储可以对字段数据进行向量化处理,可以将一个列的一整个字段连续读入CPU cache中,可以利用CPU的向量化处理并进行一些常用的计算等操作。

五、ClickHouse查询写入测试

1、测试数据导入

​ 本次ClickHouse使用官方提供的Yandex.Metrica Data测试数据来进行测试,其中包含两张表visit_v1(访问数据)和hits_v1(Yandex.Metrica提供的查询匹配数据),安装流程可参考ClickHouse测试数据导入,由于官方提供的导入方式前提为直接安装了clickhouse服务,而之前一直使用docker方式来进行安装,所以这里简单描述docker环境下如何导入测试数据。

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
# 使用了docker-compose打包了clickhouse镜像
## 1. 下载并解压测试数据
[root@xxxx clickhouse]# curl https://clickhouse-datasets.s3.yandex.net/hits/tsv/hits_v1.tsv.xz | unxz --threads=`nproc` > hits_v1.tsv
[root@xxxx clickhouse]# curl https://clickhouse-datasets.s3.yandex.net/visits/tsv/visits_v1.tsv.xz | unxz --threads=`nproc` > visits_v1.tsv
## 2. 在docker-compose配置文件中volume中添加一个地址和镜像中的对应
[root@xxxx docker_compose]# ll
total 28
-rw-r--r-- 1 root root 14907 Apr 30 15:49 config.xml
drwxr-xr-x 10 101 101 204 Apr 26 18:56 data
-rw-r--r-- 1 root root 733 Apr 26 18:46 docker-compose.yml
drwxr-xr-x 2 root root 80 Apr 30 15:53 log
drwxr-xr-x 2 101 101 58 Apr 30 15:41 tmp
-rw-r--r-- 1 root root 4532 Mar 30 17:43 users.xml
[root@xxxx docker_compose]# vim docker-compose.yml
volumes:
...
- ./tmp:/var/lib/clickhouse/tmp
# 这里使用tmp目录存放下载的测试数据,映射到镜像中的/var/lib/clickhouse/tmp目录
[root@xxxx docker_compose]# ll tmp/
total 10197044
-rw-r--r-- 1 101 101 7784351125 Apr 26 18:32 hits_v1.tsv
-rw-r--r-- 1 root root 2657415178 Apr 30 15:41 visits_v1.tsv
# docker clickhouse已经启动,进入docker环境导入数据即可
[root(host/tjtx148-16-25.58os.org)@tjtx162-17-78 docker_compose]# docker ps
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
968b926da80b clickhouse-server-demo:1.0 "/entrypoint.sh" 3 days ago Up 3 days 0.0.0.0:8123->8123/tcp, 0.0.0.0:9000->9000/tcp, 0.0.0.0:9004->9004/tcp, 9009/tcp clickhouse-server_1
## 3. 导入测试数据
[root@xxxx docker_compose]# docker exec -it clickhouse-server_1 /bin/bash
root@clickhouse-server_1:/# ll /var/lib/clickhouse/tmp/
total 10197044
drwxr-xr-x 2 clickhouse clickhouse 58 Apr 30 07:41 ./
drwxr-xr-x 10 clickhouse clickhouse 204 Apr 26 10:56 ../
-rw-r--r-- 1 clickhouse clickhouse 7784351125 Apr 26 10:32 hits_v1.tsv
-rw-r--r-- 1 root root 2657415178 Apr 30 07:41 visits_v1.tsv
root@clickhouse-server_1:/# clickhouse-client --query "CREATE DATABASE IF NOT EXISTS datasets"
... # 建表语句参考官方文档即可
## 将visiit_v1.tsv测试数据导入,另外一张表hists_v1采用同样方法导入
root@clickhouse-server_1:/# cat visits_v1.tsv | clickhouse-client --query "INSERT INTO datasets.visits_v1 FORMAT TSV" --max_insert_block_size=100000

## 4. 检验测试数据量
root@clickhouse-server_1:/# clickhouse-client
ClickHouse client version 20.3.4.10 (official build).
Connecting to localhost:9000 as user default.
Connected to ClickHouse server version 20.3.4 revision 54433.

clickhouse-server_1 :) use datasets

USE datasets

Ok.

0 rows in set. Elapsed: 0.002 sec.

clickhouse-server_1 :) select count() from hits_v1;

SELECT count()
FROM hits_v1

┌─count()─┐
│ 8873898 │
└─────────┘

1 rows in set. Elapsed: 0.008 sec.

clickhouse-server_1 :) select count() from visits_v1;

SELECT count()
FROM visits_v1

┌─count()─┐
│ 1676861 │
└─────────┘

1 rows in set. Elapsed: 0.002 sec.

2、查询语句测试

​ 由于clickhouse本身并没有提供查看执行计划的命令,所以只能通过查看日志变相看到SQL语句的执行过程,下面将基于hits_v1表以及官方提供的SQL测试语句结合记录的日志来进行查询语句的测试,并且之后针对clickhouse会基于这两张表持续测试。

1. hits_v1表结构简单介绍

1
2
## 这里为了后面的测试,简单介绍hits_v1的表结构,具体关于clickhouse的SQL描述在后面的文章中写出
CREATE TABLE datasets.hits_v1 (`WatchID` UInt64, ...`RequestNum` UInt32, `RequestTry` UInt8) ENGINE = MergeTree() PARTITION BY toYYYYMM(EventDate) ORDER BY (CounterID, EventDate, intHash32(UserID)) SAMPLE BY intHash32(UserID) SETTINGS index_granularity = 8192

clickhouse中create table的基础语法为:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
CREATE TABLE [IF NOT EXISTS] [db.]table_name ON CLUSTER cluster
(
name1 [type1] [DEFAULT|MATERIALIZED|ALIAS expr1],
name2 [type2] [DEFAULT|MATERIALIZED|ALIAS expr2],
...
INDEX index_name1 expr1 TYPE type1(...) GRANULARITY value1,
INDEX index_name2 expr2 TYPE type2(...) GRANULARITY value2
) ENGINE = engine_name()
[PARTITION BY expr]
[ORDER BY expr]
[PRIMARY KEY expr]
[SAMPLE BY expr]
[SETTINGS name=value, ...];

# 选项描述
- db:指定数据库名称,若未指定,默认使用当前数据库
- cluster: clickhouse中包含集群的概念,可对每个集群进行数据分片,创建分布式表,ON CLUSTER表示将在每个分片上都创建这个本地表,默认为default
- type: 该列类型,例如UInt32,clickhouse大小写敏感,类型必须遵守严格的大小写格式
- MATERIALIZED: 表示该列不能被修改,是通过其他列计算出来的数据并保存在表中,查询时并不需要即时计算,所以insert时不需要插入该列数据,select *查询时该列数据也不会显示
- ALIAS: 类似于MATERIALIZED参数,但该列数据并不会保存在表中,每次需要时才会进行计算,同样的insert不能修改该列数据,select *时该列数据也不会显示

# 由于hits_v1测试表使用的是MergeTree引擎,这里剩余参数针对MergeTree进行简单描述,其余引擎未必全部支持,关于各个引擎的描述及适用场景会在之后的文档中分开进行详细描述
- PARTITION BY: 指定分区键,该测试表使用日期进行分区(EventDate)
- ORDER BY: 指定排序键
- PRIMARY KEY: 指定主键,默认与ORDER BY相同
- SAMPLE BY: 抽样表达式
- SETTING: 其他一些参数

2. 分组排序查询

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
# SQL:
clickhouse-server_1 :) select CounterID, count() AS c from hits_v1 group by CounterID order by c desc limit 10

SELECT
CounterID,
count() AS c
FROM hits_v1
GROUP BY CounterID
ORDER BY c DESC
LIMIT 10

┌─CounterID─┬──────c─┐
│ 1704509 │ 523264 │
│ 732797 │ 475698 │
│ 598875 │ 337212 │
│ 792887 │ 252197 │
│ 3807842 │ 196036 │
│ 25703952 │ 147211 │
│ 716829 │ 90109 │
│ 59183 │ 85379 │
│ 33010362 │ 77807 │
│ 800784 │ 77492 │
└───────────┴────────┘

10 rows in set. Elapsed: 0.026 sec. Processed 8.87 million rows, 35.50 MB (339.77 million rows/s., 1.36 GB/s.)

# log
## 1. 查询语句
2020.05.03 22:31:49.042192 [ 81 ] {d06708d7-2f0c-489f-96fe-e42012f641f0} <Debug> executeQuery: (from 127.0.0.1:59736) SELECT CounterID, count() AS c FROM hits_v1 GROUP BY CounterID ORDER BY c DESC LIMIT 10
## 2. 用户权限验证
2020.05.03 22:31:49.042790 [ 81 ] {d06708d7-2f0c-489f-96fe-e42012f641f0} <Trace> AccessRightsContext (default): Access granted: SELECT(CounterID) ON datasets.hits_v1
## 3. 该SQL查询未使用主键索引
2020.05.03 22:31:49.042971 [ 81 ] {d06708d7-2f0c-489f-96fe-e42012f641f0} <Debug> datasets.hits_v1 (SelectExecutor): Key condition: unknown
## 4. 该SQL查询未使用分区索引
2020.05.03 22:31:49.042986 [ 81 ] {d06708d7-2f0c-489f-96fe-e42012f641f0} <Debug> datasets.hits_v1 (SelectExecutor): MinMax index condition: unknown
## 5. 该查询扫描了1个分区目录,共1092个MarkRange(分区目录和MarkRange在MergeTree中的意义还在理解中,后续在MergeTree引擎中详细描述)
2020.05.03 22:31:49.043021 [ 81 ] {d06708d7-2f0c-489f-96fe-e42012f641f0} <Debug> datasets.hits_v1 (SelectExecutor): Selected 1 parts by date, 1 parts by key, 1092 marks to read from 1 ranges
## 6. 全表扫描,共8873898条数据,共使用20个stream来进行查询处理数据
2020.05.03 22:31:49.043381 [ 81 ] {d06708d7-2f0c-489f-96fe-e42012f641f0} <Trace> datasets.hits_v1 (SelectExecutor): Reading approx. 8873898 rows with 20 streams
2020.05.03 22:31:49.044209 [ 81 ] {d06708d7-2f0c-489f-96fe-e42012f641f0} <Trace> InterpreterSelectQuery: FetchColumns -> Complete
2020.05.03 22:31:49.045627 [ 100 ] {d06708d7-2f0c-489f-96fe-e42012f641f0} <Trace> AggregatingTransform: Aggregating
...
...
## 7. 每个stream查询的行数,使用时间等信息
2020.05.03 22:31:49.050780 [ 100 ] {d06708d7-2f0c-489f-96fe-e42012f641f0} <Trace> AggregatingTransform: Aggregated. 572078 to 5872 rows (from 2.182 MiB) in 0.006 sec. (88841051.020 rows/sec., 338.902 MiB/sec.)
2020.05.03 22:31:49.050814 [ 193 ] {d06708d7-2f0c-489f-96fe-e42012f641f0} <Trace> AggregatingTransform: Aggregated. 516096 to 7617 rows (from 1.969 MiB) in 0.006 sec. (80547357.953 rows/sec., 307.264 MiB/sec.)
2020.05.03 22:31:49.051022 [ 191 ] {d06708d7-2f0c-489f-96fe-e42012f641f0} <Trace> AggregatingTransform: Aggregated. 450560 to 8195 rows (from 1.719 MiB) in 0.007 sec. (68053391.462 rows/sec., 259.603 MiB/sec.)
## 8. 某一个stream进行数据merge操作
2020.05.03 22:31:49.051046 [ 191 ] {d06708d7-2f0c-489f-96fe-e42012f641f0} <Trace> Aggregator: Merging aggregated data
## 9. 共读取了8873898条数据,共33.85M
2020.05.03 22:31:49.066350 [ 81 ] {d06708d7-2f0c-489f-96fe-e42012f641f0} <Information> executeQuery: Read 8873898 rows, 33.85 MiB in 0.024 sec., 368641453 rows/sec., 1.37 GiB/sec.
2020.05.03 22:31:49.066415 [ 81 ] {d06708d7-2f0c-489f-96fe-e42012f641f0} <Debug> MemoryTracker: Peak memory usage (for query): 0.00 B.
## 10. 查询共消耗了0.00B内存
2020.05.03 22:31:49.066786 [ 81 ] {} <Debug> MemoryTracker: Peak memory usage (total): 0.00 B.
## 11. 查询共使用了0.025s
2020.05.03 22:31:49.066814 [ 81 ] {} <Information> TCPHandler: Processed in 0.025 sec.

3. 使用主键索引

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
# SQL:
clickhouse-server_1 :) SELECT WatchID FROM hits_v1 WHERE CounterID = 67141

SELECT WatchID
FROM hits_v1
WHERE CounterID = 67141

┌─────────────WatchID─┐
│ 9092821867385297764 │
│ 6698385316098085730 │
│ 8281386980251669809 │
│ 7804373457861079090 │
│ 5352419935083292124 │
│ 7522442961486322437 │
│ 4926733399374529578 │
│ 8651569660825010330 │
│ 7777215115402859170 │
│ 5488491440763342147 │
│ 7016898938173798841 │
│ 7512073271455311672 │
│ 7675183452718991621 │
│ 7698094942612287474 │
│ 7229580476946423672 │
│ 8265472689024610766 │
│ 7397061429050334296 │
│ 5642502882079177996 │
│ 5521967617262710331 │
│ 6045376808846148744 │
│ 5223813301270698276 │
│ 5891294304736742075 │
│ 7473702977877450342 │
│ 7131227524298036078 │
│ 6397036526472438783 │
│ 5452801867475832050 │
│ 8203620973862900075 │
│ 8228211160680219393 │
│ 5669672267661574263 │
│ 6447542723619820343 │
│ 5609776647750491151 │
│ 5937976217944527938 │
│ 8559139126342788142 │
│ 6731577587255153490 │
│ 7541590813574755789 │
│ 6736741087826610411 │
│ 5750208933466385975 │
│ 6501641543222310031 │
│ 6817897199087131799 │
│ 8775895600472212626 │
│ 7276707177012917444 │
│ 7841417239216625313 │
│ 6708893161493789316 │
│ 5161987475887808662 │
│ 5167052428932424884 │
│ 8512404755681004329 │
│ 5407707620324494582 │
│ 7664508369041326595 │
│ 6437220034025745400 │
│ 5074053444698312956 │
│ 5698931552063656743 │
│ 8826145146896127905 │
└─────────────────────┘

52 rows in set. Elapsed: 0.003 sec. Processed 8.19 thousand rows, 98.30 KB (2.77 million rows/s., 33.19 MB/s.)

# log
2020.05.03 22:58:53.011895 [ 82 ] {55122f0e-83a7-454f-abed-870934d0f0a4} <Debug> executeQuery: (from 127.0.0.1:50236) SELECT WatchID FROM hits_v1 WHERE CounterID = 67141
2020.05.03 22:58:53.012824 [ 82 ] {55122f0e-83a7-454f-abed-870934d0f0a4} <Trace> AccessRightsContext (default): Access granted: SELECT(WatchID, CounterID) ON datasets.hits_v1
## 可以看到该条查询使用了主键索引
2020.05.03 22:58:53.012985 [ 82 ] {55122f0e-83a7-454f-abed-870934d0f0a4} <Debug> datasets.hits_v1 (SelectExecutor): Key condition: (column 0 in [67141, 67141])
2020.05.03 22:58:53.013000 [ 82 ] {55122f0e-83a7-454f-abed-870934d0f0a4} <Debug> datasets.hits_v1 (SelectExecutor): MinMax index condition: unknown
2020.05.03 22:58:53.013032 [ 82 ] {55122f0e-83a7-454f-abed-870934d0f0a4} <Debug> datasets.hits_v1 (SelectExecutor): Selected 1 parts by date, 1 parts by key, 1 marks to read from 1 ranges
2020.05.03 22:58:53.013210 [ 82 ] {55122f0e-83a7-454f-abed-870934d0f0a4} <Trace> datasets.hits_v1 (SelectExecutor): Reading approx. 8192 rows with 1 streams
2020.05.03 22:58:53.013291 [ 82 ] {55122f0e-83a7-454f-abed-870934d0f0a4} <Trace> InterpreterSelectQuery: FetchColumns -> Complete
## 一共查询了8192行,共96KB
2020.05.03 22:58:53.013902 [ 82 ] {55122f0e-83a7-454f-abed-870934d0f0a4} <Information> executeQuery: Read 8192 rows, 96.00 KiB in 0.002 sec., 4226200 rows/sec., 48.37 MiB/sec.
2020.05.03 22:58:53.013964 [ 82 ] {55122f0e-83a7-454f-abed-870934d0f0a4} <Debug> MemoryTracker: Peak memory usage (for query): 0.00 B.
2020.05.03 22:58:53.014059 [ 82 ] {} <Debug> MemoryTracker: Peak memory usage (total): 0.00 B.
## 查询一共花费了0.002s
2020.05.03 22:58:53.014082 [ 82 ] {} <Information> TCPHandler: Processed in 0.002 sec.

3、写入语句测试

​ 这里建立一张临时表,并插入一部分数据,结合log查看列数据库的写入逻辑,clickhouse官方目前未提供SQL查询及写入逻辑流程,并且基于不同的引擎,其写入逻辑有所不同,之后会在不同引擎的测试中对写入逻辑及日志进行详细描述。

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
# 建立临时表:
## SQL:
clickhouse-server_1 :) CREATE TABLE mixed_granularity_table (`WatchID` UInt64, `JavaEnable` UInt8, `Title` String, `GoodEvent` Int16, `EventTime` DateTime, `EventDate` Date, `CounterID` UInt32, `ClientIP` UInt32, `ClientIP6` FixedString(16), `RegionID` UInt32, `UserID` UInt64, `CounterClass` Int8, `OS` UInt8, `UserAgent` UInt8, `URL` String, `Referer` String, `URLDomain` String, `RefererDomain` String, `Refresh` UInt8, `IsRobot` UInt8, `RefererCategories` Array(UInt16), `URLCategories` Array(UInt16), `URLRegions` Array(UInt32), `RefererRegions` Array(UInt32), `ResolutionWidth` UInt16, `ResolutionHeight` UInt16, `ResolutionDepth` UInt8, `FlashMajor` UInt8, `FlashMinor` UInt8, `FlashMinor2` String, `NetMajor` UInt8, `NetMinor` UInt8, `UserAgentMajor` UInt16, `UserAgentMinor` FixedString(2), `CookieEnable` UInt8, `JavascriptEnable` UInt8, `IsMobile` UInt8, `MobilePhone` UInt8, `MobilePhoneModel` String, `Params` String, `IPNetworkID` UInt32, `TraficSourceID` Int8, `SearchEngineID` UInt16, `SearchPhrase` String, `AdvEngineID` UInt8, `IsArtifical` UInt8, `WindowClientWidth` UInt16, `WindowClientHeight` UInt16, `ClientTimeZone` Int16, `ClientEventTime` DateTime, `SilverlightVersion1` UInt8, `SilverlightVersion2` UInt8, `SilverlightVersion3` UInt32, `SilverlightVersion4` UInt16, `PageCharset` String, `CodeVersion` UInt32, `IsLink` UInt8, `IsDownload` UInt8, `IsNotBounce` UInt8, `FUniqID` UInt64, `HID` UInt32, `IsOldCounter` UInt8, `IsEvent` UInt8, `IsParameter` UInt8, `DontCountHits` UInt8, `WithHash` UInt8, `HitColor` FixedString(1), `UTCEventTime` DateTime, `Age` UInt8, `Sex` UInt8, `Income` UInt8, `Interests` UInt16, `Robotness` UInt8, `GeneralInterests` Array(UInt16), `RemoteIP` UInt32, `RemoteIP6` FixedString(16), `WindowName` Int32, `OpenerName` Int32, `HistoryLength` Int16, `BrowserLanguage` FixedString(2), `BrowserCountry` FixedString(2), `SocialNetwork` String, `SocialAction` String, `HTTPError` UInt16, `SendTiming` Int32, `DNSTiming` Int32, `ConnectTiming` Int32, `ResponseStartTiming` Int32, `ResponseEndTiming` Int32, `FetchTiming` Int32, `RedirectTiming` Int32, `DOMInteractiveTiming` Int32, `DOMContentLoadedTiming` Int32, `DOMCompleteTiming` Int32, `LoadEventStartTiming` Int32, `LoadEventEndTiming` Int32, `NSToDOMContentLoadedTiming` Int32, `FirstPaintTiming` Int32, `RedirectCount` Int8, `SocialSourceNetworkID` UInt8, `SocialSourcePage` String, `ParamPrice` Int64, `ParamOrderID` String, `ParamCurrency` FixedString(3), `ParamCurrencyID` UInt16, `GoalsReached` Array(UInt32), `OpenstatServiceName` String, `OpenstatCampaignID` String, `OpenstatAdID` String, `OpenstatSourceID` String, `UTMSource` String, `UTMMedium` String, `UTMCampaign` String, `UTMContent` String, `UTMTerm` String, `FromTag` String, `HasGCLID` UInt8, `RefererHash` UInt64, `URLHash` UInt64, `CLID` UInt32, `YCLID` UInt64, `ShareService` String, `ShareURL` String, `ShareTitle` String, `ParsedParams.Key1` Array(String), `ParsedParams.Key2` Array(String), `ParsedParams.Key3` Array(String), `ParsedParams.Key4` Array(String), `ParsedParams.Key5` Array(String), `ParsedParams.ValueDouble` Array(Float64), `IslandID` FixedString(16), `RequestNum` UInt32, `RequestTry` UInt8) ENGINE = MergeTree() PARTITION BY toYYYYMM(EventDate) ORDER BY (CounterID, EventDate, intHash32(UserID)) SAMPLE BY intHash32(UserID) SETTINGS index_granularity=8192, enable_mixed_granularity_parts=1; -- same with hits, but enabled mixed granularity

...

Ok.

0 rows in set. Elapsed: 0.008 sec.
## log
### 1. 检查建表语句
2020.05.03 23:06:08.921921 [ 82 ] {0db543f9-58b2-4765-af63-a5f881246d41} <Debug> executeQuery: (from 127.0.0.1:50236) CREATE TABLE mixed_granularity_table (`WatchID` UInt64, `JavaEnable` UInt8, `Title` String, `GoodEvent` Int16, `EventTime` DateTime, `EventDate` Date, `CounterID` UInt32, `ClientIP` UInt32, `ClientIP6` FixedString(16), `RegionID` UInt32, `UserID` UInt64, `CounterClass` Int8, `OS` UInt8, `UserAgent` UInt8, `URL` String, `Referer` String, `URLDomain` String, `RefererDomain` String, `Refresh` UInt8, `IsRobot` UInt8, `RefererCategories` Array(UInt16), `URLCategories` Array(UInt16), `URLRegions` Array(UInt32), `RefererRegions` Array(UInt32), `ResolutionWidth` UInt16, `ResolutionHeight` UInt16, `ResolutionDepth` UInt8, `FlashMajor` UInt8, `FlashMinor` UInt8, `FlashMinor2` String, `NetMajor` UInt8, `NetMinor` UInt8, `UserAgentMajor` UInt16, `UserAgentMinor` FixedString(2), `CookieEnable` UInt8, `JavascriptEnable` UInt8, `IsMobile` UInt8, `MobilePhone` UInt8, `MobilePhoneModel` String, `Params` String, `IPNetworkID` UInt32, `TraficSourceID` Int8, `SearchEngineID` UInt16, `SearchPhrase` String, `AdvEngineID` UInt8, `IsArtifical` UInt8, `WindowClientWidth` UInt16, `WindowClientHeight` UInt16, `ClientTimeZone` Int16, `ClientEventTime` DateTime, `SilverlightVersion1` UInt8, `SilverlightVersion2` UInt8, `SilverlightVersion3` UInt32, `SilverlightVersion4` UInt16, `PageCharset` String, `CodeVersion` UInt32, `IsLink` UInt8, `IsDownload` UInt8, `IsNotBounce` UInt8, `FUniqID` UInt64, `HID` UInt32, `IsOldCounter` UInt8, `IsEvent` UInt8, `IsParameter` UInt8, `DontCountHits` UInt8, `WithHash` UInt8, `HitColor` FixedString(1), `UTCEventTime` DateTime, `Age` UInt8, `Sex` UInt8, `Income` UInt8, `Interests` UInt16, `Robotness` UInt8, `GeneralInterests` Array(UInt16), `RemoteIP` UInt32, `RemoteIP6` FixedString(16), `WindowName` Int32, `OpenerName` Int32, `HistoryLength` Int16, `BrowserLanguage` FixedString(2), `BrowserCountry` FixedString(2), `SocialNetwork` String, `SocialAction` String, `HTTPError` UInt16, `SendTiming` Int32, `DNSTiming` Int32, `ConnectTiming` Int32, `ResponseStartTiming` Int32, `ResponseEndTiming` Int32, `FetchTiming` Int32, `RedirectTiming` Int32, `DOMInteractiveTiming` Int32, `DOMContentLoadedTiming` Int32, `DOMCompleteTiming` Int32, `LoadEventStartTiming` Int32, `LoadEventEndTiming` Int32, `NSToDOMContentLoadedTiming` Int32, `FirstPaintTiming` Int32, `RedirectCount` Int8, `SocialSourceNetworkID` UInt8, `SocialSourcePage` String, `ParamPrice` Int64, `ParamOrderID` String, `ParamCurrency` FixedString(3), `ParamCurrencyID` UInt16, `GoalsReached` Array(UInt32), `OpenstatServiceName` String, `OpenstatCampaignID` String, `OpenstatAdID` String, `OpenstatSourceID` String, `UTMSource` String, `UTMMedium` String, `UTMCampaign` String, `UTMContent` String, `UTMTerm` String, `FromTag` String, `HasGCLID` UInt8, `RefererHash` UInt64, `URLHash` UInt64, `CLID` UInt32, `YCLID` UInt64, `ShareService` String, `ShareURL` String, `ShareTitle` String, `ParsedParams.Key1` Array(String), `ParsedParams.Key2` Array(String), `ParsedParams.Key3` Array(String), `ParsedParams.Key4` Array(String), `ParsedParams.Key5` Array(String), `ParsedParams.ValueDouble` Array(Float64), `IslandID` FixedString(16), `RequestNum` UInt32, `RequestTry` UInt8) ENGINE = MergeTree() PARTITION BY toYYYYMM(EventDate) ORDER BY (CounterID, EventDate, intHash32(UserID)) SAMPLE BY intHash32(UserID) SETTINGS index_granularity = 8192, enable_mixed_granularity_parts = 1
### 2. 权限检查
2020.05.03 23:06:08.922323 [ 82 ] {0db543f9-58b2-4765-af63-a5f881246d41} <Trace> AccessRightsContext (default): Access granted: CREATE TABLE ON datasets.mixed_granularity_table
2020.05.03 23:06:08.925036 [ 82 ] {0db543f9-58b2-4765-af63-a5f881246d41} <Debug> datasets.mixed_granularity_table: Loading data parts
2020.05.03 23:06:08.925135 [ 82 ] {0db543f9-58b2-4765-af63-a5f881246d41} <Debug> datasets.mixed_granularity_table: Loaded data parts (0 items)
2020.05.03 23:06:08.925992 [ 82 ] {0db543f9-58b2-4765-af63-a5f881246d41} <Debug> MemoryTracker: Peak memory usage (for query): 0.00 B.
2020.05.03 23:06:08.926057 [ 82 ] {} <Debug> MemoryTracker: Peak memory usage (total): 0.00 B.
2020.05.03 23:06:08.926089 [ 82 ] {} <Information> TCPHandler: Processed in 0.006 sec.

# 插入数据:
## SQL:
clickhouse-server_1 :) INSERT INTO mixed_granularity_table SELECT * FROM hits_v1 LIMIT 10;

INSERT INTO mixed_granularity_table SELECT *
FROM hits_v1
LIMIT 10

↖ Progress: 10.00 rows, 10.85 KB (23.92 rows/s., 25.95 KB/s.) 0%Ok.

0 rows in set. Elapsed: 0.418 sec.
## log:
2020.05.03 23:09:53.387837 [ 82 ] {321f3cb7-467a-4744-8371-29d536f78908} <Debug> executeQuery: (from 127.0.0.1:50236) INSERT INTO mixed_granularity_table SELECT * FROM hits_v1 LIMIT 10
### 需要进行两次权限检查,对新表mixed_granularity_table的insert权限以及对hits_v1的select权限
2020.05.03 23:09:53.388113 [ 82 ] {321f3cb7-467a-4744-8371-29d536f78908} <Trace> AccessRightsContext (default): Access granted: INSERT(WatchID, JavaEnable, ..., RequestTry) ON datasets.mixed_granularity_table
2020.05.03 23:09:53.389910 [ 82 ] {321f3cb7-467a-4744-8371-29d536f78908} <Trace> AccessRightsContext (default): Access granted: SELECT(WatchID, JavaEnable, ..., RequestTry) ON datasets.hits_v1
2020.05.03 23:09:53.390381 [ 82 ] {321f3cb7-467a-4744-8371-29d536f78908} <Debug> datasets.hits_v1 (SelectExecutor): Key condition: unknown
2020.05.03 23:09:53.390396 [ 82 ] {321f3cb7-467a-4744-8371-29d536f78908} <Debug> datasets.hits_v1 (SelectExecutor): MinMax index condition: unknown
2020.05.03 23:09:53.390421 [ 82 ] {321f3cb7-467a-4744-8371-29d536f78908} <Debug> datasets.hits_v1 (SelectExecutor): Selected 1 parts by date, 1 parts by key, 1092 marks to read from 1 ranges
2020.05.03 23:09:53.390615 [ 82 ] {321f3cb7-467a-4744-8371-29d536f78908} <Trace> MergeTreeSelectProcessor: Reading 1 ranges from part 201403_1_32_2, approx. 8873898 rows starting from 0
2020.05.03 23:09:53.390711 [ 82 ] {321f3cb7-467a-4744-8371-29d536f78908} <Trace> InterpreterSelectQuery: FetchColumns -> Complete
2020.05.03 23:09:53.391523 [ 82 ] {321f3cb7-467a-4744-8371-29d536f78908} <Debug> executeQuery: Query pipeline:
NullAndDoCopy
Converting
Limit
Expression
Expression
TreeExecutor

2020.05.03 23:09:53.791641 [ 102 ] {321f3cb7-467a-4744-8371-29d536f78908} <Debug> DiskLocal: Reserving 1.00 MiB on disk `default`, having unreserved 3.44 TiB.
2020.05.03 23:09:53.804394 [ 102 ] {321f3cb7-467a-4744-8371-29d536f78908} <Trace> datasets.mixed_granularity_table: Renaming temporary part tmp_insert_201403_1_1_0 to 201403_1_1_0.
2020.05.03 23:09:53.804871 [ 82 ] {321f3cb7-467a-4744-8371-29d536f78908} <Information> executeQuery: Read 10 rows, 10.59 KiB in 0.417 sec., 23 rows/sec., 25.40 KiB/sec.
2020.05.03 23:09:53.804923 [ 82 ] {321f3cb7-467a-4744-8371-29d536f78908} <Debug> MemoryTracker: Peak memory usage (for query): 263.95 MiB.
2020.05.03 23:09:53.806135 [ 82 ] {} <Debug> MemoryTracker: Peak memory usage (total): 263.95 MiB.
2020.05.03 23:09:53.806169 [ 82 ] {} <Information> TCPHandler: Processed in 0.419 sec.