ClickHouse中的一些功能点学习

=Start=

缘由：

整理记录一下最近使用ClickHouse的经验，方便以后有需要的时候参考。

正文：

参考解答：

0. 借助 clickhouse-cli 这个工具进行命令行操作

$ pip3 install clickhouse-cli

$ vim ~/.clickhouse-cli.rc

$ clickhouse-cli
command not found: clickhouse-cli

# 我的环境是 macOS Catalina 10.15.6 ，用pip3安装成功，但是在 $PATH 里面找不到这个程序
# 后来发现是在 $HOME 目录下的 Library 的 Python 目录中
$ ls -lt ./Library/Python/3.8/bin

$ ~/Library/Python/3.8/bin/clickhouse-cli

1. 一些环境熟悉操作/命令

$ ~/Library/Python/3.8/bin/clickhouse-cli
clickhouse-cli version: 0.3.6
Connecting to x.x.x.x:80
Connected to ClickHouse server v19.13.1.

 :) help

clickhouse-cli's custom commands:
---------------------------------
USE     Change the current database.
SET     Set an option for the current CLI session.
QUIT    Exit clickhouse-cli.
HELP    Show this help message.

PostgreSQL-like custom commands:
--------------------------------
\l      Show databases.
\c      Change the current database.
\d, \dt Show tables in the current database.
\d+     Show table's schema.
\ps     Show current queries.
\kill   Kill query by its ID.

Query suffixes:
---------------
\g, \G  Use the Vertical format.
\p      Enable the pager.
 :)

# 查看有哪些数据库
show databases
\l

# 切换数据库
use db_name

# 查看当前数据库有哪些表
show tables
\d
\dt

# 查看特定数据表的表结构
describe table table_name
\d+ table_name

2. 时间日期函数

WITH
    toDate('2019-01-01') AS date,
    toDateTime('2019-01-01 00:00:00') AS date_time
SELECT
    date,
    subtractYears(date, 1) AS subtract_years_with_date,
    date_time,
    subtractYears(date_time, 1) AS subtract_years_with_date_time
;

3. 字符串处理函数

SELECT splitByChar(',', '1,2,3,abcde');
SELECT splitByString(', ', '1, 2 3, 4,5, abcde');

-- arrayStringConcat() 数组元素拼接的功能和Hive中的 concat_ws() 类似。

WITH
	splitByChar(',', '1,2,3,abcde') as arr1
select
	arr1,
	arrayStringConcat(arr1, '#') as arr1str
;

┌─arr1──────────────────┬─arr1str─────┐
│ ['1','2','3','abcde'] │ 1#2#3#abcde │
└───────────────────────┴─────────────┘

-- 字符串查找
position() / locate()

-- 字符串替换
SELECT replaceRegexpAll('Hello, World!', '.', '\\0\\0') AS res

┌─res────────────────────────┐
│ HHeelllloo,,  WWoorrlldd!! │
└────────────────────────────┘

-- 字符串提取
substring() / extractallgroups()

4. 常用的一些聚合功能

-- Hive中的 collect_set 功能，将某一列中的value取值转换成一个 list/set 在 ClickHouse 中可以通过 groupUniqArray() 来实现。
select user_id,
       count(1) as cnt,
       count(distinct user_agent) as ua_cnt,
       groupUniqArray(toDate(timestamp))
  from table_name
 where status = 200
   and http_host = 'domain_name'
   and uri = '/path/'
   and args like '%keyword%'
   and user_id in ('user1', 'user2', 'user3', 'user4', 'user5')
   and timestamp between '2020-06-01 00:00:00' and '2020-07-31 23:59:59'
 group by user_id
 limit 20000
;

┌─user_id─────┬─cnt─┬─ua_cnt─┬─groupUniqArray(toDate(timestamp))─────────────────────┐
│ user1       │  32 │      1 │ ['2020-07-03','2020-07-09','2020-07-02','2020-07-08'] │
└─────────────┴─────┴────────┴───────────────────────────────────────────────────────┘
┌─user_id─┬─cnt─┬─ua_cnt─┬─groupUniqArray(toDate(timestamp))────────┐
│ user4   │  18 │      1 │ ['2020-07-06','2020-07-03','2020-07-09'] │
└─────────┴─────┴────────┴──────────────────────────────────────────┘
┌─user_id─┬─cnt─┬─ua_cnt─┬─groupUniqArray(toDate(timestamp))─┐
│ user2   │   4 │      1 │ ['2020-07-13']                    │
└─────────┴─────┴────────┴───────────────────────────────────┘


-- Hive中的 count(distinct col_name) 在 ClickHouse 中可以使用 uniq(col_name) 或者 uniqExact(col_name) 来实现。

5. 其他功能

URL相关函数
JSON处理相关函数
IP处理相关函数

参考链接：

https://github.com/hatarist/clickhouse-cli

https://clickhouse.tech/docs/en/sql-reference/functions/date-time-functions/

Is there any function like hive’s concat_ws or explode or collect_list/collect_array #6664
https://github.com/ClickHouse/ClickHouse/issues/6664

https://clickhouse.tech/docs/en/sql-reference/functions/splitting-merging-functions/

https://clickhouse.tech/docs/en/sql-reference/functions/string-search-functions/#position

https://clickhouse.tech/docs/en/sql-reference/functions/string-functions/#substring

https://clickhouse.tech/docs/en/sql-reference/functions/splitting-merging-functions/#extractallgroups

https://clickhouse.tech/docs/en/sql-reference/functions/url-functions/

https://clickhouse.tech/docs/en/sql-reference/functions/json-functions/

https://clickhouse.tech/docs/en/sql-reference/functions/ip-address-functions/#ipv4stringtonums

https://clickhouse.tech/docs/zh/sql-reference/statements/select/array-join/

=END=

27 9 月, 2020

Docker

Database, KnowledgeBase, Programing, Tools

ClickHouse, groupUniqArray, Hive, join, SQL

《 “ClickHouse中的一些功能点学习” 》有 9 条评论

abc说道：

2020-09-29 15:58

— base64 编解码
`
with
‘/search/q-5Y+I5piv5LiA5Liq5LiL6Zuo5aSp’ as args
select
args,
replaceOne(args, ‘/search/q-‘, ”) as base64ed_query,
tryBase64Decode(replaceOne(args, ‘/search/q-‘, ”)) as query_str
;

/*
— 输出
┌─args───────────────────────────────────┬─base64ed_query───────────────┬─query_str──────┐
│ /search/q-5Y+I5piv5LiA5Liq5LiL6Zuo5aSp │ 5Y+I5piv5LiA5Liq5LiL6Zuo5aSp │ 又是一个下雨天 │
└────────────────────────────────────────┴──────────────────────────────┴────────────────┘
*/

select
base64Encode(‘又是一个下雨天’) — 5Y+I5piv5LiA5Liq5LiL6Zuo5aSp
;
`

回复
abc说道：

2020-11-16 15:07

ClickHouse 中如何获取array中的倒数第1个元素
`
— 获取 arr 中的最后1个元素
arr[-1]
arr[length(arr)]
`

https://clickhouse.tech/docs/en/sql-reference/functions/splitting-merging-functions/
`
ClickHouse 里面有字符串切分函数：
splitByChar($separator, $s)
splitByString($separator, $s)

但是这里的 $separator 就是一个字符/字符串，不像 Hive 里的 split 函数中可以是一个正则表达式（从而支持多分隔符的切分），但是也有变通的办法来支持，先 replace 再 splitByChar 实现。
`

https://clickhouse.tech/docs/en/sql-reference/functions/array-functions/
`
–返回arr_name这个数组的长度
length(arr_name)

— 获取 arr 中的第 n 个元素（数组下标是从 1 开始计算的）
arrayElement(arr, n)
arr[n]
— 获取 arr 中的最后1个元素
arr[-1]
arr[length(arr)]
`

回复
abc说道：

2020-11-16 15:08

https://clickhouse.tech/docs/en/sql-reference/functions/string-replace-functions/
`
— 替换 haystack 字符串中的常量字符串-pattern 为常量字符串-replacement
replace(haystack, pattern, replacement)

— 支持正则表达式的字符串替换（pattern可以用正则表达式来指定）
replaceRegexpAll(haystack, pattern, replacement)

// UserAgent解析
select
ua
, ua_array[2] as os_version // get the 2nd item of array
, ua_array[-1] as browser_ver // get the last item of array
from
(select
timestamp
, http_user_agent as ua
, splitByChar(‘(‘, replace(http_user_agent, ‘)’, ‘(‘)) as ua_array
from
table_name
where
status = 200
and timestamp between ‘2020-11-16 00:00:00’ and ‘2020-11-16 23:59:59’
limit 10
)x1
;
`

回复
abc说道：

2022-07-12 15:54

ClickHouse中如何删除首尾的特殊字符
https://clickhouse.com/docs/en/sql-reference/functions/string-functions/#trim
`
# 方法一
# trim
Removes all specified characters from the start or end of a string. By default removes all consecutive occurrences of common whitespace (ASCII character 32) from both ends of a string.

语法：
trim([[LEADING|TRAILING|BOTH] trim_character FROM] input_string)

样例：
┌─trim(BOTH ‘ ()’ FROM ‘( Hello, world! )’)─┐
│ Hello, world! │
└───────────────────────────────────────────────┘

# 方法二
# replaceRegexpAll(haystack, pattern, replacement)
# 正则替换
# username 字段长这样 [“user123”] 现在想去掉双引号和空格
,trim(BOTH ‘ [“]’ FROM username) as user1
,replaceRegexpAll(username, ‘\\[|”|\\]’, ”) as user2
`

回复
abc说道：

2022-07-12 15:55

使用的时候需要注意——ClickHouse的字符串子串提取需要明确【起始位置】（从1开始）和【提取长度】（以字节为单位），还不支持从某个位置开始到结尾的功能
https://clickhouse.com/docs/en/sql-reference/functions/string-functions/#substrings-offset-length-mids-offset-length-substrs-offset-length
`
mid(s, offset, length)
substr(s, offset, length)
substring(s, offset, length)
`

回复
abc说道：

2022-07-12 16:14

ClickHouse中对日期时间做操作的一些样例
https://clickhouse.com/docs/en/sql-reference/functions/date-time-functions/
`
# clickhouse中如何获取当前的日期时间
now()

## 样例
SELECT now();
┌───────────────now()─┐
│ 2020-10-17 07:42:09 │
└─────────────────────┘

# clickhouse中如何获取当前的日期时间对应的时间戳

toUnixTimestamp(datetime)
toUnixTimestamp(str, [timezone])

## 样例
toUnixTimestamp(now())
SELECT toUnixTimestamp(‘2017-11-05 08:07:47’, ‘Asia/Tokyo’) AS unix_timestamp

# clickhouse中如何获取今天/昨天/明天的日期
,today() –今天
,toDate(now()) –今天
,today()-1 –昨天
,today()+1 –明天

# clickhouse中如何将时间戳转换成日期时间的字符串

SELECT FROM_UNIXTIME(423543535); #1983-06-04 10:58:55
SELECT FROM_UNIXTIME(1234334543, ‘%Y-%m-%d %R:%S’) AS DateTime; #2009-02-11 14:42:23

## 如果时间戳是毫秒级别的，需要先除1000然后做个类型转换才行
FROM_UNIXTIME(toInt32(ck_timestamp/1000)) as ck_time
`

回复
abc说道：

2024-05-17 20:37

Capturing server logs of queries at the client
https://clickhouse.com/docs/knowledgebase/send_logs_level
`
log_queries=1
`
query_log
https://clickhouse.com/docs/en/operations/system-tables/query_log

how to disable logging in clickhouse?
https://stackoverflow.com/questions/68453348/how-to-disable-logging-in-clickhouse

回复
hi说道：

2024-12-19 15:10

splitByRegexp
https://clickhouse.com/docs/en/sql-reference/functions/splitting-merging-functions#splitbyregexp
`
ClickHouse 中如何按照正则表达式对字符串进行切分（功能等效于Hive SQL中的 split 函数，但是clickhouse中数组下标是从 1 开始计算的）
splitByRegexp(regexp, s[, max_substrings]))

> SELECT splitByRegexp(‘\\d+’, ‘a12bc23de345f’);

┌─splitByRegexp(‘\\d+’, ‘a12bc23de345f’)─┐
│ [‘a’,’bc’,’de’,’f’] │
└────────────────────────────────────────┘

> SELECT splitByRegexp(‘\\d+’, ‘a12bc23de345f’)[1]

Query id: ea677305-73e3-432a-aab3-803d9b1b4fa0

┌─arrayElement(splitByRegexp(‘\\d+’, ‘a12bc23de345f’), 1)─┐
1. │ a │
└─────────────────────────────────────────────────────────┘
`

回复
hi说道：

2024-12-19 15:14

macOS上使用 Homebrew 安装clickhouse并启动测试
https://github.com/Homebrew/homebrew-cask/blob/HEAD/Casks/c/clickhouse.rb
`
# 安装
brew info clickhouse
brew install clickhouse

# 启动
clickhouse server
clickhouse server –daemon #后台进程启动

# 连接
clickhouse client
`

回复

ASPIRE

ClickHouse中的一些功能点学习

缘由：

正文：

参考解答：

0. 借助 clickhouse-cli 这个工具进行命令行操作

1. 一些环境熟悉操作/命令

2. 时间日期函数

3. 字符串处理函数

4. 常用的一些聚合功能

5. 其他功能

参考链接：

《 “ClickHouse中的一些功能点学习” 》有 9 条评论

发表回复取消回复

ClickHouse中的一些功能点学习

缘由：

正文：

参考解答：

0. 借助 clickhouse-cli 这个工具进行命令行操作

1. 一些环境熟悉操作/命令

2. 时间日期函数

3. 字符串处理函数

4. 常用的一些聚合功能

5. 其他功能

参考链接：

《 “ClickHouse中的一些功能点学习” 》 有 9 条评论

发表回复 取消回复

《 “ClickHouse中的一些功能点学习” 》有 9 条评论

发表回复取消回复