=Start=
缘由:
整理记录一下最近使用ClickHouse的经验,方便以后有需要的时候参考。
正文:
参考解答:
0. 借助 clickhouse-cli 这个工具进行命令行操作
$ pip3 install clickhouse-cli
$ vim ~/.clickhouse-cli.rc
$ clickhouse-cli
command not found: clickhouse-cli
# 我的环境是 macOS Catalina 10.15.6 ,用pip3安装成功,但是在 $PATH 里面找不到这个程序
# 后来发现是在 $HOME 目录下的 Library 的 Python 目录中
$ ls -lt ./Library/Python/3.8/bin
$ ~/Library/Python/3.8/bin/clickhouse-cli
1. 一些环境熟悉操作/命令
$ ~/Library/Python/3.8/bin/clickhouse-cli
clickhouse-cli version: 0.3.6
Connecting to x.x.x.x:80
Connected to ClickHouse server v19.13.1.
:) help
clickhouse-cli's custom commands:
---------------------------------
USE Change the current database.
SET Set an option for the current CLI session.
QUIT Exit clickhouse-cli.
HELP Show this help message.
PostgreSQL-like custom commands:
--------------------------------
\l Show databases.
\c Change the current database.
\d, \dt Show tables in the current database.
\d+ Show table's schema.
\ps Show current queries.
\kill Kill query by its ID.
Query suffixes:
---------------
\g, \G Use the Vertical format.
\p Enable the pager.
:)
# 查看有哪些数据库
show databases
\l
# 切换数据库
use db_name
# 查看当前数据库有哪些表
show tables
\d
\dt
# 查看特定数据表的表结构
describe table table_name
\d+ table_name
2. 时间日期函数
WITH
toDate('2019-01-01') AS date,
toDateTime('2019-01-01 00:00:00') AS date_time
SELECT
date,
subtractYears(date, 1) AS subtract_years_with_date,
date_time,
subtractYears(date_time, 1) AS subtract_years_with_date_time
;
3. 字符串处理函数
SELECT splitByChar(',', '1,2,3,abcde');
SELECT splitByString(', ', '1, 2 3, 4,5, abcde');
-- arrayStringConcat() 数组元素拼接的功能和Hive中的 concat_ws() 类似。
WITH
splitByChar(',', '1,2,3,abcde') as arr1
select
arr1,
arrayStringConcat(arr1, '#') as arr1str
;
┌─arr1──────────────────┬─arr1str─────┐
│ ['1','2','3','abcde'] │ 1#2#3#abcde │
└───────────────────────┴─────────────┘
-- 字符串查找
position() / locate()
-- 字符串替换
SELECT replaceRegexpAll('Hello, World!', '.', '\\0\\0') AS res
┌─res────────────────────────┐
│ HHeelllloo,, WWoorrlldd!! │
└────────────────────────────┘
-- 字符串提取
substring() / extractallgroups()
4. 常用的一些聚合功能
-- Hive中的 collect_set 功能,将某一列中的value取值转换成一个 list/set 在 ClickHouse 中可以通过 groupUniqArray() 来实现。
select user_id,
count(1) as cnt,
count(distinct user_agent) as ua_cnt,
groupUniqArray(toDate(timestamp))
from table_name
where status = 200
and http_host = 'domain_name'
and uri = '/path/'
and args like '%keyword%'
and user_id in ('user1', 'user2', 'user3', 'user4', 'user5')
and timestamp between '2020-06-01 00:00:00' and '2020-07-31 23:59:59'
group by user_id
limit 20000
;
┌─user_id─────┬─cnt─┬─ua_cnt─┬─groupUniqArray(toDate(timestamp))─────────────────────┐
│ user1 │ 32 │ 1 │ ['2020-07-03','2020-07-09','2020-07-02','2020-07-08'] │
└─────────────┴─────┴────────┴───────────────────────────────────────────────────────┘
┌─user_id─┬─cnt─┬─ua_cnt─┬─groupUniqArray(toDate(timestamp))────────┐
│ user4 │ 18 │ 1 │ ['2020-07-06','2020-07-03','2020-07-09'] │
└─────────┴─────┴────────┴──────────────────────────────────────────┘
┌─user_id─┬─cnt─┬─ua_cnt─┬─groupUniqArray(toDate(timestamp))─┐
│ user2 │ 4 │ 1 │ ['2020-07-13'] │
└─────────┴─────┴────────┴───────────────────────────────────┘
-- Hive中的 count(distinct col_name) 在 ClickHouse 中可以使用 uniq(col_name) 或者 uniqExact(col_name) 来实现。
5. 其他功能
- URL相关函数
- JSON处理相关函数
- IP处理相关函数
参考链接:
https://github.com/hatarist/clickhouse-cli
https://clickhouse.tech/docs/en/sql-reference/functions/date-time-functions/
Is there any function like hive’s concat_ws or explode or collect_list/collect_array #6664
https://github.com/ClickHouse/ClickHouse/issues/6664
https://clickhouse.tech/docs/en/sql-reference/functions/splitting-merging-functions/
https://clickhouse.tech/docs/en/sql-reference/functions/string-search-functions/#position
https://clickhouse.tech/docs/en/sql-reference/functions/string-functions/#substring
https://clickhouse.tech/docs/en/sql-reference/functions/url-functions/
https://clickhouse.tech/docs/en/sql-reference/functions/json-functions/
https://clickhouse.tech/docs/en/sql-reference/functions/ip-address-functions/#ipv4stringtonums
https://clickhouse.tech/docs/zh/sql-reference/statements/select/array-join/
=END=
《 “ClickHouse中的一些功能点学习” 》 有 9 条评论
— base64 编解码
`
with
‘/search/q-5Y+I5piv5LiA5Liq5LiL6Zuo5aSp’ as args
select
args,
replaceOne(args, ‘/search/q-‘, ”) as base64ed_query,
tryBase64Decode(replaceOne(args, ‘/search/q-‘, ”)) as query_str
;
/*
— 输出
┌─args───────────────────────────────────┬─base64ed_query───────────────┬─query_str──────┐
│ /search/q-5Y+I5piv5LiA5Liq5LiL6Zuo5aSp │ 5Y+I5piv5LiA5Liq5LiL6Zuo5aSp │ 又是一个下雨天 │
└────────────────────────────────────────┴──────────────────────────────┴────────────────┘
*/
select
base64Encode(‘又是一个下雨天’) — 5Y+I5piv5LiA5Liq5LiL6Zuo5aSp
;
`
ClickHouse 中如何获取array中的倒数第1个元素
`
— 获取 arr 中的最后1个元素
arr[-1]
arr[length(arr)]
`
https://clickhouse.tech/docs/en/sql-reference/functions/splitting-merging-functions/
`
ClickHouse 里面有字符串切分函数:
splitByChar($separator, $s)
splitByString($separator, $s)
但是这里的 $separator 就是一个 字符/字符串 ,不像 Hive 里的 split 函数中可以是一个正则表达式(从而支持多分隔符的切分),但是也有变通的办法来支持,先 replace 再 splitByChar 实现。
`
https://clickhouse.tech/docs/en/sql-reference/functions/array-functions/
`
–返回arr_name这个数组的长度
length(arr_name)
— 获取 arr 中的第 n 个元素(数组下标是从 1 开始计算的)
arrayElement(arr, n)
arr[n]
— 获取 arr 中的最后1个元素
arr[-1]
arr[length(arr)]
`
https://clickhouse.tech/docs/en/sql-reference/functions/string-replace-functions/
`
— 替换 haystack 字符串中的 常量字符串-pattern 为 常量字符串-replacement
replace(haystack, pattern, replacement)
— 支持 正则表达式 的字符串替换(pattern可以用正则表达式来指定)
replaceRegexpAll(haystack, pattern, replacement)
// UserAgent解析
select
ua
, ua_array[2] as os_version // get the 2nd item of array
, ua_array[-1] as browser_ver // get the last item of array
from
(select
timestamp
, http_user_agent as ua
, splitByChar(‘(‘, replace(http_user_agent, ‘)’, ‘(‘)) as ua_array
from
table_name
where
status = 200
and timestamp between ‘2020-11-16 00:00:00’ and ‘2020-11-16 23:59:59’
limit 10
)x1
;
`
ClickHouse中如何删除首尾的特殊字符
https://clickhouse.com/docs/en/sql-reference/functions/string-functions/#trim
`
# 方法一
# trim
Removes all specified characters from the start or end of a string. By default removes all consecutive occurrences of common whitespace (ASCII character 32) from both ends of a string.
语法:
trim([[LEADING|TRAILING|BOTH] trim_character FROM] input_string)
样例:
┌─trim(BOTH ‘ ()’ FROM ‘( Hello, world! )’)─┐
│ Hello, world! │
└───────────────────────────────────────────────┘
# 方法二
# replaceRegexpAll(haystack, pattern, replacement)
# 正则替换
# username 字段长这样 [“user123”] 现在想去掉双引号和空格
,trim(BOTH ‘ [“]’ FROM username) as user1
,replaceRegexpAll(username, ‘\\[|”|\\]’, ”) as user2
`
使用的时候需要注意——ClickHouse的字符串子串提取需要明确【起始位置】(从1开始)和【提取长度】(以字节为单位),还不支持从某个位置开始到结尾的功能
https://clickhouse.com/docs/en/sql-reference/functions/string-functions/#substrings-offset-length-mids-offset-length-substrs-offset-length
`
mid(s, offset, length)
substr(s, offset, length)
substring(s, offset, length)
`
ClickHouse中 对日期时间做操作的一些样例
https://clickhouse.com/docs/en/sql-reference/functions/date-time-functions/
`
# clickhouse中 如何获取当前的日期时间
now()
## 样例
SELECT now();
┌───────────────now()─┐
│ 2020-10-17 07:42:09 │
└─────────────────────┘
# clickhouse中 如何获取当前的日期时间对应的时间戳
toUnixTimestamp(datetime)
toUnixTimestamp(str, [timezone])
## 样例
toUnixTimestamp(now())
SELECT toUnixTimestamp(‘2017-11-05 08:07:47’, ‘Asia/Tokyo’) AS unix_timestamp
# clickhouse中 如何获取 今天/昨天/明天 的日期
,today() –今天
,toDate(now()) –今天
,today()-1 –昨天
,today()+1 –明天
# clickhouse中 如何将时间戳转换成日期时间的字符串
SELECT FROM_UNIXTIME(423543535); #1983-06-04 10:58:55
SELECT FROM_UNIXTIME(1234334543, ‘%Y-%m-%d %R:%S’) AS DateTime; #2009-02-11 14:42:23
## 如果时间戳是毫秒级别的,需要先除1000然后做个类型转换才行
FROM_UNIXTIME(toInt32(ck_timestamp/1000)) as ck_time
`
Capturing server logs of queries at the client
https://clickhouse.com/docs/knowledgebase/send_logs_level
`
log_queries=1
`
query_log
https://clickhouse.com/docs/en/operations/system-tables/query_log
how to disable logging in clickhouse?
https://stackoverflow.com/questions/68453348/how-to-disable-logging-in-clickhouse
splitByRegexp
https://clickhouse.com/docs/en/sql-reference/functions/splitting-merging-functions#splitbyregexp
`
ClickHouse 中如何按照正则表达式对字符串进行切分(功能等效于Hive SQL中的 split 函数,但是clickhouse中数组下标是从 1 开始计算的)
splitByRegexp(regexp, s[, max_substrings]))
> SELECT splitByRegexp(‘\\d+’, ‘a12bc23de345f’);
┌─splitByRegexp(‘\\d+’, ‘a12bc23de345f’)─┐
│ [‘a’,’bc’,’de’,’f’] │
└────────────────────────────────────────┘
> SELECT splitByRegexp(‘\\d+’, ‘a12bc23de345f’)[1]
Query id: ea677305-73e3-432a-aab3-803d9b1b4fa0
┌─arrayElement(splitByRegexp(‘\\d+’, ‘a12bc23de345f’), 1)─┐
1. │ a │
└─────────────────────────────────────────────────────────┘
`
macOS上使用 Homebrew 安装clickhouse并启动测试
https://github.com/Homebrew/homebrew-cask/blob/HEAD/Casks/c/clickhouse.rb
`
# 安装
brew info clickhouse
brew install clickhouse
# 启动
clickhouse server
clickhouse server –daemon #后台进程启动
# 连接
clickhouse client
`