ClickHouse中的一些功能点学习

=Start=

缘由:

整理记录一下最近使用ClickHouse的经验,方便以后有需要的时候参考。

正文:

参考解答:
0. 借助 clickhouse-cli 这个工具进行命令行操作
$ pip3 install clickhouse-cli

$ vim ~/.clickhouse-cli.rc

$ clickhouse-cli
command not found: clickhouse-cli

# 我的环境是 macOS Catalina 10.15.6 ,用pip3安装成功,但是在 $PATH 里面找不到这个程序
# 后来发现是在 $HOME 目录下的 Library 的 Python 目录中
$ ls -lt ./Library/Python/3.8/bin

$ ~/Library/Python/3.8/bin/clickhouse-cli
1. 一些环境熟悉操作/命令
$ ~/Library/Python/3.8/bin/clickhouse-cli
clickhouse-cli version: 0.3.6
Connecting to x.x.x.x:80
Connected to ClickHouse server v19.13.1.

 :) help

clickhouse-cli's custom commands:
---------------------------------
USE     Change the current database.
SET     Set an option for the current CLI session.
QUIT    Exit clickhouse-cli.
HELP    Show this help message.

PostgreSQL-like custom commands:
--------------------------------
\l      Show databases.
\c      Change the current database.
\d, \dt Show tables in the current database.
\d+     Show table's schema.
\ps     Show current queries.
\kill   Kill query by its ID.

Query suffixes:
---------------
\g, \G  Use the Vertical format.
\p      Enable the pager.
 :)

# 查看有哪些数据库
show databases
\l

# 切换数据库
use db_name

# 查看当前数据库有哪些表
show tables
\d
\dt

# 查看特定数据表的表结构
describe table table_name
\d+ table_name
2. 时间日期函数
WITH
    toDate('2019-01-01') AS date,
    toDateTime('2019-01-01 00:00:00') AS date_time
SELECT
    date,
    subtractYears(date, 1) AS subtract_years_with_date,
    date_time,
    subtractYears(date_time, 1) AS subtract_years_with_date_time
;
3. 字符串处理函数
SELECT splitByChar(',', '1,2,3,abcde');
SELECT splitByString(', ', '1, 2 3, 4,5, abcde');

-- arrayStringConcat() 数组元素拼接的功能和Hive中的 concat_ws() 类似。

WITH
	splitByChar(',', '1,2,3,abcde') as arr1
select
	arr1,
	arrayStringConcat(arr1, '#') as arr1str
;

┌─arr1──────────────────┬─arr1str─────┐
│ ['1','2','3','abcde'] │ 1#2#3#abcde │
└───────────────────────┴─────────────┘
-- 字符串查找
position() / locate()

-- 字符串替换
SELECT replaceRegexpAll('Hello, World!', '.', '\\0\\0') AS res

┌─res────────────────────────┐
│ HHeelllloo,,  WWoorrlldd!! │
└────────────────────────────┘

-- 字符串提取
substring() / extractallgroups()
4. 常用的一些聚合功能
-- Hive中的 collect_set 功能,将某一列中的value取值转换成一个 list/set 在 ClickHouse 中可以通过 groupUniqArray() 来实现。
select user_id,
       count(1) as cnt,
       count(distinct user_agent) as ua_cnt,
       groupUniqArray(toDate(timestamp))
  from table_name
 where status = 200
   and http_host = 'domain_name'
   and uri = '/path/'
   and args like '%keyword%'
   and user_id in ('user1', 'user2', 'user3', 'user4', 'user5')
   and timestamp between '2020-06-01 00:00:00' and '2020-07-31 23:59:59'
 group by user_id
 limit 20000
;

┌─user_id─────┬─cnt─┬─ua_cnt─┬─groupUniqArray(toDate(timestamp))─────────────────────┐
│ user1       │  32 │      1 │ ['2020-07-03','2020-07-09','2020-07-02','2020-07-08'] │
└─────────────┴─────┴────────┴───────────────────────────────────────────────────────┘
┌─user_id─┬─cnt─┬─ua_cnt─┬─groupUniqArray(toDate(timestamp))────────┐
│ user4   │  18 │      1 │ ['2020-07-06','2020-07-03','2020-07-09'] │
└─────────┴─────┴────────┴──────────────────────────────────────────┘
┌─user_id─┬─cnt─┬─ua_cnt─┬─groupUniqArray(toDate(timestamp))─┐
│ user2   │   4 │      1 │ ['2020-07-13']                    │
└─────────┴─────┴────────┴───────────────────────────────────┘


-- Hive中的 count(distinct col_name) 在 ClickHouse 中可以使用 uniq(col_name) 或者 uniqExact(col_name) 来实现。
5. 其他功能
  • URL相关函数
  • JSON处理相关函数
  • IP处理相关函数

参考链接:

https://github.com/hatarist/clickhouse-cli

https://clickhouse.tech/docs/en/sql-reference/functions/date-time-functions/

Is there any function like hive’s concat_ws or explode or collect_list/collect_array #6664
https://github.com/ClickHouse/ClickHouse/issues/6664

https://clickhouse.tech/docs/en/sql-reference/functions/splitting-merging-functions/

https://clickhouse.tech/docs/en/sql-reference/functions/string-search-functions/#position

https://clickhouse.tech/docs/en/sql-reference/functions/string-functions/#substring

https://clickhouse.tech/docs/en/sql-reference/functions/splitting-merging-functions/#extractallgroups

https://clickhouse.tech/docs/en/sql-reference/functions/url-functions/

https://clickhouse.tech/docs/en/sql-reference/functions/json-functions/

https://clickhouse.tech/docs/en/sql-reference/functions/ip-address-functions/#ipv4stringtonums

https://clickhouse.tech/docs/zh/sql-reference/statements/select/array-join/

=END=

声明: 除非注明,ixyzero.com文章均为原创,转载请以链接形式标明本文地址,谢谢!
https://ixyzero.com/blog/archives/4979.html

《ClickHouse中的一些功能点学习》上的一个想法

  1. — base64 编解码
    `
    with
    ‘/search/q-5Y+I5piv5LiA5Liq5LiL6Zuo5aSp’ as args
    select
    args,
    replaceOne(args, ‘/search/q-‘, ”) as base64ed_query,
    tryBase64Decode(replaceOne(args, ‘/search/q-‘, ”)) as query_str
    ;

    /*
    — 输出
    ┌─args───────────────────────────────────┬─base64ed_query───────────────┬─query_str──────┐
    │ /search/q-5Y+I5piv5LiA5Liq5LiL6Zuo5aSp │ 5Y+I5piv5LiA5Liq5LiL6Zuo5aSp │ 又是一个下雨天 │
    └────────────────────────────────────────┴──────────────────────────────┴────────────────┘
    */

    select
    base64Encode(‘又是一个下雨天’) — 5Y+I5piv5LiA5Liq5LiL6Zuo5aSp
    ;
    `

发表评论

电子邮件地址不会被公开。 必填项已用*标注