Hive SQL 中的分号、下划线等tips整理

=Start=

缘由：

简单整理一下近期在进行 Hive SQL 查询时碰到的一些问题和经验，方便后面有需要的时候参考。

正文：

参考解答：

1. Hive SQL 如何对英文分号(;)做split切分？

> select split(f1,'\073')[0] from tab_name;

-- 使用 '\073' 来代替 ';' 这种写法就OK，否则会报语法错误。

-- 验证代码片段
with t1 as (
SELECT
'1\0733' as v1
,'1 3' as v2
,'123' as v3
)

SELECT
v1,split(v1,'\073') as result1
,v2,split(v2,'\073| ') as result2
,v3,split(v3,'\073| ') as result3
FROM
t1

,split(lower(mac_address),'\073| ') as mac_addr_array --对mac地址字符串使用空格和分号作为分隔符进行切分

,array_contains(mac_addr_array,user_mac) as it_asset -- 判断特定的mac地址是否在mac地址数组中

2. Hive SQL的like如何匹配下划线？

SQL中如何使用like匹配下划线的内容？因为下划线在like中代表任意单个字符，如果需要匹配下划线字符，需要对下划线进行转义。

WHERE mycolumn LIKE '%\_%' ESCAPE '\'
WHERE mycolumn LIKE '%\_%'

WHERE mycolumn LIKE '%#_%' ESCAPE '#'

方法一：
使用 escape 对下划线进行转义（默认是反划线）

方法二：
使用 instr/locate/rlike 等函数进行判断，不使用 like 来判断

-- 验证代码片段

with t1 as (
SELECT
'1_3' as underscore_var
,'123' as no_underscore_var
)

SELECT
underscore_var, if(underscore_var like '%\_%',1,0) as has1
,no_underscore_var, if(no_underscore_var like '%\_%',1,0) as has2
,if(no_underscore_var like '%_%',1,0) as wrong_judge_method
FROM
t1

3. Hive中布尔类型(boolean) 的变量如何做比较？

-- 简单来说就是可以用常规的等号(=)，也可以不添加判等符号
Correct syntax is:
where boolean_col = True
where boolean_col
where NOT boolean_col

To check for NULL use:
where boolean_col is NULL

4. Hive中如何向array/map等复杂类型字段中添加空值？

-- 暂未实际测试

Prepare:
CREATE TABLE IF NOT EXISTS tmp.test_table (
 col1 STRING,
 col2 MAP<STRING, STRING>,
 col3 ARRAY<BIGINT>
)
PARTITIONED BY ( ds STRING )
;

Step1: Add brickhouse jar and create temporary functions to cast array and map.

> ADD JAR s3a://airbnb-datainfra-dependencies-internal-only/teams/datainfra/projects/hive_aux_jars/brickhouse-0.7.1.jar;
> CREATE TEMPORARY FUNCTION CAST_ARRAY AS 'brickhouse.udf.collect.CastArrayUDF';
> CREATE TEMPORARY FUNCTION CAST_MAP AS 'brickhouse.udf.collect.CastMapUDF';

Step 2: Insert the data:

> INSERT OVERWRITE TABLE tmp.test_table PARTITION(ds='2020–01–01')
> SELECT 
 'random_text' col1
 , CAST_MAP(map('', ''), 'map<string,string>') col2
 , CAST_ARRAY(array(), 'bigint') col3

Step 3: Check the results:
> SELECT * FROM tmp.test_table;

参考链接：

split string that includes semicolons in Hive
https://stackoverflow.com/questions/17212511/split-string-that-includes-semicolons-in-hive

Why does using an Underscore character in a LIKE filter give me all the results?
https://stackoverflow.com/questions/19588455/why-does-using-an-underscore-character-in-a-like-filter-give-me-all-the-results

【Hive】like + 转义字符
https://blog.csdn.net/qq_34105362/article/details/80415526

HIVE Boolean Filter
https://stackoverflow.com/questions/68313045/hive-boolean-filter

Insert NULL into Hive complex columns like ARRAY, MAP etc.
https://medium.com/@rajnishkumargarg/insert-null-into-hive-complex-columns-like-array-map-etc-a76e320d3e7e
https://github.com/jeromebanks/brickhouse/blob/master/src/main/java/brickhouse/udf/collect/CastMapUDF.java

=END=

《 “Hive SQL 中的分号、下划线等tips整理” 》有 2 条评论

hi说道：

2024-09-26 17:12

一个判断hour字段是否属于自定义的工作时间的方法：
array_contains(split(‘22,23,00,01,02,03,04,05,06,07′,’,’),`hour`) as not_work_hour,

使用split对字符串进行切分之后生成array，然后再用array_contains进行判断，可以在外面再套一层if判断将取值范围限定到0和1方便后面做累加操作

if(array_contains(split(‘22,23,00,01,02,03,04,05,06,07′,’,’),`hour`),1,0) as not_work_hour,

回复
hi说道：

2024-09-26 17:33

或者直接rlike也可以达到目标，因为hour字符串就2个字符，也不存在或者说很难出现错误匹配的情况

hour rlike ’22|23|00|01|02|03|04|05|06|07′ as not_work_hour

回复

ASPIRE

Hive SQL 中的分号、下划线等tips整理

缘由：

正文：

参考解答：

1. Hive SQL 如何对英文分号(;)做split切分？

2. Hive SQL的like如何匹配下划线？

3. Hive中布尔类型(boolean) 的变量如何做比较？

4. Hive中如何向array/map等复杂类型字段中添加空值？

参考链接：

《 “Hive SQL 中的分号、下划线等tips整理” 》有 2 条评论

发表回复取消回复

Hive SQL 中的分号、下划线等tips整理

缘由：

正文：

参考解答：

1. Hive SQL 如何对英文分号(;)做split切分？

2. Hive SQL的like如何匹配下划线？

3. Hive中 布尔类型(boolean) 的变量如何做比较？

4. Hive中如何向array/map等复杂类型字段中添加空值？

参考链接：

《 “Hive SQL 中的分号、下划线等tips整理” 》 有 2 条评论

发表回复 取消回复

3. Hive中布尔类型(boolean) 的变量如何做比较？

《 “Hive SQL 中的分号、下划线等tips整理” 》有 2 条评论

发表回复取消回复