Hive SQL学习整理_6

=Start=

缘由:

整理一下最近在工作中用到的Hive SQL的知识点,方便以后参考。

正文:

参考解答:
  • 在Hive SQL中进行IP字符串和数字之间的转换
-- 数字格式的IP转成点分十进制字符串
create temporary macro IPint2str (x bigint) concat_ws('.',CONV(substr(hex(x),1,2),16,10),CONV(substr(hex(x),3,2),16,10),CONV(substr(hex(x),5,2),16,10),CONV(substr(hex(x),7,2),16,10));

select
concat_ws('.',
conv(substr(lpad(conv(3232267034, 10, 2), 32, 0), 1, 8), 2, 10),
conv(substr(lpad(conv(3232267034, 10, 2), 32, 0), 9, 8), 2, 10),
conv(substr(lpad(conv(3232267034, 10, 2), 32, 0), 17, 8), 2, 10),
conv(substr(lpad(conv(3232267034, 10, 2), 32, 0), 25, 8), 2, 10)
)
, concat_ws('.',
CONV(substr(hex(3232267034),1,2),16,10),
CONV(substr(hex(3232267034),3,2),16,10),
CONV(substr(hex(3232267034),5,2),16,10),
CONV(substr(hex(3232267034),7,2),16,10)
)
, IPint2str(3232267034)
;
-- 192.168.123.26	192.168.123.26	192.168.123.26

-- 点分十进制格式的IP字符串转换成整数
create temporary macro IPstr2int (x string) cast(split(x, '\\.') [0] * 256 * 256 * 256 + split(x, '\\.') [1] * 256 * 256 + split(x, '\\.') [2] * 256 + split(x, '\\.') [3] AS bigint);

SELECT
  cast(
    split('192.168.123.26', '\\.') [0] * 256 * 256 * 256 + split('192.168.123.26', '\\.') [1] * 256 * 256 + split('192.168.123.26', '\\.') [2] * 256 + split('192.168.123.26', '\\.') [3] AS bigint
  ) AS ip_bigint
, IPstr2int('192.168.123.26')
;
-- 3232267034	3232267034

# 十六进制函数: hex
语法: hex(BIGINT/BINARY/STRING a)
返回值: string
说明: 如果变量是bigint/binary类型,那么返回a的十六进制表示;如果变量是string类型,则返回该字符串中每个字符的十六进制表示

hive> select hex(17);
11

hive> select hex('abc');
616263

If the argument is an INT or binary, hex returns the number as a STRING in hexadecimal format. Otherwise if the number is a STRING, it converts each character into its hexadecimal representation and returns the resulting STRING. (BINARY version as of Hive 0.12.0.)


# 进制转换函数: conv
语法: conv(BIGINT/STRING num, int from_base, int to_base)
返回值: string
说明: 将数值num从from_base进制转化到to_base进制

hive> select conv(17,10,16) from tableName;
11

hive> select conv(17,10,2) from tableName;
10001

conv(BIGINT num, INT from_base, INT to_base),
conv(STRING num, INT from_base, INT to_base)

Converts a number from a given base to another (see http://dev.mysql.com/doc/refman/5.0/en/mathematical-functions.html#function_conv).


# 左补足函数: lpad
语法: lpad(string str, int len, string pad)
返回值: string
说明:将 str 进行用 pad 进行左补足到 len 位

hive> select lpad('abc',10,'td');
tdtdtdtabc
  • 在Hive中用SQL生成日期序列
-- 通用方法
with t as (
  select '2020-12-28' as dt_start, '2021-01-12' as dt_end
)
select
idx, date_add(to_date(dt_start), idx) as dt
from t
lateral view posexplode(split(space(datediff(t.dt_end,t.dt_start)),' ')) pi as idx, empty
;

/*
思路解读:
先用 datediff 函数计算两个日期之间的天数(第二个参数应该比第一个参数要更「大」,否则返回的会是负数);
然后用 space 函数把天数这个数值(在此假设为n)转换成 n 个空格;
再用 split 函数对空格进行切分,转换成一个array数组;
再用 lateral view posexplode 对上一步骤产生的数组拆分成多行(索引和内容的格式,但此处的内容为空串);
最后在 select 的时候用 date_add 函数在最初的基础日期上循环不断增加 索引的值 从而产生新的连续日期。

datediff(date1, date2) - Returns the number of days between date1 and date2
date1 and date2 are strings in the format 'yyyy-MM-dd HH:mm:ss' or 'yyyy-MM-dd'. The time parts are ignored.If date1 is earlier than date2, the result is negative.

space(n) - returns n spaces

split(str, regex) - Splits str around occurances that match regex
Function class:org.apache.hadoop.hive.ql.udf.generic.GenericUDFSplit

posexplode(a) - behaves like explode for arrays, but includes the position of items in the original array
Function class:org.apache.hadoop.hive.ql.udf.generic.GenericUDTFPosExplode
Function type:BUILTIN


idx	dt
0	2020-12-28
1	2020-12-29
2	2020-12-30
3	2020-12-31
4	2021-01-01
...
15	2021-01-12
*/
-- 注:在Hive SQL中,函数接收日期的格式默认为 'yyyy-mm-dd' 的,而非 'yyyymmdd' 要注意!

SELECT
default.date_range('20201228','20210112',1)
;

/*
date	index
20201228	0
20201229	1
20201230	2
20201231	3
20210101	4
...
20210112	15
*/

desc function extended default.date_range;
/*
default.date_range(a,b,c) - Generates a range of integers from a to b incremented by c or the elements of a map into multiple rows and columns
Function class:brickhouse.udf.date.DateRangeUDTF
Function type:PERSISTENT
Resource:viewfs://hadoop-lt-cluster/home/system/hive/resources/brickhouse-0.7.1-SNAPSHOT-jar-with-dependencies.jar
*/
参考链接:

Hive Sql中IP地址和数字之间的转换
https://blog.csdn.net/weixin_44034508/article/details/105813769

hive 数字IP与字符串IP之间转换
https://blog.csdn.net/cxy1991xm/article/details/102505150

Hive SQL中IP地址与数字之间的转换
https://www.vzhima.com/2020/06/24/hive-sql-convert-ip-address-to-integer.html

How to generate Date Series in HIVE? (Creating table)
https://stackoverflow.com/questions/45278300/how-to-generate-date-series-in-hive-creating-table/45279528

https://www.gairuo.com/p/hive-sql-mathematical-functions

https://dev.mysql.com/doc/refman/8.0/en/string-functions.html#function_hex
https://dev.mysql.com/doc/refman/8.0/en/mathematical-functions.html#function_conv

=END=

声明: 除非注明,ixyzero.com文章均为原创,转载请以链接形式标明本文地址,谢谢!
https://ixyzero.com/blog/archives/5096.html

发表评论

您的电子邮箱地址不会被公开。 必填项已用*标注