Hive SQL学习整理_5

=Start=

缘由：

继续整理最近学到或是用到的Hive SQL知识，方便以后参考。

正文：

参考解答：

1、Hive中的条件判断

IF( Test Condition, True Value, False Value )
// 如果「Test Condition」为真，则取「True Value」的值，否则取「False Value」的值
select IF(1=1,'TRUE','FALSE') as IF_CONDITION_TEST;
-- TRUE

COALESCE( value1,value2,… )
// 获取参数列表中的首个非空值，若均为NULL，则返回NULL
select coalesce(null,'a',null,'b');
-- a

CASE Statement
// 与其他语言中case语法相似，根据实际情况取实际的值，适用于多条件比较的情况
select case x 
 when 1 then 'one'
 when 2 then 'two'
 when 0 then 'zero'
 else 'out of range'
end
from
(select 3 as x
)t
;
-- out of range

2、Hive中如何获取今天的日期？是星期几？是一年中的第几周？

select current_timestamp,
from_unixtime(unix_timestamp()) as `from_unixtime(unix_timestamp())`,

current_date,
to_date(from_unixtime(unix_timestamp())) as `to_date(from_unixtime(unix_timestamp()))`
;
-- 2019-05-21 20:30:05.591 2019-05-21 20:30:06 2019-05-21 2019-05-21
-- 即，一般情况下使用 current_timestamp 和 current_date 即可

SELECT current_date AS `date`,
       CASE date_format(current_date,'u')
           WHEN 1 THEN 'Mon'
           WHEN 2 THEN 'Tues'
           WHEN 3 THEN 'Wed'
           WHEN 4 THEN 'Thu'
           WHEN 5 THEN 'Fri'
           WHEN 6 THEN 'Sat'
           WHEN 7 THEN 'Sun'
END AS day_of_week
;
-- 2019-05-21 Tues

select weekofyear(current_timestamp); -- 21

3、Hive中的数据类型转换

同Java语言一样，Hive也包括隐式转换（implicit conversions）和显式转换（explicitly conversions）。

任何整数类型都可以隐式地转换成一个范围更大的类型。TINYINT,SMALLINT,INT,BIGINT,FLOAT和STRING都可以隐式地转换成DOUBLE；是的你没看错，STRING也可以隐式地转换成DOUBLE！但是你要记住，BOOLEAN类型不能转换为其他任何数据类型！

CAST的语法为：

cast(value AS TYPE)

如果转换失败，结果则会返回NULL。

4、Hive中如何获取本周、本月的第一天？

方法一：写 UDF 实现一个 function 供调用，这种最简单直接快速

方法二：用 SQL 进行日期函数的增减
在Hive SQL中如何获取本月的第一天？(date_format函数的第二个参数为’d’表示Day in month，即当月的第几天)
hive> select date_sub(current_date, cast(date_format(current_date,'d') as INT)) as month_first;

在Hive SQL中如何获取本周的第一天？(date_format函数的第二个参数为’u’表示Day number of week，即当周的第几天)
hive> select date_sub(current_date, cast(date_format(current_date,'u') as INT)) as week_first;

5、Hive中常用的正则表达式

1. 校验密码强度
密码的强度必须是包含大小写字母和数字的组合，不能使用特殊字符，长度在8-10之间。

^(?=.*\\d)(?=.*[a-z])(?=.*[A-Z]).{8,10}$

2. 校验中文
字符串仅能是中文。

^[\\u4e00-\\u9fa5]{0,}$

3. 由数字、26个英文字母或下划线组成的字符串

^\\w+$

4. 校验E-Mail 地址
同密码一样，下面是E-mail地址合规性的正则检查语句。

[\\w!#$%&'*+/=?^_`{|}~-]+(?:\\.[\\w!#$%&'*+/=?^_`{|}~-]+)*@(?:[\\w](?:[\\w-]*[\\w])?\\.)

5. 校验身份证号码

下面是身份证号码的正则校验。15 或 18位。（根据实际情况来看，符合身份证号码格式的可能会很多，但是，能通过身份证格式校验的就很少，所以还需要借助下面的身份证格式校验功能进行进一步的验证才行）

15位：

^[1-9]\\d{7}((0\\d)|(1[0-2]))(([0|1|2]\\d)|3[0-1])\\d{3}$

18位：

^[1-9]\\d{5}[1-9]\\d{3}((0\\d)|(1[0-2]))(([0|1|2]\\d)|3[0-1])\\d{3}([0-9]|X)$

6. 校验日期
“yyyy-mm-dd“ 格式的日期校验，已考虑平闰年。

^(?:(?!0000)[0-9]{4}-(?:(?:0[1-9]|1[0-2])-(?:0[1-9]|1[0-9]|2[0-8])|(?:0[13-9]|1[0-2])

7. 校验手机号
下面是国内 13、15、18开头的手机号正则表达式。

^(13[0-9]|14[5|7]|15[0|1|2|3|5|6|7|8|9]|18[0|1|2|3|5|6|7|8|9])\\d{8}$

6、Hive中如何进行身份证字符串格式的合法性校验

-- Hive 18位身份证号码验证（需要用「不符合规范」的身份证号码进行测试才会有结果，正确的号码查询结果为空）

select * from
(select trim(upper('440102198001021231')) idcard) t1
where
-- 号码位数不正确
length(idcard) <> 18 

-- 省份代码不正确
or substr(idcard,1,2) not in 
('11','12','13','14','15','21','22','23','31',
'32','33','34','35','36','37','41','42','43',
'44','45','46','50','51','52','53','54','61',
'62','63','64','65','71','81','82','91') 

-- 身份证号码的正则表达式判断
or (if(pmod(cast(substr(idcard, 7, 4) as int),400) = 0 or (pmod(cast(substr(idcard, 7, 4) as int),100) <> 0 and pmod(cast(substr(idcard, 7, 4) as int),4) = 0), -- 闰年
if(idcard regexp '^[1-9][0-9]{5}19[0-9]{2}((01|03|05|07|08|10|12)(0[1-9]|[1-2][0-9]|3[0-1])|(04|06|09|11)(0[1-9]|[1-2][0-9]|30)|02(0[1-9]|[1-2][0-9]))[0-9]{3}[0-9X]$',1,0),
if(idcard regexp '^[1-9][0-9]{5}19[0-9]{2}((01|03|05|07|08|10|12)(0[1-9]|[1-2][0-9]|3[0-1])|(04|06|09|11)(0[1-9]|[1-2][0-9]|30)|02(0[1-9]|1[0-9]|2[0-8]))[0-9]{3}[0-9X]$',1,0)
)) = 0

-- 校验位不正确
or substr('10X98765432',pmod(
(cast(substr(idcard,1,1) as int)+cast(substr(idcard,11,1) as int))*7
+(cast(substr(idcard,2,1) as int)+cast(substr(idcard,12,1) as int))*9
+(cast(substr(idcard,3,1) as int)+cast(substr(idcard,13,1) as int))*10
+(cast(substr(idcard,4,1) as int)+cast(substr(idcard,14,1) as int))*5
+(cast(substr(idcard,5,1) as int)+cast(substr(idcard,15,1) as int))*8
+(cast(substr(idcard,6,1) as int)+cast(substr(idcard,16,1) as int))*4
+(cast(substr(idcard,7,1) as int)+cast(substr(idcard,17,1) as int))*2
+cast(substr(idcard, 8,1) as int)*1
+cast(substr(idcard, 9,1) as int)*6
+cast(substr(idcard,10,1) as int)*3,11)+1,1) 
<> cast(substr(idcard,18,1) as int)
;

参考链接：

Hive条件判断
https://blog.csdn.net/u012378570/article/details/62216722
What’s the best way to write if/else if/else if/else in HIVE?
https://stackoverflow.com/questions/32472801/whats-the-best-way-to-write-if-else-if-else-if-else-in-hive
Hadoop Hive Conditional Functions: IF,CASE,COALESCE,NVL,DECODE
http://dwgeek.com/hadoop-hive-conditional-functions-if-case-coalesce-nvl-decode.html/
=
Hive进行身份证合法性校验
https://blog.csdn.net/wzy0623/article/details/53893238
常用的正则表达式
http://lxw1234.com/archives/2016/04/640.htm
hive正则
https://blog.csdn.net/changzoe/article/details/80251700
Hive 正则提取英文名称和中文名称
https://cloud.tencent.com/developer/article/1403321
=
How to select current date in Hive SQL
https://stackoverflow.com/questions/17905873/how-to-select-current-date-in-hive-sql
Hive date function to achieve day of week
https://stackoverflow.com/questions/22982904/hive-date-function-to-achieve-day-of-week/55320077#55320077
Hive和sparksql中的dayofweek
https://blog.csdn.net/hjw199089/article/details/79526362
=
Hive数据类型转换
https://www.iteblog.com/archives/892.html
=
Hive常用函数大全一览
https://www.iteblog.com/archives/2258.html#i-5
Apache Hive 内置函数(Builtin Function)列表
https://www.iteblog.com/archives/2032.html#date_format

=END=

23 5 月, 2019

Docker

Database, KnowledgeBase, Programing

case, cast, current_date, current_timestamp, date_format, Hive, SQL, 正则表达式, 身份证