一些SQL代码片段整理

=Start=

缘由：

简单整理一下个人觉得有用/通用的SQL代码片段，方便后面有需要的时候参考和使用。

正文：

参考解答：

Nginx日志中的UA字段的切分和信息提取

-- 从HTTP请求的user_agent字段中提取一些信息
SELECT
`op_datetime`
,`device`
,`device_id`
,split(replace(user_agent, ')', '('),'\\(')[1] as ua
,case
when user_agent rlike 'Windows NT|Macintosh' THEN 'PC'
when user_agent rlike 'iPhone|iPad|Android' THEN 'mobile'
else 'others'
END AS platform
,case
when locate('Windows NT',user_agent) > 0 THEN 'win'
when locate('Macintosh',user_agent) > 0 THEN 'mac'
when locate('iPhone',user_agent) > 0 THEN 'iphone'
when locate('iPad',user_agent) > 0 THEN 'ipad'
when locate('Android',user_agent) > 0 THEN 'android'
when locate('Linux x86_64',user_agent) > 0 THEN 'linux'
when locate('Java/',user_agent) > 0 THEN 'java'
else 'others'
END AS os_type
,`user_agent`

GitLab中和代码仓库下载有关的日志和查询条件

-- 在GitLab的3个日志（production_json.log、api_json.log、gitlab-shell.log）中查看下载行为记录
SELECT
from_unixtime(cast(`timestamp`/1000 as BIGINT)) as time_str
,username
,remote_ip
,action as op
,project_path
,status
,ua
,p_dt
,p_hour
FROM
origin_sec_log.gitlab_rails_production_json
WHERE
p_dt between '20231102' and '20231102'
and (project_path like '%reponame%'
)
AND action = 'git_upload_pack'

union all

SELECT
from_unixtime(cast(`timestamp`/1000 as BIGINT)) as time_str
,username
,remote_ip
,'git-upload-pack' as op
,project_path
,status
,ua
,p_dt
,p_hour
FROM
origin_sec_log.gitlab_rails_api_json
WHERE
p_dt between '20231102' and '20231102'
and params like '%"git-upload-pack"%'
and username IS NOT NULL
and (project_path like '%reponame%'
)

union all

SELECT
from_unixtime(cast(`timestamp`/1000 as BIGINT)) as time_str
,username
,null as remote_ip
,command as op
,gl_project_path as project_path
,status
,null as ua
,p_dt
,p_hour
FROM
origin_sec_log.gitlab_shell_log
WHERE
p_dt between '20231102' and '20231102'
AND command = 'git-upload-pack'
and (gl_project_path like '%reponame%'
)
;

分析或清洗Nginx日志时会用到的一些字段以及需要关注的一些类型的数据

CREATE TABLE IF NOT EXISTS origin_sec_log.nginx_access_log (
`unix_timestamp` bigint,
`datetime_str` string,
`user_name` string,
`ipv4` string,
`ipv6` string,
`scheme` string,
`domain` string,
`uri` string,
`req_method` string,
`req_args` string,
`req_body` string,
`resp_content` string,
`status` int,
`http_user_agent` string,
`device_id` string,
`http_referer` string,
`http_x_forwarded_for` string,
`http_content_type` string,
`upstream_response_length` bigint,
`bytes_sent` bigint
) 
PARTITIONED BY (
`p_dt` string
)

req_body rlike '(银行卡|手机号|电话|地址|订单|身份证|实名|addr|street|company|phone|mobile|token|accesskey|secretkey|passport|key|userid|uid|idcard|idn|order|card|merchant|shop|logistics|asset|cert)'

resp_content rlike '(银行卡|手机号|电话|地址|订单|身份证|实名|addr|street|company|phone|mobile|token|accesskey|secretkey|passport|key|userid|uid|idcard|idn|order|card|merchant|shop|logistics|asset|cert)'

手机号、身份证号、邮箱、地址、银行卡号
用户ID、订单ID、搜索关键字、下载文件名
扩展ID1、扩展ID2

,`up_phone` string
,`up_idno` string
,`up_email` string
,`up_address` string
,`up_bankcard` string
,`up_userid` string
,`up_orderid` string
,`up_keyword` string
,`down_filename` string
,`ext_tag1` string
,`ext_tag2` string

参考链接：

GitLab log system
https://docs.gitlab.com/16.5/ee/administration/logs/

JavaScript 通过UserAgent获取用户设备信息（浏览器信息、操作系统信息）
https://blog.csdn.net/weixin_44477431/article/details/117260001

User-Agent Reduction deprecation trial （新的Chrome浏览器已经不显示小版本号了，并且高版本macOS的版本号也不准）
https://developer.chrome.com/blog/user-agent-reduction-deprecation-trial/

Nginx日志分析监控思路整理
https://ixyzero.com/blog/archives/5522.html

API日志分析经验记录
https://ixyzero.com/blog/archives/5451.html

=END=

2 11 月, 2023

Docker

Database, KnowledgeBase, Programing

GitLab, Hive, SQL, tips, UserAgent

《 “一些SQL代码片段整理” 》有 2 条评论

Hive SQL日期字符串规范化 – ASPIRE说道：

2023-11-04 17:23

[…] 一些SQL代码片段整理https://ixyzero.com/blog/archives/5538.html […]

回复
abc说道：

2023-12-19 19:45

编写 SQL 的好习惯
https://www.gairuo.com/p/good-habit-writing-sql
`
SQL 作为一种数据操作语言，良好的代码习惯能我们高效地完成数据分析工作。除了根据 SQL 执行顺序上的优化外，按以下规范编写 SQL 代码既能使代码更加清晰，方便自己后期维护，也方便其他同事接手。

查询
* 先理解业务需求，再分析需求，最后设计出要查询出的数据
* 先做 SQL 设计，再按设计编写实施
* 减少不必要的返回字段，用 select 代替 select *
* 最外层 SELECT 均有 AS
* 字段命名见名知义：week_id、order_qty、b_day
* 虚表命名见名知义，order_with_week 或 o_week , 保留 as
* 虚表内自用字段不带别名
* 做好代码逻辑缩进
* 关键字大小保持一致（因为大小写不敏感）
* SQL先 explain 查看执行计划，做SQL性能优化
* where/order by/group by 后的操作列以及多表关联的列优先考虑索引和组合索引
* 留意 where 后的字段数据类型的隐式转换
* WHERE 子句中尽量不对列进行函数转换和表达式计算
* 对于没有 Where 的，SQL语句拼接常加 where 1=1

操作
* delete或者update语句，加个limit，先操作部分，管控风险
* 设计创建表的时候，所有表及字段都加上相应的注释
* 修改或删除重要数据前，一定先备份！先备份！备份！份！
* 尽量把所有列的无值的默认值定义为 NOT NULL，少空间
* 修改删除前 WHERE 查询一下数据，看是不是要删除的内容
* MySQL 表必须使用Innodb存储引擎
* 字符集统一使用UTF8
`

回复

一些SQL代码片段整理

缘由：

正文：

参考解答：

参考链接：

《 “一些SQL代码片段整理” 》 有 2 条评论

发表回复 取消回复

《 “一些SQL代码片段整理” 》有 2 条评论

发表回复取消回复