经常会碰到文本处理的问题,之前也有过很多总结了,但知识这东西,经常性的过几遍还是有好处的(因此也会产生重复),把最近的一些小知识记录如下:
搜索关键字:
awk/sed/grep ++搭配++ “cheat sheet”/cheatsheet/清单/列表
以往记录:
- http://ixyzero.com/blog/awk_sed.txt
- http://ixyzero.com/blog/sed1line.txt
- http://ixyzero.com/blog/regex.html
参考链接:
- http://sparky.rice.edu/~hartigan/awk.html
- http://www.ibm.com/developerworks/linux/library/l-awk1/index.html
- http://www.ibm.com/developerworks/linux/library/l-awk2/index.html
- http://www.ibm.com/developerworks/linux/library/l-awk3/index.html
- linux awk数组操作详细介绍 – 程默
- https://github.com/txsniper/Note/blob/master/awk.sh
- http://bl831.als.lbl.gov/~gmeigs/scripting_help/awk_cheat_sheet.pdf
- =
- http://www.grymoire.com/unix/SedChart.pdf
- sed很强大的文本操作命令
- sed 命令用法详解
- =
- 强大的文件搜索工具grep
- 使用grep、awk统计查询日志
更多参考链接:
- Linux Cheat Sheets (awk, ed, sed, bash, screen, perl, and more)
- =
- UNIX 高手的 10 个习惯
- UNIX 技巧: UNIX 高手的另外 10 个习惯
- 磨练构建正则表达式模式的技能
- 功能丰富的 Perl: 一行程序 101
- 功能丰富的 Perl: 一行程序 102
- =
- 利用 AWK 的数值计算功能提升工作效率
- GAWK 入门:AWK 语言基础
- =
- Shell脚本编程的常识
- [Shell学习笔记] Shell正则表达式与grep、sed、awk的特点
- Shell正则表达式 列表
- [Shell学习笔记] 在文件中搜索文本工具grep命令用法
- 使用 xargs 命令 – Linux个人笔记
一些小测试:
$ grep "['software_version']" case.log ... $unshift['software_version'] = '4.0.1'; ... $ grep "\['software_version'\]" case.log $unshift['software_version'] = '4.0.1'; $ grep "\['software_version'\]" case.log | awk '{print $3}' '4.0.1'; $ grep "\['software_version'\]" case.log | awk '{print $3}' | tr -d "';" 4.0.1 $ grep "\['software_version'\]" case.log | awk '{print substr($3,1,length($3)-3)}' '4.0. $ grep "\['software_version'\]" case.log | awk '{print substr($3,0,length($3)-2)}' '4.0.1 $ grep "\['software_version'\]" case.log | awk '{print substr($3,2,length($3)-2)}' 4.0.1' $ grep "\['software_version'\]" case.log | awk '{print substr($3,2,length($3)-3)}' 4.0.1 # http://www.cnblogs.com/sunada2005/p/3493941.html # http://blog.chinaunix.net/uid-10540984-id-325914.html
==
用sed进行换行符(‘\n’)的替换
搜索关键字:
- linux bash convert ‘\n’ to <br/>
参考链接:
- http://stackoverflow.com/questions/1251999/how-can-i-replace-a-newline-n-using-sed
- http://stackoverflow.com/questions/3498799/unix-commandline-for-inline-replacement-of-all-newlines-in-file-with-br-n
- http://unix.stackexchange.com/questions/26788/using-sed-to-convert-newlines-into-spaces
用sed提取文件中的指定行
http://stackoverflow.com/questions/83329/how-can-i-extract-a-range-of-lines-from-a-text-file-on-unix
跳过前n行
http://stackoverflow.com/questions/6869449/skipping-the-first-n-lines-when-using-regex-with-sed
使用sed 的-i选项
使用sed的-r选项
sed -r, --regexp-extended use extended regular expressions in the script. # astr="<mac address='52:54:00:a9:cc:20'/>" # echo $astr <mac address='52:54:00:a9:cc:20'/> # echo $astr | sed "s/.'(.)'.*/\1/g" sed: -e expression #1, char 15: invalid reference \1 on `s' command's RHS # echo $astr | sed "s/.'(.)'.*/\\1/g" sed: -e expression #1, char 15: invalid reference \1 on `s' command's RHS # # echo $astr | sed -r "s/.'(.)'.*/\1/g" <mac address='52:54:00:a9:cc:20'/> # # echo $astr | sed -e "s/.'(.)'.*/\1/g" sed: -e expression #1, char 15: invalid reference \1 on `s' command's RHS # # echo $astr | awk -F' '{print $2}' > -bash: unexpected EOF while looking for matching `'' -bash: syntax error: unexpected end of file # # echo $astr | awk -F\' '{print $2}' 52:54:00:a9:cc:20 # # echo $astr | sed -r "s/.'(.+)'.*/\1/g" <mac address52:54:00:a9:cc:20 # echo $astr | sed -r "s/.+'(.+)'.*/\1/g" 52:54:00:a9:cc:20 # echo $astr | sed -r "s/.+'(.+)'.+/\1/g" 52:54:00:a9:cc:20
==
《 “用awk/sed/grep进行文本处理” 》 有 9 条评论
用grep进行匹配时,仅打印匹配到的内容「-o选项」:
`
…var md5=”e21a0ffc2876b34f98280914d98c9a88″…
grep -oE ‘var md5=”w+”‘ #仅打印出其中的md5字符串
grep -o ‘var md5=”[[:alnum:]]*”‘
`
在使用正则表达式时,一般需要「-E选项」的配合,grep默认的BRE用起来感觉不太顺手。
Linux下用grep查找管道操作符(grep查找中如何转义「管道操作符」)
搜索关键字:
linux grep escape pipe
参考解答:
$ grep -F “abc.sh|” info.log | grep -F “|2017-05-25|” | awk ‘{print $4}’ | sort | uniq -c >~/2017-05-25.host
`
-F, –fixed-strings
Interpret PATTERN as a list of fixed strings, separated by newlines, any of which is to be matched. (-F is specified by POSIX.)
`
# 其实 grep 不添加任何选项也可以起到上面的作用。。。
$ grep “abc.sh|” info.log | grep “|2017-05-25|” | awk ‘{print $4}’ | sort | uniq -c >~/2017-05-25.host2
参考链接:
https://stackoverflow.com/questions/23772231/how-to-escape-the-pipe-character-in-grep
https://stackoverflow.com/questions/11856054/bash-easy-way-to-pass-a-raw-string-to-grep/11856117#11856117
https://stackoverflow.com/questions/612658/how-can-i-grep-for-a-count-of-pipes #grep的匹配单位是针对行而不是针对字符的
bingrep – 专门搜索二进制文件的 Grep 工具
https://github.com/m4b/bingrep
grep在使用`-e`选项的正则表达式语法时,数字可以用[0-9]表示,但是次数{2}需要对其中的大括号进行转义:
`
grep -m 5 ” sec-analyse01 ” /var/logs/bash.log | grep –color -e “^Jun 30 [0-9]\{2\}:[0-9]\{2\}:[0-9]\{2\} sec-analyse01 ”
grep -e “^Jun 30 [0-9]\{2\}:[0-9]\{2\}:[0-9]\{2\} sec-analyse01 ” /var/logs/bash.log >> bash_log.sec-analyse01.20170630
`
使用grep解析VPNFilter IoC的Syslog
http://www.4hou.com/technology/11801.html
`
打开一个终端,从这里 https://blog.talosintelligence.com/2018/05/VPNFilter.html 获取相关的IoC IP,并将其粘贴到临时文件中,将该文件保存到:
/tmp/vpnfilterc2.txt
内容是与VPNFilter的C2关联的IP地址
-r =递归
-i =忽略大小写
-l =列出文件名而不是行
-f =使用文件的内容而不是字符串
$ grep -rilf /tmp/vpnfilterC2.txt syslog.log
`
How to print a range of IP addresses with Linux seq command
https://unix.stackexchange.com/questions/169098/how-to-print-a-range-of-ip-addresses-with-linux-seq-command
`
$ seq -f “10.20.30.%g” 40 50
$ seq 2 23 | sed ‘s/^/10.0.0./’
$ echo 10.0.0.{2..23} | tr ‘ ‘ ‘\n’
$ for i in $(seq 2 23); do echo “10.0.0.$i”; done
`
使用AWK进行文件内容join
https://blog.yourtion.com/join-file-data-using-awk.html
https://github.com/yourtion/BlogCodes/tree/master/awk_join
`
最近在做一个项目,需要在两个不同的数据源上导出两个 CSV 文件,同时对导出的文件进行类似于 SQL 的 join 操作,由于只是一个查询脚本,没办法修改程序或者数据库等功能,所以就想到将 csv 文件导出后,通过 Linux 自带的命令来完成内容的合并。
首先是最核心的 join.awk ,实现了 awk 核心的逻辑:
function read_file_into_array(file, array, status, record) {
while (1) {
status = getline record < file
if (status == -1) {
print "Failed to read file " file;
exit 1;
}
if (status == 0) break;
split(record, a, "\t");
array[a[1]] = a[2];
}
close(file);
}
BEGIN {
read_file_into_array(CHANNEL, File);
}
{
if(NR == 1) {
print "channel\tid\tpv\tuv\tsubmit"
next
}
{ printf("%s\t%s\t%s\t%s\t%s\t\n", ($1 in File ? File[$1] : "未知"),$1,$2,$3,$4) }
}
`
简单好用的命令行神器:grep
https://quickapp.lovejade.cn/simple-and-easy-to-use-command-line-tools-grep/
grab – simple, but very fast grep
https://github.com/stealth/grab
如何快速移除一行字符串的首尾空白符号?
https://unix.stackexchange.com/questions/102008/how-do-i-trim-leading-and-trailing-whitespace-from-each-line-of-some-output
`
awk ‘{$1=$1;print}’
awk ‘{$1=$1};1’
$ sed ‘s/^[ \t]*//;s/[ \t]*$//’ < file
create a script /usr/local/bin/trim:
#!/bin/bash
awk '{$1=$1};1'
and give that file executable rights:
chmod +x /usr/local/bin/trim
Now you can pass every output to trim for example:
cat file | trim
`