用awk/sed/grep进行文本处理

经常会碰到文本处理的问题，之前也有过很多总结了，但知识这东西，经常性的过几遍还是有好处的（因此也会产生重复），把最近的一些小知识记录如下：

搜索关键字：

awk/sed/grep ++搭配++ “cheat sheet”/cheatsheet/清单/列表

以往记录：

参考链接：

一些小测试：

$ grep "['software_version']" case.log
        ...
        $unshift['software_version'] = '4.0.1';
		...
$ grep "\['software_version'\]" case.log
        $unshift['software_version'] = '4.0.1';
$ grep "\['software_version'\]" case.log | awk '{print $3}'
'4.0.1';
$ grep "\['software_version'\]" case.log | awk '{print $3}' | tr -d "';"
4.0.1
$ grep "\['software_version'\]" case.log | awk '{print substr($3,1,length($3)-3)}'
'4.0.
$ grep "\['software_version'\]" case.log | awk '{print substr($3,0,length($3)-2)}'
'4.0.1
$ grep "\['software_version'\]" case.log | awk '{print substr($3,2,length($3)-2)}'
4.0.1'
$ grep "\['software_version'\]" case.log | awk '{print substr($3,2,length($3)-3)}'
4.0.1

# http://www.cnblogs.com/sunada2005/p/3493941.html
# http://blog.chinaunix.net/uid-10540984-id-325914.html

用sed进行换行符(‘\n’)的替换

搜索关键字：

linux bash convert ‘\n’ to <br/>

参考链接：

使用sed的-r选项

sed
-r, --regexp-extended
    use extended regular expressions in the script.

# astr="<mac address='52:54:00:a9:cc:20'/>"
# echo $astr
<mac address='52:54:00:a9:cc:20'/>
# echo $astr | sed "s/.'(.)'.*/\1/g"
sed: -e expression #1, char 15: invalid reference \1 on `s' command's RHS
# echo $astr | sed "s/.'(.)'.*/\\1/g"
sed: -e expression #1, char 15: invalid reference \1 on `s' command's RHS
#
# echo $astr | sed -r "s/.'(.)'.*/\1/g"
<mac address='52:54:00:a9:cc:20'/>
#
# echo $astr | sed -e "s/.'(.)'.*/\1/g"
sed: -e expression #1, char 15: invalid reference \1 on `s' command's RHS
#
# echo $astr | awk -F' '{print $2}'
> -bash: unexpected EOF while looking for matching `''
-bash: syntax error: unexpected end of file
#
# echo $astr | awk -F\' '{print $2}'
52:54:00:a9:cc:20
#
# echo $astr | sed -r "s/.'(.+)'.*/\1/g"
<mac address52:54:00:a9:cc:20
# echo $astr | sed -r "s/.+'(.+)'.*/\1/g"
52:54:00:a9:cc:20
# echo $astr | sed -r "s/.+'(.+)'.+/\1/g"
52:54:00:a9:cc:20

22 5 月, 2015

admin

KnowledgeBase, Linux, Programing, Tools

awk, grep, sed, tips

《 “用awk/sed/grep进行文本处理” 》有 9 条评论

a-z说道：

2016-08-08 13:23

用grep进行匹配时，仅打印匹配到的内容「-o选项」：
`
…var md5=”e21a0ffc2876b34f98280914d98c9a88″…
grep -oE ‘var md5=”w+”‘ #仅打印出其中的md5字符串
grep -o ‘var md5=”[[:alnum:]]*”‘
`
在使用正则表达式时，一般需要「-E选项」的配合，grep默认的BRE用起来感觉不太顺手。

回复
a-z说道：

2017-05-27 22:33

Linux下用grep查找管道操作符（grep查找中如何转义「管道操作符」）

搜索关键字：
linux grep escape pipe

参考解答：
$ grep -F “abc.sh|” info.log | grep -F “|2017-05-25|” | awk ‘{print $4}’ | sort | uniq -c >~/2017-05-25.host

`
-F, –fixed-strings
Interpret PATTERN as a list of fixed strings, separated by newlines, any of which is to be matched. (-F is specified by POSIX.)
`

# 其实 grep 不添加任何选项也可以起到上面的作用。。。
$ grep “abc.sh|” info.log | grep “|2017-05-25|” | awk ‘{print $4}’ | sort | uniq -c >~/2017-05-25.host2

参考链接：
https://stackoverflow.com/questions/23772231/how-to-escape-the-pipe-character-in-grep
https://stackoverflow.com/questions/11856054/bash-easy-way-to-pass-a-raw-string-to-grep/11856117#11856117

https://stackoverflow.com/questions/612658/how-can-i-grep-for-a-count-of-pipes #grep的匹配单位是针对行而不是针对字符的

回复
a-z说道：

2017-06-12 13:35

bingrep – 专门搜索二进制文件的 Grep 工具
https://github.com/m4b/bingrep

回复
a-z说道：

2017-06-30 17:13

grep在使用`-e`选项的正则表达式语法时，数字可以用[0-9]表示，但是次数{2}需要对其中的大括号进行转义：
`
grep -m 5 ” sec-analyse01 ” /var/logs/bash.log | grep –color -e “^Jun 30 [0-9]\{2\}:[0-9]\{2\}:[0-9]\{2\} sec-analyse01 ”

grep -e “^Jun 30 [0-9]\{2\}:[0-9]\{2\}:[0-9]\{2\} sec-analyse01 ” /var/logs/bash.log >> bash_log.sec-analyse01.20170630
`

回复
hi说道：

2018-05-28 21:27

使用grep解析VPNFilter IoC的Syslog
http://www.4hou.com/technology/11801.html
`
打开一个终端，从这里 https://blog.talosintelligence.com/2018/05/VPNFilter.html 获取相关的IoC IP，并将其粘贴到临时文件中，将该文件保存到：
/tmp/vpnfilterc2.txt
内容是与VPNFilter的C2关联的IP地址

-r =递归
-i =忽略大小写
-l =列出文件名而不是行
-f =使用文件的内容而不是字符串

$ grep -rilf /tmp/vpnfilterC2.txt syslog.log
`

回复
hi说道：

2019-04-20 09:54

How to print a range of IP addresses with Linux seq command
https://unix.stackexchange.com/questions/169098/how-to-print-a-range-of-ip-addresses-with-linux-seq-command
`
$ seq -f “10.20.30.%g” 40 50

$ seq 2 23 | sed ‘s/^/10.0.0./’

$ echo 10.0.0.{2..23} | tr ‘ ‘ ‘\n’

$ for i in $(seq 2 23); do echo “10.0.0.$i”; done
`

回复
hi说道：

2020-07-31 14:58

使用AWK进行文件内容join
https://blog.yourtion.com/join-file-data-using-awk.html
https://github.com/yourtion/BlogCodes/tree/master/awk_join
`
最近在做一个项目，需要在两个不同的数据源上导出两个 CSV 文件，同时对导出的文件进行类似于 SQL 的 join 操作，由于只是一个查询脚本，没办法修改程序或者数据库等功能，所以就想到将 csv 文件导出后，通过 Linux 自带的命令来完成内容的合并。

首先是最核心的 join.awk ，实现了 awk 核心的逻辑：

function read_file_into_array(file, array, status, record) {
while (1) {
status = getline record < file
if (status == -1) {
print "Failed to read file " file;
exit 1;
}
if (status == 0) break;
split(record, a, "\t");
array[a[1]] = a[2];
}
close(file);
}
BEGIN {
read_file_into_array(CHANNEL, File);
}
{
if(NR == 1) {
print "channel\tid\tpv\tuv\tsubmit"
next
}
{ printf("%s\t%s\t%s\t%s\t%s\t\n", ($1 in File ? File[$1] : "未知"),$1,$2,$3,$4) }
}
`

回复
abc说道：

2020-09-23 20:22

简单好用的命令行神器：grep
https://quickapp.lovejade.cn/simple-and-easy-to-use-command-line-tools-grep/

grab – simple, but very fast grep
https://github.com/stealth/grab

回复
abc说道：

2024-04-11 14:35

如何快速移除一行字符串的首尾空白符号？
https://unix.stackexchange.com/questions/102008/how-do-i-trim-leading-and-trailing-whitespace-from-each-line-of-some-output
`
awk ‘{$1=$1;print}’
awk ‘{$1=$1};1’

$ sed ‘s/^[ \t]*//;s/[ \t]*$//’ < file

create a script /usr/local/bin/trim:

#!/bin/bash
awk '{$1=$1};1'

and give that file executable rights:

chmod +x /usr/local/bin/trim

Now you can pass every output to trim for example:

cat file | trim
`

回复

ASPIRE

用awk/sed/grep进行文本处理

搜索关键字：

以往记录：

参考链接：

更多参考链接：

一些小测试：

用sed进行换行符(‘\n’)的替换

搜索关键字：

参考链接：

用sed提取文件中的指定行

跳过前n行

使用sed 的-i选项

使用sed的-r选项

《 “用awk/sed/grep进行文本处理” 》有 9 条评论

回复 a-z 取消回复

用awk/sed/grep进行文本处理

搜索关键字：

以往记录：

参考链接：

更多参考链接：

一些小测试：

用sed进行换行符(‘\n’)的替换

搜索关键字：

参考链接：

用sed提取文件中的指定行

跳过前n行

使用sed 的-i选项

使用sed的-r选项

《 “用awk/sed/grep进行文本处理” 》 有 9 条评论

回复 a-z 取消回复

《 “用awk/sed/grep进行文本处理” 》有 9 条评论