用awk/sed/grep进行文本处理


经常会碰到文本处理的问题,之前也有过很多总结了,但知识这东西,经常性的过几遍还是有好处的(因此也会产生重复),把最近的一些小知识记录如下:

搜索关键字:

awk/sed/grep ++搭配++ “cheat sheet”/cheatsheet/清单/列表

以往记录:
参考链接:
更多参考链接:
一些小测试:
$ grep "['software_version']" case.log
        ...
        $unshift['software_version'] = '4.0.1';
		...
$ grep "\['software_version'\]" case.log
        $unshift['software_version'] = '4.0.1';
$ grep "\['software_version'\]" case.log | awk '{print $3}'
'4.0.1';
$ grep "\['software_version'\]" case.log | awk '{print $3}' | tr -d "';"
4.0.1
$ grep "\['software_version'\]" case.log | awk '{print substr($3,1,length($3)-3)}'
'4.0.
$ grep "\['software_version'\]" case.log | awk '{print substr($3,0,length($3)-2)}'
'4.0.1
$ grep "\['software_version'\]" case.log | awk '{print substr($3,2,length($3)-2)}'
4.0.1'
$ grep "\['software_version'\]" case.log | awk '{print substr($3,2,length($3)-3)}'
4.0.1

# http://www.cnblogs.com/sunada2005/p/3493941.html
# http://blog.chinaunix.net/uid-10540984-id-325914.html

==

用sed进行换行符(‘\n’)的替换
搜索关键字:
  • linux bash convert ‘\n’ to <br/>
参考链接:
用sed提取文件中的指定行

http://stackoverflow.com/questions/83329/how-can-i-extract-a-range-of-lines-from-a-text-file-on-unix

跳过前n行

http://stackoverflow.com/questions/6869449/skipping-the-first-n-lines-when-using-regex-with-sed

使用sed 的-i选项

http://stackoverflow.com/questions/5171901/sed-command-find-and-replace-in-file-and-overwrite-file-doesnt-work-it-empties

使用sed的-r选项
sed
-r, --regexp-extended
    use extended regular expressions in the script.

# astr="<mac address='52:54:00:a9:cc:20'/>"
# echo $astr
<mac address='52:54:00:a9:cc:20'/>
# echo $astr | sed "s/.'(.)'.*/\1/g"
sed: -e expression #1, char 15: invalid reference \1 on `s' command's RHS
# echo $astr | sed "s/.'(.)'.*/\\1/g"
sed: -e expression #1, char 15: invalid reference \1 on `s' command's RHS
#
# echo $astr | sed -r "s/.'(.)'.*/\1/g"
<mac address='52:54:00:a9:cc:20'/>
#
# echo $astr | sed -e "s/.'(.)'.*/\1/g"
sed: -e expression #1, char 15: invalid reference \1 on `s' command's RHS
#
# echo $astr | awk -F' '{print $2}'
> -bash: unexpected EOF while looking for matching `''
-bash: syntax error: unexpected end of file
#
# echo $astr | awk -F\' '{print $2}'
52:54:00:a9:cc:20
#
# echo $astr | sed -r "s/.'(.+)'.*/\1/g"
<mac address52:54:00:a9:cc:20
# echo $astr | sed -r "s/.+'(.+)'.*/\1/g"
52:54:00:a9:cc:20
# echo $astr | sed -r "s/.+'(.+)'.+/\1/g"
52:54:00:a9:cc:20

==

, , ,

《 “用awk/sed/grep进行文本处理” 》 有 9 条评论

  1. 用grep进行匹配时,仅打印匹配到的内容「-o选项」:
    `
    …var md5=”e21a0ffc2876b34f98280914d98c9a88″…
    grep -oE ‘var md5=”w+”‘ #仅打印出其中的md5字符串
    grep -o ‘var md5=”[[:alnum:]]*”‘
    `
    在使用正则表达式时,一般需要「-E选项」的配合,grep默认的BRE用起来感觉不太顺手。

  2. Linux下用grep查找管道操作符(grep查找中如何转义「管道操作符」)

    搜索关键字:
    linux grep escape pipe

    参考解答:
    $ grep -F “abc.sh|” info.log | grep -F “|2017-05-25|” | awk ‘{print $4}’ | sort | uniq -c >~/2017-05-25.host

    `
    -F, –fixed-strings
    Interpret PATTERN as a list of fixed strings, separated by newlines, any of which is to be matched. (-F is specified by POSIX.)
    `

    # 其实 grep 不添加任何选项也可以起到上面的作用。。。
    $ grep “abc.sh|” info.log | grep “|2017-05-25|” | awk ‘{print $4}’ | sort | uniq -c >~/2017-05-25.host2

    参考链接:
    https://stackoverflow.com/questions/23772231/how-to-escape-the-pipe-character-in-grep
    https://stackoverflow.com/questions/11856054/bash-easy-way-to-pass-a-raw-string-to-grep/11856117#11856117

    https://stackoverflow.com/questions/612658/how-can-i-grep-for-a-count-of-pipes #grep的匹配单位是针对行而不是针对字符的

  3. grep在使用`-e`选项的正则表达式语法时,数字可以用[0-9]表示,但是次数{2}需要对其中的大括号进行转义:
    `
    grep -m 5 ” sec-analyse01 ” /var/logs/bash.log | grep –color -e “^Jun 30 [0-9]\{2\}:[0-9]\{2\}:[0-9]\{2\} sec-analyse01 ”

    grep -e “^Jun 30 [0-9]\{2\}:[0-9]\{2\}:[0-9]\{2\} sec-analyse01 ” /var/logs/bash.log >> bash_log.sec-analyse01.20170630
    `

  4. 使用grep解析VPNFilter IoC的Syslog
    http://www.4hou.com/technology/11801.html
    `
    打开一个终端,从这里 https://blog.talosintelligence.com/2018/05/VPNFilter.html 获取相关的IoC IP,并将其粘贴到临时文件中,将该文件保存到:
    /tmp/vpnfilterc2.txt
    内容是与VPNFilter的C2关联的IP地址

    -r =递归
    -i =忽略大小写
    -l =列出文件名而不是行
    -f =使用文件的内容而不是字符串

    $ grep -rilf /tmp/vpnfilterC2.txt syslog.log
    `

  5. 使用AWK进行文件内容join
    https://blog.yourtion.com/join-file-data-using-awk.html
    https://github.com/yourtion/BlogCodes/tree/master/awk_join
    `
    最近在做一个项目,需要在两个不同的数据源上导出两个 CSV 文件,同时对导出的文件进行类似于 SQL 的 join 操作,由于只是一个查询脚本,没办法修改程序或者数据库等功能,所以就想到将 csv 文件导出后,通过 Linux 自带的命令来完成内容的合并。

    首先是最核心的 join.awk ,实现了 awk 核心的逻辑:

    function read_file_into_array(file, array, status, record) {
    while (1) {
    status = getline record < file
    if (status == -1) {
    print "Failed to read file " file;
    exit 1;
    }
    if (status == 0) break;
    split(record, a, "\t");
    array[a[1]] = a[2];
    }
    close(file);
    }
    BEGIN {
    read_file_into_array(CHANNEL, File);
    }
    {
    if(NR == 1) {
    print "channel\tid\tpv\tuv\tsubmit"
    next
    }
    { printf("%s\t%s\t%s\t%s\t%s\t\n", ($1 in File ? File[$1] : "未知"),$1,$2,$3,$4) }
    }
    `

  6. 如何快速移除一行字符串的首尾空白符号?
    https://unix.stackexchange.com/questions/102008/how-do-i-trim-leading-and-trailing-whitespace-from-each-line-of-some-output
    `
    awk ‘{$1=$1;print}’
    awk ‘{$1=$1};1’

    $ sed ‘s/^[ \t]*//;s/[ \t]*$//’ < file

    create a script /usr/local/bin/trim:

    #!/bin/bash
    awk '{$1=$1};1'

    and give that file executable rights:

    chmod +x /usr/local/bin/trim

    Now you can pass every output to trim for example:

    cat file | trim
    `

发表回复

您的邮箱地址不会被公开。 必填项已用 * 标注