用Python进行字符串提取的两种方法


一、提取某两个标记之间的文本内容(多行)

有文本内容如下:

fdsjhgjhg
fdshkjhk
Start
Good Morning
Hello World
End
dashjkhjk
dsfjkhk

我需要用Python实现——获取”Start”和”End”之间的内容并写入结果文件。

解决方法1:
with open('/path/to/input') as infile, open('/path/to/output', 'w') as outfile:
    copy = False
    for line in infile:
        if line.strip() == "Start":
            copy = True
        elif line.strip() == "End":
            copy = False
        elif copy:
            outfile.write(line)
解决方法2:
with open('input.txt') as myfile:
    content = myfile.read()

text = re.search(r'Start\n.*?End', content, re.DOTALL).group()

with open("output.txt", "w") as myfile2:
    myfile2.write(text)
解决方法3:
import itertools
with open('input.txt', 'r') as f, open('output.txt', 'w') as fout:
    while True:
        it = itertools.dropwhile(lambda line: line.strip() != 'Start', f)
        if next(it, None) is None: break
        fout.writelines(itertools.takewhile(lambda line: line.strip() != 'End', it))
参考链接:

http://stackoverflow.com/questions/18865058/extract-values-between-two-strings-in-a-text-file-using-python

二、提取某两个字符串之间的内容(单行)
解决方法(字符串切片):
'''
get content between str1 and str2 in str
'''
def getBetween(str, str1, str2):
    strOutput = str[str.find(str1)+len(str1):str.find(str2)]
    return strOutput
参考链接:

https://github.com/bfishadow/SBB

三、其它的实现方式
sed -n '/Start/,/End/p' input.txt | grep -Ev '(Start|End)'

sed -e '1,/Start/d' -e '/End/,$d' input.txt

awk /Start/,/End/ input.txt | grep -Ev '(Start|End)'

awk '/Start/{flag=1;next} /End/{flag=0} flag{ print }' input.txt

awk '/End/{flag=0} flag; /Start/{flag=1}' input.txt

perl -lne 'print if((/Start/../End/) && !(/Start/||/End/))' input.txt
搜索关键字:
  • awk print line between
参考链接:

=EOF=

, ,

《“用Python进行字符串提取的两种方法”》 有 1 条评论

回复 a-z 取消回复

您的电子邮箱地址不会被公开。 必填项已用 * 标注