在macOS上用命令/脚本进行OCR提取文字内容

=Start=

缘由：

简单整理一下，方便后面有需要的时候参考。另外提一下，ocrmac等模块是对macOS系统能力做的封装，所以仅仅在较高版本macOS(10.15+)系统上才可生效。

简单来说就是，macOS系统上用它自带的OCR能力，速度更快、效果更好，更值得推荐。

正文：

参考解答：

环境准备

Tesseract

brew install tesseract
brew install tesseract-lang

tesseract --help
tesseract --help-extra

tesseract test.png - --oem 1 #将识别结果直接输出到终端

tesseract test.png ocr_result_tesseract --oem 1 -l chi_sim

ocrmac

# 使用 Python3 自带的 venv 模块创建一个虚拟隔离环境
python3 -m venv venv1
source venv1/bin/activate

# 安装依赖
pip install ocrmac
pip install ipython

简单的编程实现

直接调用命令来识别

#!/bin/bash

# 截屏将存储为 .png 文件，而录屏将存储为 .mov 文件，默认存储位置是`~/Desktop`。文件名将以“截屏”或“录屏”开头，并包括日期和时间。比如：
# ~/Desktop/截屏2025-01-08 20.32.16.png
# 上面的这个自动保存的文件名中就包含空格。

# ls -1 ~/Desktop/*.png | while IFS= read -r file; do
ls ~/Desktop/*.png | while IFS= read -r file; do
    # echo "$file"
    file "$file"
    tesseract "$file" - --oem 1 -l chi_sim >"$file".txt
done

# ls 的 -1 选项，强制输出每行一个条目。当输出不指向终端时，这是默认值。 Force output to be one entry per line.  This is the default when output is not to a terminal.
# 直接 ls 命令不指定任何选项也是可以的。

# 图片的处理顺序不可控
find ~/Desktop -iname \*.png -mindepth 1 -type f -print0 | sort | while IFS= read -r -d $'\0' fp; do
    #statements
    echo "$fp"
    # /opt/homebrew/Cellar/tesseract/5.5.0/bin/tesseract "$fp" - --oem 1 -l chi_sim >> "$fp".txt
done

通过Python编程来识别

from ocrmac import ocrmac

annotations = ocrmac.OCR('test.png').recognize() #对于图片中的中文内容识别不好，需要通过 language_preference 指定语言

annotations = ocrmac.OCR('test.png', language_preference=['zh-Hans']).recognize() #ok
'''
[('作新年里的第一篇，我们讲什么话题呢？', 0.5, [0.02004453812264858, 0.9707568809405439, 0.4632516871018234, 0.021975676947777467]), ('讲破局。', 1.0, [0.02004454434757439, 0.9243119263511643, 0.09131403086684081, 0.020642201834862317]), ('我经常听到一类声音：', 0.5, [0.020044542850228938, 0.8793604652597939, 0.23608017638194745, 0.017441860032737733]), ('时间不经浪，人生不经晃，只有年纪噌噌往上涨；', 0.5, [0.02004454478891597, 0.8255813958466146, 0.550111365024989, 0.02476264030561537]), ('家里没有矿，全靠自己扛，辛苦到头来一场空忙；', 0.5, [0.02004454570576815, 0.7790697679033006, 0.550111365024989, 0.02482931438936009]), ..., ('这件事，跟着我们阅读超过1年的读者，都记得，不记得你也可以点进去复习复习。', 1.0, [0.020044559825702565, 0.002866972200416651, 0.8930957438866158, 0.021842329873951294])]
'''

annotations = ocrmac.OCR('test.png', language_preference=['zh-Hans'], framework="livetext").recognize() #ok 不过这里的识别结果单位变成了一个字符，但每个识别结果的置信度都很高
'''
[('作', 1.0, [0.02004454736451367, 0.970756880733945, 0.027846489542340463, 0.020642201834862386]), ('为', 1.0, [0.04789103690685413, 0.970756880733945, 0.023725834797891032, 0.020642201834862386]), ..., ('。', 1.0, [0.9009226713532513, 0.0028669724770642446, 0.012217638223586125, 0.021842329848517084])]
'''
print(annotations)

# annotations 是一个 list ，其中每一个元素的结构是 (Text, Confidence, BoundingBox) 即（提取文本，置信度，边界框）

for x in annotations:
    print(x[0])

from ocrmac import ocrmac

annotations = ocrmac.OCR('test.png', language_preference=['zh-Hans']).recognize() #识别结果单位是一行文字

annotations = ocrmac.OCR('test.png', language_preference=['zh-Hans'], framework="vision", recognition_level="fast").recognize() #根据作者的测试统计来看，这个不如下面这个 livetext 方式均衡

# 推荐方式
annotations = ocrmac.OCR('test.png', language_preference=['zh-Hans'], framework="livetext").recognize() #识别结果单位是一个字符，但每个识别结果的置信度都很高

'''
# language_preference
en-US
zh-Hans
de-DE
...

# framework
vision
livetext

# recognition_level
fast
accurate
'''



# annotations 是一个 list ，其中每一个元素的结构是 (Text, Confidence, BoundingBox) 即（提取文本，置信度，边界框）
print(annotations)

# 只打印识别结果
for x in annotations:
    print(x[0])

参考链接：

ocrmac
https://github.com/straussmaximilian/ocrmac

A Python wrapper for Google Tesseract
https://github.com/madmaze/pytesseract

EasyOCR
https://github.com/JaidedAI/EasyOCR

A powerful OCR tool for macOS – from terminal to API serve
https://github.com/dielect/mac-ocr-cli

=END=

《“在macOS上用命令/脚本进行OCR提取文字内容”》有 1 条评论

hi说道：

2025-01-13 11:49

Use macOS OCR engine from Python
https://gist.github.com/jonashaag/95e8b75ed44cc5b93cbc5d4599e3803a

将上面 gist 的那段代码丢给 ChatGPT等大模型它们会给你解释各部分代码的含义，并指出代码中存在的问题，然后你通过进一步的交互式提问【上面那一段代码对于包含中文字符的图片的OCR效果不理想，请帮忙给出更加健壮和高效的Python代码】，它就会给你提供一整段更为清晰、高效的代码，并告诉你需要什么依赖，改进了哪些地方，然后你把代码复制到本地简单修改一下就可以运行了。

实测效果也不错，相当于实现了一个简单的 ocrmac 模块（都需要依赖 pyobjc 库来调用Vision API），不管怎么样，都是需要安装一些额外的依赖的。

回复

ASPIRE

在macOS上用命令/脚本进行OCR提取文字内容

缘由：

正文：

参考解答：

参考链接：

《“在macOS上用命令/脚本进行OCR提取文字内容”》有 1 条评论

发表回复取消回复

在macOS上用命令/脚本进行OCR提取文字内容

缘由：

正文：

参考解答：

参考链接：

《“在macOS上用命令/脚本进行OCR提取文字内容”》 有 1 条评论

发表回复 取消回复

《“在macOS上用命令/脚本进行OCR提取文字内容”》有 1 条评论

发表回复取消回复