手把手教你使用python实现文字识别提取

2023-11-26 17:18:34 浏览数 (1)

一 背景

没事玩玩文字识别(Optical Character Recognition,OCR),发现有很多开源的可以使用,诸如easyOCR,cnocr,mmocr ,paddleocr,tesseract等。网上也有相应的demo和比较,还比较全。但是腾讯的OCR也是蛮牛,网上使用和介绍的挺少,所以本文就略微研究学习下。腾讯的OCR是基于腾讯优图实验室的深度学习技术,将图片上的文字内容,智能识别成为可编辑的文本。详情可以参见https://cloud.tencent.com/document/product/866

二 准备

2.1 开通OCR服务

如果没开通可以申请开通OCR服务。开通之后的界面如下:

从功能表格我们看到它支持几十种类型的识别,诸如通用文字识别,卡证文字识别,票据单据识别,特定场景识别,智能结构化识别,文本图像增强智能扫码API,2022营业执照核验,功能很多,免费支持1000次call,在线调试等,本次我以手写体文字识别来做体验,相信其他API应该是类似的流程。

2.2 环境准备

源码安装:https://github.com/TencentCloud/tencentcloud-sdk-python.git

python 版本:Python 2.7.16

OS: mac 10.15.7 (19H2026)

安装:

代码语言:javascript复制
pip install --upgrade tencentcloud-sdk-python

安装日志:

代码语言:javascript复制
DEPRECATION: Python 2.7 reached the end of its life on January 1st, 2020. Please upgrade your Python as Python 2.7 is no longer maintained. pip 21.0 will drop support for Python 2.7 in January 2021. More details about Python 2 support in pip can be found at https://pip.pypa.io/en/latest/development/release-process/#python-2-support pip 21.0 will remove support for this functionality.
Defaulting to user installation because normal site-packages is not writeable
Collecting tencentcloud-sdk-python
  Downloading tencentcloud_sdk_python-3.0.1038-py2.py3-none-any.whl (9.3 MB)
     |████████████████████████████████| 9.3 MB 26 kB/s 
Requirement already satisfied, skipping upgrade: requests>=2.16.0 in ./Library/Python/2.7/lib/python/site-packages (from tencentcloud-sdk-python) (2.27.1)
Requirement already satisfied, skipping upgrade: certifi>=2017.4.17 in ./Library/Python/2.7/lib/python/site-packages (from requests>=2.16.0->tencentcloud-sdk-python) (2021.10.8)
Requirement already satisfied, skipping upgrade: urllib3<1.27,>=1.21.1 in ./Library/Python/2.7/lib/python/site-packages (from requests>=2.16.0->tencentcloud-sdk-python) (1.26.14)
Requirement already satisfied, skipping upgrade: chardet<5,>=3.0.2; python_version < "3" in ./Library/Python/2.7/lib/python/site-packages (from requests>=2.16.0->tencentcloud-sdk-python) (4.0.0)
Requirement already satisfied, skipping upgrade: idna<3,>=2.5; python_version < "3" in ./Library/Python/2.7/lib/python/site-packages (from requests>=2.16.0->tencentcloud-sdk-python) (2.10)
Installing collected packages: tencentcloud-sdk-python
Successfully installed tencentcloud-sdk-python-3.0.1038

申请TENCENTCLOUD_SECRET_ID 和TENCENTCLOUD_SECRET_KEY:

可以去https://console.cloud.tencent.com/cam/capi 查看或者创建。

网络测试图片:可以百度去找下你想要的字体图片,然后复制链接。

3. 代码调试

在OCR服务列表,可以调用API调试功能(https://console.cloud.tencent.com/api/explorer?Product=ocr&Version=2018-11-19&Action=GeneralHandwritingOCR),这是腾讯云比较好的一个功能,代码调试,你可以填写参数,生成模拟代码,并模拟执行。

请求参数请求参数
代码生成和调试代码生成和调试

我们也可以基于我们搭建的环境验证:

新建py文件,把上述生成的代码copy进去:

在文件顶端添加utf-8编码,避免中文不识别

代码语言:javascript复制
# -*- coding: utf-8 -*-

整个代码块如下:

代码语言:javascript复制
# -*- coding: utf-8 -*-
import json
from tencentcloud.common import credential
from tencentcloud.common.profile.client_profile import ClientProfile
from tencentcloud.common.profile.http_profile import HttpProfile
from tencentcloud.common.exception.tencent_cloud_sdk_exception import TencentCloudSDKException
from tencentcloud.ocr.v20181119 import ocr_client, models
try:
    # 实例化一个认证对象,入参需要传入腾讯云账户 SecretId 和 SecretKey,此处还需注意密钥对的保密
    # 代码泄露可能会导致 SecretId 和 SecretKey 泄露,并威胁账号下所有资源的安全性。以下代码示例仅供参考,建议采用更安全的方式来使用密钥,请参见:https://cloud.tencent.com/document/product/1278/85305
    # 密钥可前往官网控制台 https://console.cloud.tencent.com/cam/capi 进行获取
    cred = credential.Credential("SecretId", "SecretKey")
    # 实例化一个http选项,可选的,没有特殊需求可以跳过
    httpProfile = HttpProfile()
    httpProfile.endpoint = "ocr.ap-shanghai.tencentcloudapi.com"

    # 实例化一个client选项,可选的,没有特殊需求可以跳过
    clientProfile = ClientProfile()
    clientProfile.httpProfile = httpProfile
    # 实例化要请求产品的client对象,clientProfile是可选的
    client = ocr_client.OcrClient(cred, "ap-shanghai", clientProfile)

    # 实例化一个请求对象,每个接口都会对应一个request对象
    req = models.GeneralHandwritingOCRRequest()
    params = {
        "ImageUrl": "https://gd-hbimg.huaban.com/3649dd984fff035db47aa6443b23676f37ac21b07e610-wjwRPZ",
        "EnableWordPolygon": False,
        "EnableDetectText": True
    }
    req.from_json_string(json.dumps(params))

    # 返回的resp是一个GeneralHandwritingOCRResponse的实例,与请求对象对应
    resp = client.GeneralHandwritingOCR(req)
    # 输出json格式的字符串回包
    print(resp.to_json_string())

except TencentCloudSDKException as err:
    print(err)

上述可以替换掉SecretId,SecretKey为你自己申请的。

ImageUrl可以替换为你的图片地址。

现在这个地址图片是:https://gd-hbimg.huaban.com/3649dd984fff035db47aa6443b23676f37ac21b07e610-wjwRPZ

执行结果如下:

代码语言:javascript复制

{
    "TextDetections": [
        {
            "AdvancedInfo": "{"Parag":{"ParagNo":2}}",
            "Confidence": 69,
            "DetectedText": "山",
            "Polygon": [
                {
                    "Y": 407,
                    "X": 1083
                },
                {
                    "Y": 490,
                    "X": 1083
                },
                {
                    "Y": 490,
                    "X": 1038
                },
                {
                    "Y": 407,
                    "X": 1038
                }
            ],
            "WordPolygon": []
        },
        {
            "AdvancedInfo": "{"Parag":{"ParagNo":2}}",
            "Confidence": 82,
            "DetectedText": "人生若口如初見",
            "Polygon": [
                {
                    "Y": 310,
                    "X": 205
                },
                {
                    "Y": 329,
                    "X": 923
                },
                {
                    "Y": 506,
                    "X": 919
                },
                {
                    "Y": 487,
                    "X": 201
                }
            ],
            "WordPolygon": []
        },
        {
            "AdvancedInfo": "{"Parag":{"ParagNo":1}}",
            "Confidence": 61,
            "DetectedText": "1",
            "Polygon": [
                {
                    "Y": 46,
                    "X": 81
                },
                {
                    "Y": 127,
                    "X": 83
                },
                {
                    "Y": 128,
                    "X": 51
                },
                {
                    "Y": 47,
                    "X": 49
                }
            ],
            "WordPolygon": []
        }
    ],
    "RequestId": "7d102507-a041-41fd-a361-fd0f05f030b7",
    "Angel": 179.99000549316406
}

我们再换个图片:

https://img1.baidu.com/it/u=2828164285,1071569506&fm=253&fmt=auto&app=138&f=JPEG?w=1000&h=500

识别结果:

代码语言:javascript复制
{
    "TextDetections": [
        {
            "AdvancedInfo": "{"Parag":{"ParagNo":1}}",
            "Confidence": 99,
            "DetectedText": "你是声色张扬下我欲盖弥彰的温柔。",
            "Polygon": [
                {
                    "Y": 227,
                    "X": 102
                },
                {
                    "Y": 214,
                    "X": 886
                },
                {
                    "Y": 285,
                    "X": 888
                },
                {
                    "Y": 298,
                    "X": 103
                }
            ],
            "WordPolygon": []
        }
    ],
    "RequestId": "894e272b-48e2-4816-9f6f-003e09d82059",
    "Angel": -0.009999999776482582
}

可以看到如果字体很干净,识别的很准确。

再换个草书看看效果:

草书草书

https://img14.360buyimg.com/n1/jfs/t1/35875/36/7052/136580/5cca9b7bE22d505d6/c4cd8342ee18ca06.jpg

执行响应:

代码语言:javascript复制
{
    "TextDetections": [
        {
            "AdvancedInfo": "{"Parag":{"ParagNo":1}}",
            "Confidence": 95,
            "DetectedText": "我们每正的生活。不要浪费",
            "Polygon": [
                {
                    "Y": 8,
                    "X": 40
                },
                {
                    "Y": 8,
                    "X": 305
                },
                {
                    "Y": 52,
                    "X": 305
                },
                {
                    "Y": 52,
                    "X": 40
                }
            ],
            "WordPolygon": []
        },
        {
            "AdvancedInfo": "{"Parag":{"ParagNo":1}}",
            "Confidence": 95,
            "DetectedText": "在那些負面的情绪里。而是要",
            "Polygon": [
                {
                    "Y": 60,
                    "X": 45
                },
                {
                    "Y": 60,
                    "X": 306
                },
                {
                    "Y": 96,
                    "X": 306
                },
                {
                    "Y": 96,
                    "X": 45
                }
            ],
            "WordPolygon": []
        },
        {
            "AdvancedInfo": "{"Parag":{"ParagNo":1}}",
            "Confidence": 92,
            "DetectedText": "多些积极、少些消极、因为在",
            "Polygon": [
                {
                    "Y": 106,
                    "X": 47
                },
                {
                    "Y": 106,
                    "X": 308
                },
                {
                    "Y": 146,
                    "X": 308
                },
                {
                    "Y": 146,
                    "X": 47
                }
            ],
            "WordPolygon": []
        },
        {
            "AdvancedInfo": "{"Parag":{"ParagNo":1}}",
            "Confidence": 95,
            "DetectedText": "这个世界上。能够真正让我们",
            "Polygon": [
                {
                    "Y": 157,
                    "X": 40
                },
                {
                    "Y": 151,
                    "X": 306
                },
                {
                    "Y": 198,
                    "X": 307
                },
                {
                    "Y": 204,
                    "X": 41
                }
            ],
            "WordPolygon": []
        },
        {
            "AdvancedInfo": "{"Parag":{"ParagNo":1}}",
            "Confidence": 96,
            "DetectedText": "变得强大的。是来自内心的力量。",
            "Polygon": [
                {
                    "Y": 201,
                    "X": 37
                },
                {
                    "Y": 201,
                    "X": 305
                },
                {
                    "Y": 239,
                    "X": 305
                },
                {
                    "Y": 239,
                    "X": 37
                }
            ],
            "WordPolygon": []
        },
        {
            "AdvancedInfo": "{"Parag":{"ParagNo":1}}",
            "Confidence": 91,
            "DetectedText": "因此,成功人士很少纠續那些",
            "Polygon": [
                {
                    "Y": 246,
                    "X": 42
                },
                {
                    "Y": 250,
                    "X": 315
                },
                {
                    "Y": 293,
                    "X": 314
                },
                {
                    "Y": 288,
                    "X": 41
                }
            ],
            "WordPolygon": []
        },
        {
            "AdvancedInfo": "{"Parag":{"ParagNo":1}}",
            "Confidence": 94,
            "DetectedText": "負面情绪和相关问题。",
            "Polygon": [
                {
                    "Y": 301,
                    "X": 45
                },
                {
                    "Y": 301,
                    "X": 244
                },
                {
                    "Y": 341,
                    "X": 244
                },
                {
                    "Y": 341,
                    "X": 45
                }
            ],
            "WordPolygon": []
        }
    ],
    "RequestId": "cc24c09c-43a9-45a2-9d91-cb9f096d271b",
    "Angel": 359.8233337402344
}

可以看出个别文字识别不对,但是整体还是不错了。

4 总结

最后总结下,第一次接触OCR相关的主题,本文体验了腾讯的OCR功能,一步一步通过搭建环境,生成code,再找参数值,一步一步体验了腾讯云的强大功能。 详细的功能还请参考官方文档。

最后以上面识别的话结语:

我们每天的生活, 不要浪费

在那些負面的情绪里;而是要

多些积极、少些消极;因为在

这个世界上,能够真正让我们

变得强大的,是来自内心的力量。

因此,成功人士很少纠缠那些

負面情绪和相关问题。

我正在参与2023腾讯技术创作特训营第三期有奖征文,组队打卡瓜分大奖!

0 人点赞