PaddleOCR使用指南

2022-09-19 15:00:05 浏览数 (1)

首先是安装飞桨,然后是安装paddleocr

代码语言:javascript复制
pip install "paddleocr>=2.0.1"

对图像进行识别

代码语言:javascript复制
from paddleocr import PaddleOCR, draw_ocr
from PIL import Image

if __name__ == '__main__':

    ocr = PaddleOCR(use_angle_cls=True, lang='ch')
    img_path = 'demo/demo_kie.jpeg'
    result = ocr.ocr(img_path, cls=True)
    for line in result:
        print(line)

    image = Image.open(img_path).convert('RGB')
    boxes = [line[0] for line in result]
    txts = [line[1][0] for line in result]
    scores = [line[1][1] for line in result]
    im_show = draw_ocr(image, boxes, txts, scores, font_path='data/chineseocr/labels/font.TTF')
    im_show = Image.fromarray(im_show)
    im_show.save('output/result5.jpg')

这里的PaddleOCR(use_angle_cls=True, lang='ch')中的lang可以是很多种语言,比如`ch`, `en`, `fr`, `german`, `korean`, `japan`。

这里即包含了文字检测,也包含了文本识别,一般结果如下

但如果是一张比较简单的文字,如

这个时候,我们只需要识别,无需检测

代码语言:javascript复制
from paddleocr import PaddleOCR, draw_ocr

if __name__ == '__main__':

    ocr = PaddleOCR(use_angle_cls=True, lang='en')
    img_path = 'demo/demo_text_recog.jpg'
    result = ocr.ocr(img_path, cls=True, det=False)
    for line in result:
        print(line)

运行结果(部分)

代码语言:javascript复制
('STAR', 0.8838256597518921)

PaddleOCR框架下载地址:GitHub - PaddlePaddle/PaddleOCR: Awesome multilingual OCR toolkits based on PaddlePaddle (practical ultra lightweight OCR system, support 80 languages recognition, provide data annotation and synthesis tools, support training and deployment among server, mobile, embedded and IoT devices)

模型训练

这里依然以Kaggle 验证码文本识别为例,PaddleOCR的数据集格式跟MMOCR有一些不同,它需要将训练数据集和测试数据集的图片放在两个不同的文件夹中。大致样式如下

由于之前都是放在一起的,所以写一个脚本将它们分开

代码语言:javascript复制
import shutil

if __name__ == '__main__':

    with open('data/toy_dataset/test_label.txt', 'r') as f:
        for line in f:
            filename = line.split('	')[0]
            shutil.move('data/toy_dataset/train/'   filename, 'data/toy_dataset/test/'   filename)

另外它的标签文件中间是以制表符t分开的,而在MMOCR中是以空格分开的。

代码语言:javascript复制
2wc38.png	2wc38
y5n6d.png	y5n6d
men4f.png	men4f
57b27.png	57b27
x3deb.png	x3deb

修改PaddleOCR主目录下的configs/rec/rec_icdar15_train.yml文件,当然这只是识别框架的其中之一,我们以此为例,修改的部分内容如下

代码语言:javascript复制
Train:
  dataset:
    name: SimpleDataSet
#    data_dir: ./train_data/ic15_data/
    data_dir: ./data/toy_dataset/train/
#    label_file_list: ["./train_data/ic15_data/rec_gt_train.txt"]
    label_file_list: ["./data/toy_dataset/train_label.txt"]
    transforms:
      - DecodeImage: # load image
          img_mode: BGR
          channel_first: False
      - CTCLabelEncode: # Class handling label
      - RecResizeImg:
          image_shape: [3, 32, 100]  # 中文[3, 32, 320]
      - KeepKeys:
          keep_keys: ['image', 'label', 'length'] # dataloader will return list in this order
  loader:
    shuffle: True
    batch_size_per_card: 256
    drop_last: True
    num_workers: 8
    use_shared_memory: False

Eval:
  dataset:
    name: SimpleDataSet
#    data_dir: ./train_data/ic15_data
    data_dir: ./data/toy_dataset/test/
#    label_file_list: ["./train_data/ic15_data/rec_gt_test.txt"]
    label_file_list: ["./data/toy_dataset/test_label.txt"]
    transforms:
      - DecodeImage: # load image
          img_mode: BGR
          channel_first: False
      - CTCLabelEncode: # Class handling label
      - RecResizeImg:
          image_shape: [3, 32, 100]
      - KeepKeys:
          keep_keys: ['image', 'label', 'length'] # dataloader will return list in this order
  loader:
    shuffle: False
    drop_last: False
    batch_size_per_card: 256
    num_workers: 4
    use_shared_memory: False

将tools文件夹下的train.py拷贝到PaddleOCR主文件夹下,添加参数

代码语言:javascript复制
--config=configs/rec/rec_icdar15_train.yml

运行,开始训练。

0 人点赞