首先是安装飞桨,然后是安装paddleocr
代码语言:javascript复制pip install "paddleocr>=2.0.1"
对图像进行识别
代码语言:javascript复制from paddleocr import PaddleOCR, draw_ocr
from PIL import Image
if __name__ == '__main__':
ocr = PaddleOCR(use_angle_cls=True, lang='ch')
img_path = 'demo/demo_kie.jpeg'
result = ocr.ocr(img_path, cls=True)
for line in result:
print(line)
image = Image.open(img_path).convert('RGB')
boxes = [line[0] for line in result]
txts = [line[1][0] for line in result]
scores = [line[1][1] for line in result]
im_show = draw_ocr(image, boxes, txts, scores, font_path='data/chineseocr/labels/font.TTF')
im_show = Image.fromarray(im_show)
im_show.save('output/result5.jpg')
这里的PaddleOCR(use_angle_cls=True, lang='ch')中的lang可以是很多种语言,比如`ch`, `en`, `fr`, `german`, `korean`, `japan`。
这里即包含了文字检测,也包含了文本识别,一般结果如下
但如果是一张比较简单的文字,如
这个时候,我们只需要识别,无需检测
代码语言:javascript复制from paddleocr import PaddleOCR, draw_ocr
if __name__ == '__main__':
ocr = PaddleOCR(use_angle_cls=True, lang='en')
img_path = 'demo/demo_text_recog.jpg'
result = ocr.ocr(img_path, cls=True, det=False)
for line in result:
print(line)
运行结果(部分)
代码语言:javascript复制('STAR', 0.8838256597518921)
PaddleOCR框架下载地址:GitHub - PaddlePaddle/PaddleOCR: Awesome multilingual OCR toolkits based on PaddlePaddle (practical ultra lightweight OCR system, support 80 languages recognition, provide data annotation and synthesis tools, support training and deployment among server, mobile, embedded and IoT devices)
模型训练
这里依然以Kaggle 验证码文本识别为例,PaddleOCR的数据集格式跟MMOCR有一些不同,它需要将训练数据集和测试数据集的图片放在两个不同的文件夹中。大致样式如下
由于之前都是放在一起的,所以写一个脚本将它们分开
代码语言:javascript复制import shutil
if __name__ == '__main__':
with open('data/toy_dataset/test_label.txt', 'r') as f:
for line in f:
filename = line.split(' ')[0]
shutil.move('data/toy_dataset/train/' filename, 'data/toy_dataset/test/' filename)
另外它的标签文件中间是以制表符t分开的,而在MMOCR中是以空格分开的。
代码语言:javascript复制2wc38.png 2wc38
y5n6d.png y5n6d
men4f.png men4f
57b27.png 57b27
x3deb.png x3deb
修改PaddleOCR主目录下的configs/rec/rec_icdar15_train.yml文件,当然这只是识别框架的其中之一,我们以此为例,修改的部分内容如下
代码语言:javascript复制Train:
dataset:
name: SimpleDataSet
# data_dir: ./train_data/ic15_data/
data_dir: ./data/toy_dataset/train/
# label_file_list: ["./train_data/ic15_data/rec_gt_train.txt"]
label_file_list: ["./data/toy_dataset/train_label.txt"]
transforms:
- DecodeImage: # load image
img_mode: BGR
channel_first: False
- CTCLabelEncode: # Class handling label
- RecResizeImg:
image_shape: [3, 32, 100] # 中文[3, 32, 320]
- KeepKeys:
keep_keys: ['image', 'label', 'length'] # dataloader will return list in this order
loader:
shuffle: True
batch_size_per_card: 256
drop_last: True
num_workers: 8
use_shared_memory: False
Eval:
dataset:
name: SimpleDataSet
# data_dir: ./train_data/ic15_data
data_dir: ./data/toy_dataset/test/
# label_file_list: ["./train_data/ic15_data/rec_gt_test.txt"]
label_file_list: ["./data/toy_dataset/test_label.txt"]
transforms:
- DecodeImage: # load image
img_mode: BGR
channel_first: False
- CTCLabelEncode: # Class handling label
- RecResizeImg:
image_shape: [3, 32, 100]
- KeepKeys:
keep_keys: ['image', 'label', 'length'] # dataloader will return list in this order
loader:
shuffle: False
drop_last: False
batch_size_per_card: 256
num_workers: 4
use_shared_memory: False
将tools文件夹下的train.py拷贝到PaddleOCR主文件夹下,添加参数
代码语言:javascript复制--config=configs/rec/rec_icdar15_train.yml
运行,开始训练。