利用腾讯云API(Python)对字幕文件进行翻译

2021-08-19 10:20:33 浏览数 (2)

原文地址:利用腾讯云API(Python)对字幕文件进行翻译

引言

本篇文章使用腾讯云的机器翻译来对英语字幕文件进行翻译,接口的需要的SecretId和SecretKey请自行上腾讯云https://console.cloud.tencent.com/cam/capi获取,运行环境为Python3.8,如使用Python2,请注意注释内容,并进行相对于的修改,程序还需要用到腾讯云的Python SDK:

代码语言:txt复制
pip install tencentcloud-sdk-python

翻译前示例文件

代码语言:txt复制
WEBVTT
X-TIMESTAMP-MAP=LOCAL:00:00:00.000,MPEGTS:161632

1
00:00:01.070 --> 00:00:02.970
<v Don>Greetings ladies and gentlemen, this is Don Murdoch</v>

2
00:00:02.970 --> 00:00:05.070
and I'm going to be doing a talk this afternoon here

3
00:00:05.070 --> 00:00:07.960
at the RSA conference on adversary simulation.

4
00:00:07.960 --> 00:00:10.170
We're going to go through this process

5
00:00:10.170 --> 00:00:11.645
and what we want to be able to do

6
00:00:11.645 --> 00:00:13.100
throughout this presentation has help you close the gaps

7
00:00:13.100 --> 00:00:15.070
in your security posture.

8
00:00:15.070 --> 00:00:17.480
So, by way of introduction, I've been in IT

9
00:00:17.480 --> 00:00:20.260
for well over 25 years, about 17 years

10
00:00:20.260 --> 00:00:21.860
in information security.
......

翻译后示例文件

代码语言:txt复制
WEBVTT
X-TIMESTAMP-MAP=LOCAL:00:00:00.000,MPEGTS:161632

1
00:00:01.070 --> 00:00:02.970
<v Don>女士们先生们,大家好,我是Don Murdoch</v>

2
00:00:02.970 --> 00:00:05.070
今天下午我要在这里做一个演讲

3
00:00:05.070 --> 00:00:07.960
在RSA关于对手模拟的会议上。

4
00:00:07.960 --> 00:00:10.170
我们要经历这个过程

5
00:00:10.170 --> 00:00:11.645
我们想要做的是

6
00:00:11.645 --> 00:00:13.100
整个演示文稿帮助您缩小差距

7
00:00:13.100 --> 00:00:15.070
以你的安全姿态。

8
00:00:15.070 --> 00:00:17.480
所以,顺便介绍一下,我在IT行业

9
00:00:17.480 --> 00:00:20.260
已经超过25年了,大约17年

10
00:00:20.260 --> 00:00:21.860
在信息安全方面。

代码

代码语言:txt复制
# coding:utf-8
'''
@author: Duckweeds7  20210527
@todo: 腾讯云API翻译字幕文件
'''
import json
from time import sleep
from tencentcloud.common import credential
from tencentcloud.common.exception.tencent_cloud_sdk_exception import TencentCloudSDKException
from tencentcloud.common.profile.client_profile import ClientProfile
from tencentcloud.common.profile.http_profile import HttpProfile
from tencentcloud.tmt.v20180321 import tmt_client, models


class TencentTranslate():
    
    '''
    翻译接口,输入为待翻译句子的列表
    '''
    def translate(self, t):
        try:
            cred = credential.Credential("your SecretId", "your SecretKey")
            httpProfile = HttpProfile()
            httpProfile.endpoint = "tmt.tencentcloudapi.com"

            clientProfile = ClientProfile()
            clientProfile.httpProfile = httpProfile
            client = tmt_client.TmtClient(cred, "ap-guangzhou", clientProfile)

            req = models.TextTranslateBatchRequest()
            params = {
                "Source": "auto",
                "Target": "zh",
                "ProjectId": 0,
                "SourceTextList": t
            }
            req.from_json_string(json.dumps(params))

            resp = client.TextTranslateBatch(req)
            return json.loads(resp.to_json_string())

        except TencentCloudSDKException as err:
            print(err)

    '''
    程序主入口
    '''
    def main(self, path):
        content = open(path, 'r', encoding='utf-8').readlines()  # 将待翻译字幕文件按行读取成列表 
        # python2 content = open(path, 'r').readlines()
        head, context = content[:5], content[5:]  # 切割头部不需要翻译的内容和正文 根据自己需求修改头部行数
        new_context = context[:]  # 复制一份准备用来替换翻译内容的正文部分
        
        wait_for_translate = []  # 声明一个放置待翻译文本的列表
        for c in range(0, len(context), 4): # 将每行的内容加入待翻译列表中,并去掉换行符,4是间隔
            wait_for_translate.append(context[c].replace('n', ''))
        wail_list = [] 
        wail_tmp = []
        for l in range(len(wait_for_translate)): # 这一块是将总的文本切分成多个40行的文本,这是因为腾讯云的批量文本翻译接口有限制,不能超出2000个字符,这一块也是根据你的字幕文件来决定的,句子如果较长的话,就把这个数调低点,句子较短,就把这个数调高。
            wail_tmp.append(wait_for_translate[l])
            if len(wail_tmp) == 40 or l == len(wait_for_translate) - 1: 
                wail_list.append(wail_tmp)
                wail_tmp = []
        translater = []

        for w in range(len(wail_list)): # 批量进行翻译
            translater.extend(self.translate(wail_list[w])['TargetTextList'])
            sleep(0.21) # 休眠是因为腾讯云接口调用时间限制
        count = 0
        for c in range(0, len(context), 4):
            new_context[c] = translater[count]   'n' # 替换翻译内容并补上换行符
            count  = 1
            if count == len(translater):
                break
        name = path.replace('en', 'zh') # 
        with open(name, 'w', encoding='utf-8') as f:
            f.writelines(head   new_context)
        return name


if __name__ == '__main__':
    TencentTranslate().main('xxx_en.vtt')
    # test()

0 人点赞