使用python 将逐条sql合并为批量sql

2024-09-28 14:12:29 浏览数 (2)

场景:研发有个数据导入的需求,给过来的是类似下面的逐行insert语句

代码语言:txt复制
# 注意: 必须是一行一条INSERT,如果insert跨多行了,本文的脚本将无法使用!!

employee.sql示例内容如下:
INSERT INTO employee (age,name,job) VALUES(23,'wang','dba');
INSERT INTO employee (age,name,job) VALUES(23,'wang','dba');
INSERT INTO employee (age,name,job) VALUES(23,'wang','dba');
INSERT INTO employee (age,name,job) VALUES(23,'wang','dba');
INSERT INTO employee (age,name,job) VALUES(23,'wang','dba');
INSERT INTO employee (age,name,job) VALUES(23,'wang','dba');
INSERT INTO employee (age,name,job) VALUES(23,'wang','dba');
INSERT INTO employee (age,name,job) VALUES(23,'wang','dba');
INSERT INTO employee (age,name,job) VALUES(23,'wang','dba');
INSERT INTO employee (age,name,job) VALUES(23,'wang','dba');
INSERT INTO employee (age,name,job) VALUES(23,'wang','dba');

这种逐条插入的sql,导入速度非常慢,还可能造成复制的延迟。

因此可以使用下面的python脚本,将其转换为batch批量sql。

脚本如下:

代码语言:txt复制
import os
import subprocess

# 待处理的文件名,注意employee.sql文件里面的VALUES关键字要是大写
# 下面的提取每条语句的VALUES部分写死了必须是大写的VALUES
origial_file="employee.sql"

# 切分的行数(建议控制在1k-2k)
batch_size=1000

# 数据库的列名
column_list="age,name,job"
table_name="employee"

# 清理下,防止有老的文件存在
try:
    os.remove('batch.sql')
except:
    pass

# 将某个文件按照batch_size切分为一个个小文件,文件名以output_开头
command = f"split -l {batch_size} {origial_file} output_"
result = subprocess.run(command, shell=True)

# 检查命令的输出
if result.returncode == 0:
    print("Command executed successfully!")
else:
    print("Command failed with return code:", result.returncode)
    sys.exit(10)

current_directory = os.getcwd()
for filename in os.listdir(current_directory):
    if filename.startswith("output_"):
        print(filename)
        with open (filename,'r') as f:
            insert_statements = [line.strip() for line in f.readlines()]

        # 提取每条语句的VALUES部分
        values = [stmt.rstrip(';').split("VALUES")[1] for stmt in insert_statements]

        # 拼接成一条合并的INSERT语句
        merged_statement = f"INSERT INTO {table_name} ({column_list}) VALUES "   ", ".join(values)   ';n'

        with open ('batch.sql','a') as ff:
            ff.write(merged_statement)

        try:
            os.remove(filename)
        except:
            pass

1 人点赞