场景:研发有个数据导入的需求,给过来的是类似下面的逐行insert语句
代码语言:txt复制# 注意: 必须是一行一条INSERT,如果insert跨多行了,本文的脚本将无法使用!!
employee.sql示例内容如下:
INSERT INTO employee (age,name,job) VALUES(23,'wang','dba');
INSERT INTO employee (age,name,job) VALUES(23,'wang','dba');
INSERT INTO employee (age,name,job) VALUES(23,'wang','dba');
INSERT INTO employee (age,name,job) VALUES(23,'wang','dba');
INSERT INTO employee (age,name,job) VALUES(23,'wang','dba');
INSERT INTO employee (age,name,job) VALUES(23,'wang','dba');
INSERT INTO employee (age,name,job) VALUES(23,'wang','dba');
INSERT INTO employee (age,name,job) VALUES(23,'wang','dba');
INSERT INTO employee (age,name,job) VALUES(23,'wang','dba');
INSERT INTO employee (age,name,job) VALUES(23,'wang','dba');
INSERT INTO employee (age,name,job) VALUES(23,'wang','dba');
这种逐条插入的sql,导入速度非常慢,还可能造成复制的延迟。
因此可以使用下面的python脚本,将其转换为batch批量sql。
脚本如下:
代码语言:txt复制import os
import subprocess
# 待处理的文件名,注意employee.sql文件里面的VALUES关键字要是大写
# 下面的提取每条语句的VALUES部分写死了必须是大写的VALUES
origial_file="employee.sql"
# 切分的行数(建议控制在1k-2k)
batch_size=1000
# 数据库的列名
column_list="age,name,job"
table_name="employee"
# 清理下,防止有老的文件存在
try:
os.remove('batch.sql')
except:
pass
# 将某个文件按照batch_size切分为一个个小文件,文件名以output_开头
command = f"split -l {batch_size} {origial_file} output_"
result = subprocess.run(command, shell=True)
# 检查命令的输出
if result.returncode == 0:
print("Command executed successfully!")
else:
print("Command failed with return code:", result.returncode)
sys.exit(10)
current_directory = os.getcwd()
for filename in os.listdir(current_directory):
if filename.startswith("output_"):
print(filename)
with open (filename,'r') as f:
insert_statements = [line.strip() for line in f.readlines()]
# 提取每条语句的VALUES部分
values = [stmt.rstrip(';').split("VALUES")[1] for stmt in insert_statements]
# 拼接成一条合并的INSERT语句
merged_statement = f"INSERT INTO {table_name} ({column_list}) VALUES " ", ".join(values) ';n'
with open ('batch.sql','a') as ff:
ff.write(merged_statement)
try:
os.remove(filename)
except:
pass