Python 比较文本文件

2024-04-19 09:32:20 浏览数 (1)

1、问题背景

我们需要比较一个文本文件 F 与路径下多个其他文本文件之间的差异。我们已经编写了以下代码,但只能输出一个文件的比较结果。我们需要修改代码,以便比较所有文件并打印所有结果。

代码语言:python代码运行次数:0复制
import difflib
import fnmatch
import os

filelist=[]
f= open("D:/Desktop/data/sample/ff69c.txt")
flines= f.readlines()
path="D:/Desktop/data/sample/sample2"
for root, dirnames, filenames in os.walk(path):  
    for filename in fnmatch.filter(filenames, '*.txt'):   
        filelist.append(os.path.join(root, filename))

for m in filelist:
    g=open(m,'r')
    glines= g.readlines()
   # g.close()
    d = difflib.Differ()
    diff_list = list(d.compare(flines, glines))

#print("".join(diff))
n_adds, n_subs, n_eqs, n_wiered = 0, 0, 0, 0

for diff_item in diff_list:
    if diff_item[0] == ' ':
        n_adds  = 1
    elif diff_item[0] == '-':
        n_subs  =1 
    elif diff_item[0] == ' ':
        n_eqs  = 1
    else: 
        n_wiered  = 1

print 'lines files #1: %d  #2: %d' % (len(flines), len(glines))
print 'adds: %d subs: %d eqs: %d ?:%d '  % (n_adds, n_subs, n_eqs, n_wiered)

2、解决方案

方法一:

问题在于 diff_list 被每次读取的文件覆盖。我们可以修改代码,在每次读取文件时将差异添加到 diff_list 中,而不是覆盖它。

代码语言:python代码运行次数:0复制
import difflib
import fnmatch
import os

filelist=[]
f= open("D:/Desktop/data/sample/ff69c.txt")
flines= f.readlines()
path="D:/Desktop/data/sample/sample2"
for root, dirnames, filenames in os.walk(path):  
    for filename in fnmatch.filter(filenames, '*.txt'):   
        filelist.append(os.path.join(root, filename))

diff_list = []  # Initialize an empty list to store all differences

for m in filelist:
    g=open(m,'r')
    glines= g.readlines()
    d = difflib.Differ()
    diff_list.extend(list(d.compare(flines, glines)))  # Append differences to diff_list

n_adds, n_subs, n_eqs, n_wiered = 0, 0, 0, 0

for diff_item in diff_list:
    if diff_item[0] == ' ':
        n_adds  = 1
    elif diff_item[0] == '-':
        n_subs  =1 
    elif diff_item[0] == ' ':
        n_eqs  = 1
    else: 
        n_wiered  = 1

print 'lines files #1: %d  #2: %d' % (len(flines), len(glines))
print 'adds: %d subs: %d eqs: %d ?:%d '  % (n_adds, n_subs, n_eqs, n_wiered)

现在,代码将比较所有文件,并将所有结果打印出来。

方法二:

另一种方法是使用 filecmp.cmp 函数来比较文件。filecmp.cmp 函数接受两个文件路径作为参数,并返回一个布尔值,表示这两个文件是否相等。

代码语言:python代码运行次数:0复制
import filecmp
import os

filelist=[]
f= open("D:/Desktop/data/sample/ff69c.txt")
flines= f.readlines()
path="D:/Desktop/data/sample/sample2"
for root, dirnames, filenames in os.walk(path):  
    for filename in fnmatch.filter(filenames, '*.txt'):   
        filelist.append(os.path.join(root, filename))

for file1 in filelist:
    for file2 in filelist:
        if filecmp.cmp(file1, file2, shallow=False):
            print(f"{file1} and {file2} are equal.")
        else:
            print(f"{file1} and {file2} are different.")

这种方法不需要读取文件内容,因此速度更快,但它只比较文件的二进制内容,不比较文件的内容。

0 人点赞