汉明距离的定义:对于两条长度相等的字符串来说,汉明距离指的是它们之间不相同的字符数。对于两条 DNA,则是它们之间的点突变数目。
给定:两条长度相等的 DNA 序列(不超过 1kb)。
需得:计算汉明距离。
示例数据
代码语言:javascript复制GAGCCTACTAACGGGAT
CATCGTAATGACGGCCT
示例结果
代码语言:javascript复制7
Python 实现
Counting_Point_Mutations.py
代码语言:javascript复制import sys
def hamm(s1, s2):
return sum([a != b for a, b in zip(s1, s2)])
def test():
s1 = 'GAGCCTACTAACGGGAT'
s2 = 'CATCGTAATGACGGCCT'
return hamm(s1, s2) == 7
if __name__ == '__main__':
if not test():
print("hamm: Failed")
sys.exit(1)
lines = []
with open('rosalind_hamm.txt') as fh:
lines = fh.readlines()
mutations = hamm(lines[0], lines[1])
print(mutations)
汉明距离的计算:
- zip()函数,将两条序列对应的元素打包成一个个元组;
- 通过列表展开式判断对应元素是否不同;
- sum()函数计算不相同的字符数,即为汉明距离。
Problem
Figure 1. The Hamming distance between these two strings is 7. Mismatched symbols are colored red.
Given two strings and of equal length, the Hamming distance between and , denoted , is the number of corresponding symbols that differ in and . See Figure 2.
Given: Two DNA strings and of equal length (not exceeding 1 kbp).
Return: The Hamming distance .
Sample Dataset
代码语言:javascript复制GAGCCTACTAACGGGAT
CATCGTAATGACGGCCT
Sample Output
代码语言:javascript复制7