RDKit具有多种内置功能,可用于生成分子指纹并使用它们来计算分子相似性。
可实现的分子指纹:
- Topological Fingerprints
- MACCS Keys
- Atom Pairs and Topological Torsions
- Morgan Fingerprints (Circular Fingerprints)
实现:
#导入各种包
代码语言:javascript复制importrdkit
fromrdkit import Chem
from rdkit.Chemimport Draw
from rdkit.Chem.Draw import IPythonConsole
from rdkit import DataStructs
from rdkit.Chem.Fingerprintsimport FingerprintMols
#构建三个分子,CCOC,CCO,以及COC
ms = [Chem.MolFromSmiles('CCOC'), Chem.MolFromSmiles('CCO'),Chem.MolFromSmiles('COC')]
Topological Fingerprints
#单独产生其中一个分子的指纹
代码语言:javascript复制fps0=FingerprintMols.FingerprintMol(ms[0])
#查看分子指纹的字节
fps[0].ToBitString()
#产生分子指纹
fps =[FingerprintMols.FingerprintMol(x) for x in ms]
#产生所有分子指纹的字节
fpsstr=[x.ToBitString()for x in fps]
#比较两个分子相似度, Tanimoto similarity.
DataStructs.FingerprintSimilarity(fps[0],fps[1])
#结果
0.6
MACCS Keys
#基于SMART的166个亚结构MACCS keys.
代码语言:javascript复制There is a SMARTS-based implementation of the 166public MACCS keys.
#导入包
from rdkit.Chem import MACCSkeys
#获取MACCS指纹
fps =[MACCSkeys.GenMACCSKeys(x) for x in ms]
#进行计算查看相似度
DataStructs.FingerprintSimilarity(fps[0],fps[1])
0.5
Atom Pairs and Topological Torsions
代码语言:javascript复制#原子对指纹,Atom Pairs
from rdkit.Chem.AtomPairs import Pairs
ms = [Chem.MolFromSmiles('C1CCC1OCC'),Chem.MolFromSmiles('CC(COCC'),Chem.MolFromSmiles('CCOCC')]
pairFps = [Pairs.GetAtomPairFingerprint(x)for x in ms]
Morgan Fingerprints (CircularFingerprints)
#摩根又称环形指纹,相当于ECFP4
代码语言:javascript复制from rdkit.Chem import AllChem
#读取分子1
m1 = Chem.MolFromSmiles('Cc1ccccc1')
#获取指纹
fp1 = AllChem.GetMorganFingerprint(m1,2)
#读取分子2
m2 = Chem.MolFromSmiles('Cc1ncccc1')
#获取指纹
fp2 = AllChem.GetMorganFingerprint(m2,2)
DataStructs.DiceSimilarity(fp1,fp2)