Rdkit实现分子指纹

2021-02-04 14:53:59 浏览数 (1)

RDKit具有多种内置功能,可用于生成分子指纹并使用它们来计算分子相似性。

可实现的分子指纹:

  1. Topological Fingerprints
  2. MACCS Keys
  3. Atom Pairs and Topological Torsions
  4. Morgan Fingerprints (Circular Fingerprints)

实现:

#导入各种包

代码语言:javascript复制
importrdkit
fromrdkit import Chem
from rdkit.Chemimport Draw
from rdkit.Chem.Draw import IPythonConsole
from rdkit import DataStructs
from rdkit.Chem.Fingerprintsimport FingerprintMols
#构建三个分子,CCOC,CCO,以及COC
 ms = [Chem.MolFromSmiles('CCOC'), Chem.MolFromSmiles('CCO'),Chem.MolFromSmiles('COC')]

Topological Fingerprints

#单独产生其中一个分子的指纹

代码语言:javascript复制
fps0=FingerprintMols.FingerprintMol(ms[0])
#查看分子指纹的字节
fps[0].ToBitString()
#产生分子指纹
fps =[FingerprintMols.FingerprintMol(x) for x in ms]
#产生所有分子指纹的字节
fpsstr=[x.ToBitString()for x in fps]
#比较两个分子相似度, Tanimoto similarity.
DataStructs.FingerprintSimilarity(fps[0],fps[1])
#结果
0.6

MACCS Keys

#基于SMART的166个亚结构MACCS keys.

代码语言:javascript复制
There is a SMARTS-based implementation of the 166public MACCS keys.
#导入包
from rdkit.Chem import MACCSkeys
#获取MACCS指纹
fps =[MACCSkeys.GenMACCSKeys(x) for x in ms]
#进行计算查看相似度
DataStructs.FingerprintSimilarity(fps[0],fps[1])
0.5

Atom Pairs and Topological Torsions

代码语言:javascript复制
#原子对指纹,Atom Pairs
from rdkit.Chem.AtomPairs import Pairs
ms = [Chem.MolFromSmiles('C1CCC1OCC'),Chem.MolFromSmiles('CC(COCC'),Chem.MolFromSmiles('CCOCC')]
pairFps = [Pairs.GetAtomPairFingerprint(x)for x in ms]
 

Morgan Fingerprints (CircularFingerprints)

#摩根又称环形指纹,相当于ECFP4

代码语言:javascript复制
from rdkit.Chem import AllChem
#读取分子1
m1 = Chem.MolFromSmiles('Cc1ccccc1')
#获取指纹
fp1 = AllChem.GetMorganFingerprint(m1,2)
#读取分子2
m2 = Chem.MolFromSmiles('Cc1ncccc1')
#获取指纹
fp2 = AllChem.GetMorganFingerprint(m2,2)
DataStructs.DiceSimilarity(fp1,fp2)

0 人点赞