学习药化的同志们应该都了解化合物分子的特征描述有很多计算软件,今天我们来给大家展示下在R语言中如何实现分子特征描述的计算。主要以MACCS分子指纹的实现作为案例。
- 我们需要对应的R包有:rJava,rcdklibs,rcdk(主包)。
- 我们看下rcdk包的函数构成。
函数名称 | 简介 |
---|---|
bpdata | Boiling Point Data |
cdk.version | Get Current CDK Version |
cdkFormula-class | Class cdkFormula, a class for handling molecular formula |
charge | Get the Total Charges for the Molecule |
compare.isotope.pattern | Compare isotope patterns. |
convert.implicit.to.explicit | Operations on molecules |
copy.image.to.clipboard | View and Copy 2D Structure Diagrams |
depict | View and Copy 2D Structure Diagrams |
do.aromaticity | Perform Aromaticity Detection, atom typing or isotopic configuration |
do.isotopes | Perform Aromaticity Detection, atom typing or isotopic configuration |
do.typing | Perform Aromaticity Detection, atom typing or isotopic configuration |
eval.atomic.desc | Evaluate an Atomic Descriptor |
eval.desc | Evaluate a Molecular Descriptor |
fragment | Molecule Fragmentation Methods |
generate.2d.coordinates | Generate 2D Coordinates from Connectivity Information |
generate.formula | Generate molecular formulae given a target mass and a set of elements and counts. |
generate.formula.iter | Generate molecular formulae given a target mass and a set of elements and counts. |
get.adjacency.matrix | Get adjacency matrix for a molecule. |
get.alogp | Commonly Used Molecular Descriptors |
get.atom.count | Get the atoms from a molecule or bond |
get.atom.index | Operations on atoms |
get.atomic.desc.names | Get the names of the available atomic descriptors |
get.atomic.number | Operations on atoms |
get.atoms | Get the atoms from a molecule or bond |
get.bonds | Get the bonds from a molecule |
get.charge | Operations on atoms |
get.connected.atom | Get the atom connected to an atom in a bond |
get.connected.atoms | Operations on atoms |
get.connection.matrix | Get connection matrix for a molecule. |
get.depictor | View and Copy 2D Structure Diagrams |
get.desc.categories | Get Descriptor Class Names |
get.desc.names | Get Descriptor Class Names |
get.exact.mass | Operations on molecules |
get.exhaustive.fragments | Molecule Fragmentation Methods |
get.fingerprint | Evaluate Fingerprints |
get.formal.charge | Operations on atoms |
get.formula | Get the formula object from a formula character. |
get.hydrogen.count | Operations on atoms |
get.isotope.pattern.generator | Construct an isotope pattern generator. |
get.isotope.pattern.similarity | Construct an isotope pattern similarity calculator. |
get.isotopes.pattern | Generate the isotope pattern. |
get.largest.component | Get the Largest Component in a Disconnected Molecule |
get.mcs | Perform Substructure Searching & MCS Detection |
get.mol2formula | Parser a molecule to formula object. |
get.murcko.fragments | Molecule Fragmentation Methods |
get.natural.mass | Operations on molecules |
get.point2d | Operations on atoms |
get.point3d | Operations on atoms |
get.properties | Get All Property Values of a Molecule |
get.property | Get the Value of a Molecule Property |
get.smiles | Get the SMILES for a Molecule |
get.smiles.parser | Get a SMILES Parser |
get.symbol | Operations on atoms |
get.title | Get the Value of a Molecule Property |
get.total.charge | Get the Total Charges for the Molecule |
get.total.formal.charge | Get the Total Charges for the Molecule |
get.total.hydrogen.count | Get the Total Hydrogen Count for a Molecule |
get.tpsa | Commonly Used Molecular Descriptors |
get.volume | Commonly Used Molecular Descriptors |
get.xlogp | Commonly Used Molecular Descriptors |
hasNext | Does This Iterator Have A Next Element |
hasNext.iload.molecules | Does This Iterator Have A Next Element |
iload.molecules | Load Molecular Structures From Disk |
is.aliphatic | Operations on atoms |
is.aromatic | Operations on atoms |
is.connected | Get the Largest Component in a Disconnected Molecule |
is.in.ring | Operations on atoms |
is.neutral | Operations on molecules |
is.subgraph | Perform Substructure Searching & MCS Detection |
isvalid.formula | Validate a cdkFormula object. |
load.molecules | Load Molecular Structures From Disk |
match | Perform Substructure Searching & MCS Detection |
matches | Perform Substructure Searching & MCS Detection |
mcs | Perform Substructure Searching & MCS Detection |
parse.smiles | Parse a Vector of SMILES Strings |
remove.hydrogens | Remove Hydrogens from a Molecule |
remove.property | Remove A Property From a Molecule |
set.charge.formula | Set the charge to a cdkFormula object. |
set.property | Set A Property On A Molecule |
show-method | Class cdkFormula, a class for handling molecular formula |
smarts | Perform Substructure Searching & MCS Detection |
smiles.flavors | Generate flag for customizing SMILES generation. |
substructure | Perform Substructure Searching & MCS Detection |
view.image.2d | View and Copy 2D Structure Diagrams |
view.molecule.2d | View and Copy 2D Structure Diagrams |
view.table | View 2D Structures With Data |
write.molecules | Write Molecules To Disk |
- rcdk包的安装:
a. windows下安装:
首先,在Java官网 下载Java CDK,网址如下:
http://www.oracle.com/technetwork/java/javase/downloads/jdk8-downloads-2133151.html)
然后就是对应的依次安装rJava,rcdk。
b. Linux下安装:
同windows一样首先安装Java CDK。Ubuntu的话直接:apt installopenjdk-8-jre-headless;sudo apt-get installopenjdk-8-jdk,即可安装java环境。
R语言安装参见:R语言在Linux的安装。然后就是对应的依次安装rJava,rcdk。
- 数据的导入格式
a. load.molecules()。
Exp: mol=load.molecules("G:/drugbank.sdf")。
b. parse.smiles
代码语言:javascript复制Exp:mol= parse.smiles('C1C=CCC1N(C)c1ccccc1')[[1]]。
- MACCS指纹的计算及基础的分子描述。
a. get.smiles() 获取分子的SMILE结构
b. get.atom.count() 获取组成分子的原子数目
c. get.fingerprint() 获取分子的MACCS指纹。结果抽取如下:
- 数据的导出
数据的导出还是平时我们用的write.csv()。只要把所有的指纹数据导出就可以进行我们下一步的计算了。