MASTER

2021-02-04 15:05:02 浏览数 (2)

系列:

rosetta-motif-search

目的:

在pdb库中寻找相似结构

步骤: 1:下载master数据库,以及master软件

代码语言:javascript复制
#master软件
https://grigoryanlab.org/master/
#master数据库
rsync -varz arteni.cs.dartmouth.edu::masterDB/

2:使用pdb存储下需要寻找的motif 输入指令,创建pds:

代码语言:javascript复制
createPDS --type query --pdb query.pdb --pds query.pds

3:查看master帮助指令

代码语言:javascript复制
./master 
 --query           query PDS file (required).
 --target          target PDS file.
 --targetList      a file with a list of target PDS files. Either --target or
                   --targetList must be given.
 --rmsdCut         RMSD cutoff for defining a match (in Angstrom).
 --gapLen          optional: impose length constraints on gaps between adjacent
                   segments. E.g., --gapLen '1-5;;3-10' will restrain the gap
                   between the first and second segments in the query to be
                   between 1 and 5 residues, no constraint will be placed on
                   the gap between the second and third segments, whereas the
                   gap between the third and fourth segments will need to be
                   between 3 and 10 residues. Note: the order of segments is as
                   they appeared within the PDB file of the query. Also, the
                   number of restraints must match the number of gaps (e.g., in
                   the example above, the query must have four segments).
 --matchOut        optional: file name for storing resulting matches (one line
                   per match); contains all information for defining a match
                   (location of match and RMSD).
 --seqOut          optional: file name for storing the sequences of matching
                   regions (one per line). See --outType for defining what gets
                   output.
 --structOut       optional: name of directory for writing match structures in
                   PDB format (one PDB file per match). See --outType for
                   defining what gets output.
 --outType         optional: specifies what kind of sequences and/or structures
                   to output (only works when --seqOut and/or --structOut have
                   been specified). If set to 'full', will output the entire
                   target sequence and/or structure containing a matching
                   region; if set to 'match', will output just the matching
                   region; if set to 'wgap', will output the matching region
                   with the gap(s) constrained by '--gapLen'. By default,
                   output one PDB file per matching region. In all cases,
                   output structures are aligned to superimpose the matching
                   region onto the query.
 --bbRMSD          optional: search by full-backbone RMSD (default is C-alpha
                   RMSD).
 --topN            optional: keep the best this many matches in terms of the
                   search metric (must be integer); default is 0 (no limit).
 --rmsdMode        optional: RMSD bounding mode. 0 -- provable RMSD bounds will
                   be calculated, guaranteeing that all matches within
                   --rmsdCut will be found (default). 1 -- greedy bound that
                   enforces some uniformity of RMSD residuals (see
                   documentation). 2 -- uses --rmsdCut for both the overall
                   RMSD cutoff as well as the cutoff for partial matches (see
                   documentation).
 --tune            optional: tuning parameter for greedy RMSD cutoff (i.e.,
                   when --rmsdMode 1 is specified); 0.5 for default (see
                   documentation).
 --dEps            optional: user-defined greedy distance deviation cutoff (in
                   Angstrom). If given, rather than applying a provable bound
                   on inter-segment distances, this cutoff will be applied.
 --phiEps          optional: phi angle deviation cutoff (in degrees); default
                   is 180.0, meaning no cutoff is applied.
 --psiEps          optional: psi angle deviation cutoff (in degrees); default
                   is 180.0, meaning no cutoff is applied.
 --ddZscore        optional: output a Z-score that describes the distribution
                   of inter-segment distance deviations (between query and
                   matches) relative to the greedy cutoff --dEps. High Z-scores
                   (> 3.5, in our experience), indicating a good choice of
                   --dEps and suggesting that all or nearly all matches were
                   found despite the greedy constraint.
 --matchIn         a list of matches from a previously run search (i.e., the
                   result of --matchOut of a previous run). If specified, will
                   skip searching and will produce outputs directly.

4:搜索

代码语言:javascript复制
../master --query ./query.pds --targetList ../masterdb/100list --rmsdCut 1.0 --bbRMSD  --matchOut ./query.match --seqOut ./query.seq   --structOut ./structure
#rmsd差距在1埃以内,输出文件或者文件夹为:query.seq,query.match,./structur
#我自己创建了一个100list,也可以直接使用完整的masterdb list
list文件内容就是这个
/data/home/Program/Master/masterdb/zy/2zyz_A.pds
/data/home/Program/Master/masterdb/zy/3zy0_A.pds
/data/home/Program/Master/masterdb/zy/3zy2_A.pds
/data/home/Program/Master/masterdb/zy/3zyb_A.pds
/data/home/Program/Master/masterdb/zy/3zyg_A.pds
/data/home/Program/Master/masterdb/zy/3zyi_A.pds
/data/home/Program/Master/masterdb/zy/3zym_A.pds
/data/home/Program/Master/masterdb/zy/3zyp_A.pds
/data/home/Program/Master/masterdb/zy/3zyq_A.pds
/data/home/Program/Master/masterdb/zy/3zyt_A.pds
/data/home/Program/Master/masterdb/zy/3zyv_A.pds
/data/home/Program/Master/masterdb/zy/3zyw_A.pds
/data/home/Program/Master/masterdb/zy/3zyy_X.pds
/data/home/Program/Master/masterdb/zz/1zz6_A.pds
/data/home/Program/Master/masterdb/zz/1zzg_A.pds
/data/home/Program/Master/masterdb/zz/2zze_A.pds
/data/home/Program/Master/masterdb/zz/2zzj_A.pds
/data/home/Program/Master/masterdb/zz/2zzs_1.pds
/data/home/Program/Master/masterdb/zz/2zzv_A.pds
#list文件需要自己处理为pbs文件的绝对路径

5:结果

代码语言:javascript复制
query.match
 0.76545 /data/home/sujiaqi/Program/Master/masterdb/a4/1a48_A.pds [(267,283)]
 0.91357 /data/home/sujiaqi/Program/Master/masterdb/a2/2a2j_A.pds [(101,117)]
 0.92832 /data/home/sujiaqi/Program/Master/masterdb/a0/1a0p_A.pds [(157,173)]
 0.94531 /data/home/sujiaqi/Program/Master/masterdb/a1/3a1c_A.pds [(109,125)]
 0.96706 /data/home/sujiaqi/Program/Master/masterdb/a1/3a1i_A.pds [(0,16)

pymol打开,红色是为query.pdb,其余的match为寻找到的结构

参考:

代码语言:javascript复制
[1] Zhou J., Grigoryan G., "Rapid Search for Tertiary Fragments Reveals Protein Sequence-Structure Relationships", Protein Science, 24(4): 508-524, 2015.
[2]https://www.dazhuanlan.com/2020/02/27/5e56e76a32e8d/
[3]https://zhuanlan.zhihu.com/p/101179342

Everyday you must keep something new.

0 人点赞