系列:
rosetta-motif-search
目的:
在pdb库中寻找相似结构
步骤: 1:下载master数据库,以及master软件
代码语言:javascript复制#master软件
https://grigoryanlab.org/master/
#master数据库
rsync -varz arteni.cs.dartmouth.edu::masterDB/
2:使用pdb存储下需要寻找的motif 输入指令,创建pds:
代码语言:javascript复制createPDS --type query --pdb query.pdb --pds query.pds
3:查看master帮助指令
代码语言:javascript复制./master
--query query PDS file (required).
--target target PDS file.
--targetList a file with a list of target PDS files. Either --target or
--targetList must be given.
--rmsdCut RMSD cutoff for defining a match (in Angstrom).
--gapLen optional: impose length constraints on gaps between adjacent
segments. E.g., --gapLen '1-5;;3-10' will restrain the gap
between the first and second segments in the query to be
between 1 and 5 residues, no constraint will be placed on
the gap between the second and third segments, whereas the
gap between the third and fourth segments will need to be
between 3 and 10 residues. Note: the order of segments is as
they appeared within the PDB file of the query. Also, the
number of restraints must match the number of gaps (e.g., in
the example above, the query must have four segments).
--matchOut optional: file name for storing resulting matches (one line
per match); contains all information for defining a match
(location of match and RMSD).
--seqOut optional: file name for storing the sequences of matching
regions (one per line). See --outType for defining what gets
output.
--structOut optional: name of directory for writing match structures in
PDB format (one PDB file per match). See --outType for
defining what gets output.
--outType optional: specifies what kind of sequences and/or structures
to output (only works when --seqOut and/or --structOut have
been specified). If set to 'full', will output the entire
target sequence and/or structure containing a matching
region; if set to 'match', will output just the matching
region; if set to 'wgap', will output the matching region
with the gap(s) constrained by '--gapLen'. By default,
output one PDB file per matching region. In all cases,
output structures are aligned to superimpose the matching
region onto the query.
--bbRMSD optional: search by full-backbone RMSD (default is C-alpha
RMSD).
--topN optional: keep the best this many matches in terms of the
search metric (must be integer); default is 0 (no limit).
--rmsdMode optional: RMSD bounding mode. 0 -- provable RMSD bounds will
be calculated, guaranteeing that all matches within
--rmsdCut will be found (default). 1 -- greedy bound that
enforces some uniformity of RMSD residuals (see
documentation). 2 -- uses --rmsdCut for both the overall
RMSD cutoff as well as the cutoff for partial matches (see
documentation).
--tune optional: tuning parameter for greedy RMSD cutoff (i.e.,
when --rmsdMode 1 is specified); 0.5 for default (see
documentation).
--dEps optional: user-defined greedy distance deviation cutoff (in
Angstrom). If given, rather than applying a provable bound
on inter-segment distances, this cutoff will be applied.
--phiEps optional: phi angle deviation cutoff (in degrees); default
is 180.0, meaning no cutoff is applied.
--psiEps optional: psi angle deviation cutoff (in degrees); default
is 180.0, meaning no cutoff is applied.
--ddZscore optional: output a Z-score that describes the distribution
of inter-segment distance deviations (between query and
matches) relative to the greedy cutoff --dEps. High Z-scores
(> 3.5, in our experience), indicating a good choice of
--dEps and suggesting that all or nearly all matches were
found despite the greedy constraint.
--matchIn a list of matches from a previously run search (i.e., the
result of --matchOut of a previous run). If specified, will
skip searching and will produce outputs directly.
4:搜索
代码语言:javascript复制../master --query ./query.pds --targetList ../masterdb/100list --rmsdCut 1.0 --bbRMSD --matchOut ./query.match --seqOut ./query.seq --structOut ./structure
#rmsd差距在1埃以内,输出文件或者文件夹为:query.seq,query.match,./structur
#我自己创建了一个100list,也可以直接使用完整的masterdb list
list文件内容就是这个
/data/home/Program/Master/masterdb/zy/2zyz_A.pds
/data/home/Program/Master/masterdb/zy/3zy0_A.pds
/data/home/Program/Master/masterdb/zy/3zy2_A.pds
/data/home/Program/Master/masterdb/zy/3zyb_A.pds
/data/home/Program/Master/masterdb/zy/3zyg_A.pds
/data/home/Program/Master/masterdb/zy/3zyi_A.pds
/data/home/Program/Master/masterdb/zy/3zym_A.pds
/data/home/Program/Master/masterdb/zy/3zyp_A.pds
/data/home/Program/Master/masterdb/zy/3zyq_A.pds
/data/home/Program/Master/masterdb/zy/3zyt_A.pds
/data/home/Program/Master/masterdb/zy/3zyv_A.pds
/data/home/Program/Master/masterdb/zy/3zyw_A.pds
/data/home/Program/Master/masterdb/zy/3zyy_X.pds
/data/home/Program/Master/masterdb/zz/1zz6_A.pds
/data/home/Program/Master/masterdb/zz/1zzg_A.pds
/data/home/Program/Master/masterdb/zz/2zze_A.pds
/data/home/Program/Master/masterdb/zz/2zzj_A.pds
/data/home/Program/Master/masterdb/zz/2zzs_1.pds
/data/home/Program/Master/masterdb/zz/2zzv_A.pds
#list文件需要自己处理为pbs文件的绝对路径
5:结果
代码语言:javascript复制query.match
0.76545 /data/home/sujiaqi/Program/Master/masterdb/a4/1a48_A.pds [(267,283)]
0.91357 /data/home/sujiaqi/Program/Master/masterdb/a2/2a2j_A.pds [(101,117)]
0.92832 /data/home/sujiaqi/Program/Master/masterdb/a0/1a0p_A.pds [(157,173)]
0.94531 /data/home/sujiaqi/Program/Master/masterdb/a1/3a1c_A.pds [(109,125)]
0.96706 /data/home/sujiaqi/Program/Master/masterdb/a1/3a1i_A.pds [(0,16)
pymol打开,红色是为query.pdb,其余的match为寻找到的结构
参考:
代码语言:javascript复制[1] Zhou J., Grigoryan G., "Rapid Search for Tertiary Fragments Reveals Protein Sequence-Structure Relationships", Protein Science, 24(4): 508-524, 2015.
[2]https://www.dazhuanlan.com/2020/02/27/5e56e76a32e8d/
[3]https://zhuanlan.zhihu.com/p/101179342
Everyday you must keep something new.