4️⃣ 核酸序列特征分析(4):内含子/外显子剪切位点的识别及Spidey工具应用实例

真核生物的基因大都为断裂基因，编码序列通常被内含子隔开。内含子和外显子边界和周围序列是前体mRNA内的有保守性的一些特殊核苷酸序列。

内含子的5'端剪切位点以GU开始，叫donor 内含子的3'端剪切位点以AG结束，叫acceptor，还包括位于内含子内，靠近3'端的分支位点，通常为A，后面是多聚嘧啶区

在分析基因组数据时，通常需要预测基因的RNA选择性剪切方式，也就是内含子和外显子的位置和数量。而基于的就是RNA剪接的保守型序列GU-AG规则，据此，再辅以ORF，Blast等数据可以对未知基因的成熟mRNA进行预测。

NCBI的Splign预测实例

或者

image.png

Navigate to the Online page using the menu at the top of the page Navigate to the Online page using the menu at the top of the page
Type or copy/paste you input sequences in the cDNA and Genomic text areas. Sequences in each box can be specified as identifiers (accessions or GIs), or in FASTA format. Entering both FASTA data and identifiers in same entry will generate an error. You can specify up to five cDNA sequences at a time, but only one genomic sequence.
Check "Reverse and complement the query" box if you want your cDNA be aligned in antisense. E.g. EST sequences are often not guaranteed to have a sense orientation.
Check "Cross-species mode" if your cDNA and genomic sequences are from different species. Internally, the cross-species mode means less stringent blast hits.
Upon job submission, results will appear in a few seconds or more, depending primarily on the lengths and the number of sequences being spligned. Since fetching large chromosomal sequences (like whole-length human chromosomes) and running blast on them can be time-consuming, consider specifying shorter genomic sequences such as contigs. Smaller chromosomal sequences (e.g. Drosophila chromosomes) are ok.

image.png

详细请参考https://www.ncbi.nlm.nih.gov/sutils/splign/splign.cgi?textpage=documentation

Plus (sense) and minus signs next to accessions indicate orientations in which the sequences were aligned. The remaining columns are explained below:

image.png

image.png

0 人点赞