偶然间在 youtube 上看到 Dan Knights 的 Microbiome Discovery 宏基因组入门课程,大致浏览了一下,由浅入深,从理论到实践讲得非常不错,真是相见恨晚 QAQ,只看这个应该完全足够入门宏基因组了~
课程播放列表:https://www.youtube.com/playlist?list=PLOPiWVjg6aTzsA53N19YqJQeZpSCH9QPc
RMarkdown 示例数据及实践代码:https://github.com/danknights/mice8992-2016
视频目录
1. Intro to the Microbiome
•介绍微生物组•如何进行研究•面对的一些挑战(微生物组数据相对不稳定,biomarker discovery)
网址 https://youtu.be/6564K4-_DBI
2. How microbiome data are generated
•如何产生这些数据的•两种测序方法的优劣•宏基因组测序•扩增子测序
网址 https://youtu.be/FWT1HBzlWOE
3. 16S Variable Regions
•为什么选择 16S 片段,16S rRNA 的结构功能•OTU 从何而来
网址 https://youtu.be/8Aa_mnyXm70
4. QIIME
•QIIME 分析流程介绍
网址 https://youtu.be/iy0JWgzmM_A
4.5. (Optional) UNIX Command Line
•UNIX 命令介绍以及 Git 的使用
网址 https://youtu.be/u2IQQUMeWy8
5. Picking OTUs
•OTU 聚类方法•closed reference•de novo•UCLUST•CD-HIT•SUMACLUST•mothur•SWARM•open reference
网址 https://youtu.be/Ok5h24KZbAE
6. Assigning Taxonomy
•如何注释菌群分类•The Random Forests classifier seems to work better•Nearest neighbor using optimal gapped alignment with large reference databases will probably win eventually
网址 https://youtu.be/HkwFdzFLZ0I
7. Alpha Diversity
•Alpha diversity measures diversity within communities•Beta diversity measures diversity between communities•Rarefaction determines saturation•There is room for experimental validation•不同计算 Alpha Diversity 的方法•species count•phylogenetic diversity (PD)•Chao1 Estimator
网址 https://youtu.be/9ZvoR89HYP8
8. Beta Diversity
•Beta diversity measures diversity between communities•不同 Beta Diversity 的计算方法•euclidean distance•Chi-square distance, Chi-square is usually best for gradients•Bray-Curtis•Most people use Bray Curtis or UniFrac•用 PCoA 可视化
网址 https://youtu.be/lcbp6EecDg4
9. UniFrac
•Beta diversity using UniFrac
网址 https://youtu.be/M8ylvsS0MHg
10. Statistical testing part 1
•统计学基础•Linear models are not always appropriate•Non-parametric tests (no distribution assumptions)•Generalized linear models(better underlying distributions)
网址 https://youtu.be/_uDv7LRUUsY
11. Statistical testing part 2
•统计学基础•t-test:Compare 2 groups•ANOVA:Compare three or more groups•Correlation:Compare to a continuous variable (e.g.Age)•Linear Regression:Similar to correlation,but you can regress on multiple variables at the same time•NOTE:all of these assume normal distributions!•When linear regression tests do not have normally distributed residuals,use a generalized linear model with the negative binomial distribution.This is in the edgeR package in R.•Use false discovery rate (FDR) to correct for multiple hypothesis testing.•If you don't need to control for confounders, non-parametric tests are very safe (although lower power than linear models or generalized linear models).•Two-category test:Mann-Whitney U (Wilcoxon) test (like a t-test)•Multi-category test:Kruskal-Wallis (like ANOVA)•Continuous test:Spearman correlation (like Pearson correlation)
网址 https://youtu.be/tNxfYqa5Rtc
12. Visualizing Microbiome Diversity, Ordination
•用 R 或 QIIME 可视化•PCA•PCoA•NMDS
网址 https://youtu.be/H-u2iyiTzj0
13. Detrending and detecting gradients
•用 QIIME 进行 detrending•Detrending does not have strong statistical foundations•Use detrending for visualizing a primary gradient•Use detrending to test whether your ordination recovered the primary gradient in axis 1
网址 https://youtu.be/aNLPzdfivkM
14. Constrained Ordination
•CCA does direct gradient analysis•Never use more than 3-4 variates•More will simply over fit the data•Measure success by ratio of constrained variance explained to unconstrained variance explained•Canonical Correspondence analysis == Constrained Correspondence analysis•Not to be confused with canonical correlation analysis
网址 https://youtu.be/wHSECEI2tnQ
15. Clustering
•Use caution with supervised ordination - need to assess significance carefully•Prediction strength >0.9 or Silhouette index >0.5•Clusters can be useful ways to analyze high-dimensional data•However, direct analysis is generally better when you have known gradients/groups•Diagnostics based on direct supervised analysis generally better
网址 https://youtu.be/ORX968xJqiA
16. Supervised Learning Background
•Supervised learning tries to learning a model that will predict outcomes for novel samples•Example: classify cancer patients to determine treatment path•Models have to balance low complexity (underfitting) and high complexity (overfitting)•Model accuracy should be assessed in separate test data that it has never seen•10-fold cross validation is standard
网址 https://youtu.be/-eXnrA_3xzA
17. Supervised Learning Applications
•用 QIIME 进行随机森林分类
网址 https://youtu.be/ecz5SzP6Z_U
18. Source Tracking
•介绍 Source Tracking 实现原理以及 SourceTracker 应用•Microbial source tracking can be done at the community-wide level•SourceTracker uses Bayesian methods to deconvolute mixtures of communities•Can identify contributions of individual species from each source environment•Does not model changes after mixing (temporal dynamics)•SourceTracker:github.com/danknights/sourcetracker/releases
网址 https://youtu.be/sDevHMuYJ28
19. Compositionality
•Compositionality can cause spurious and even opposite conclusions•Dominant bugs can skew the relative abundance of minor bugs•Correlation is hard to infer•See Sparco, SPIEC-EASI•Best to do analysis with absolute abundances when possible•Spike-ins of foreign bugs and/or q PCR can circumvent this
网址 https://youtu.be/X60nFYpLWRs
20. PICRUSt and predicting functions
. PICRUSt and predicting functions
•Shotgun metagenomics can describe the full functional repertoire of a metagenome, but it is expensive•PICRUSt can produce 80-85% accurate metagenomes from 16S data sets•Useful for mining published data•Can be used to select a subset of 16S samples for shotgun sequencing•Be sure to treat the results as "suggestive only"in publications•Mostly useful on human gut samples
网址 https://youtu.be/mPQCl_cHCsM
21. Shotgun Taxonomy
•Shotgun metagenomics can be used for identifying species•Far superior to 16S•Approaches to Shotgun taxonomy•MetaPhlAn and MetaPhlAn2•Pre-identify a set of marker genes•Genes that are conserved within a species but not elsewhere•Requires alignment,but uses small database•Kraken,others•Use all unique k-mers as markers•UItrafast,but large database
网址 https://youtu.be/DlQTXdb2rhg
看到这里的小伙伴恭喜你发现了隐藏福利~ 我帮大家搬运了全集
链接:https://pan.baidu.com/s/194r0zs5WbcNFQKQrV0Nnkg 密码:0rjr
生信技能树目前已经公开了三个生信知识库,记得来关注哦~
每周文献分享
https://www.yuque.com/biotrainee/weeklypaper
肿瘤外显子分析指南
https://www.yuque.com/biotrainee/wes
生物统计从理论到实践
https://www.yuque.com/biotrainee/biostat