欢迎大家关注全网生信学习者系列:
- WX公zhong号:生信学习者
- Xiao hong书:生信学习者
- 知hu:生信学习者
- CDSN:生信学习者2
介绍
本教程是使用一个Python脚本来绘制马赛克图,用于可视化两个变量的频率分布。
数据
大家通过以下链接下载数据:
- 百度网盘链接:https://pan.baidu.com/s/1f1SyyvRfpNVO3sLYEblz1A
- 提取码: WX公zhong号生信学习者后台发送 复现msm 获取提取码
Python packages required
- Pandas
- SciPy
- Matplotlib
- statsmodels
Drawing a mosaic plot using mosaic_plot.py
使用一个Python脚本mosaic_plot.py
,以及一个包含MSM 和 Non-MSM个体相关的物种的表格,这些物种被识别为革兰氏阴性或非革兰氏阴性,在two_variable_mosaic.tsv: ./data/two_variable_mosaic.tsv
中。
mosaic_plot.py
codes
#!/usr/bin/env python
"""
NAME: mosaic_plot.py
DESCRIPTION: mosaic_plot.py is a python script for visualizing proportions of data points along two variables.
"""
import pandas as pd
from scipy.stats import fisher_exact
import matplotlib.pyplot as plt
from statsmodels.graphics.mosaicplot import mosaic
import matplotlib
import sys
import argparse
import textwrap
def make_mosaic_plot(two_variable_file, facecolor_dict, output_fig, font_style = "sans-serif,Arial"):
font_family, font_type = font_style.split(",")
matplotlib.rcParams['font.family'] = font_family
matplotlib.rcParams['font.sans-serif'] = font_type
two_variable_df = pd.read_csv(two_variable_file, sep = "t", index_col = False)
features, variable1, variable2 = two_variable_df.columns
cont_df = pd.crosstab(two_variable_df[variable1], two_variable_df[variable2])
res = fisher_exact(cont_df, alternative = "two-sided")
label_dict = {}
for idx in cont_df.index.to_list():
for col in cont_df.columns.to_list():
label_dict[(idx, col)] = cont_df.loc[idx, col]
labelizer = lambda k:label_dict[k]
variable2_0, variable2_1 = sorted(set(two_variable_df[variable2].to_list()))
props = {}
for variable in facecolor_dict:
props[(variable, variable2_0)] = {"facecolor": facecolor_dict[variable], "edgecolor": "white"}
props[(variable, variable2_1)] = {"facecolor": facecolor_dict[variable], "edgecolor": "white"}
mosaic(two_variable_df, [variable1, variable2], labelizer = labelizer, properties = props, title = " P-value: " str(res[1]) " (Fisher's exact test)")
plt.savefig(output_fig)
if __name__ == "__main__":
def read_args(args):
# This function is to parse arguments
parser = argparse.ArgumentParser(formatter_class=argparse.RawDescriptionHelpFormatter,
description = textwrap.dedent('''
This program is to draw a mosaic plot.
'''),
epilog = textwrap.dedent('''
examples: mosaic_plot.py --input input_file.tsv --facecolor_map facecolor_mapfile.tsv --output mosaic_plot.png
'''))
parser.add_argument('--input',
nargs = '?',
help = 'Input a file containing two variable information regarding each individual subject.',
type = str,
default = None)
parser.add_argument('--facecolor_map',
nargs = '?',
help = 'Specify the the pathway to SCFA metabolisms database. default: /vol/projects/khuang/databases/SCFA/SCFA_pathways.tsv',
default = '/vol/projects/khuang/databases/SCFA/SCFA_pathways.tsv')
parser.add_argument('--font_style',
nargs = '?',
help = 'Specify the font style, font family and font type is delimited by a comma. default: [sans-serif,Arial]',
default = 'sans-serif,Arial')
parser.add_argument('--output',
nargs = '?',
help = 'Specify the output figure name.',
type = str,
default = None)
return vars(parser.parse_args())
pars = read_args(sys.argv)
facecolor_dict = {i.rstrip().split("t")[0]: i.rstrip().split("t")[1] for i in open(pars['facecolor_map']).readlines()}
make_mosaic_plot(pars["input"], facecolor_dict , pars["output"], font_style = pars["font_style"])
- Usage:
mosaic_plot.py [-h] [--input [INPUT]] [--facecolor_map [FACECOLOR_MAP]] [--font_style [FONT_STYLE]] [--output [OUTPUT]]
This program is to draw a mosaic plot.
optional arguments:
-h, --help show this help message and exit
--input [INPUT] Input a file containing two variable information regarding each individual subject.
--facecolor_map [FACECOLOR_MAP]
Specify the the pathway to SCFA metabolisms database. default: /vol/projects/khuang/databases/SCFA/SCFA_pathways.tsv
--font_style [FONT_STYLE]
Specify the font style, font family and font type is delimited by a comma. default: [sans-serif,Arial]
--output [OUTPUT] Specify the output figure name.
examples:
python mosaic_plot.py --input input_file.tsv --facecolor_map facecolor_mapfile.tsv --output mosaic_plot.png
示例命令:
代码语言:javascript复制python mosaic_plot.py
--input two_variable_mosaic.tsv
--facecolor_map facecolor_map.tsv
--output mosaic_plot.png
Note
马赛克图的面颜色应该按照示例中的映射文件mapping file: ./data/facecolor_map.tsv
来指定。