01. 引言
最近着手准备使用R-ggpot2进行图表绘制,当然,Python可视化图表绘制也不能落下,所以,后面的推文我尽量会推出两种教程
。得益于ggplot2丰富的拓展包,本期推文就绘制一幅经济学人风格图表的绘制教程。
02. ggplot2 可视化绘制
在可视化部分,数据处理我们就相对弱化,后面会推出系列的教程的。本次推文所需数据如下(部分):
绘图代码如下:
代码语言:javascript复制library(tidyverse)
library(ggthemes)
library(cowplot)
p <- df_research %>%
ggplot(aes(percent_women_ed, field))
geom_vline(xintercept = 0, color = "black", size = 0.7)
geom_vline(xintercept = 50, color = "red1", size = 0.7)
geom_point(data = filter(df_research, !is.na(type)), aes(color = type, fill = type),
size = 9, alpha = 0.8, shape = 21)
geom_point(data = filter(df_research, is.na(type)), size = 4.5,
color = "black", fill = "grey40", alpha = 0.4, shape = 21)
scale_x_continuous(limits = c(0, 65), breaks = seq(0, 60, by = 10), expand = c(0, 0))
scale_color_manual(values = c("firebrick4", "turquoise4", "dodgerblue3"), name = NULL)
scale_fill_manual(values = c("firebrick4", "turquoise4", "dodgerblue3"), name = NULL)
guides(color = guide_legend(override.aes = list(size = 4)))
labs(x = NULL, y = NULL,
caption = 'nSources: "Gender in the Global Research Landscape" by Elsevier')
#使用经济学人主题
theme_economist()
theme(text = element_text(family = "Open_Sans"),
axis.text = element_text(size = 14),
axis.ticks.x = element_blank(),
axis.line.x = element_blank(),
legend.text = element_text(size = 11),
legend.position = "top",
plot.caption = element_text(color = "grey40"),
plot.background = element_rect(fill = "#dcf0f7"),
panel.grid.major.y = element_blank(),
panel.grid.major.x = element_line(color = "grey70", size = 0.4),
panel.background = element_rect(fill = "#dcf0f7"))
p
当然,我们这里使用cowplot 包的ggdraw()、draw_text()添加一些文本要素。具体如下:
代码语言:javascript复制p_research <- ggdraw(p)
draw_text("Still a man's world", x = 0.02, y = 0.98,
hjust = 0, vjust = 1, size = 20, family = "Open_Sans_ExtraBold")
draw_text("Women among researchers with papers published",
x = 0.02, y = 0.93, hjust = 0, vjust = 1, size = 14, family = "Open_Sans")
draw_text("(indexed in Scopus from 2011 to 2015, % of total)",
x = 0.02, y = 0.88, hjust = 0, vjust = 1, size = 11, family = "Open_Sans")
p_research
这里在绘图系统里进行了数据选择,分别如下:
代码语言:javascript复制data = filter(df_research, !is.na(type))
data = filter(df_research, is.na(type))
代码也很明确,即筛选空值和非空值。最终的可视化结果如下:
可以看到,ggplot2 绘制不同风格的主题非常方便,调用包即可,这一点Python可视化绘制则繁琐一点,需一点点绘制精修。
03. Seaborn 可视化绘制
这里使用Python-seaborn 进行绘制可以免去很多繁琐的步骤,作者我也是在尝试使用 matplotlib绘制无果的情况下直接使用seaborn绘制,直接上代码:
代码语言:javascript复制#数据筛选
plt.rcParams['font.family'] = ['Open Sans']
#开始绘图
fig,ax = plt.subplots(figsize=(4,4),dpi=200,facecolor='#DCF0F7',edgecolor='#DCF0F7')
ax.set_facecolor("#DCF0F7")
#添加竖线
ax.axvline(x=0,color='k',lw=.9,zorder=0)
ax.axvline(x=50,color='#FF0000',lw=1.5,zorder=0)
ax.invert_yaxis()
#颜色设置
palette = ['#9B4546','#2C9BA1','#3F8DD5']
scatter = sns.scatterplot(x='percent_women_ed',y='field',hue='country',
data=df_research[df_research['type'].isin(['Japan','EU28','Portugal'])],
palette=palette,s=400,
ec='none',alpha=.95,zorder=1,ax=ax)
scatter_na = sns.scatterplot(x='percent_women_ed',y='field',color='gray',s=100,ec='k',linewidth=.4,alpha=.4
data=df_research[~df_research['type'].isin(['Japan','EU28','Portugal'])],
ax=ax,zorder=2)
#定制化绘制
ax.tick_params(left=False,bottom=False,labelsize=13,pad=10)
for spine in ['top','bottom','left','right']:
ax.spines[spine].set_color("none")
ax.grid(which='major',axis='x',ls='-',c='gray',alpha=.6,lw=.5)
ax.set_xlabel("")
ax.set_ylabel("")
ax.set_xticks(np.arange(0,70,10))
ax.set_xlim(left=0,right=65)
ax.set_axisbelow(True)
#修改图例
legend = ax.legend(frameon=False,ncol=4,markerscale=1.5,loc='upper right',fontsize=10,
bbox_to_anchor=(1, 1.15),columnspacing=.2)
#去除图例标题
legend.get_texts()[0].set_text('')
ax.text(-.62,1.3, "Still a man's world",transform = ax.transAxes,
ha='left', va='center',fontsize = 18,color='black',fontweight='bold')
ax.text(-.62,1.2, "Women among researchers with papers publishedn(indexed in Scopus from 2011 to 2015, % of total)",
transform = ax.transAxes,ha='left', va='center',fontsize = 12,color='black')
ax.text(.91,-.12,'nVisualization by DataCharm',transform = ax.transAxes,
ha='center', va='center',fontsize = 6,color='black')
plt.savefig(r'F:DataCharmArtist_charts_make_python_Rwomen_researchawomen_research_seaborn.png',
width=5,height=5,dpi=900,bbox_inches='tight',facecolor='#DCF0F7')
知识点:
(1)数据筛选:
代码语言:javascript复制df_research[df_research['type'].isin(['Japan','EU28','Portugal'])]
#和
df_research[~df_research['type'].isin(['Japan','EU28','Portugal'])]
这个步骤筛选出在特定字符串列表的行,也是数据操作中较常使用的方法,大家可以参看下,当然,也可以使用 str.contains()方法进行数据匹配。除此之外,这里使用了 ~ 符号进行“反选”操作。
(2)坐标轴类别数据设置
这里的y轴为具体的分类数据,如下:
matplotlib绘制较为麻烦,而使用seaborn则可完美解决,体现出sns.scatterplot() 的集成优势
。
(3)图例的设置
代码语言:javascript复制#修改图例
legend = ax.legend(frameon=False,ncol=4,markerscale=1.5,loc='upper right',fontsize=10,
bbox_to_anchor=(1, 1.15),columnspacing=.2)
#去除图例标题
legend.get_texts()[0].set_text('')
seaborn图例设置可是花费我大把时间
,现阶段也算对其能够进行定制化设置了。
可视化结果如下:
04. 总结
R-ggplot2 和Python-Seaborn 各有自己的绘图特点,说真的ggplot2 几乎对图表的每一元素都有对应的函数操作,绘制起来还是比较方便。两者算各有千秋吧