标签:Python,wordcloud
本文演示如何在Python中创建词云。词云是一种文本数据可视化,词云图中有些词更大、更粗,而另一些词则更小。通常,数据文本中提到的特定单词越多,这些单词在可视化中显示就越大。
首先,需要安装下列库:
pip install wordcloud numpy matplotlib pillow
文本数据摘自Telsa的2021影响报告,描述了该公司的目标。文本数据如下:
Tesla’s purpose is to accelerate the world’s transition to sustainable energy.
We strive to be the best on every metric relevant to our mission to accelerate the world’s transition to
sustainable energy. To maximize our impact, we plan to continue increasing our production volumes and the
accessibility of our products. In more concrete terms, this means that by 2030 we are aiming to sell 20 million
electric vehicles per year (compared to 0.94 million in 2021) and deploy 1,500 GWh of energy storage per year
(compared to 4 GWh in 2021).
If we were to achieve such a vehicle delivery milestone through a consistent growth rate, the total Tesla vehicle
fleet would surpass tens of millions of vehicles by 2030, and each of those vehicles could save tons of CO2e
emissions every year of usage.
Furthermore, each product we make must be continuously improved at each step of its lifecycle: from
manufacturing to consumer use to recycling.
We must also improve every metric, including the energy and water used to make our products, how safe our
customers and employees are and the affordability and accessibility of our products. Each of these themes will
be covered in this year’s Impact Report.
导入相应的库:
from wordcloud import WordCloud
import numpy as np
import matplotlib.pyplot as plt
from PIL import Image
wordcloud库相当容易使用,使用一行Python代码就创建了词云可视化。
wc = WordCloud().generate(text_data)
plt.axis('off')
plt.imshow(wc)
plt.axis(‘off’)隐藏坐标轴,这是可选的,仅用于更好的外观。然后,需要使用plt.imshow()显示词云。
每次运行WordCloud().generate()时,每个单词的颜色和位置都是随机的。运行结果如下图1所示。
图1
为了增加词云的趣味,我们可以将单词组织成任何形状,而不仅仅是矩形。
建议使用黑白图像以获得最佳效果,而且不需要对图像进行额外处理。下面是找到的一张苹果标志的图片,但你可以随意使用任何你想要的图片。
图2
使用Pillow库将图像读入Python。对于计算机来说,图像只是一个从0到255的整数矩阵。numpy库可以方便地将Pillow图像对象转换为np.array对象。注意,[255,255,255]对应于RGB颜色值。值[0,0,0]表示黑色,[255,255,255]表示白色。
img_url = r'D:testapple.png'
img_mask = np.array(Image.open(img_url))
图3
plt.imshow(img_mask)
图4
注意上图4,苹果的形状是黑色的,背景是白色的——这正是我们想要的。白色区域是“遮罩”。wordcloud库不会在(白色)遮罩区域显示任何内容,同时,它会找到一种方法来组织苹果徽标形状内的单词。
wc = WordCloud(width=1600, height=1600, mask=img_mask,
background_color='white').generate(text_data)
plt.figure(figsize=[10,10])
plt.axis("off")
plt.imshow(wc)
图5
如果认为形状不够明显,还可以在单词周围添加边界线(轮廓)。只需将contour_width和contour_color参数传入WordCloud()构造函数:
wc = WordCloud(width=1600, height=1600, mask=img_mask,
background_color='white',
contour_width=1,
contour_color='red').generate(text_data)
plt.figure(figsize=[10,10])
plt.axis("off")
plt.imshow(wc)
图6
注:本文学习整理自pythoninoffice.com,供有兴趣的朋友学习参考。
欢迎在下面留言,完善本文内容,让更多的人学到更完美的知识。