写在前面
昨天我们讲了英文词云绘制,今天我们来试试中文词云,首先我们需要一本道德经
读取文件
代码语言:javascript复制#-*- coding:utf-8 -*-
with open('C:\Users\Administrator\Desktop\daode.txt',errors='ignore') as read_file:#读取文本
data=read_file.read()
print(data)
读取出来咋用啊,还是逐行读取为字符串吧
代码语言:javascript复制data = ''
with open('C:\Users\Administrator\Desktop\daode.txt',errors='ignore') as f:#逐行读取文本为str
for line in f.readlines():
line = line.strip()
data = line
print(data)
去一下标点符号
代码语言:javascript复制from string import punctuation
str = data
add_punc=',。、【】“”:;()《》‘’{}?!⑦()、%^>℃:.”“^-——=擅长于的&#@¥' # 去除字符串内的符号
all_punc = punctuation add_punc
temp = []
for c in str:
if c not in all_punc :
temp.append(c)
newText = ''.join(temp)
print(newText)
去除数字
代码语言:javascript复制from string import digits
s = newText
remove_digits = str.maketrans('', '', digits)#去除字符串内的数字
res = s.translate(remove_digits)
print(res)
结巴(jieba)分词
代码语言:javascript复制import jieba
mytext = " ".join(jieba.cut(res))
print(mytext)
可视化
代码语言:javascript复制import wordcloud
c = wordcloud.WordCloud(background_color='white')#1.配置对象参数,背景色换为白色
wenzi = "He is busy every day. He has many thing to do. He has no time to go home for lunch. He gets home at 7:00 p.m. At home he does the housework. He cooks nice dishes for mother and me."
c.generate(mytext) #2.加载词云文本
c.to_file("pywordcloud.png")#3.输出词云文件
懵逼了吧,宝儿,这是因为matplotlib
默认字体是不包含中文的,所以我们要给他的参数定义一个字体
import wordcloud
c = wordcloud.WordCloud(font_path="msyh.ttc",background_color='white')#1.配置对象参数,背景色换为白色
wenzi = "He is busy every day. He has many thing to do. He has no time to go home for lunch. He gets home at 7:00 p.m. At home he does the housework. He cooks nice dishes for mother and me."
c.generate(mytext) #2.加载词云文本
c.to_file("pywordcloud.png")#3.输出词云文件