顾老师新书《全栈软件测试工程师宝典》
https://item.m.jd.com/product/10023427978355.html
以前两本书的网上购买地址:
《软件测试技术实战设计、工具及管理》:
https://item.jd.com/34295655089.html
《基于Django的电子商务网站》:
https://item.jd.com/12082665.html
代码语言:javascript复制# coding:utf-8
import numpy as np
importpandas as pd
1 初始化数据
代码语言:javascript复制def init_data():
df =
pd.DataFrame({'Key1':['0','1','1','0','1','0','1','1','0'],'Key2':['A','B','A','B','A','A','B','A','B'],'Data1':np.random.randn(9),'Data2':np.random.randn(9)})
people =pd.DataFrame(np.random.randn(5,5),columns=['A','B','C','D','E'],index=['张三','李四','王五','赵六','田七'])
data = [df,people]
return data
2常用分组
2.1使用列名作为键分组
代码语言:javascript复制def group_by_column(df):
print(df['Data1'].groupby(df['Key1']).mean())#单列分组,mean()统计平均值
print(df['Data1'].groupby([df['Key1'],df['Key2']]).mean())#多列分组
print(df.groupby(df['Key1']).mean())
Key1
0 0.514210
1 -1.061533
Name: Data1,dtype: float64
Key1 Key2
0 A 1.213983
B -0.185563
1 A -1.106163
B -0.994589
Name: Data1,dtype: float64
Data1 Data2
Key1
0 0.514210 0.077754
1 -1.061533 -0.496585
2.2使用数组和列表作为键分组
代码语言:javascript复制defgroup_by_array_list(df):
city=np.array(['北京','上海','广州','北京','上海','广州','北京','上海','广州'])
year=np.array(['2018','2019','2020','2018','2019','2020','2018','2019','2020'])
print(df['Data1'].groupby([city,year]).mean())
上海 2019 0.748117
北京 2018 0.393102
广州 2020 0.384081
Name: Data1,dtype: float64
2.3使用字典和序列分组
代码语言:javascript复制def group_by_dict(people):
people.iloc[2:3,[1,2]]=np.nan
print(people)
mapping={'A':'红','B':'蓝','C':'蓝','D':'红','E':'蓝','F':'绿'}
by_column=people.groupby(mapping,axis=1)
print(by_column.sum())
map_series =pd.Series(mapping)
print(map_series)
print(people.groupby(map_series,axis=1).sum())
A B C D E
张三 1.310737 -0.999188 1.350368 0.038247 -1.731970
李四 -0.199826 0.157752 -0.722816 -2.521777 -1.088693
王五 2.149382 NaN NaN -0.580666 0.007590
赵六 0.416690 -0.602955 1.075470 0.570869 2.686723
田七 -1.244776 0.244324 1.479028 0.322721 -0.316716
红 蓝
张三 1.348984 -1.380790
李四 -2.721603 -1.653758
王五 1.568717 0.007590
赵六 0.987559 3.159237
田七 -0.922055 1.406636
A 红
B 蓝
C 蓝
D 红
E 蓝
F 绿
dtype: object
红 蓝
张三 1.348984 -1.380790
李四 -2.721603 -1.653758
王五 1.568717 0.007590
赵六 0.987559 3.159237
田七 -0.922055 1.406636
2.4使用函数分组
代码语言:javascript复制defgroup_by_func(people):
print(people.groupby(len).sum())#根据索引长度
print(people.groupby(len,axis=1).sum())#根据列名长度
A B C D E
2 -1.849514 2.00963 -2.590456 -0.215472 1.075749
1
张三 -1.361778
李四 -3.657590
王五 2.843443
赵六 -1.766513
田七 2.372376
3 聚合
3.1 基本聚合
代码语言:javascript复制def polymerization(df):
df =pd.DataFrame({'Key':['张三','张三','张三','李四','李四','李四','王五','王五','王五'],'Data':np.random.randn(9)})
print("df:n",df)
print("非空个数:n",df['Data'].groupby(df['Key']).count())
print("非空之和:n",df['Data'].groupby(df['Key']).sum())
print("非空平均值:n",df['Data'].groupby(df['Key']).mean())
print("非空中间值:n",df['Data'].groupby(df['Key']).median())
print("标准差:n",df['Data'].groupby(df['Key']).std())
print("方差:n",df['Data'].groupby(df['Key']).var())
print("非空最小值:n",df['Data'].groupby(df['Key']).min())
print("非空最大值:n",df['Data'].groupby(df['Key']).max())
print("非空积:n",df['Data'].groupby(df['Key']).prod())
print("第一个非空值:n",df['Data'].groupby(df['Key']).first())
print("最后一个非空值:n",df['Data'].groupby(df['Key']).last())
df:
Key Data
0 张三 0.115438
1 张三 -0.993400
2 张三 0.774209
3 李四 -0.580589
4 李四 -1.407389
5 李四 1.517611
6 王五 -0.246851
7 王五 0.141326
8 王五 -0.375641
非空个数:
Key
张三 3
李四 3
王五 3
Name: Data,dtype: int64
非空之和:
Key
张三 -0.103753
李四 -0.470368
王五 -0.481166
Name: Data,dtype: float64
非空平均值:
Key
张三 -0.034584
李四 -0.156789
王五 -0.160389
Name: Data,dtype: float64
非空中间值:
Key
张三 0.115438
李四 -0.580589
王五 -0.246851
Name: Data,dtype: float64
标准差:
Key
张三 0.893303
李四 1.507850
王五 0.269111
Name: Data,dtype: float64
方差:
Key
张三 0.797990
李四 2.273611
王五 0.072421
Name: Data,dtype: float64
非空最小值:
Key
张三 -0.993400
李四 -1.407389
王五 -0.375641
Name: Data,dtype: float64
非空最大值:
Key
张三 0.774209
李四 1.517611
王五 0.141326
Name: Data,dtype: float64
非空积:
Key
张三 -0.088784
李四 1.240063
王五 0.013105
Name: Data,dtype: float64
第一个非空值:
Key
张三 0.115438
李四 -0.580589
王五 -0.246851
Name: Data,dtype: float64
最后一个非空值:
Key
张三 0.774209
李四 1.517611
王五 -0.375641
Name: Data,dtype: float64
4 桶分析
代码语言:javascript复制def barrel():
frame=pd.DataFrame({'data1':np.random.randn(1000),'data2':np.random.randn(1000)})
quartiles =pd.cut(frame.data1,4)
print(quartiles[:4])
print(type(quartiles))
grouped =frame.data2.groupby(quartiles)
print(grouped.apply(get_stars).unstack())
0 (-0.225, 1.291]
1 (-1.741, -0.225]
2 (-1.741, -0.225]
3 (1.291, 2.807]
Name: data1,dtype: category
Categories (4,interval[float64]): [(-3.264, -1.741] < (-1.741, -0.225] < (-0.225,1.291] <
(1.291,2.807]]
<class'pandas.core.series.Series'>
min max count mean
data1
(-3.264, -1.741]-2.136158 2.236768 42.0 0.076879
(-1.741, -0.225]-4.184163 2.738323 386.0 0.040916
(-0.225,1.291] -3.116442 3.155207 477.0 0.031832
(1.291,2.807] -2.157628 2.200123 95.0 0.039635
代码语言:javascript复制def get_stars(group):
return{'min':group.min(),'max':group.max(),'count':group.count(),'mean':group.mean()}
代码语言:javascript复制if__name__=="__main__":
data = init_data()
group_by_column(data[0])
group_by_array_list(data[0])
group_by_dict(data[1])
group_by_func(data[1])
polymerization(data[0])
barrel()
—————————————————————————————————
顾老师课程欢迎报名
软件安全测试
https://study.163.com/course/courseMain.htm?courseId=1209779852&share=2&shareId=480000002205486
接口自动化测试
https://study.163.com/course/courseMain.htm?courseId=1209794815&share=2&shareId=480000002205486
DevOps 和Jenkins之DevOps
https://study.163.com/course/courseMain.htm?courseId=1209817844&share=2&shareId=480000002205486
DevOps与Jenkins 2.0之Jenkins
https://study.163.com/course/courseMain.htm?courseId=1209819843&share=2&shareId=480000002205486
Selenium自动化测试
https://study.163.com/course/courseMain.htm?courseId=1209835807&share=2&shareId=480000002205486
性能测试第1季:性能测试基础知识
https://study.163.com/course/courseMain.htm?courseId=1209852815&share=2&shareId=480000002205486
性能测试第2季:LoadRunner12使用
https://study.163.com/course/courseMain.htm?courseId=1209980013&share=2&shareId=480000002205486
性能测试第3季:JMeter工具使用
https://study.163.com/course/courseMain.htm?courseId=1209903814&share=2&shareId=480000002205486
性能测试第4季:监控与调优
https://study.163.com/course/courseMain.htm?courseId=1209959801&share=2&shareId=480000002205486
Django入门
https://study.163.com/course/courseMain.htm?courseId=1210020806&share=2&shareId=480000002205486
啄木鸟顾老师漫谈软件测试
https://study.163.com/course/courseMain.htm?courseId=1209958326&share=2&shareId=480000002205486