参考链接: Pandas处理丢失数据
Pandas学习笔记(4)-Pandas处理丢失数据、文件导入导出
dates = pd.date_range('20130101',periods=6)
df = pd.DataFrame(np.arange(24).reshape((6,4)),index=dates,columns=['A','B','C','D'])
df.iloc[0,1] = np.nan
df.iloc[1,2] = np.nan
print(df)
#out:
A B C D
2013-01-01 0 NaN 2.0 3
2013-01-02 4 5.0 NaN 7
2013-01-03 8 9.0 10.0 11
2013-01-04 12 13.0 14.0 15
2013-01-05 16 17.0 18.0 19
2013-01-06 20 21.0 22.0 23
dropna处理NULL数据
print(df.dropna(axis=0,how='any')) #去掉存在值为空的行 #how={'any','all'} all:行或列数据全部为Nan时才丢掉
#out:
A B C D
2013-01-03 8 9.0 10.0 11
2013-01-04 12 13.0 14.0 15
2013-01-05 16 17.0 18.0 19
2013-01-06 20 21.0 22.0 23
fillna填充NULL数据
print(df.fillna(value=0)) #给空的地方填入0
A B C D
2013-01-01 0 0.0 2.0 3
2013-01-02 4 5.0 0.0 7
2013-01-03 8 9.0 10.0 11
2013-01-04 12 13.0 14.0 15
2013-01-05 16 17.0 18.0 19
2013-01-06 20 21.0 22.0 23
isnull寻找NULL数据
print(df.isnull())
out:
A B C D
2013-01-01 False True False False
2013-01-02 False False True False
2013-01-03 False False False False
2013-01-04 False False False False
2013-01-05 False False False False
2013-01-06 False False False False
Pandas文件导入、导出
data = pd.read_excel('test.xls') #文件导入
print(data)
#out:
Student ID name age gender
0 0 kelly 11 Female
1 1 lory 12 Female
2 2 dlsaj 11 Male
3 3 sddsds 11 Male
4 4 sdsd 11 Male
5 5 sds 11 Female
6 6 dsds 11 Female
7 7 sdsd 11 Male
8 8 sdsdsds 22 Male
data.to_pickle('student.pickle') #文件导出