nsmallest和nlargest的使用
本文介绍两个函数的使用:nsmallest和nlargest。
官网地址:https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.nsmallest.html
代码语言:javascript复制DataFrame.nsmallest(
n, # int类型
columns, # 字段名
keep='first' # 重复值处理;{‘first’, ‘last’, ‘all’}, default ‘first’
)
模拟数据
代码语言:javascript复制import pandas as pd
import numpy as np
代码语言:javascript复制df = pd.DataFrame({"name":["xiaosun","zhoujuan","xiaozhang","wangfeng","xiaoming","zhangjun"],
"score":[100,128,100,150,100,145],
"age":[21,25,23,21,25,25],
"height":[1.75,1.8,1.77,1.8,1.9,1.71]
})
df
.dataframe tbody tr th:only-of-type { vertical-align: middle; } <pre><code>.dataframe tbody tr th { vertical-align: top; } .dataframe thead th { text-align: right; } </code></pre>
name | score | age | height | |
---|---|---|---|---|
0 | xiaosun | 100 | 21 | 1.75 |
1 | zhoujuan | 128 | 25 | 1.80 |
2 | xiaozhang | 100 | 23 | 1.77 |
3 | wangfeng | 150 | 21 | 1.80 |
4 | xiaoming | 100 | 25 | 1.90 |
5 | zhangjun | 145 | 25 | 1.71 |
nsmallest
默认情况
代码语言:javascript复制df.nsmallest(2, "score")
.dataframe tbody tr th:only-of-type { vertical-align: middle; } <pre><code>.dataframe tbody tr th { vertical-align: top; } .dataframe thead th { text-align: right; } </code></pre>
name | score | age | height | |
---|---|---|---|---|
0 | xiaosun | 100 | 21 | 1.75 |
2 | xiaozhang | 100 | 23 | 1.77 |
df.nsmallest(4, "score")
.dataframe tbody tr th:only-of-type { vertical-align: middle; } <pre><code>.dataframe tbody tr th { vertical-align: top; } .dataframe thead th { text-align: right; } </code></pre>
name | score | age | height | |
---|---|---|---|---|
0 | xiaosun | 100 | 21 | 1.75 |
2 | xiaozhang | 100 | 23 | 1.77 |
4 | xiaoming | 100 | 25 | 1.90 |
1 | zhoujuan | 128 | 25 | 1.80 |
可以看到默认情况,重复值也会多次计数。
参数keep
代码语言:javascript复制# 同上结果,默认first
df.nsmallest(4, "score", keep="first")
.dataframe tbody tr th:only-of-type { vertical-align: middle; } <pre><code>.dataframe tbody tr th { vertical-align: top; } .dataframe thead th { text-align: right; } </code></pre>
name | score | age | height | |
---|---|---|---|---|
0 | xiaosun | 100 | 21 | 1.75 |
2 | xiaozhang | 100 | 23 | 1.77 |
4 | xiaoming | 100 | 25 | 1.90 |
1 | zhoujuan | 128 | 25 | 1.80 |
df.nsmallest(4, "score", keep="last")
.dataframe tbody tr th:only-of-type { vertical-align: middle; } <pre><code>.dataframe tbody tr th { vertical-align: top; } .dataframe thead th { text-align: right; } </code></pre>
name | score | age | height | |
---|---|---|---|---|
4 | xiaoming | 100 | 25 | 1.90 |
2 | xiaozhang | 100 | 23 | 1.77 |
0 | xiaosun | 100 | 21 | 1.75 |
1 | zhoujuan | 128 | 25 | 1.80 |
排序的顺序发生了变化,从索引号最大的4开始;
如何理解keep=“all”?
代码语言:javascript复制df.nsmallest(2, "score")
.dataframe tbody tr th:only-of-type { vertical-align: middle; } <pre><code>.dataframe tbody tr th { vertical-align: top; } .dataframe thead th { text-align: right; } </code></pre>
name | score | age | height | |
---|---|---|---|---|
0 | xiaosun | 100 | 21 | 1.75 |
2 | xiaozhang | 100 | 23 | 1.77 |
当keep="all"会把全部的信息显示出来:
代码语言:javascript复制df.nsmallest(2, "score", keep="all")
.dataframe tbody tr th:only-of-type { vertical-align: middle; } <pre><code>.dataframe tbody tr th { vertical-align: top; } .dataframe thead th { text-align: right; } </code></pre>
name | score | age | height | |
---|---|---|---|---|
0 | xiaosun | 100 | 21 | 1.75 |
2 | xiaozhang | 100 | 23 | 1.77 |
4 | xiaoming | 100 | 25 | 1.90 |
多个字段取值
代码语言:javascript复制df.nsmallest(4,["age","height"])
.dataframe tbody tr th:only-of-type { vertical-align: middle; } <pre><code>.dataframe tbody tr th { vertical-align: top; } .dataframe thead th { text-align: right; } </code></pre>
name | score | age | height | |
---|---|---|---|---|
0 | xiaosun | 100 | 21 | 1.75 |
3 | wangfeng | 150 | 21 | 1.80 |
2 | xiaozhang | 100 | 23 | 1.77 |
5 | zhangjun | 145 | 25 | 1.71 |
nlargest
该函数是降序排列
https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.nlargest.html#pandas.DataFrame.nlargest
代码语言:javascript复制DataFrame.nlargest(
n,
columns,
keep='first' # {‘first’, ‘last’, ‘all’}, default ‘first’
)
代码语言:javascript复制df.nlargest(3,"score")
.dataframe tbody tr th:only-of-type { vertical-align: middle; } <pre><code>.dataframe tbody tr th { vertical-align: top; } .dataframe thead th { text-align: right; } </code></pre>
name | score | age | height | |
---|---|---|---|---|
3 | wangfeng | 150 | 21 | 1.80 |
5 | zhangjun | 145 | 25 | 1.71 |
1 | zhoujuan | 128 | 25 | 1.80 |
df.nlargest(3,"age")
.dataframe tbody tr th:only-of-type { vertical-align: middle; } <pre><code>.dataframe tbody tr th { vertical-align: top; } .dataframe thead th { text-align: right; } </code></pre>
name | score | age | height | |
---|---|---|---|---|
1 | zhoujuan | 128 | 25 | 1.80 |
4 | xiaoming | 100 | 25 | 1.90 |
5 | zhangjun | 145 | 25 | 1.71 |
df.nlargest(2,"age",keep="first")
.dataframe tbody tr th:only-of-type { vertical-align: middle; } <pre><code>.dataframe tbody tr th { vertical-align: top; } .dataframe thead th { text-align: right; } </code></pre>
name | score | age | height | |
---|---|---|---|---|
1 | zhoujuan | 128 | 25 | 1.8 |
4 | xiaoming | 100 | 25 | 1.9 |
df.nlargest(2,"age",keep="last")
.dataframe tbody tr th:only-of-type { vertical-align: middle; } <pre><code>.dataframe tbody tr th { vertical-align: top; } .dataframe thead th { text-align: right; } </code></pre>
name | score | age | height | |
---|---|---|---|---|
5 | zhangjun | 145 | 25 | 1.71 |
4 | xiaoming | 100 | 25 | 1.90 |
df.nlargest(2,"age",keep="all")
.dataframe tbody tr th:only-of-type { vertical-align: middle; } <pre><code>.dataframe tbody tr th { vertical-align: top; } .dataframe thead th { text-align: right; } </code></pre>
name | score | age | height | |
---|---|---|---|---|
1 | zhoujuan | 128 | 25 | 1.80 |
4 | xiaoming | 100 | 25 | 1.90 |
5 | zhangjun | 145 | 25 | 1.71 |
nlargest drop_duplicates
实现需求:找出年龄age最大的前2位;如果相同年龄,取出一个即可
代码语言:javascript复制df
.dataframe tbody tr th:only-of-type { vertical-align: middle; } <pre><code>.dataframe tbody tr th { vertical-align: top; } .dataframe thead th { text-align: right; } </code></pre>
name | score | age | height | |
---|---|---|---|---|
0 | xiaosun | 100 | 21 | 1.75 |
1 | zhoujuan | 128 | 25 | 1.80 |
2 | xiaozhang | 100 | 23 | 1.77 |
3 | wangfeng | 150 | 21 | 1.80 |
4 | xiaoming | 100 | 25 | 1.90 |
5 | zhangjun | 145 | 25 | 1.71 |
df["age"].value_counts()
代码语言:javascript复制25 3
21 2
23 1
Name: age, dtype: int64
年龄最大为25,且有3位;根据age去重:
代码语言:javascript复制df1 = df.drop_duplicates(subset=["age"], keep="first")
df1
.dataframe tbody tr th:only-of-type { vertical-align: middle; } <pre><code>.dataframe tbody tr th { vertical-align: top; } .dataframe thead th { text-align: right; } </code></pre>
name | score | age | height | |
---|---|---|---|---|
0 | xiaosun | 100 | 21 | 1.75 |
1 | zhoujuan | 128 | 25 | 1.80 |
2 | xiaozhang | 100 | 23 | 1.77 |
df1.nlargest(2,"age")
.dataframe tbody tr th:only-of-type { vertical-align: middle; } <pre><code>.dataframe tbody tr th { vertical-align: top; } .dataframe thead th { text-align: right; } </code></pre>
name | score | age | height | |
---|---|---|---|---|
1 | zhoujuan | 128 | 25 | 1.80 |
2 | xiaozhang | 100 | 23 | 1.77 |