Pandas函数使用-nlargest-nsmallest

2023-08-25 11:41:32 浏览数 (2)

nsmallest和nlargest的使用

本文介绍两个函数的使用:nsmallest和nlargest。

官网地址:https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.nsmallest.html

代码语言:javascript复制
DataFrame.nsmallest(
    n,  # int类型
    columns,  # 字段名
    keep='first'  # 重复值处理;{‘first’, ‘last’, ‘all’}, default ‘first’
   )

模拟数据

代码语言:javascript复制
import pandas as pd
import numpy as np
代码语言:javascript复制
df = pd.DataFrame({"name":["xiaosun","zhoujuan","xiaozhang","wangfeng","xiaoming","zhangjun"],
                   "score":[100,128,100,150,100,145],
                   "age":[21,25,23,21,25,25],
                   "height":[1.75,1.8,1.77,1.8,1.9,1.71]
                  })
df

.dataframe tbody tr th:only-of-type { vertical-align: middle; } <pre><code>.dataframe tbody tr th { vertical-align: top; } .dataframe thead th { text-align: right; } </code></pre>

name

score

age

height

0

xiaosun

100

21

1.75

1

zhoujuan

128

25

1.80

2

xiaozhang

100

23

1.77

3

wangfeng

150

21

1.80

4

xiaoming

100

25

1.90

5

zhangjun

145

25

1.71

nsmallest

默认情况

代码语言:javascript复制
df.nsmallest(2, "score")

.dataframe tbody tr th:only-of-type { vertical-align: middle; } <pre><code>.dataframe tbody tr th { vertical-align: top; } .dataframe thead th { text-align: right; } </code></pre>

name

score

age

height

0

xiaosun

100

21

1.75

2

xiaozhang

100

23

1.77

代码语言:javascript复制
df.nsmallest(4, "score")

.dataframe tbody tr th:only-of-type { vertical-align: middle; } <pre><code>.dataframe tbody tr th { vertical-align: top; } .dataframe thead th { text-align: right; } </code></pre>

name

score

age

height

0

xiaosun

100

21

1.75

2

xiaozhang

100

23

1.77

4

xiaoming

100

25

1.90

1

zhoujuan

128

25

1.80

可以看到默认情况,重复值也会多次计数。

参数keep

代码语言:javascript复制
# 同上结果,默认first

df.nsmallest(4, "score", keep="first")

.dataframe tbody tr th:only-of-type { vertical-align: middle; } <pre><code>.dataframe tbody tr th { vertical-align: top; } .dataframe thead th { text-align: right; } </code></pre>

name

score

age

height

0

xiaosun

100

21

1.75

2

xiaozhang

100

23

1.77

4

xiaoming

100

25

1.90

1

zhoujuan

128

25

1.80

代码语言:javascript复制
df.nsmallest(4, "score", keep="last")

.dataframe tbody tr th:only-of-type { vertical-align: middle; } <pre><code>.dataframe tbody tr th { vertical-align: top; } .dataframe thead th { text-align: right; } </code></pre>

name

score

age

height

4

xiaoming

100

25

1.90

2

xiaozhang

100

23

1.77

0

xiaosun

100

21

1.75

1

zhoujuan

128

25

1.80

排序的顺序发生了变化,从索引号最大的4开始;

如何理解keep=“all”?

代码语言:javascript复制
df.nsmallest(2, "score")

.dataframe tbody tr th:only-of-type { vertical-align: middle; } <pre><code>.dataframe tbody tr th { vertical-align: top; } .dataframe thead th { text-align: right; } </code></pre>

name

score

age

height

0

xiaosun

100

21

1.75

2

xiaozhang

100

23

1.77

当keep="all"会把全部的信息显示出来:

代码语言:javascript复制
df.nsmallest(2, "score", keep="all")

.dataframe tbody tr th:only-of-type { vertical-align: middle; } <pre><code>.dataframe tbody tr th { vertical-align: top; } .dataframe thead th { text-align: right; } </code></pre>

name

score

age

height

0

xiaosun

100

21

1.75

2

xiaozhang

100

23

1.77

4

xiaoming

100

25

1.90

多个字段取值

代码语言:javascript复制
df.nsmallest(4,["age","height"])

.dataframe tbody tr th:only-of-type { vertical-align: middle; } <pre><code>.dataframe tbody tr th { vertical-align: top; } .dataframe thead th { text-align: right; } </code></pre>

name

score

age

height

0

xiaosun

100

21

1.75

3

wangfeng

150

21

1.80

2

xiaozhang

100

23

1.77

5

zhangjun

145

25

1.71

nlargest

该函数是降序排列

https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.nlargest.html#pandas.DataFrame.nlargest

代码语言:javascript复制
DataFrame.nlargest(
    n,
    columns,
    keep='first'  # {‘first’, ‘last’, ‘all’}, default ‘first’
    )
代码语言:javascript复制
df.nlargest(3,"score")

.dataframe tbody tr th:only-of-type { vertical-align: middle; } <pre><code>.dataframe tbody tr th { vertical-align: top; } .dataframe thead th { text-align: right; } </code></pre>

name

score

age

height

3

wangfeng

150

21

1.80

5

zhangjun

145

25

1.71

1

zhoujuan

128

25

1.80

代码语言:javascript复制
df.nlargest(3,"age")

.dataframe tbody tr th:only-of-type { vertical-align: middle; } <pre><code>.dataframe tbody tr th { vertical-align: top; } .dataframe thead th { text-align: right; } </code></pre>

name

score

age

height

1

zhoujuan

128

25

1.80

4

xiaoming

100

25

1.90

5

zhangjun

145

25

1.71

代码语言:javascript复制
df.nlargest(2,"age",keep="first")

.dataframe tbody tr th:only-of-type { vertical-align: middle; } <pre><code>.dataframe tbody tr th { vertical-align: top; } .dataframe thead th { text-align: right; } </code></pre>

name

score

age

height

1

zhoujuan

128

25

1.8

4

xiaoming

100

25

1.9

代码语言:javascript复制
df.nlargest(2,"age",keep="last")

.dataframe tbody tr th:only-of-type { vertical-align: middle; } <pre><code>.dataframe tbody tr th { vertical-align: top; } .dataframe thead th { text-align: right; } </code></pre>

name

score

age

height

5

zhangjun

145

25

1.71

4

xiaoming

100

25

1.90

代码语言:javascript复制
df.nlargest(2,"age",keep="all")

.dataframe tbody tr th:only-of-type { vertical-align: middle; } <pre><code>.dataframe tbody tr th { vertical-align: top; } .dataframe thead th { text-align: right; } </code></pre>

name

score

age

height

1

zhoujuan

128

25

1.80

4

xiaoming

100

25

1.90

5

zhangjun

145

25

1.71

nlargest drop_duplicates

实现需求:找出年龄age最大的前2位;如果相同年龄,取出一个即可

代码语言:javascript复制
df

.dataframe tbody tr th:only-of-type { vertical-align: middle; } <pre><code>.dataframe tbody tr th { vertical-align: top; } .dataframe thead th { text-align: right; } </code></pre>

name

score

age

height

0

xiaosun

100

21

1.75

1

zhoujuan

128

25

1.80

2

xiaozhang

100

23

1.77

3

wangfeng

150

21

1.80

4

xiaoming

100

25

1.90

5

zhangjun

145

25

1.71

代码语言:javascript复制
df["age"].value_counts()
代码语言:javascript复制
25    3
21    2
23    1
Name: age, dtype: int64

年龄最大为25,且有3位;根据age去重:

代码语言:javascript复制
df1 = df.drop_duplicates(subset=["age"], keep="first")
df1

.dataframe tbody tr th:only-of-type { vertical-align: middle; } <pre><code>.dataframe tbody tr th { vertical-align: top; } .dataframe thead th { text-align: right; } </code></pre>

name

score

age

height

0

xiaosun

100

21

1.75

1

zhoujuan

128

25

1.80

2

xiaozhang

100

23

1.77

代码语言:javascript复制
df1.nlargest(2,"age")

.dataframe tbody tr th:only-of-type { vertical-align: middle; } <pre><code>.dataframe tbody tr th { vertical-align: top; } .dataframe thead th { text-align: right; } </code></pre>

name

score

age

height

1

zhoujuan

128

25

1.80

2

xiaozhang

100

23

1.77

0 人点赞