Pandas函数使用-nlargest-nsmallest

nsmallest和nlargest的使用

本文介绍两个函数的使用：nsmallest和nlargest。

官网地址：https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.nsmallest.html

代码语言：javascript复制

DataFrame.nsmallest(
    n,  # int类型
    columns,  # 字段名
    keep='first'  # 重复值处理；{‘first’, ‘last’, ‘all’}, default ‘first’
   )

模拟数据

代码语言：javascript复制

import pandas as pd
import numpy as np

代码语言：javascript复制

df = pd.DataFrame({"name":["xiaosun","zhoujuan","xiaozhang","wangfeng","xiaoming","zhangjun"],
                   "score":[100,128,100,150,100,145],
                   "age":[21,25,23,21,25,25],
                   "height":[1.75,1.8,1.77,1.8,1.9,1.71]
                  })
df

.dataframe tbody tr th:only-of-type { vertical-align: middle; } <pre><code>.dataframe tbody tr th { vertical-align: top; } .dataframe thead th { text-align: right; } </code></pre>

	name	score	age	height
0	xiaosun	100	21	1.75
1	zhoujuan	128	25	1.80
2	xiaozhang	100	23	1.77
3	wangfeng	150	21	1.80
4	xiaoming	100	25	1.90
5	zhangjun	145	25	1.71

nsmallest

默认情况

代码语言：javascript复制

df.nsmallest(2, "score")

.dataframe tbody tr th:only-of-type { vertical-align: middle; } <pre><code>.dataframe tbody tr th { vertical-align: top; } .dataframe thead th { text-align: right; } </code></pre>

	name	score	age	height
0	xiaosun	100	21	1.75
2	xiaozhang	100	23	1.77

代码语言：javascript复制

df.nsmallest(4, "score")

.dataframe tbody tr th:only-of-type { vertical-align: middle; } <pre><code>.dataframe tbody tr th { vertical-align: top; } .dataframe thead th { text-align: right; } </code></pre>

	name	score	age	height
0	xiaosun	100	21	1.75
2	xiaozhang	100	23	1.77
4	xiaoming	100	25	1.90
1	zhoujuan	128	25	1.80

可以看到默认情况，重复值也会多次计数。

参数keep

代码语言：javascript复制

# 同上结果，默认first

df.nsmallest(4, "score", keep="first")

.dataframe tbody tr th:only-of-type { vertical-align: middle; } <pre><code>.dataframe tbody tr th { vertical-align: top; } .dataframe thead th { text-align: right; } </code></pre>

	name	score	age	height
0	xiaosun	100	21	1.75
2	xiaozhang	100	23	1.77
4	xiaoming	100	25	1.90
1	zhoujuan	128	25	1.80

代码语言：javascript复制

df.nsmallest(4, "score", keep="last")

.dataframe tbody tr th:only-of-type { vertical-align: middle; } <pre><code>.dataframe tbody tr th { vertical-align: top; } .dataframe thead th { text-align: right; } </code></pre>

	name	score	age	height
4	xiaoming	100	25	1.90
2	xiaozhang	100	23	1.77
0	xiaosun	100	21	1.75
1	zhoujuan	128	25	1.80

排序的顺序发生了变化，从索引号最大的4开始；

如何理解keep=“all”？

代码语言：javascript复制

df.nsmallest(2, "score")

.dataframe tbody tr th:only-of-type { vertical-align: middle; } <pre><code>.dataframe tbody tr th { vertical-align: top; } .dataframe thead th { text-align: right; } </code></pre>

	name	score	age	height
0	xiaosun	100	21	1.75
2	xiaozhang	100	23	1.77

当keep="all"会把全部的信息显示出来：

代码语言：javascript复制

df.nsmallest(2, "score", keep="all")

.dataframe tbody tr th:only-of-type { vertical-align: middle; } <pre><code>.dataframe tbody tr th { vertical-align: top; } .dataframe thead th { text-align: right; } </code></pre>

	name	score	age	height
0	xiaosun	100	21	1.75
2	xiaozhang	100	23	1.77
4	xiaoming	100	25	1.90

多个字段取值

代码语言：javascript复制

df.nsmallest(4,["age","height"])

.dataframe tbody tr th:only-of-type { vertical-align: middle; } <pre><code>.dataframe tbody tr th { vertical-align: top; } .dataframe thead th { text-align: right; } </code></pre>

	name	score	age	height
0	xiaosun	100	21	1.75
3	wangfeng	150	21	1.80
2	xiaozhang	100	23	1.77
5	zhangjun	145	25	1.71

nlargest

该函数是降序排列

https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.nlargest.html#pandas.DataFrame.nlargest

代码语言：javascript复制

DataFrame.nlargest(
    n,
    columns,
    keep='first'  # {‘first’, ‘last’, ‘all’}, default ‘first’
    )

代码语言：javascript复制

df.nlargest(3,"score")

.dataframe tbody tr th:only-of-type { vertical-align: middle; } <pre><code>.dataframe tbody tr th { vertical-align: top; } .dataframe thead th { text-align: right; } </code></pre>

	name	score	age	height
3	wangfeng	150	21	1.80
5	zhangjun	145	25	1.71
1	zhoujuan	128	25	1.80

代码语言：javascript复制

df.nlargest(3,"age")

.dataframe tbody tr th:only-of-type { vertical-align: middle; } <pre><code>.dataframe tbody tr th { vertical-align: top; } .dataframe thead th { text-align: right; } </code></pre>

	name	score	age	height
1	zhoujuan	128	25	1.80
4	xiaoming	100	25	1.90
5	zhangjun	145	25	1.71

代码语言：javascript复制

df.nlargest(2,"age",keep="first")

.dataframe tbody tr th:only-of-type { vertical-align: middle; } <pre><code>.dataframe tbody tr th { vertical-align: top; } .dataframe thead th { text-align: right; } </code></pre>

	name	score	age	height
1	zhoujuan	128	25	1.8
4	xiaoming	100	25	1.9

代码语言：javascript复制

df.nlargest(2,"age",keep="last")

.dataframe tbody tr th:only-of-type { vertical-align: middle; } <pre><code>.dataframe tbody tr th { vertical-align: top; } .dataframe thead th { text-align: right; } </code></pre>

	name	score	age	height
5	zhangjun	145	25	1.71
4	xiaoming	100	25	1.90

代码语言：javascript复制

df.nlargest(2,"age",keep="all")

.dataframe tbody tr th:only-of-type { vertical-align: middle; } <pre><code>.dataframe tbody tr th { vertical-align: top; } .dataframe thead th { text-align: right; } </code></pre>

	name	score	age	height
1	zhoujuan	128	25	1.80
4	xiaoming	100	25	1.90
5	zhangjun	145	25	1.71

nlargest drop_duplicates

实现需求：找出年龄age最大的前2位；如果相同年龄，取出一个即可

代码语言：javascript复制

df

.dataframe tbody tr th:only-of-type { vertical-align: middle; } <pre><code>.dataframe tbody tr th { vertical-align: top; } .dataframe thead th { text-align: right; } </code></pre>

	name	score	age	height
0	xiaosun	100	21	1.75
1	zhoujuan	128	25	1.80
2	xiaozhang	100	23	1.77
3	wangfeng	150	21	1.80
4	xiaoming	100	25	1.90
5	zhangjun	145	25	1.71

代码语言：javascript复制

df["age"].value_counts()

代码语言：javascript复制

25    3
21    2
23    1
Name: age, dtype: int64

年龄最大为25，且有3位；根据age去重：

代码语言：javascript复制

df1 = df.drop_duplicates(subset=["age"], keep="first")
df1

.dataframe tbody tr th:only-of-type { vertical-align: middle; } <pre><code>.dataframe tbody tr th { vertical-align: top; } .dataframe thead th { text-align: right; } </code></pre>

	name	score	age	height
0	xiaosun	100	21	1.75
1	zhoujuan	128	25	1.80
2	xiaozhang	100	23	1.77

代码语言：javascript复制

df1.nlargest(2,"age")

.dataframe tbody tr th:only-of-type { vertical-align: middle; } <pre><code>.dataframe tbody tr th { vertical-align: top; } .dataframe thead th { text-align: right; } </code></pre>

	name	score	age	height
1	zhoujuan	128	25	1.80
2	xiaozhang	100	23	1.77

code dataframe height pandas 函数

0 人点赞