索引排序-sort_index
针对Pandas中索引的排序功能介绍,详细内容参考官网:
https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.sort_index.html
参数介绍
代码语言:javascript复制DataFrame.sort_index(axis=0,
level=None,
ascending=True,
inplace=False,
kind='quicksort',
na_position='last',
sort_remaining=True,
ignore_index=False,
key=None)
参数说明:
- axis:排序的轴:axis=0表示行,axis=1表示列
- level:如果是多层索引的排序,表示根据指定的索引进行排序,可以是索引号,名称或者多个索引组成的列表
- ascending:排序规则,默认是升序
- inplace:表示是否原地修改;默认是False
- kind:表示选的排序算法
- na_position:空值的位置选择,first或者last。默认是last
- sort_remaining:
数据模拟
代码语言:javascript复制import pandas as pd
import numpy as np
代码语言:javascript复制df = pd.DataFrame({"name":["Jimmy","Ana","Tom","John"],
"age":[24,20,19,28],
"Math":[100,120,80,150],
"address":["beijing","shanghai","shenzhen","guangzhou"]
},
index=[np.nan,2,0,1]) # 存在空值
df
.dataframe tbody tr th:only-of-type { vertical-align: middle; } <pre><code>.dataframe tbody tr th { vertical-align: top; } .dataframe thead th { text-align: right; } </code></pre>
name | age | Math | address | |
---|---|---|---|---|
NaN | Jimmy | 24 | 100 | beijing |
2.0 | Ana | 20 | 120 | shanghai |
0.0 | Tom | 19 | 80 | shenzhen |
1.0 | John | 28 | 150 | guangzhou |
参数axis
代码语言:javascript复制# df.sort_index() 默认
df.sort_index(axis=0)
.dataframe tbody tr th:only-of-type { vertical-align: middle; } <pre><code>.dataframe tbody tr th { vertical-align: top; } .dataframe thead th { text-align: right; } </code></pre>
name | age | Math | address | |
---|---|---|---|---|
0.0 | Tom | 19 | 80 | shenzhen |
1.0 | John | 28 | 150 | guangzhou |
2.0 | Ana | 20 | 120 | shanghai |
NaN | Jimmy | 24 | 100 | beijing |
默认是在axis=0轴上进行排序;且默认是升序排列
代码语言:javascript复制df.sort_index(axis=1)
.dataframe tbody tr th:only-of-type { vertical-align: middle; } <pre><code>.dataframe tbody tr th { vertical-align: top; } .dataframe thead th { text-align: right; } </code></pre>
Math | address | age | name | |
---|---|---|---|---|
NaN | 100 | beijing | 24 | Jimmy |
2.0 | 120 | shanghai | 20 | Ana |
0.0 | 80 | shenzhen | 19 | Tom |
1.0 | 150 | guangzhou | 28 | John |
axis=1表示在列方向上进行排序;上面的列字段全部是字母,则根据它们的ASCII码表的大小来排序
参数ignore_index
默认情况是保留原索引。如果是设置成True,则行索引变成0,1,2…N-1
代码语言:javascript复制# 默认情况
df.sort_index(axis=1,ignore_index=False)
.dataframe tbody tr th:only-of-type { vertical-align: middle; } <pre><code>.dataframe tbody tr th { vertical-align: top; } .dataframe thead th { text-align: right; } </code></pre>
Math | address | age | name | |
---|---|---|---|---|
NaN | 100 | beijing | 24 | Jimmy |
2.0 | 120 | shanghai | 20 | Ana |
0.0 | 80 | shenzhen | 19 | Tom |
1.0 | 150 | guangzhou | 28 | John |
df.sort_index(axis=1,ignore_index=True)
.dataframe tbody tr th:only-of-type { vertical-align: middle; } <pre><code>.dataframe tbody tr th { vertical-align: top; } .dataframe thead th { text-align: right; } </code></pre>
Math | address | age | name | |
---|---|---|---|---|
0 | 100 | beijing | 24 | Jimmy |
1 | 120 | shanghai | 20 | Ana |
2 | 80 | shenzhen | 19 | Tom |
3 | 150 | guangzhou | 28 | John |
参数key
可选项,如果不是空值,则在排序之前现将key函数作用于指定的索引上,再进行排序。
代码语言:javascript复制df.sort_index(axis=1) # 默认axis=1
.dataframe tbody tr th:only-of-type { vertical-align: middle; } <pre><code>.dataframe tbody tr th { vertical-align: top; } .dataframe thead th { text-align: right; } </code></pre>
Math | address | age | name | |
---|---|---|---|---|
NaN | 100 | beijing | 24 | Jimmy |
2.0 | 120 | shanghai | 20 | Ana |
0.0 | 80 | shenzhen | 19 | Tom |
1.0 | 150 | guangzhou | 28 | John |
df.sort_index(axis=1, key=lambda x: x.str.lower())
.dataframe tbody tr th:only-of-type { vertical-align: middle; } <pre><code>.dataframe tbody tr th { vertical-align: top; } .dataframe thead th { text-align: right; } </code></pre>
address | age | Math | name | |
---|---|---|---|---|
NaN | beijing | 24 | 100 | Jimmy |
2.0 | shanghai | 20 | 120 | Ana |
0.0 | shenzhen | 19 | 80 | Tom |
1.0 | guangzhou | 28 | 150 | John |
当指定了key函数:将列属性全部小写;此时Math变成了math。
后面排序的话,也就是根据全部小写的字段进行排序,所以Math会在name的前面。
参数ascending
代码语言:javascript复制df.sort_index()
.dataframe tbody tr th:only-of-type { vertical-align: middle; } <pre><code>.dataframe tbody tr th { vertical-align: top; } .dataframe thead th { text-align: right; } </code></pre>
name | age | Math | address | |
---|---|---|---|---|
0.0 | Tom | 19 | 80 | shenzhen |
1.0 | John | 28 | 150 | guangzhou |
2.0 | Ana | 20 | 120 | shanghai |
NaN | Jimmy | 24 | 100 | beijing |
# df.sort_index() 默认情况:升序
df.sort_index(ascending=True)
.dataframe tbody tr th:only-of-type { vertical-align: middle; } <pre><code>.dataframe tbody tr th { vertical-align: top; } .dataframe thead th { text-align: right; } </code></pre>
name | age | Math | address | |
---|---|---|---|---|
0.0 | Tom | 19 | 80 | shenzhen |
1.0 | John | 28 | 150 | guangzhou |
2.0 | Ana | 20 | 120 | shanghai |
NaN | Jimmy | 24 | 100 | beijing |
df.sort_index(ascending=False) # 设置成降序
.dataframe tbody tr th:only-of-type { vertical-align: middle; } <pre><code>.dataframe tbody tr th { vertical-align: top; } .dataframe thead th { text-align: right; } </code></pre>
name | age | Math | address | |
---|---|---|---|---|
2.0 | Ana | 20 | 120 | shanghai |
1.0 | John | 28 | 150 | guangzhou |
0.0 | Tom | 19 | 80 | shenzhen |
NaN | Jimmy | 24 | 100 | beijing |
参数inplace
inplace的作用是用来直接修改原数据还是生成新的数据。
如果是True,则表示原地修改,即原数据直接改变。
为了演示的方便,先生成一个df的副本df1,对df1直接操作:
代码语言:javascript复制df1 = df.copy()
df1
.dataframe tbody tr th:only-of-type { vertical-align: middle; } <pre><code>.dataframe tbody tr th { vertical-align: top; } .dataframe thead th { text-align: right; } </code></pre>
name | age | Math | address | |
---|---|---|---|---|
NaN | Jimmy | 24 | 100 | beijing |
2.0 | Ana | 20 | 120 | shanghai |
0.0 | Tom | 19 | 80 | shenzhen |
1.0 | John | 28 | 150 | guangzhou |
# 默认是False
df1.sort_index(inplace=False)
.dataframe tbody tr th:only-of-type { vertical-align: middle; } <pre><code>.dataframe tbody tr th { vertical-align: top; } .dataframe thead th { text-align: right; } </code></pre>
name | age | Math | address | |
---|---|---|---|---|
0.0 | Tom | 19 | 80 | shenzhen |
1.0 | John | 28 | 150 | guangzhou |
2.0 | Ana | 20 | 120 | shanghai |
NaN | Jimmy | 24 | 100 | beijing |
此时df1是没有改变的:
代码语言:javascript复制df1
.dataframe tbody tr th:only-of-type { vertical-align: middle; } <pre><code>.dataframe tbody tr th { vertical-align: top; } .dataframe thead th { text-align: right; } </code></pre>
name | age | Math | address | |
---|---|---|---|---|
NaN | Jimmy | 24 | 100 | beijing |
2.0 | Ana | 20 | 120 | shanghai |
0.0 | Tom | 19 | 80 | shenzhen |
1.0 | John | 28 | 150 | guangzhou |
df1.sort_index(inplace=True) # 原地修改
如果设置成True,此时df1已经完成了排序工作:
代码语言:javascript复制df1
.dataframe tbody tr th:only-of-type { vertical-align: middle; } <pre><code>.dataframe tbody tr th { vertical-align: top; } .dataframe thead th { text-align: right; } </code></pre>
name | age | Math | address | |
---|---|---|---|---|
0.0 | Tom | 19 | 80 | shenzhen |
1.0 | John | 28 | 150 | guangzhou |
2.0 | Ana | 20 | 120 | shanghai |
NaN | Jimmy | 24 | 100 | beijing |
参数kind
kind表示排序选择的算法:{‘quicksort’, ‘mergesort’, ‘heapsort’},默认是’quicksort‘。
- ‘quicksort’:快速排序
- ‘mergesort’:合并排序
- ‘heapsort’:堆排序
df.sort_index()
.dataframe tbody tr th:only-of-type { vertical-align: middle; } <pre><code>.dataframe tbody tr th { vertical-align: top; } .dataframe thead th { text-align: right; } </code></pre>
name | age | Math | address | |
---|---|---|---|---|
0.0 | Tom | 19 | 80 | shenzhen |
1.0 | John | 28 | 150 | guangzhou |
2.0 | Ana | 20 | 120 | shanghai |
NaN | Jimmy | 24 | 100 | beijing |
df.sort_index(kind="mergesort")
.dataframe tbody tr th:only-of-type { vertical-align: middle; } <pre><code>.dataframe tbody tr th { vertical-align: top; } .dataframe thead th { text-align: right; } </code></pre>
name | age | Math | address | |
---|---|---|---|---|
0.0 | Tom | 19 | 80 | shenzhen |
1.0 | John | 28 | 150 | guangzhou |
2.0 | Ana | 20 | 120 | shanghai |
NaN | Jimmy | 24 | 100 | beijing |
参数na_position
空值的位置选择,first或者last。默认是last
代码语言:javascript复制df.sort_index()
.dataframe tbody tr th:only-of-type { vertical-align: middle; } <pre><code>.dataframe tbody tr th { vertical-align: top; } .dataframe thead th { text-align: right; } </code></pre>
name | age | Math | address | |
---|---|---|---|---|
0.0 | Tom | 19 | 80 | shenzhen |
1.0 | John | 28 | 150 | guangzhou |
2.0 | Ana | 20 | 120 | shanghai |
NaN | Jimmy | 24 | 100 | beijing |
df.sort_index(na_position="first")
.dataframe tbody tr th:only-of-type { vertical-align: middle; } <pre><code>.dataframe tbody tr th { vertical-align: top; } .dataframe thead th { text-align: right; } </code></pre>
name | age | Math | address | |
---|---|---|---|---|
NaN | Jimmy | 24 | 100 | beijing |
0.0 | Tom | 19 | 80 | shenzhen |
1.0 | John | 28 | 150 | guangzhou |
2.0 | Ana | 20 | 120 | shanghai |
参数sort_remaining
如果为 true 且按级别和索引排序是多层,则按指定级别排序后也按其他级别(按顺序)排序
代码语言:javascript复制# 一个来自官网的例子
arrays = [np.array(['qux', 'qux', 'foo', 'foo',
'baz', 'baz', 'bar', 'bar']),
np.array(['two', 'one', 'two', 'one',
'two', 'one', 'two', 'one'])]
s = pd.Series([1, 2, 3, 4, 5, 6, 7, 8], index=arrays)
s
代码语言:javascript复制qux two 1
one 2
foo two 3
one 4
baz two 5
one 6
bar two 7
one 8
dtype: int64
代码语言:javascript复制s.sort_index(level=1, sort_remaining=True) # 默认True
代码语言:javascript复制bar one 8
baz one 6
foo one 4
qux one 2
bar two 7
baz two 5
foo two 3
qux two 1
dtype: int64
代码语言:javascript复制s.sort_index(level=1, sort_remaining=False)
代码语言:javascript复制qux one 2
foo one 4
baz one 6
bar one 8
qux two 1
foo two 3
baz two 5
bar two 7
dtype: int64
参数level
代码语言:javascript复制df = pd.DataFrame({"name":["Jimmy","Ana","Tom","John"],
"age":[24,20,19,28],
"Math":[100,120,80,150],
"address":["beijing","shanghai","shenzhen","guangzhou"]
},
index=[[np.nan,2,0,1], # 创建多层索引的DataFrame
[4,5,8,1]
])
df
.dataframe tbody tr th:only-of-type { vertical-align: middle; } <pre><code>.dataframe tbody tr th { vertical-align: top; } .dataframe thead th { text-align: right; } </code></pre>
name | age | Math | address | ||
---|---|---|---|---|---|
NaN | 4 | Jimmy | 24 | 100 | beijing |
2 | 5 | Ana | 20 | 120 | shanghai |
0 | 8 | Tom | 19 | 80 | shenzhen |
1 | 1 | John | 28 | 150 | guangzhou |
可以看到df是多层索引:
代码语言:javascript复制df.index
代码语言:javascript复制MultiIndex([(nan, 4),
(2.0, 5),
(0.0, 8),
(1.0, 1)],
)
代码语言:javascript复制df.sort_index(level=0)
.dataframe tbody tr th:only-of-type { vertical-align: middle; } <pre><code>.dataframe tbody tr th { vertical-align: top; } .dataframe thead th { text-align: right; } </code></pre>
name | age | Math | address | ||
---|---|---|---|---|---|
NaN | 4 | Jimmy | 24 | 100 | beijing |
0 | 8 | Tom | 19 | 80 | shenzhen |
1 | 1 | John | 28 | 150 | guangzhou |
2 | 5 | Ana | 20 | 120 | shanghai |
df.sort_index(level=1)
.dataframe tbody tr th:only-of-type { vertical-align: middle; } <pre><code>.dataframe tbody tr th { vertical-align: top; } .dataframe thead th { text-align: right; } </code></pre>
name | age | Math | address | ||
---|---|---|---|---|---|
1 | 1 | John | 28 | 150 | guangzhou |
NaN | 4 | Jimmy | 24 | 100 | beijing |
2 | 5 | Ana | 20 | 120 | shanghai |
0 | 8 | Tom | 19 | 80 | shenzhen |