Pandas索引排序详解

2023-08-25 11:38:41 浏览数 (2)

索引排序-sort_index

针对Pandas中索引的排序功能介绍,详细内容参考官网:

https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.sort_index.html

参数介绍

代码语言:javascript复制
DataFrame.sort_index(axis=0,
                     level=None,
                     ascending=True,
                     inplace=False,
                     kind='quicksort',
                     na_position='last',
                     sort_remaining=True,
                     ignore_index=False,
                     key=None)

参数说明:

  • axis:排序的轴:axis=0表示行,axis=1表示列
  • level:如果是多层索引的排序,表示根据指定的索引进行排序,可以是索引号,名称或者多个索引组成的列表
  • ascending:排序规则,默认是升序
  • inplace:表示是否原地修改;默认是False
  • kind:表示选的排序算法
  • na_position:空值的位置选择,first或者last。默认是last
  • sort_remaining:

数据模拟

代码语言:javascript复制
import pandas as pd
import numpy as np
代码语言:javascript复制
df = pd.DataFrame({"name":["Jimmy","Ana","Tom","John"],
                   "age":[24,20,19,28],
                   "Math":[100,120,80,150],
                   "address":["beijing","shanghai","shenzhen","guangzhou"]
                  },
                 index=[np.nan,2,0,1])  # 存在空值

df

.dataframe tbody tr th:only-of-type { vertical-align: middle; } <pre><code>.dataframe tbody tr th { vertical-align: top; } .dataframe thead th { text-align: right; } </code></pre>

name

age

Math

address

NaN

Jimmy

24

100

beijing

2.0

Ana

20

120

shanghai

0.0

Tom

19

80

shenzhen

1.0

John

28

150

guangzhou

参数axis

代码语言:javascript复制
# df.sort_index()  默认
df.sort_index(axis=0)

.dataframe tbody tr th:only-of-type { vertical-align: middle; } <pre><code>.dataframe tbody tr th { vertical-align: top; } .dataframe thead th { text-align: right; } </code></pre>

name

age

Math

address

0.0

Tom

19

80

shenzhen

1.0

John

28

150

guangzhou

2.0

Ana

20

120

shanghai

NaN

Jimmy

24

100

beijing

默认是在axis=0轴上进行排序;且默认是升序排列

代码语言:javascript复制
df.sort_index(axis=1)

.dataframe tbody tr th:only-of-type { vertical-align: middle; } <pre><code>.dataframe tbody tr th { vertical-align: top; } .dataframe thead th { text-align: right; } </code></pre>

Math

address

age

name

NaN

100

beijing

24

Jimmy

2.0

120

shanghai

20

Ana

0.0

80

shenzhen

19

Tom

1.0

150

guangzhou

28

John

axis=1表示在列方向上进行排序;上面的列字段全部是字母,则根据它们的ASCII码表的大小来排序

参数ignore_index

默认情况是保留原索引。如果是设置成True,则行索引变成0,1,2…N-1

代码语言:javascript复制
# 默认情况
df.sort_index(axis=1,ignore_index=False)

.dataframe tbody tr th:only-of-type { vertical-align: middle; } <pre><code>.dataframe tbody tr th { vertical-align: top; } .dataframe thead th { text-align: right; } </code></pre>

Math

address

age

name

NaN

100

beijing

24

Jimmy

2.0

120

shanghai

20

Ana

0.0

80

shenzhen

19

Tom

1.0

150

guangzhou

28

John

代码语言:javascript复制
df.sort_index(axis=1,ignore_index=True)

.dataframe tbody tr th:only-of-type { vertical-align: middle; } <pre><code>.dataframe tbody tr th { vertical-align: top; } .dataframe thead th { text-align: right; } </code></pre>

Math

address

age

name

0

100

beijing

24

Jimmy

1

120

shanghai

20

Ana

2

80

shenzhen

19

Tom

3

150

guangzhou

28

John

参数key

可选项,如果不是空值,则在排序之前现将key函数作用于指定的索引上,再进行排序。

代码语言:javascript复制
df.sort_index(axis=1)  # 默认axis=1

.dataframe tbody tr th:only-of-type { vertical-align: middle; } <pre><code>.dataframe tbody tr th { vertical-align: top; } .dataframe thead th { text-align: right; } </code></pre>

Math

address

age

name

NaN

100

beijing

24

Jimmy

2.0

120

shanghai

20

Ana

0.0

80

shenzhen

19

Tom

1.0

150

guangzhou

28

John

代码语言:javascript复制
df.sort_index(axis=1, key=lambda x: x.str.lower())

.dataframe tbody tr th:only-of-type { vertical-align: middle; } <pre><code>.dataframe tbody tr th { vertical-align: top; } .dataframe thead th { text-align: right; } </code></pre>

address

age

Math

name

NaN

beijing

24

100

Jimmy

2.0

shanghai

20

120

Ana

0.0

shenzhen

19

80

Tom

1.0

guangzhou

28

150

John

当指定了key函数:将列属性全部小写;此时Math变成了math。

后面排序的话,也就是根据全部小写的字段进行排序,所以Math会在name的前面。

参数ascending

代码语言:javascript复制
df.sort_index()

.dataframe tbody tr th:only-of-type { vertical-align: middle; } <pre><code>.dataframe tbody tr th { vertical-align: top; } .dataframe thead th { text-align: right; } </code></pre>

name

age

Math

address

0.0

Tom

19

80

shenzhen

1.0

John

28

150

guangzhou

2.0

Ana

20

120

shanghai

NaN

Jimmy

24

100

beijing

代码语言:javascript复制
# df.sort_index()  默认情况:升序
df.sort_index(ascending=True)

.dataframe tbody tr th:only-of-type { vertical-align: middle; } <pre><code>.dataframe tbody tr th { vertical-align: top; } .dataframe thead th { text-align: right; } </code></pre>

name

age

Math

address

0.0

Tom

19

80

shenzhen

1.0

John

28

150

guangzhou

2.0

Ana

20

120

shanghai

NaN

Jimmy

24

100

beijing

代码语言:javascript复制
df.sort_index(ascending=False)  # 设置成降序

.dataframe tbody tr th:only-of-type { vertical-align: middle; } <pre><code>.dataframe tbody tr th { vertical-align: top; } .dataframe thead th { text-align: right; } </code></pre>

name

age

Math

address

2.0

Ana

20

120

shanghai

1.0

John

28

150

guangzhou

0.0

Tom

19

80

shenzhen

NaN

Jimmy

24

100

beijing

参数inplace

inplace的作用是用来直接修改原数据还是生成新的数据。

如果是True,则表示原地修改,即原数据直接改变。

为了演示的方便,先生成一个df的副本df1,对df1直接操作:

代码语言:javascript复制
df1 = df.copy()
df1

.dataframe tbody tr th:only-of-type { vertical-align: middle; } <pre><code>.dataframe tbody tr th { vertical-align: top; } .dataframe thead th { text-align: right; } </code></pre>

name

age

Math

address

NaN

Jimmy

24

100

beijing

2.0

Ana

20

120

shanghai

0.0

Tom

19

80

shenzhen

1.0

John

28

150

guangzhou

代码语言:javascript复制
# 默认是False

df1.sort_index(inplace=False)

.dataframe tbody tr th:only-of-type { vertical-align: middle; } <pre><code>.dataframe tbody tr th { vertical-align: top; } .dataframe thead th { text-align: right; } </code></pre>

name

age

Math

address

0.0

Tom

19

80

shenzhen

1.0

John

28

150

guangzhou

2.0

Ana

20

120

shanghai

NaN

Jimmy

24

100

beijing

此时df1是没有改变的:

代码语言:javascript复制
df1

.dataframe tbody tr th:only-of-type { vertical-align: middle; } <pre><code>.dataframe tbody tr th { vertical-align: top; } .dataframe thead th { text-align: right; } </code></pre>

name

age

Math

address

NaN

Jimmy

24

100

beijing

2.0

Ana

20

120

shanghai

0.0

Tom

19

80

shenzhen

1.0

John

28

150

guangzhou

代码语言:javascript复制
df1.sort_index(inplace=True)  # 原地修改

如果设置成True,此时df1已经完成了排序工作:

代码语言:javascript复制
df1

.dataframe tbody tr th:only-of-type { vertical-align: middle; } <pre><code>.dataframe tbody tr th { vertical-align: top; } .dataframe thead th { text-align: right; } </code></pre>

name

age

Math

address

0.0

Tom

19

80

shenzhen

1.0

John

28

150

guangzhou

2.0

Ana

20

120

shanghai

NaN

Jimmy

24

100

beijing

参数kind

kind表示排序选择的算法:{‘quicksort’, ‘mergesort’, ‘heapsort’},默认是’quicksort‘。

  • ‘quicksort’:快速排序
  • ‘mergesort’:合并排序
  • ‘heapsort’:堆排序
代码语言:javascript复制
df.sort_index()

.dataframe tbody tr th:only-of-type { vertical-align: middle; } <pre><code>.dataframe tbody tr th { vertical-align: top; } .dataframe thead th { text-align: right; } </code></pre>

name

age

Math

address

0.0

Tom

19

80

shenzhen

1.0

John

28

150

guangzhou

2.0

Ana

20

120

shanghai

NaN

Jimmy

24

100

beijing

代码语言:javascript复制
df.sort_index(kind="mergesort")

.dataframe tbody tr th:only-of-type { vertical-align: middle; } <pre><code>.dataframe tbody tr th { vertical-align: top; } .dataframe thead th { text-align: right; } </code></pre>

name

age

Math

address

0.0

Tom

19

80

shenzhen

1.0

John

28

150

guangzhou

2.0

Ana

20

120

shanghai

NaN

Jimmy

24

100

beijing

参数na_position

空值的位置选择,first或者last。默认是last

代码语言:javascript复制
df.sort_index()

.dataframe tbody tr th:only-of-type { vertical-align: middle; } <pre><code>.dataframe tbody tr th { vertical-align: top; } .dataframe thead th { text-align: right; } </code></pre>

name

age

Math

address

0.0

Tom

19

80

shenzhen

1.0

John

28

150

guangzhou

2.0

Ana

20

120

shanghai

NaN

Jimmy

24

100

beijing

代码语言:javascript复制
df.sort_index(na_position="first")

.dataframe tbody tr th:only-of-type { vertical-align: middle; } <pre><code>.dataframe tbody tr th { vertical-align: top; } .dataframe thead th { text-align: right; } </code></pre>

name

age

Math

address

NaN

Jimmy

24

100

beijing

0.0

Tom

19

80

shenzhen

1.0

John

28

150

guangzhou

2.0

Ana

20

120

shanghai

参数sort_remaining

如果为 true 且按级别和索引排序是多层,则按指定级别排序后也按其他级别(按顺序)排序

代码语言:javascript复制
# 一个来自官网的例子

arrays = [np.array(['qux', 'qux', 'foo', 'foo',
                    'baz', 'baz', 'bar', 'bar']),
          np.array(['two', 'one', 'two', 'one',
                    'two', 'one', 'two', 'one'])]
s = pd.Series([1, 2, 3, 4, 5, 6, 7, 8], index=arrays)
s
代码语言:javascript复制
qux  two    1
     one    2
foo  two    3
     one    4
baz  two    5
     one    6
bar  two    7
     one    8
dtype: int64
代码语言:javascript复制
s.sort_index(level=1, sort_remaining=True)  # 默认True
代码语言:javascript复制
bar  one    8
baz  one    6
foo  one    4
qux  one    2
bar  two    7
baz  two    5
foo  two    3
qux  two    1
dtype: int64
代码语言:javascript复制
s.sort_index(level=1, sort_remaining=False)
代码语言:javascript复制
qux  one    2
foo  one    4
baz  one    6
bar  one    8
qux  two    1
foo  two    3
baz  two    5
bar  two    7
dtype: int64

参数level

代码语言:javascript复制
df = pd.DataFrame({"name":["Jimmy","Ana","Tom","John"],
                   "age":[24,20,19,28],
                   "Math":[100,120,80,150],
                   "address":["beijing","shanghai","shenzhen","guangzhou"]
                  },
                 index=[[np.nan,2,0,1],  # 创建多层索引的DataFrame
                        [4,5,8,1]
                       ])

df

.dataframe tbody tr th:only-of-type { vertical-align: middle; } <pre><code>.dataframe tbody tr th { vertical-align: top; } .dataframe thead th { text-align: right; } </code></pre>

name

age

Math

address

NaN

4

Jimmy

24

100

beijing

2

5

Ana

20

120

shanghai

0

8

Tom

19

80

shenzhen

1

1

John

28

150

guangzhou

可以看到df是多层索引:

代码语言:javascript复制
df.index
代码语言:javascript复制
MultiIndex([(nan, 4),
            (2.0, 5),
            (0.0, 8),
            (1.0, 1)],
           )
代码语言:javascript复制
df.sort_index(level=0)

.dataframe tbody tr th:only-of-type { vertical-align: middle; } <pre><code>.dataframe tbody tr th { vertical-align: top; } .dataframe thead th { text-align: right; } </code></pre>

name

age

Math

address

NaN

4

Jimmy

24

100

beijing

0

8

Tom

19

80

shenzhen

1

1

John

28

150

guangzhou

2

5

Ana

20

120

shanghai

代码语言:javascript复制
df.sort_index(level=1)

.dataframe tbody tr th:only-of-type { vertical-align: middle; } <pre><code>.dataframe tbody tr th { vertical-align: top; } .dataframe thead th { text-align: right; } </code></pre>

name

age

Math

address

1

1

John

28

150

guangzhou

NaN

4

Jimmy

24

100

beijing

2

5

Ana

20

120

shanghai

0

8

Tom

19

80

shenzhen

0 人点赞