看完这篇,还不会pandas时间数据处理,请你吃瓜

2022-12-19 18:04:48 浏览数 (1)

pandas 可以说是 python 中数据处理的中流砥柱,不会点 pandas ,你都不敢说自己了解 pythonpandas 是数据处理神器,时间数据处理自然也是不在话下,今天咱们就来聊一聊 pandas 处理时间数据的应用。

我们可以从两个维度来描述时间,一种是时间点或者说时间时刻,一种是时间长度。而时间长度又包括时间差和时间段。

时间点数据处理

时间点就是指某一时间,比如说当前时间,当前时间戳,今天。时间点相关的问题场景经常是:今天日期是什么?现在的时间是多少?今天是周几?今天的本年第几天?等

pd.Timestamp 系列函数专门用于处理时间点数据。

01:将各种时间格式转换为标准时间格式 yyyy-MM-dd HH:mm:ss

代码语言:javascript复制
import pandas as pd

print(pd.Timestamp(year=2022, month=9, day=16, hour=9, minute=21, second=35))
print(pd.Timestamp('2022-09-16 9:21:35'))
print(pd.Timestamp('Sep 16, 2022 9:21:35'))
# output:
2022-09-16 09:21:35
2022-09-16 09:21:35
2022-09-16 09:21:35

需要说明的是 pd.Timestamp 是纳秒级别的,如果将一个整数或者浮点数传给 pd.Timestamp,将返回自 unix 元年(1970-01-01)之后多少纳秒的时间点。

代码语言:javascript复制
print(pd.Timestamp(9000000000))
print(pd.Timestamp(9000009000))
# output:
1970-01-01 00:00:09
1970-01-01 00:00:09.000009

02:获取当前时间

代码语言:javascript复制
import pandas as pd

print(pd.Timestamp.now())
# output:
2022-09-18 08:11:10.797284

03:获取当前日期

代码语言:javascript复制
import pandas as pd

print(pd.Timestamp.now().date())
# output:
2022-09-18

04:通过日期元素获取标准日期

代码语言:javascript复制
import pandas as pd

print(pd.Timestamp(2022, 9, 18).date())
# output:
2022-09-18

05:通过日期、时间元素获取标准日期时间

代码语言:javascript复制
import pandas as pd

print(pd.Timestamp(2022, 9, 18, 12, 30, 46))
print(pd.Timestamp(year=2022, month=9, day=18, hour=12, minute=30, second=46))
# output:
2022-09-18 12:30:46
2022-09-18 12:30:46

06:通过日期时间元素获取标准时间

代码语言:javascript复制
import pandas as pd

print(pd.Timestamp(2022, 9, 18, 12, 30, 46).time())
print(pd.Timestamp(year=2022, month=9, day=18, hour=12, minute=30, second=46).time())
# output:
12:30:46
12:30:46

07:通过时间戳获取本地时区时间

代码语言:javascript复制
import pandas as pd

print(pd.Timestamp.fromtimestamp(1663340762))
# output:
# 北京时间,东八区
2022-09-16 23:06:02

08:通过时间戳获取UTC时间

代码语言:javascript复制
import pandas as pd

print(pd.Timestamp.utcfromtimestamp(1663340762))
# output:
2022-09-16 15:06:02

09:通过时间戳获取本地时区日期

代码语言:javascript复制
import pandas as pd

print(pd.Timestamp.fromtimestamp(1663340762).date())
# output:
2022-09-16

10:通过 pd.Timestamp 属性获取日期时间元素

代码语言:javascript复制
import pandas as pd

now = pd.Timestamp.now()

print("Current year:          ", now.year) 
print("Current quarter:       ", now.quarter)
print("Current month:         ", now.month)
print("Current week num:      ", now.week)
print("Current week num:      ", now.weekofyear)
print("Current day:           ", now.day)
print("Current day_of_year:   ", now.day_of_year)
print("Current day of year:   ", now.dayofyear)
print("Current day_of_week:   ", now.day_of_week) # [0-6]
print("Current day of week:   ", now.dayofweek)   # [0-6]
print("Current hour:          ", now.hour)
print("Current minute:        ", now.minute)
print("Current second:        ", now.second)
print("Current microsecond:   ", now.microsecond)
print("Current days_in_month: ", now.days_in_month) # [28-31]
print("Current days in month: ", now.daysinmonth)   # [28-31]
print("is leap year:          ", now.is_leap_year)   # [True|False]
# output:
Current year:           2022
Current quarter:        3
Current month:          9
Current week num:       37
Current week num:       37
Current day:            18
Current day_of_year:    261
Current day of year:    261
Current day_of_week:    6
Current day of week:    6
Current hour:           22
Current minute:         17
Current second:         39
Current microsecond:    37986
Current days_in_month:  30
Current days in month:  30
is leap year:           False

11:通过 pd.Timestamp 方法获取日期时间元素

代码语言:javascript复制
import pandas as pd

now = pd.Timestamp.now()

print("Current month_name:    ", now.month_name())
print("Current weekday:       ", now.weekday())  # [0-6]
print("Current week_day_name: ", now.day_name())
print("Current date:          ", now.date())
print("Current ctime:         ", now.ctime())
print("Current time:          ", now.time()) 
print("Current timestamp:     ", now.timestamp())   
print("Current timetuple:     ", now.timetuple())
print("Current time:          ", now.timetz())
print("Current time zone:     ", now.tzname()) # 不设置就没有
# output:
Current month_name:     September
Current weekday:        6
Current week_day_name:  Sunday
Current date:           2022-09-18
Current ctime:          Sun Sep 18 23:04:28 2022
Current time:           23:04:28.491830
Current timestamp:      1663542268.49183
Current timetuple:      time.struct_time(tm_year=2022, tm_mon=9, tm_mday=18, tm_hour=23, tm_min=4, tm_sec=28, tm_wday=6, tm_yday=261, tm_isdst=-1)
Current time:           23:04:28.491830
Current time zone:      None

12:日期时间转换为字符串格式 strftime( )

代码语言:javascript复制
import pandas as pd

now = pd.Timestamp.now()

t = now.strftime("%H:%M:%S")
print("time:", t)

s1 = now.strftime("%Y/%m/%d, %H:%M:%S")
# Y/mm/dd H:M:S format
print("s1:", s1)

s2 = now.strftime("%d/%m/%Y, %H:%M:%S")
# dd/mm/YY H:M:S format
print("s2:", s2)
# output:
time: 23:07:47
s1: 2022/09/18, 23:07:47
s2: 18/09/2022, 23:07:47

上面用到的 %Y、%m、%d、%H 等是格式代码。strftime() 方法接收一个或多个格式代码,并根据它返回一个格式化的字符串。

常见日期时间格式代码如下(datetime format codes):

  • %Y - full year [0001, 0002, ..., 2018, 2019,..., 9999]
  • %y - short year [01, 02, ..., 18, 19,..., 99]
  • %-y - decimal short year [1, 2, ..., 18, 19,..., 99]
  • %m - zero-padded decimal month [01, 02, ..., 11, 12]
  • %-m - decimal month [1, 2, ..., 11, 12]
  • %d - zero-padded decimal day [01, 02, ..., 30, 31]
  • %-d - decimal day [1, 2, ..., 30, 31]
  • %H - 24-hour [00, 01, ..., 22, 23]
  • %-H - decimal 24-hour [0, 1, ..., 22, 23]
  • %I - 12-hour [01, 02, ..., 11, 12]
  • %-I - decimal 12-hour [1, 2, ..., 11, 12]
  • %M - minute [00, 01, ..., 58, 59]
  • %-M - decimal minute [0, 1, ..., 58, 59]
  • %S - second [00, 01, ..., 58, 59]
  • %-S - decimal second [0, 1, ..., 58, 59]
  • %j - day number [001, 002, ..., 365, 366]
  • %-j - decimal day number [1, 2, ..., 365, 366]
  • %f - Microsecond [000000, 000001, ..., 999998, 999999]
  • %B - full month name [January, February, ..., November, December]
  • %b - short month name [Jan, Feb, ..., Nov, Dec]
  • %A - full weekday name [Monday, Tuesday, ..., Saturday, Sunday]
  • %a - short weekday name [Mon, Tue, ..., Sat, Sun]
  • %W - year week number [00, 01, ..., 52, 53](星期一为星期的开始)
  • %U - year week number [00, 01, ..., 52, 53](星期天为星期的开始)
  • %w - week number [0, 0, ..., 5, 6](星期天为星期的开始)
  • %c - Locale’s appropriate date and time representation [Sat Sep 17 14:43:06 2022]
  • %X - local time display as %H:%M:%S
  • %x - local date display as %m/%d/%y
  • %Z - zone [CST, UST, GMT](默认没有时区,设置之后才可以展示)
  • %z - UTC offset [ 0800, 1200, ..., -0800, -1200](默认没有时区,设置之后才可以展示)
  • %p - AM OR PM [AM, PM]
  • %% - just % character

13:字符串格式转换为日期时间 to_datetime( )

代码语言:javascript复制
import pandas as pd
# strptime( ) 未实现

date_string = "18 September, 2022"
print("date_string =", date_string)

date_object = pd.to_datetime(date_string, format="%d %B, %Y")
print("date_object =", date_object)
# output:
date_string = 18 September, 2022
date_object = 2022-09-18 00:00:00

14:给时间设置时区并展示

代码语言:javascript复制
import pandas as pd
import pytz

local = pd.Timestamp.now()
print("Local:", local.strftime("%m/%d/%Y, %H:%M:%S"))


tz_NY = pytz.timezone('America/New_York') 
datetime_NY = pd.Timestamp.now(tz_NY)
print("datetime_NY:", datetime_NY.strftime("%m/%d/%Y, %H:%M:%S, timezone:%Z"))

tz_London = pytz.timezone('Europe/London')
datetime_London = pd.Timestamp.now(tz_London)
print("datetime_London:", datetime_London.strftime("%m/%d/%Y, %H:%M:%S, timezone:%Z"))

tz_Shanghai = pytz.timezone('Asia/Shanghai')
datetime_Shanghai = pd.Timestamp.now(tz_Shanghai)
print("datetime_Shanghai:", datetime_Shanghai.strftime("%m/%d/%Y, %H:%M:%S, timezone:%Z"))
# output:
Local: 09/18/2022, 23:22:57
datetime_NY: 09/18/2022, 11:22:57, timezone:EDT
datetime_London: 09/18/2022, 16:22:57, timezone:BST
datetime_Shanghai: 09/18/2022, 23:22:57, timezone:CST

时间差数据处理

时间差是指两个时间点在时间上的差异,是两个时间比较之后的结果。可以用不同的时间单位表示,例如,天、小时、分钟、秒。它们既可以是正数,也可以是负数。

pd.Timedelta 系列函数专门用于处理时间差数据。

15、通过字符串获取时间差

代码语言:javascript复制
import pandas as pd

print(pd.Timedelta('1 days 1 hours 18 minutes 27 seconds'))
# output:
1 days 01:18:27

16、通过整数获取时间差

代码语言:javascript复制
import pandas as pd

print(pd.Timedelta(8,unit='h')) # 8 hours
# output:
0 days 08:00:00

17、通过数据偏移量构造时间差

代码语言:javascript复制
import pandas as pd

print(pd.Timedelta(days=3)) # 3 days
# output:
3 days 00:00:00

18、获取时间差的各项元素值

代码语言:javascript复制
import pandas as pd

# given datetimes
now = pd.Timestamp.now()
date_time = pd.Timestamp(year=2022, month=8, day=31, hour=12, minute=30)

# Calculate the difference between two datetimes
timedelta = now - date_time

# get a components namedtuple-like object
print(timedelta.components)

# get timedelta elements
print("timedelta days:        ",timedelta.components.days)
print("timedelta hours:       ",timedelta.components.hours)
print("timedelta minutes:     ",timedelta.components.minutes)
print("timedelta seconds:     ",timedelta.components.seconds)
print("timedelta milliseconds:",timedelta.components.milliseconds)
print("timedelta microseconds:",timedelta.components.microseconds)
print("timedelta nanoseconds :",timedelta.components.nanoseconds)
# output:
Components(days=19, hours=10, minutes=13, seconds=18, milliseconds=798, microseconds=228, nanoseconds=0)
timedelta days:         19
timedelta hours:        10
timedelta minutes:      13
timedelta seconds:      18
timedelta milliseconds: 798
timedelta microseconds: 228
timedelta nanoseconds : 0

19、获取两个日期的时间差,单位是天

代码语言:javascript复制
import pandas as pd

ts1 = pd.Timestamp(2022, 9, 11)
ts2 = pd.Timestamp(2022, 9, 18)

diff_days = (ts2 - ts1).days
print(diff_days)
# output:
7

20、计算两个时间的时间差,单位是秒

代码语言:javascript复制
import pandas as pd

ts1 = pd.Timestamp(2022, 9, 18, 12, 30, 46)
ts2 = pd.Timestamp(2022, 9, 18, 13, 31, 46)

diff_seconds = (ts2 - ts1).seconds
print(diff_seconds)
# output:
3660

21、计算未来时间

代码语言:javascript复制
import pandas as pd

now = pd.Timestamp.now()
print('Given  Time:', now)

# Three weeks later
future_time = now   pd.Timedelta(3, unit='w')
print('Future Time:', future_time)
# output:
Given  Time: 2022-09-19 23:12:06.468015
Future Time: 2022-10-10 23:12:06.468015

22、比较两个时间段大小

代码语言:javascript复制
import pandas as pd

# Creating two timedelta objects
d1 = pd.Timedelta(hours=57)
d2 = pd.Timedelta(hours=25, seconds=2)

# not equal check
print(d2 != d1)  # True

# check if timedelta are same
print(d2 == 25)  # False

# Compare two TimeDelta
print(d1 > d2)
# output:
True
False
True

23、将 pd.Timedelta 转换为字符串格式

代码语言:javascript复制
import pandas as pd

ts = pd.Timedelta(days=34, minutes=7, seconds=64)

# str() constructor
print(str(ts))

# __str__() self
print(str(ts.__str__()))
# output:
34 days 00:08:04
34 days 00:08:04

24、计算日期所在月份的第一天

pandas 尚未提供 计算日期所在月份的第一天 的内置方法,因此,需要我们自己实现。既然是第一天,那就是把日期中的 day 元素调整为 1 就好了,具体来说,有如下三种实现方法:

方法一、将 day 元素替换为 1

代码语言:javascript复制
import pandas as pd

given_date = pd.Timestamp.today().date()

first_day_of_month = given_date.replace(day=1)

print("First day of month: ", first_day_of_month)
# output:
First day of month:  2022-09-01

方法二、将当前日期减去 (day 元素值 -1)

比如今天是 7 号,减去 6 天,就能得到 1 号了。

代码语言:javascript复制
import pandas as pd

given_date = pd.Timestamp.today().date()

first_day_of_month = given_date - pd.Timedelta(days = given_date.day-1)

print("First day of month: ", first_day_of_month)
# output:
First day of month:  2022-09-01

方法三、使用 strftime( ) 方法获取

代码语言:javascript复制
import pandas as pd

given_date = pd.Timestamp.today().date()

first_day_of_month = given_date.strftime("%Y-%m-01")

print("First day of month: ", first_day_of_month)
# output:
First day of month:  2022-09-01

25、计算日期所在月份的最后一天

很不幸,pandas 也尚未提供计算日期所在月份的最后一天 的内置方法,月末计算比月初计算还要复杂,因为每个月天数是不一样的,有 28、29、30、31 四种可能,这该怎么办呢?

有一种快捷的方式是:先计算出次月的第一天,然后,用这个日期减去 1 天,这样,就得到了当月的最后一天,那次月的第一天又怎么算呢?当月第一天在加上一个月就可以了。跟示例 24 对应,这里也有三种方法。

但是,pd.Timedelta 不支持月粒度时间处理,这里只能借助其他模块实现

方法一、使用 calendar 实现

代码语言:javascript复制
import pandas as pd
import calendar

given_date = pd.Timestamp.today().date()

last_day_of_month = given_date.replace(day = calendar.monthrange(given_date.year, given_date.month)[1])

print("Last day of month: ", last_day_of_month)
# output:
Last day of month:  2022-09-30

方法二、使用 dateutil.relativedelta() 实现

代码语言:javascript复制
import pandas as pd
from dateutil.relativedelta import relativedelta

given_date = pd.Timestamp.today().date()

last_day_of_month = (given_date   relativedelta(day=31))

print("Last day of month: ", last_day_of_month)
# output:
Last day of month:  2022-09-30

时间段数据处理

时间段表示一段时间持续的长度,比如一年、一个月、一天,与时间差类似,但又存在区别。因为它不是两个事件比较之后的结果。

pd.Period 系列函数专门用于处理时间段数据。

先看一个示例

代码语言:javascript复制
import pandas as pd

period = pd.Period('2022-9-24', freq='W')

这里,pd.Period 返回的是 2022-09-24 所在时间周期 的这个时间段,也就是 2022-09-19 ~ 2022-09-25 ,它表示的是一个时间范围,W 就是周的别名,时间周期 freq 别名释义表如下:

Alias

Description

B

business day frequency

C

custom business day frequency

D

calendar day frequency

W

weekly frequency

M

month end frequency

SM

semi-month end frequency (15th and end of month)

BM

business month end frequency

CBM

custom business month end frequency

MS

month start frequency

SMS

semi-month start frequency (1st and 15th)

BMS

business month start frequency

CBMS

custom business month start frequency

Q

quarter end frequency

BQ

business quarter end frequency

QS

quarter start frequency

BQS

business quarter start frequency

A, Y

year end frequency

BA, BY

business year end frequency

AS, YS

year start frequency

BAS, BYS

business year start frequency

BH

business hour frequency

H

hourly frequency

T, min

minutely frequency

S

secondly frequency

L, ms

milliseconds

U, us

microseconds

N

nanoseconds

26、获取时间段各项时间元素值

代码语言:javascript复制
import pandas as pd

period = pd.Period('2022-9-24 12:12:12', freq='M')
print("period.year:         ", period.year)
print("period.quarter:      ", period.quarter)
print("period.month:        ", period.month)
print("period.week:         ", period.week)
print("period.day:          ", period.day)
print("period.hour:         ", period.hour)
print("period.minute:       ", period.minute)
print("period.second:       ", period.second)
print("period.is_leap_year: ", period.is_leap_year)
print("period.day_of_year:  ", period.day_of_year)
print("period.dayofyear:    ", period.dayofyear)
print("period.days_in_month:", period.days_in_month)
print("period.daysinmonth:  ", period.daysinmonth)
print("period.day_of_week:  ", period.day_of_week) # [0-6]
print("period.dayofweek:    ", period.dayofweek)   # [0-6]
print("period.weekday:      ", period.weekday)     # [0-6]
print("period.weekofyear:   ", period.weekofyear)
print("period.qyear:        ", period.qyear)
print("period.start_time:   ", period.start_time)
print("period.end_time:     ", period.end_time)
print("period.freqstr:      ", period.freqstr)
print("period.freq:         ", period.freq)
# output:
period.year:          2022
period.quarter:       3
period.month:         9
period.week:          38
period.day:           25
period.hour:          0
period.minute:        0
period.second:        0
period.is_leap_year:  False
period.day_of_year:   268
period.dayofyear:     268
period.days_in_month: 30
period.daysinmonth:   30
period.day_of_week:   6
period.dayofweek:     6
period.weekday:       6
period.weekofyear:    38
period.qyear:         2022
period.start_time:    2022-09-19 00:00:00
period.end_time:      2022-09-25 23:59:59.999999999
period.freqstr:       W-SUN  # 周粒度,周日结束
period.freq:          <Week: weekday=6>

27、使用 asfreq 改变时间段的周期

pd.Period.asfreq( ) 方法有两个参数,freq 是所需时间周期,how 表示使用转换后时间段的开始还是结束

代码语言:javascript复制
import pandas as pd

period  = pd.Period('2022-9-24 12:12:12', freq='W')
print("period.start_time:   ", period.start_time)
print("period.end_time:     ", period.end_time)

period_start = period.asfreq(freq='D', how='start') # how='S'
print("period_start:", period_start)

period_end = period.asfreq(freq='D', how='end')     # how='E', default 'end'
print("period_end:  ", period_end)
# output:
period.start_time:    2022-09-19 00:00:00
period.end_time:      2022-09-25 23:59:59.999999999
period_start: 2022-09-19
period_end:   2022-09-25

28、获取当前时间的所在时间段

代码语言:javascript复制
import pandas as pd

period  = pd.Period.now(freq='M')
print("period:      ", period)
# output:
period:       2022-09

29、格式化时间段字符串

时间格式编码含义请参考示例 12

代码语言:javascript复制
import pandas as pd

period1 = pd.Period(freq='Q-JUL', year=2006, quarter=1)
print(period1.strftime('%F-Q%q'))
# Output the last month in the quarter of this date
print(period1.strftime('%b-%Y'))

period2 = pd.Period(freq='D', year=2001, month=1, day=1)
print(period2.strftime('%d-%b-%Y'))
print(period2.strftime('%b. %d, %Y was a %A'))
# output:
2006-Q1
Oct-2005
01-Jan-2001
Jan. 01, 2001 was a Monday

30、获取时间周期某时间戳的标准时间格式字符串

pd.Period.to_timestamp( ) 方法有两个参数,freq 是所需时间周期(如果原时间周期大于等于周,则为 D 否则为 S ),how 表示使用转换后时间段的开始还是结束。

代码语言:javascript复制
import pandas as pd
  
prd1 = pd.Period('2022-9-24 12:12:12', freq='W')

print(prd1)

prd2 = prd1.to_timestamp(freq='D', how='start')
print(prd2)

prd3 = pd.Period('2022-9-24 12:00:00', freq='4H')

print(prd3)

prd4 = prd3.to_timestamp(freq='S', how='end')
print(prd4)
# output:
2022-09-19/2022-09-25
2022-09-19 00:00:00
2022-09-24 12:00
2022-09-24 15:59:59.999999999

批量数据转换

上边介绍的都是单个数据转换,下面通过一个示例介绍一下如何进行批量数据的转换。

假设,有如下样式的数据集:

strdate

2022/9/1

2022/9/2

2022/9/12

2022/9/22

2022/12/1

2022/12/23

计划调整为如下格式:(获取日期所在月份的第一天)

strdate

2022/9/1

2022/9/1

2022/9/1

2022/9/1

2022/12/1

2022/12/1

利用 pandas 对某列数据进行调整,经常使用 apply 方法,需要首先定义一个函数,用于编写单个数据的处理逻辑,根据示例 24 中介绍的方法,可以使用如下自定义函数实现。

代码语言:javascript复制
import pandas as pd

def get_firstday(time):
    return time.replace(day = 1).strftime('%Y/%m/%d').replace('/0','/')

df=pd.read_csv('./str2date.txt')
print("df1:n", df)

# convert column strdate to datetime
df['strdate'] = pd.to_datetime(df['strdate'],format = '%Y/%m/%d')
print("df2:n", df)

# apply get_firstday function to column strdate
df['strdate'] = df['strdate'].apply(get_firstday)
print("df3:n", df)
# output:
df1:
       strdate
0    2022/9/1
1    2022/9/2
2   2022/9/12
3   2022/9/22
4   2022/12/1
5  2022/12/23
df2:
      strdate
0 2022-09-01
1 2022-09-02
2 2022-09-12
3 2022-09-22
4 2022-12-01
5 2022-12-23
df3:
      strdate
0   2022/9/1
1   2022/9/1
2   2022/9/1
3   2022/9/1
4  2022/12/1
5  2022/12/1

如上仅为示例,其他类型的时间数据转换,只需修改自定义函数即可。

0 人点赞