Pandas中resample方法详解

最新推荐文章于 2025-06-28 00:15:00 发布

wangshuang1631

最新推荐文章于 2025-06-28 00:15:00 发布

阅读量10w+

点赞数 31

CC 4.0 BY-SA版权

分类专栏： Python 文章标签： Pandas python

本文链接：https://blog.csdn.net/wangshuang1631/article/details/52314944

Python 专栏收录该内容

31 篇文章

订阅专栏

本文介绍了Pandas库中resample方法的使用方法及其参数详解。该方法用于处理时间序列数据，实现数据的重新采样及频率转换。文章通过实例展示了如何调整采样频率、选择时间标签方式以及填充缺失值等操作。

摘要生成于 C知道，由 DeepSeek-R1 满血版支持，前往体验 >

Pandas中的resample，重新采样，是对原样本重新处理的一个方法，是一个对常规时间序列数据重新采样和频率转换的便捷的方法。

方法的格式是：

DataFrame.resample(rule, how=None, axis=0, fill_method=None, closed=None, label=None, convention='start',kind

=None, loffset=None, limit=None, base=0)

参数详解是：

Parameters:

Parameters:	rule : string 偏移量表示目标字符串或对象转换 axis : int, optional, default 0 closed : {‘right’, ‘left’} 哪一个方向的间隔是关闭的 label : {‘right’, ‘left’} Which bin edge label to label bucket with convention : {‘start’, ‘end’, ‘s’, ‘e’} loffset : timedelta 调整重新取样时间标签 base : int, default 0 频率均匀细分1天,“起源”的聚合的间隔。例如,对于“5分钟”频率,基地可能范围从0到4。默认值为0

rule : string

偏移量表示目标字符串或对象转换

axis : int, optional, default 0

closed : {‘right’, ‘left’}

哪一个方向的间隔是关闭的

label : {‘right’, ‘left’}

Which bin edge label to label bucket with

convention : {‘start’, ‘end’, ‘s’, ‘e’}

loffset : timedelta

调整重新取样时间标签

base : int, default 0

频率均匀细分1天,“起源”的聚合的间隔。例如,对于“5分钟”频率,基地可能范围从0到4。默认值为0

首先创建一个Series，采样频率为一分钟。

>>> index = pd.date_range('1/1/2000', periods=9, freq='T')
>>> series = pd.Series(range(9), index=index)
>>> series
2000-01-01 00:00:00    0
2000-01-01 00:01:00    1
2000-01-01 00:02:00    2
2000-01-01 00:03:00    3
2000-01-01 00:04:00    4
2000-01-01 00:05:00    5
2000-01-01 00:06:00    6
2000-01-01 00:07:00    7
2000-01-01 00:08:00    8
Freq: T, dtype: int64

降低采样频率为三分钟

>>> series.resample('3T').sum()
2000-01-01 00:00:00     3
2000-01-01 00:03:00    12
2000-01-01 00:06:00    21
Freq: 3T, dtype: int64

降低采样频率为三分钟，但是每个标签使用right来代替left。请注意，bucket中值的用作标签。

>>> series.resample('3T', label='right').sum()
2000-01-01 00:03:00     3
2000-01-01 00:06:00    12
2000-01-01 00:09:00    21
Freq: 3T, dtype: int64

降低采样频率为三分钟，但是关闭right区间。

>>> series.resample('3T', label='right', closed='right').sum()
2000-01-01 00:00:00     0
2000-01-01 00:03:00     6
2000-01-01 00:06:00    15
2000-01-01 00:09:00    15
Freq: 3T, dtype: int64

增加采样频率到30秒

>>> series.resample('30S').asfreq()[0:5] #select first 5 rows
2000-01-01 00:00:00     0
2000-01-01 00:00:30   NaN
2000-01-01 00:01:00     1
2000-01-01 00:01:30   NaN
2000-01-01 00:02:00     2
Freq: 30S, dtype: float64

增加采样频率到30S,使用pad方法填充nan值。

>>> series.resample('30S').pad()[0:5]
2000-01-01 00:00:00    0
2000-01-01 00:00:30    0
2000-01-01 00:01:00    1
2000-01-01 00:01:30    1
2000-01-01 00:02:00    2
Freq: 30S, dtype: int64

增加采样频率到30S,使用bfill方法填充nan值。

>>> series.resample('30S').bfill()[0:5]
2000-01-01 00:00:00    0
2000-01-01 00:00:30    1
2000-01-01 00:01:00    1
2000-01-01 00:01:30    2
2000-01-01 00:02:00    2
Freq: 30S, dtype: int64

通过apply运行一个自定义函数

>>> def custom_resampler(array_like):
...     return np.sum(array_like)+5

>>> series.resample('3T').apply(custom_resampler)
2000-01-01 00:00:00     8
2000-01-01 00:03:00    17
2000-01-01 00:06:00    26
Freq: 3T, dtype: int64