数据集时间窗口

发布于 2025-01-23 07:52:47 字数 439 浏览 2 评论 0原文

我在此形式下有一个数据集

”数据集表单“

我想通过制作一个窗口来拆分数据集,该窗口包括每2分钟发生的行,然后im将结果包含在另一个数据集中,该数据集将在此下方形式

我问是否有人可以给我一只手来加快我的工作?

I have a Dataset under this form

DATASET Form

I want to split the data set by making a windowing which includes the lines that happen every 2 minutes, then i m going to include the result in another data set which will be under this form
Result

i'm asking if anyone can offer me a hand to speed up my work?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(2

鯉魚旗 2025-01-30 07:52:47

这是一个随机数据框,DF:

df:
                     Content
Date                        
2021-12-04 04:07:04        6
2021-12-04 04:07:20        1
2021-12-04 04:08:04        4
2021-12-04 04:09:04       12
2021-12-04 04:12:04        4
2021-12-04 04:15:04        8
2021-12-04 04:15:04       10
2021-12-04 04:16:04        4
2021-12-04 04:17:04        6
2021-12-04 04:17:24        3

现在,我将使用pd.grouper用于'2min'频率和apply(list)

df_out= df.groupby(pd.Grouper(freq='2Min'))['Content'].apply(list).

df_out:

Date
2021-12-04 04:06:00       [6, 1]
2021-12-04 04:08:00      [4, 12]
2021-12-04 04:10:00           []
2021-12-04 04:12:00          [4]
2021-12-04 04:14:00      [8, 10]
2021-12-04 04:16:00    [4, 6, 3]

如果您将第二列作为一个列表然后使用.tolist()

list=df_out.tolist()
list:
[[6, 1], [4, 12], [], [4], [8, 10], [4, 6, 3]]

要获取每个元素使用df_out [i]##i = 0,1,2,等等
要将其转换为数据框

如果 using:

df.index = pd.to_datetime(df.index)

Entire code for a test csv file:

import pandas as pd
df=pd.read_csv(r'D:\python\test.txt', sep=',').set_index('Date')
df.index = pd.to_datetime(df.index)

df_out= df.groupby(pd.Grouper(freq='2Min'))['Content'].apply(list)

If you don't know how to create a sample df, here I put another example:

import pandas as pd
import numpy as np

np.random.seed(0)
# create an array of 10 dates starting at '2021-12-04', one per minute 
rng = pd.date_range('2021-12-04 04:07:04', periods=10, freq='T')
df_random = pd.DataFrame({ 'Date': rng, 'Content': np.random.randint(1,13,10) }).set_index('Date') 

df_random_out= df_random.groupby(pd.Grouper(freq='2Min'))['Content'].apply(list)

 df_random:
                     Content
Date                        
2021-12-04 04:07:04        6
2021-12-04 04:08:04        1
2021-12-04 04:09:04        4
2021-12-04 04:10:04       12
2021-12-04 04:11:04        4
2021-12-04 04:12:04        8
2021-12-04 04:13:04       10
2021-12-04 04:14:04        4
2021-12-04 04:15:04        6
2021-12-04 04:16:04        3

df_random_out:
Date
2021-12-04 04:06:00        [6]
2021-12-04 04:08:00     [1, 4]
2021-12-04 04:10:00    [12, 4]
2021-12-04 04:12:00    [8, 10]
2021-12-04 04:14:00     [4, 6]
2021-12-04 04:16:00        [3]

NB: Please explain clearly what you want to do with your results so that I can answer accordingly.

Here is a random dataframe, df:

df:
                     Content
Date                        
2021-12-04 04:07:04        6
2021-12-04 04:07:20        1
2021-12-04 04:08:04        4
2021-12-04 04:09:04       12
2021-12-04 04:12:04        4
2021-12-04 04:15:04        8
2021-12-04 04:15:04       10
2021-12-04 04:16:04        4
2021-12-04 04:17:04        6
2021-12-04 04:17:24        3

Now, I will use pd.Grouper for '2Min' frequency and apply(list)

df_out= df.groupby(pd.Grouper(freq='2Min'))['Content'].apply(list).

df_out:

Date
2021-12-04 04:06:00       [6, 1]
2021-12-04 04:08:00      [4, 12]
2021-12-04 04:10:00           []
2021-12-04 04:12:00          [4]
2021-12-04 04:14:00      [8, 10]
2021-12-04 04:16:00    [4, 6, 3]

if you want the 2nd column as a list then use .tolist():

list=df_out.tolist()
list:
[[6, 1], [4, 12], [], [4], [8, 10], [4, 6, 3]]

to get each element use df_out[i] # i=0,1,2, etc
if you want to convert it into a data frame then use pd.DataFrame(df_out)

Remember if you are reading the text file from a csv or whatever file you will have to convert your df index to datetime index using:

df.index = pd.to_datetime(df.index)

Entire code for a test csv file:

import pandas as pd
df=pd.read_csv(r'D:\python\test.txt', sep=',').set_index('Date')
df.index = pd.to_datetime(df.index)

df_out= df.groupby(pd.Grouper(freq='2Min'))['Content'].apply(list)

If you don't know how to create a sample df, here I put another example:

import pandas as pd
import numpy as np

np.random.seed(0)
# create an array of 10 dates starting at '2021-12-04', one per minute 
rng = pd.date_range('2021-12-04 04:07:04', periods=10, freq='T')
df_random = pd.DataFrame({ 'Date': rng, 'Content': np.random.randint(1,13,10) }).set_index('Date') 

df_random_out= df_random.groupby(pd.Grouper(freq='2Min'))['Content'].apply(list)

 df_random:
                     Content
Date                        
2021-12-04 04:07:04        6
2021-12-04 04:08:04        1
2021-12-04 04:09:04        4
2021-12-04 04:10:04       12
2021-12-04 04:11:04        4
2021-12-04 04:12:04        8
2021-12-04 04:13:04       10
2021-12-04 04:14:04        4
2021-12-04 04:15:04        6
2021-12-04 04:16:04        3

df_random_out:
Date
2021-12-04 04:06:00        [6]
2021-12-04 04:08:00     [1, 4]
2021-12-04 04:10:00    [12, 4]
2021-12-04 04:12:00    [8, 10]
2021-12-04 04:14:00     [4, 6]
2021-12-04 04:16:00        [3]

N.B: Please explain clearly what you want to do with your results so that I can answer accordingly.

不疑不惑不回忆 2025-01-30 07:52:47

我花了一些时间,我发布了一个答案,但由于我意识到这与预期的结果有所不同,但被删除了……还取了一块@shuvashish-申请(列表)。但是无论如何..这应该给您带来预期的结果:

df['Time']=pd.to_datetime(df['Time'])
df.set_index('Time',inplace=True)

df2=pd.DataFrame(df.groupby(pd.Grouper(freq='2Min',origin=df.index[0].floor('Min')))['Content'].apply(list).explode())

df2[df2.Content.notna()].reset_index()

Shuvashish已经显示了PD.Greuper-我只爆炸了结果,并将其设置为第一次“地板”到分钟的第一个时间 - 在您的预期表中, t是04:50:00的时间,因为我们从一个奇数开始每2分钟开始汇总每2分钟04:07:00

It took me some time and I posted an answer but deleted as I realised it was different from the expected result...also took a piece of @Shuvashish - the apply(list). But anyways..this should give you the expected result:

df['Time']=pd.to_datetime(df['Time'])
df.set_index('Time',inplace=True)

df2=pd.DataFrame(df.groupby(pd.Grouper(freq='2Min',origin=df.index[0].floor('Min')))['Content'].apply(list).explode())

df2[df2.Content.notna()].reset_index()

Shuvashish already showed the pd.Grouper - I only exploded the results and set the origin to be the first time value 'floored' to the minute - btw in your expected table ,there shouldn't be 04:50:00 time as we started binning every 2 minutes from an odd number 04:07:00

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文