设置' timespan'价值第一个时间'在第一个零件中的值不包括主题的地方

发布于 2025-02-06 18:48:30 字数 966 浏览 0 评论 0原文

我拥有的

Jacob,19,Male,True,1654949111,0,1
Anna,20,Female,True,1654949111,0,1
Jacob,19,Male,True,1654949222,0,2
Anna,20,Female,True,1654949222,0,2
Brother,20,Male,True,1654949333,0,3
Anna,20,Female,True,1654949333,0,3
Cleitinho,53,Female,True,1654949444,0,4
Jacob,19,Male,True,1654949444,0,4

每个'batch'都是网络scrape。 我想将值放在每个“名称”上的“ timespan”列中。 对于“时间”列中的值第一个“批次”中未包含在“时间”列中

。 。 。 。

如果在几个连续的“批处理”中存在“名称”,那么就可以删除第一个和最后一个之间的条目,这也很好。

在最后一个“批次”中,“ timespan”的值应为“时间” + 1秒。

Jacob,19,Male,True,1654949111,1654949333,1
Anna,20,Female,True,1654949111,1654949444,1
Brother,20,Male,True,1654949333,1654949444,3
Cleitinho,53,Female,True,1654949444,1654949445,4
Jacob,19,Male,True,1654949444,1654949445,4

我想要的

what i have

Jacob,19,Male,True,1654949111,0,1
Anna,20,Female,True,1654949111,0,1
Jacob,19,Male,True,1654949222,0,2
Anna,20,Female,True,1654949222,0,2
Brother,20,Male,True,1654949333,0,3
Anna,20,Female,True,1654949333,0,3
Cleitinho,53,Female,True,1654949444,0,4
Jacob,19,Male,True,1654949444,0,4

Each 'Batch' is a web scrape.
I want to put the value in the 'Timespan' column on each 'Name'.
To the value in the 'Time' column in the first next 'Batch' that 'Name' was not included in.

Should 'Name' appear in a later 'Batch', the procedure must be repeated. . . .

Also it would be nice if 'Name' is present in several consecutive 'Batch' to just remove the entries inbetween the first and last.

On the last 'Batch', the value for 'Timespan' should be 'Time' + 1 second.

Jacob,19,Male,True,1654949111,1654949333,1
Anna,20,Female,True,1654949111,1654949444,1
Brother,20,Male,True,1654949333,1654949444,3
Cleitinho,53,Female,True,1654949444,1654949445,4
Jacob,19,Male,True,1654949444,1654949445,4

what i want

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(1

柠檬 2025-02-13 18:48:30

在最后一个“批处理”中,“ timespan”的值仍不包括在内。

我现在如何解决它:

import pandas as pd

data = pd.read_csv('names.csv',names=['name','Age','Gender','Online','time','Timespan','Batch'])
names = data[['name', 'time', 'Batch']].copy()

df_dict = names.groupby(names.Batch.values).agg(list).to_dict('records')

scrapes = [pd.DataFrame(df) for df in df_dict]
sessions = pd.DataFrame(columns=['session', 'name', 'login', 'logout'])
for i in range(len(scrapes) - 1):
prev_scrape = scrapes[i]
current_scrape = scrapes[i+1]    
# Add a new sessions for all users that where not online last scrape
for _, current_name in current_scrape['name'].iteritems():
    if prev_scrape[prev_scrape['name'] == current_name].empty:
        # Find at which batch the user went offline
        for scrape_to_check_if_still_online in scrapes[i+1:]:   
            if scrape_to_check_if_still_online[scrape_to_check_if_still_online['name'] == current_name].empty:
                new_entry = {
                    'session': len(sessions[sessions['name'] == current_name].index),
                    'name': current_name,
                    'login': current_scrape['time'].iloc[0],
                    'logout': scrape_to_check_if_still_online['time'].iloc[0]
                }
                sessions = sessions.append(new_entry, ignore_index=True)
                break

On the last 'Batch', the value for 'Timespan' should be 'Time' is still not included.

How i solved it for now:

import pandas as pd

data = pd.read_csv('names.csv',names=['name','Age','Gender','Online','time','Timespan','Batch'])
names = data[['name', 'time', 'Batch']].copy()

df_dict = names.groupby(names.Batch.values).agg(list).to_dict('records')

scrapes = [pd.DataFrame(df) for df in df_dict]
sessions = pd.DataFrame(columns=['session', 'name', 'login', 'logout'])
for i in range(len(scrapes) - 1):
prev_scrape = scrapes[i]
current_scrape = scrapes[i+1]    
# Add a new sessions for all users that where not online last scrape
for _, current_name in current_scrape['name'].iteritems():
    if prev_scrape[prev_scrape['name'] == current_name].empty:
        # Find at which batch the user went offline
        for scrape_to_check_if_still_online in scrapes[i+1:]:   
            if scrape_to_check_if_still_online[scrape_to_check_if_still_online['name'] == current_name].empty:
                new_entry = {
                    'session': len(sessions[sessions['name'] == current_name].index),
                    'name': current_name,
                    'login': current_scrape['time'].iloc[0],
                    'logout': scrape_to_check_if_still_online['time'].iloc[0]
                }
                sessions = sessions.append(new_entry, ignore_index=True)
                break
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文