设置＆＃x27; timespan＆＃x27;价值第一个时间＆＃x27;在第一个零件中的值不包括主题的地方

发布于 2025-02-06 18:48:30 字数 966 浏览 0 评论 0原文

Jacob,19,Male,True,1654949111,0,1
Anna,20,Female,True,1654949111,0,1
Jacob,19,Male,True,1654949222,0,2
Anna,20,Female,True,1654949222,0,2
Brother,20,Male,True,1654949333,0,3
Anna,20,Female,True,1654949333,0,3
Cleitinho,53,Female,True,1654949444,0,4
Jacob,19,Male,True,1654949444,0,4

每个'batch'都是网络scrape。我想将值放在每个“名称”上的“ timespan”列中。对于“时间”列中的值第一个“批次”中未包含在“时间”列中

。。。。

如果在几个连续的“批处理”中存在“名称”，那么就可以删除第一个和最后一个之间的条目，这也很好。

在最后一个“批次”中，“ timespan”的值应为“时间” + 1秒。

Jacob,19,Male,True,1654949111,1654949333,1
Anna,20,Female,True,1654949111,1654949444,1
Brother,20,Male,True,1654949333,1654949444,3
Cleitinho,53,Female,True,1654949444,1654949445,4
Jacob,19,Male,True,1654949444,1654949445,4

我想要的

原文

what i have

Jacob,19,Male,True,1654949111,0,1
Anna,20,Female,True,1654949111,0,1
Jacob,19,Male,True,1654949222,0,2
Anna,20,Female,True,1654949222,0,2
Brother,20,Male,True,1654949333,0,3
Anna,20,Female,True,1654949333,0,3
Cleitinho,53,Female,True,1654949444,0,4
Jacob,19,Male,True,1654949444,0,4

Each 'Batch' is a web scrape.
I want to put the value in the 'Timespan' column on each 'Name'.
To the value in the 'Time' column in the first next 'Batch' that 'Name' was not included in.

Should 'Name' appear in a later 'Batch', the procedure must be repeated. . . .

Also it would be nice if 'Name' is present in several consecutive 'Batch' to just remove the entries inbetween the first and last.

On the last 'Batch', the value for 'Timespan' should be 'Time' + 1 second.

Jacob,19,Male,True,1654949111,1654949333,1
Anna,20,Female,True,1654949111,1654949444,1
Brother,20,Male,True,1654949333,1654949444,3
Cleitinho,53,Female,True,1654949444,1654949445,4
Jacob,19,Male,True,1654949444,1654949445,4

what i want

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

柠檬 2025-02-13 18:48:30

在最后一个“批处理”中，“ timespan”的值仍不包括在内。

我现在如何解决它：

import pandas as pd

data = pd.read_csv('names.csv',names=['name','Age','Gender','Online','time','Timespan','Batch'])
names = data[['name', 'time', 'Batch']].copy()

df_dict = names.groupby(names.Batch.values).agg(list).to_dict('records')

scrapes = [pd.DataFrame(df) for df in df_dict]
sessions = pd.DataFrame(columns=['session', 'name', 'login', 'logout'])
for i in range(len(scrapes) - 1):
prev_scrape = scrapes[i]
current_scrape = scrapes[i+1]    
# Add a new sessions for all users that where not online last scrape
for _, current_name in current_scrape['name'].iteritems():
    if prev_scrape[prev_scrape['name'] == current_name].empty:
        # Find at which batch the user went offline
        for scrape_to_check_if_still_online in scrapes[i+1:]:   
            if scrape_to_check_if_still_online[scrape_to_check_if_still_online['name'] == current_name].empty:
                new_entry = {
                    'session': len(sessions[sessions['name'] == current_name].index),
                    'name': current_name,
                    'login': current_scrape['time'].iloc[0],
                    'logout': scrape_to_check_if_still_online['time'].iloc[0]
                }
                sessions = sessions.append(new_entry, ignore_index=True)
                break

On the last 'Batch', the value for 'Timespan' should be 'Time' is still not included.

How i solved it for now:

import pandas as pd

data = pd.read_csv('names.csv',names=['name','Age','Gender','Online','time','Timespan','Batch'])
names = data[['name', 'time', 'Batch']].copy()

df_dict = names.groupby(names.Batch.values).agg(list).to_dict('records')

scrapes = [pd.DataFrame(df) for df in df_dict]
sessions = pd.DataFrame(columns=['session', 'name', 'login', 'logout'])
for i in range(len(scrapes) - 1):
prev_scrape = scrapes[i]
current_scrape = scrapes[i+1]    
# Add a new sessions for all users that where not online last scrape
for _, current_name in current_scrape['name'].iteritems():
    if prev_scrape[prev_scrape['name'] == current_name].empty:
        # Find at which batch the user went offline
        for scrape_to_check_if_still_online in scrapes[i+1:]:   
            if scrape_to_check_if_still_online[scrape_to_check_if_still_online['name'] == current_name].empty:
                new_entry = {
                    'session': len(sessions[sessions['name'] == current_name].index),
                    'name': current_name,
                    'login': current_scrape['time'].iloc[0],
                    'logout': scrape_to_check_if_still_online['time'].iloc[0]
                }
                sessions = sessions.append(new_entry, ignore_index=True)
                break

回复收藏 0 原文

~没有更多了~