将大型数据集中的时间戳转换为多个时区
我有一个大型数据集,约有 900 万行和 4 列 - 其中之一是 utc 时间戳。该集合中的数据是从澳大利亚各地的 507 个站点记录的,并且有一个站点 ID 列。我有另一个数据集,其中包含每个站点 ID 的时区,格式为“澳大利亚/布里斯班”。我编写了一个函数来在主数据集中创建一个新列,该列是将 utc 时间戳转换为本地时间。但是,错误的新时间与 utc 时间戳相匹配,例如 2019-01-05 12:10:00+00:00 和 2019-01-13 18:55:00+11:00(时区错误)。我相信网站不会在数据中混淆,但我尝试对数据进行排序,以防出现问题。下面是我的代码和每个数据集第一行的图像,非常感谢任何帮助!
import pytz
from dateutil import tz
def update_timezone(df):
newtimes = []
df = df.sort_values('site_id')
sites = df['site_id'].unique().tolist()
for site in sites:
timezone = solarbom.loc[solarbom['site_id'] == site].iloc[0, 1]
dfsub = df[df['site_id'] == site].copy()
dfsub['utc_timestamp'] = dfsub['utc_timestamp'].dt.tz_convert(timezone)
newtimes.extend(dfsub['utc_timestamp'].tolist())
df['newtimes'] = newtimes
I have a large dataset with ~ 9 million rows and 4 columns - one of which is a utc timestamp. Data in this set has been recorded from 507 sites across Australia, and there is a site ID column. I have another dataset that has the timezones for each site ID in the format 'Australia/Brisbane'. I've written a function to create a new column in the main dataset that is the utc timestamp converted to the local time. However the wrong new time is being matched up with the utc timestamp, for example 2019-01-05 12:10:00+00:00 and 2019-01-13 18:55:00+11:00 (wrong timezone). I believe that sites are not mixed up in the data, but I've tried to sort the data incase that was the problem. Below is my code and images of the first row of each dataset, any help is much appreciated!
import pytz
from dateutil import tz
def update_timezone(df):
newtimes = []
df = df.sort_values('site_id')
sites = df['site_id'].unique().tolist()
for site in sites:
timezone = solarbom.loc[solarbom['site_id'] == site].iloc[0, 1]
dfsub = df[df['site_id'] == site].copy()
dfsub['utc_timestamp'] = dfsub['utc_timestamp'].dt.tz_convert(timezone)
newtimes.extend(dfsub['utc_timestamp'].tolist())
df['newtimes'] = newtimes
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
IIUC,您希望按 ID 对数据进行分组,然后转换特定于每个 ID 的时间戳。您可以通过使用 groupby 来实现此目的,然后应用每个组的转换器功能。例如:
现在 df 看起来像
IIUC, you're looking to group your data by ID, then convert the timestamp specific to each ID. You could achieve this by using groupby, then applying a converter function to each group. Ex:
now df looks like