如何按天汇总数据并仍然尊重时区?
我们目前正在使用一个汇总表,该表按 UTC 时间每小时汇总用户信息。我们遇到的问题是这个表变得太大并且极大地减慢了我们的系统速度。我们已经完成了为 PostgreSQL 推荐的所有调整技术,但仍然遇到缓慢的情况。
我们的想法是开始按天而不是按小时聚合,但问题是我们允许客户更改时区,这会重新计算当天的数据。
有谁知道一种方法来存储每日摘要,但在切换时区时仍然尊重数字和总数?
We are currently using a summary table that aggregates information for our users on an hourly basis in UTC time. The problem we are having is that this table is becoming too large and slowing our system down immensely. We have done all the tuning techniques recommended for PostgreSQL and we are still experiencing slowness.
Our idea was to start aggregating by day rather than by hour, but the problem is that we allow our customers to change the timezone, which recalculates the data for that day.
Does anyone know of a way to store the daily summary but still respect the numbers and totals when they switch timezones?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(4)
使用时间偏移列和“日”字段(日期)(即该特定汇总行的日期)汇总表中的数据。索引(时间偏移、日期、其他相关字段),如果可能的话进行集群(大概 PostgresSQL 有集群索引?),一切都应该很好。
Summarise the data in tables with a timeoffset column, and a "day" field (a date) that is the day for that particular summary line. Index on (timeoffset, day, other relevant fields), clustered if possible (presumably PostgresSQL has clustered indexes?) and all should be well.
我假设您已经了解了所有分区注意事项,例如按用户分区。
根据使用模式,我可以看到几种解决您问题的方法。
每天根据用户选择汇总数据。如果时区发生更改,请以编程方式重新计算该合作伙伴的总计。如果时区更改不频繁,并且当用户更改时区时可能会引入一定的数据延迟,则这是合理的。
如果您的度量相对较少,您可以为每个度量维护 24 列 - 每列描述不同时区的度量的每日聚合。
如果时区更改频繁且有大量度量,则似乎需要 24 个不同的聚合表。
I'm assuming you've went through all the partitioning considerations, such as partitioning by user.
I can see several solutions to your problem, depending on the usage pattern.
Aggregate data per day, per user selection. In the event of timezone change, programatically recalculate the aggregate for this partner. This is plausible if timezone changes are infrequent and if a certain delay in data may be introduced when a user changes timezones.
If you have relatively few measures, you may maintain 24 columns for each measure - each describing the daily aggregate for the measure in a different timezone.
If timezone changes are frequent and there are numerous measures, it seems like 24 different aggregate tables would be the way to go.
我也遇到了这个问题。我采取这样的解决方案:日期类型的数据使用本地时区,其他日期时间类型的数据使用UTC时区,因为统计索引是本地的。另一个原因是现在我们只有本地数据。
I met this problem too. I take this solution: the data with date type use local timezone, the other data with datetime type use UTC timezone, because the statistics index is local. Another reason is now we have only local data.
我面临着同样的问题。我正在考虑按日期和时间(UTC 中的每小时)进行聚合。然后您可以相应地获取您想要的任何时区的数据。不幸的是,如果您需要支持有 45/30/15 分钟偏移的时区,这将不起作用。或者,您可以每 15 分钟聚合一次数据以支持每个时区。
I’m facing the same problem. I’m thinking about aggregating by date and time (hour by hour in UTC). Then you could fetch data accordingly for whatever time zone you want. Unfortunately, this won’t work if you need to support timezones where there is 45/30/15 minute offset. Alternatively, you could aggregate data by every 15 minutes to support every timezone.