涉及多态关联的复杂时间序列统计聚合

发布于 2024-11-13 22:03:26 字数 2231 浏览 3 评论 0原文

好的。请耐心等待,因为我需要提供大量上下文细节才能为我的问题寻求合理的答案。

我有一个网站,可以让你每天选股。它的工作原理是,系统会提示您在当天对峙的公司之间做出选择。例如,GE 与 IBM。您可以做出两种类型的选择:表现(哪只股票表现更好?)和总成交量(合并后的股票交易量会高于还是低于 X?)。每天你会得到 100 虚拟美元来进行选择。

最终,我们的目标是跟踪哪个用户在以下时间段内在各个类别(如下所述)中每次选择赚的钱最多:5 天、15 天、30 天、90 天、180 天、1 年、全部 -时间。计算每次选秀赚多少钱非常简单。这是赚(或亏)的总钱/选秀次数。

现在,用户选择的每个公司都属于分类层次结构。一般来说,分类层次结构如下所示:

Division -->主要组 -->产业集团-->分类-->公司

以下是一些示例:

  • 采矿业 -->金属开采 -->铁矿石 -->褐矿开采 --> A公司
  • 矿业-->金属开采 -->铁矿石 -->褐矿开采 --> B公司
  • 矿业-->金属开采 -->铁矿石 -->褐铁矿开采 --> C
  • 矿业公司 -->金属开采 -->铁矿石 -->褐铁矿开采 --> D 公司
  • 制造 -->烟草制品 -->雪茄-->斯托吉斯 --> E公司
  • 制造-->烟草制品 -->雪茄-->斯托吉斯 --> F公司
  • 制造 -->烟草制品 -->雪茄-->小雪茄 --> G公司
  • 制造——>烟草制品 -->雪茄-->小雪茄 -->公司 H
  • ……等等……

每个类别都有一个模型(当然还有相应的表),并且它们是关联的(想想foreign_key),就像您在上面看到的那样。

Matchup 有一个模型,每条记录代表当天有哪些公司在对峙。每条记录都跟踪每家公司的起始和最终股票价格以及总交易量。

每个比赛都有一个或多个可以全天变化的 :pick_prices。通常,每场比赛都有一个表现选择价格和一个总数量选择价格。价格决定了选择的成本以及正确选择的收入。 (现在,这只是背景信息。您无需担心那些特定的价格计算。)

在交易日结束时,用户的选择得到解决。选择在 Pick 模型中表示,具有以下属性:

  • user_id
  • amount_spent(例如,$10)
  • result(例如,WON、LOST)
  • pick(例如,公司 A)
  • matchup_id
  • pick_price_id
  • amount_won
  • returned(true 或 false)
  • created_at
  • Updated_at

目前,当每个pick 解决后,另一个名为 pick_records 的表被更新,该表具有以下属性:

  • user_id
  • recordable_id
  • recordable_type (部门或主要组或行业组或分类或公司)
  • 选秀权(做出的总选秀权,无论选秀类型如何)
  • 获胜(赢得的总选秀权,无论选秀类型如何)
  • 丢失(丢失的总选秀权,无论选秀类型)
  • 金钱(赢得的总金钱)
  • money_per_pick(金钱/选秀权)
  • performance_picks
  • Performance_won
  • Performance_lost
  • Performance_money
  • Performance_money_per_pick
  • Volume_picks
  • Volume_Won
  • Volume_lost
  • Volume_Money
  • Volume_Money_Per_pick
  • Created_At
  • Updated_At

当您可以看出,这是一个多态模型。该表汇总了所有时间的挑选记录统计数据。

那么现在的挑战是:

在现有的设计下,我需要做什么才能捕获用户在以下时间段内的选择记录:5天、15天、30天、90天、180天、1年,全时?它需要简单、高效、快速!

我目前在 MySQL DB 上运行 Rails 2.3.11。

Ok. Bear with me, as I need to provide a lot of contextual detail before I can solicit a reasonable answer to my question.

I have a site that allows you to make daily stock picks. The way it works is that you're prompted to make picks between companies that are facing-off for the day. For example, GE vs. IBM. You can make two types of picks: Performance (which stock will perform better?) and Total Volume (will the combined stocks trade at volumes higher or lower than X?). You're given 100 virtual dollars each day to make picks.

Ultimately, our goal here is to track which user makes the most money per pick in various categories (explained below) over the following periods of time: 5 days, 15 days, 30 days, 90 days, 180 days, 1 year, all-time. It's very simple to calculate how much money is made per pick. It's the total money made (or lost) / number of picks.

Now, each company that the user makes a pick on falls under a categorical hierarchy. Generically, the categorical hierarchy looks like this:

Division --> Major Group --> Industry Group --> Classification --> Company

Here are some examples:

  • Mining --> Metal Mining --> Iron Ores --> Brown Ore Mining --> Company A
  • Mining --> Metal Mining --> Iron Ores --> Brown Ore Mining --> Company B
  • Mining --> Metal Mining --> Iron Ores --> Limonite Mining --> Company C
  • Mining --> Metal Mining --> Iron Ores --> Limonite Mining --> Company D
  • Manufacturing --> Tobacco Products --> Cigars --> Stogies --> Company E
  • Manufacturing --> Tobacco Products --> Cigars --> Stogies --> Company F
  • Manufacturing --> Tobacco Products --> Cigars --> Cigarillos --> Company G
  • Manufacturing --> Tobacco Products --> Cigars --> Cigarillos --> Company H
  • …and so on…

There's a model for each category (and corresponding table, of course), and they are associated (think foreign_key) just like you see above.

There is a model for Matchup, with each record representing which companies are facing-off for the day. Each record keeps track of the starting and final stock prices for each company, as well as the the total trade volume.

Each Matchup has one or more :pick_prices that can change throughout the day. Normally, each matchup has a Performance Pick Price and a Total Volume Pick Price. The price determines what the pick will cost you and how much you earn for a correct pick. (Now, this is all just background info. You don't need to worry about those particular price calculations.)

At the end of the trading day, the user's picks are resolved. Picks are represented in a Pick model, with the following attributes:

  • user_id
  • amount_spent (e.g., $10)
  • result (e.g., WON, LOST)
  • pick (e.g., company A)
  • matchup_id
  • pick_price_id
  • amount_won
  • resolved (true or false)
  • created_at
  • updated_at

Currently, when each pick is resolved, another table is updated called pick_records, which has the following attributes:

  • user_id
  • recordable_id
  • recordable_type (Division or Major Group or Industry Group or Classification or Company)
  • picks (total picks made, regardless of pick type)
  • won (total picks won, regardless of pick type)
  • lost (total picks lost, regardless of pick type)
  • money (total money won)
  • money_per_pick (money / picks)
  • performance_picks
  • performance_won
  • performance_lost
  • performance_money
  • performance_money_per_pick
  • volume_picks
  • volume_won
  • volume_lost
  • volume_money
  • volume_money_per_pick
  • created_at
  • updated_at

As you can tell, this is a polymorphic model. The table aggregates the all-time pick record statistics.

So now here's the challenge:

Given the existing design, what do I have to do so that I can capture the user's pick records over the following periods of time: 5 days, 15 days, 30 days, 90 days, 180 days, 1 year, all-time? It needs to be simple, efficient, and fast!

I'm currently running Rails 2.3.11 on MySQL DB.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(3

木有鱼丸 2024-11-20 22:03:26

我认为不需要表 pick_records。
您可以对任意天数执行这样的查询:

SELECT 
   user_id
   ,sum(amount_spent) 
   ,sum(IF(result = 'WON',1,0)) as WON_count
   ,sum(IF(result = 'LOST',1,0)) as LOST_count
   ,pick 
   /*matchup_id*/
   ,sum(pc.price) as price
   ,sum(IF(result = 'WON'),amount_won,0)) as amount_won
   ,sum(IF(result = 'LOST'),amount_won,0)) as amount_lost
   ,sum(IF(result = 'WON'),amount_won,-amount_won)) as nett_amount
FROM picks
INNER JOIN pick_price pc ON (pc.id = user.pick_price_id)
WHERE created_at BETWEEN DATE_SUB(NOW(), INTERVAL 5 DAY) AND NOW()
  AND resolved = 'true'
GROUP BY user_id, pick

I don't see the need for table pick_records.
You can do a query like this for any number of days:

SELECT 
   user_id
   ,sum(amount_spent) 
   ,sum(IF(result = 'WON',1,0)) as WON_count
   ,sum(IF(result = 'LOST',1,0)) as LOST_count
   ,pick 
   /*matchup_id*/
   ,sum(pc.price) as price
   ,sum(IF(result = 'WON'),amount_won,0)) as amount_won
   ,sum(IF(result = 'LOST'),amount_won,0)) as amount_lost
   ,sum(IF(result = 'WON'),amount_won,-amount_won)) as nett_amount
FROM picks
INNER JOIN pick_price pc ON (pc.id = user.pick_price_id)
WHERE created_at BETWEEN DATE_SUB(NOW(), INTERVAL 5 DAY) AND NOW()
  AND resolved = 'true'
GROUP BY user_id, pick
苍景流年 2024-11-20 22:03:26

不确定我的问题是否正确,但是...

@records=Pick_record.all(:conditions => ["user_id = ?", user_id],
                         :group => "date(created_at)", 
                         :having => ["created_at > ?", 5.days.ago])

Not sure if I get the question right, but...

@records=Pick_record.all(:conditions => ["user_id = ?", user_id],
                         :group => "date(created_at)", 
                         :having => ["created_at > ?", 5.days.ago])
Saygoodbye 2024-11-20 22:03:26

如果我理解正确的话,现在每个用户只有一个 pick_record,它包含他的总选择的概述,并在解决选择时更新。

由于 pick_record 的内容可以计算,因此它仅用于缓存并确保您可以非常快速地提供数据/报告。

为了解决您的问题,我建议如下:

在整个生命周期中,我不会为您感兴趣的每个时间段提供一个 pick_record,而是拥有一个 pick_record。因此,您将拥有一个包含过去 4 天结果的 pick_record,一个以及过去 14 天的结果,29 ...那些您每天计算一次的结果,最好是在晚上(或者当您的网站使用率较低时)。当需要显示选定时间段的报告时,您只需添加当天的结果即可完成!

因此,回顾一下:

  1. 为每个感兴趣的时间段引入一个 pick_record(添加一个指示时间段的字段:5、15、30,...)
  2. 每天预先计算一次结果(后台作业、eq resque 或elaided_job)
  3. 在检索时间段时 -结果,你只需要添加当天的结果

你觉得怎么样?

If I understand correctly, there is now only one pick_record per user, and it contains an overview of his total picks, and is updated when resolving a pick.

Since the content of the pick_record can be calculated, it is merely used for caching and making sure you can supply the data/report really quickly.

To resolve your problem, I would propose the following:

Instead of having single pick_record, over the total lifespan, I would have a pick_record per timespan you are interested in. So you would have a pick_record with the result for the last 4 days, one with the result for the result of the last 14 days, 29 ... Those you calculate once per day, preferably by night (or when your site is low in use). When a report for a chosen period of time has to shown, you only need to add the result of the current day and done!

So, to recap:

  1. introduce a pick_record per interesting period (add a field indicating the period: 5, 15, 30, ...)
  2. pre-calculate the results once per day (background job, e.q. resque or delayed_job)
  3. when retrieving the period-results, you only need to add the results of the current day

What do you think?

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文