使用 Google App Engine 数据存储区进行数据建模

发布于 2024-09-26 21:23:20 字数 1275 浏览 3 评论 0原文

我目前正在用 Python 在 Google App Engine 上构建一个 Web 应用程序,以收集表单的赛马数据。基本数据结构是球场有很多会议有很多比赛有很多马匹有一名骑师和一名驯马师。到目前为止,我已经得到了以下模型(为了简洁起见,减少了字段数量)。

class Course(db.Model):
  course_number = db.IntegerProperty()     # course id (third party)
  course_description = db.StringProperty() # course name

class Meeting(db.Model):
  course = db.ReferenceProperty(Course)    # reference to course
  meeting_number = db.IntegerProperty()    # lifetime meeting number for course
  meeting_date = db.DateProperty()         # meeting date

class Race(db.Model):
  meeting = db.ReferenceProperty(Meeting)  # reference to meeting
  race_number = db.IntegerProperty()       # eg 1 for 1st race of meeting
  race_name = db.StringProperty()          # race name
  time_of_race = db.TimeProperty()         # race time

我无法弄清楚如何在数据存储中存储有关马匹、驯马师、骑师的数据。

我的应用程序将收集过去 2 年的数据,为此我将保存马匹、练马师、骑师的相关结果信息。练马师和骑师在该时间点的特定马匹结果信息是相同的。然而,随着时间的推移,一匹马可能会有不同的驯马师和不同的骑师。

当我意识到在分析中我可能需要查看马匹、骑师、练马师过去 10 场比赛的结果时,我的主要大脑疼痛就来了。可能无法存储的结果,因为结果发生在英国比赛之外(数据仍然可用),或者发生在我开始完整比赛存储的日期之前。

谁能告诉我如何优化马匹、骑师、练马师结果的存储,以便我能够适应这一点?

数据来源:http://form.horseracing.betfair.com/timeform 所有必需的数据都可以通过 JSON 请求轻松访问。

I am currently building a web application on Google App Engine in Python to harvest horse racing data of the form. The basic data structure is Course has many Meetings has many Races has many Horses has one Jockey and had one Trainer. So far I have got the following models (reduced number of fields for sake of brevity).

class Course(db.Model):
  course_number = db.IntegerProperty()     # course id (third party)
  course_description = db.StringProperty() # course name

class Meeting(db.Model):
  course = db.ReferenceProperty(Course)    # reference to course
  meeting_number = db.IntegerProperty()    # lifetime meeting number for course
  meeting_date = db.DateProperty()         # meeting date

class Race(db.Model):
  meeting = db.ReferenceProperty(Meeting)  # reference to meeting
  race_number = db.IntegerProperty()       # eg 1 for 1st race of meeting
  race_name = db.StringProperty()          # race name
  time_of_race = db.TimeProperty()         # race time

I am having trouble working out how to store data on Horses, Trainers, Jockeys in the data store.

My application will be harvesting data for say the last 2 years, for this I will be saving relevant result information for Horse, Trainer, Jockey. The information on a particular horses result is the same for Trainer and Jockey at that time point. However over time a Horse can have different trainer and different jockey.

My main brain ache is coming when I realise that in analysis I may need to look at the result for the last 10 races for either Horse, Jockey, Trainer. Results which may not be stored either because the results occured outside of UK racing (data is still available) or happened before the date I start complete race storage.

Can anyone shed any light on how to optimise the storage of Horse, Jockey, Trainer results so that I can accomodate for this?

Source of data: http://form.horseracing.betfair.com/timeform
All required data can be easily accessed via JSON requests.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(1

红焚 2024-10-03 21:23:20

使用 HorseResult、TrainerResult 和 JockeyResult 模型,您的方向是正确的。不要忘记,数据存储没有分组或聚合函数,因此您可能需要在加载数据时预先计算任何感兴趣的聚合或统计数据。

也许您还需要统计类型模型来跟踪马匹、骑师和驯马师随时间的表现以及每个模型的组合。像 HorseMonth 这样的东西,它可能会跟踪这匹马参加了多少场比赛以及它每月的排名。

我还会考虑保留有关马匹和骑师、或马匹和驯马师的组合随着时间的推移如何表现的详细信息。不幸的是,我对赛马了解不够,无法为您提供有意义的组合的具体建议。

由于听起来这主要是供您自己使用的工具,因此您可以查看映射器 API< /a>.当您探索数据时,它可能非常有价值。

如果你的数据中不包含比赛,那么除了扩大收获范围之外,你可能无能为力。您可能只想返回您拥有的结果,并且可能表明日期范围内没有足够的数据?

You are on the right track with using HorseResult, TrainerResult, and JockeyResult models. Do not forget, the datastore does not have grouping or aggregate functions, so you might want to pre-compute any aggregates or statistics of interest when you are loading the data.

Perhaps you will also want to have statistics type models for tracking horse, jockey, and trainer performance over time and the combinations of each. Something like HorseMonth, which might track how many races the horse was involved in and how it placed by month.

I would also consider keeping details on how the combinations of horse and jockey, or horse and trainer did over time. Unfortunately I do not know enough about horse racing to give you specific suggestions for which combinations are meaningful.

Since it sounds like this is a tool largely for your own use, you might look into the mapper API. It might be of great value when you are exploring the data.

If a race is not included in your data, aside from expanding the harvest range, there may not be a lot you can do. You will probably just want to return the results you have, and perhaps something indicating there is not enough data in the date range?

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文