不等式过滤日期和数字

发布于 2024-10-15 01:23:49 字数 671 浏览 3 评论 0原文

我正在尝试查询我的 Google App Engine 数据存储区 [Python],其中包含 item_name、making_date 和 number_of_items_shipped。数据存储中约有 100 万条记录,并且还在不断增加。

场景: 获取已发货超过 x_items [用户输入]并某个日期[用户输入]之后制造的所有item_names。 基本上,是一种库存检查。

实际上有 2 个属性不等式。 但由于 GAE 中的查询限制,我不能够做到这一点。

搜索了这个问题。但是,到目前为止还没有运气。您遇到过这个问题吗?如果是这样,你能解决这个问题吗?请告诉我。

同样在 Google I/O 2010 的下一代查询中,Alfred Fuller 提到他们将很快取消这一限制。已经过去8个多月了,但这个限制仍然存在。很遗憾。

如果有人能够绕过此限制并发布答案,我将不胜感激。

多谢。

Am trying to query my Google App Engine datastore [Python], which has a item_name, manufacturing_date and number_of_items_shipped. There are ~1.0 million records in the datastore and ever increasing.

The scenario:
Get all the item_names which have been shipped more than x_items [user input] and manufactured after some_date [user input].
Basically, kind of an inventory check.

Effectively 2 inequalities on properties.
But due to restrictions on queries in GAE, am not able to do this.

Searched SO for this issue. But, no luck till now. Did you come across this issue? If so, were you able to resolve this? Please let me know.

Also in Google I/O 2010, Next Gen Queries, Alfred Fuller mentioned that they are going to remove this restriction soon. Its been more than 8 months, but this restriction is in place even now. Unfortunately.

Appreciate if anyone can post an answer if they were able to circumvent this restriciton.

Thanks a lot.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(2

我不在是我 2024-10-22 01:23:49

根据 Sudhir 的回答,我可能会根据您关心的粒度将每条记录分配到制造日期“桶”。例如,如果您的生产日期范围超过几年,请使用每月存储桶。如果您的范围只是去年,每周一次。

现在,当您想查找带有 > 的记录时n 给定范围内的销售和制造日期,对该范围内的每个存储桶执行一次查询,并后过滤掉您不感兴趣的项目。

例如(完全未经测试):

BUCKET_SIZE_DAYS = 10

def put(self):
    self.manufacture_bucket = int(self.manufacture_date.toordinal() / BUCKET_SIZE_DAYS)
    super(self.__class__, self).put()

def filter_date_after(self, date_start):
    first_bucket = int(date_start.toordinal() / BUCKET_SIZE_DAYS)
    last_bucket = int(datetime.datetime.today().toordinal() / BUCKET_SIZE_DAYS)

    for this_bucket in range(first_bucket, last_bucket+1):
        for found in self.filter("manufacture_bucket =", this_bucket):
            if found.manufacture_date >= date_start:
                yield found

然后您应该能够像这样使用

widgets.filter("sold >", 7).filter_date_after(datetime.datetime(2010,11,21))

:给读者的一个练习:

  • 让它与添加到末尾的其他过滤器很好地配合。
  • 多个存储桶大小,允许您始终查询 ln(日期范围内的天数)存储桶。

Building on Sudhir's answer, I'd probably assign each record to a manufacture date "bucket", based on the granularity you care about. If your range of manufacturing dates is over a couple of years, use monthly buckets for example. If your range is just in the last year, weekly.

Now when you want to find records with > n sales and manufacturing date in a given range, do your query once per bucket in that range, and postfilter out the items you are not interested in.

For example (totally untested):

BUCKET_SIZE_DAYS = 10

def put(self):
    self.manufacture_bucket = int(self.manufacture_date.toordinal() / BUCKET_SIZE_DAYS)
    super(self.__class__, self).put()

def filter_date_after(self, date_start):
    first_bucket = int(date_start.toordinal() / BUCKET_SIZE_DAYS)
    last_bucket = int(datetime.datetime.today().toordinal() / BUCKET_SIZE_DAYS)

    for this_bucket in range(first_bucket, last_bucket+1):
        for found in self.filter("manufacture_bucket =", this_bucket):
            if found.manufacture_date >= date_start:
                yield found

You should be then able to use this like:

widgets.filter("sold >", 7).filter_date_after(datetime.datetime(2010,11,21))

Left as an exercise for the reader:

  • Making it play nicely with other filters added to the end
  • Multiple bucket sizes allowing you to always query ln(days in date range) buckets.
秋风の叶未落 2024-10-22 01:23:49

不幸的是,您无法规避此限制,但我可以帮助您以稍微不同的方式对数据进行建模。

首先,Bigtable 适合对大型数据库进行快速读取——就像有一百万人同时访问您的应用程序时所做的那样。您在这里尝试做的是一份历史数据报告。虽然我建议将报告移至 RDBMS,但有一种方法可以在 Bigtable 上完成此操作。

首先,重写项目模型上的 put() 方法以在保存之前分割日期。你会做的是这样的事情

def put(self):
  self.manufacture_day = self.manufacture_date.day
  self.manufacture_month = self.manufacture_date.month
  self.manufacture_year = self.manufacture_date.year
  super(self.__class__, self).put()

你可以按照你想要的任何粒度级别进行,甚至是小时、分钟、秒等等。

您只需加载和保存项目实体即可将其追溯应用到您的数据库。 映射器对此非常方便。

然后更改您的查询以仅在项目计数上使用不等式,并选择您想要使用正常等式的天/月/年。您可以通过触发多个查询或使用 IN 子句来执行范围。 (无论如何,它做了同样的事情)。

这似乎确实是做作且难以做到,但请记住,如果您这样做,您的报告几乎会立即运行,即使数百万人尝试同时运行它们。您可能不需要这种规模,但是……这就是您得到的:D

Unfortunately, you can't circumvent this restriction, but I can help you model the data in a slightly different way.

First off, Bigtable is suited to very fast reads off large databases - the kind you do when have a million people hitting your app at the same time. What you're trying to do here is a report on historical data. While I would recommend moving the reporting to a RDBMS, there is a way you can do it on Bigtable.

First, override the put() method on your item model to split the date before saving it. What you would do is something like

def put(self):
  self.manufacture_day = self.manufacture_date.day
  self.manufacture_month = self.manufacture_date.month
  self.manufacture_year = self.manufacture_date.year
  super(self.__class__, self).put()

You can do this to any level of granularity you want, even hours, minutes, seconds, whatever.

You can apply this retroactively to your database by just loading and saving your item entities. The mapper is very convenient for this.

Then change your query to use the inequality only on the item count, and select the days / months / years you want using normal equalities. You can do ranges by either firing multiple queries or using the IN clause. (Which does the same thing anyway).

This does seem contrived and tough to do, but keep in mind that your reports will run almost instantaneously if you do this, even when millions of people try to run them at the same time. You might not need this kind of scale, but well... that's what you get :D

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文