有没有办法缓存获取输出?

发布于 2024-11-19 09:15:25 字数 513 浏览 3 评论 0原文

我正在开发一个在云中运行的封闭系统。

我需要的是一个搜索函数,它使用用户输入的正则表达式来过滤数据集中的行。

phrase = re.compile(request.get("query"))
data = Entry.all().fetch(50000)  #this takes around 10s when there are 6000 records
result = x for x in data if phrase.search(x.title)

现在,数据库本身不会有太大变化,每天的搜索量不会超过200-300次。

有没有办法以某种方式缓存所有条目(我预计条目不会超过 50.000 个,每个条目不超过 500 字节),因此检索它们不会花费超过 10 秒的时间?或者也许可以并行化它?我不介意 10cpu 秒,但我介意用户必须等待 10 秒。

为了解决诸如“索引它并使用 .filter()”之类的任何答案 - 查询是正则表达式,并且我不知道任何允许使用正则表达式的索引机制。

I'm working on a closed system running in the cloud.

What I need is a search function that uses user-typed-in regexp to filter the rows in a dataset.

phrase = re.compile(request.get("query"))
data = Entry.all().fetch(50000)  #this takes around 10s when there are 6000 records
result = x for x in data if phrase.search(x.title)

Now, the database itself won't change too much, and there will be no more than 200-300 searches a day.

Is there a way to somehow cache all the Entries (I expect that there will be no more than 50.000 of them, each no bigger than 500 bytes), so retrieving them won't take up >10 seconds? Or perhaps to parallelize it? I don't mind 10cpu seconds, but I do mind 10 second that the user has to wait.

To address any answers like "index it and use .filter()" - the query is a regexp, and I don't know about any indexing mechanism that would allow to use a regexp.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(3

千寻… 2024-11-26 09:15:25

您还可以使用 cachepy性能引擎(无耻插件)将数据存储在应用程序引擎的本地实例上,这样您就可以可以更快地访问所有实体,而不受内存缓存边界或数据存储延迟的限制。

提示:如果本地实例的内存超过 185 MB,它就会被终止,因此如果您知道自己在做什么,实际上可以在其中存储相当多的数据。

You can also use cachepy or performance engine (shameless plug) to store the data on app engine's local instances, so you can have faster access to all entities without getting limited by memcache boundaries or datastore latency.

Hint: A local instance gets killed if it surpasses about 185 MB of memory, so you can store actually quite a lot of data in it if you know what you're doing.

迷爱 2024-11-26 09:15:25

由于条目数量有限,因此您可以对所有条目进行内存缓存,然后像您概述的那样在内存中进行过滤。但请注意,每个内存缓存条目不能超过 1mb。但您最多可以并行获取 32mb 的内存缓存条目。

因此,将条目拆分为子集,对子集进行内存缓存,然后通过预先计算内存缓存键来并行读取它们。

更多信息请参见:

http://code.google.com/appengine/docs/ python/memcache/functions.html

Since there is a bounded number of entries, you can memcache all entries and then do the filtering in memory like you've outlined. However, note that each memcache entry cannot exceed 1mb. But you can fetch up to 32mb of memcache entries in parallel.

So split the entries into sub sets, memcache the subsets and then read them in parallel by precomputing the memcache key.

More here:

http://code.google.com/appengine/docs/python/memcache/functions.html

夏末 2024-11-26 09:15:25

由于您的数据约为 20MB,因此您可以将其完全加载到本地实例内存中,这将是您能达到的最快速度。或者,您可以将其作为数据文件与应用程序一起存储,读取速度比访问数据存储更快。

Since your data is on the order of 20MB, you may be able to load it entirely into local instance memory, which will be as fast as you can get. Alternately, you could store it as a data file alongside your app, reading which will be faster than accessing the datastore.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文