BigTable 是慢还是我笨?
我基本上有经典的多对多模型。 一个用户,一个奖项,以及用户和奖项之间的“多对多”表映射。
每个用户有大约400个奖励,每个奖励分配给大约1/2的用户。
我想遍历所有用户的奖励并总结他们的分数。 在 SQL 中,这将是多对多之间的表连接,然后遍历每一行。 在具有 MySQL 实例的体面机器上,400 行根本不是什么大问题。
在应用程序引擎上,我看到大约 10 秒即可完成求和。 大部分时间都花在 Google 的数据存储中。 这是 cProfile 的前几行
ncalls tottime percall cumtime percall filename:lineno(function) 462 6.291 0.014 6.868 0.015 {google3.apphosting.runtime._apphosting_runtime___python__apiproxy.Wait} 913 0.148 0.000 1.437 0.002 datastore.py:524(_FromPb) 8212 0.130 0.000 0.502 0.000 datastore_types.py:1345(FromPropertyPb) 462 0.120 0.000 0.458 0.001 {google3.net.proto._net_proto___parse__python.MergeFromString}
我的数据模型是否错误? 我的查找操作有误吗? 这是我必须处理缓存和批量更新的缺点吗(这将是一个巨大的痛苦)。
I basically have the classic many to many model. A user, an award, and a "many-to-many" table mapping between users and awards.
Each user has on the order of 400 awards and each award is given to about 1/2 the users.
I want to iterate over all of the user's awards and sum up their points. In SQL it would be a table join between the many-to-many and then walk through each of the rows. On a decent machine with a MySQL instance, 400 rows should not be a big deal at all.
On app engine I'm seeing around 10 seconds to do the sum. Most of the time being spent in Google's datastore. Here is the first few rows of cProfile
ncalls tottime percall cumtime percall filename:lineno(function) 462 6.291 0.014 6.868 0.015 {google3.apphosting.runtime._apphosting_runtime___python__apiproxy.Wait} 913 0.148 0.000 1.437 0.002 datastore.py:524(_FromPb) 8212 0.130 0.000 0.502 0.000 datastore_types.py:1345(FromPropertyPb) 462 0.120 0.000 0.458 0.001 {google3.net.proto._net_proto___parse__python.MergeFromString}
Is my data model wrong? Am I doing the lookups wrong? Is this a shortcoming that I have to deal with with caching and bulkupdating (which would be a royal pain in the ass).
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(5)
可能两者兼而有之;-)
如果您在 Awards 表上执行 400 个查询,对于映射表上的查询返回的每个结果都有一个查询,那么我认为这会很痛苦。 查询的 1000 个结果限制是存在的,因为 BigTable 认为返回 1000 个结果已经达到了其在合理时间内运行的能力的极限。 根据架构,我预计 400 个查询比返回 400 个结果的一个查询要慢得多(400 log N 与 (log M) + 400)。
好消息是,在 GAE 上,memcaching 包含所有奖项及其积分值的单个哈希表非常简单(嗯,当我不久前查看 memcache 文档时,看起来非常简单。我不需要这样做然而)。
另外,如果您还不知道,
for result in query.fetch(1000)
比for result in query
快得多,而且您只能获得 1000 个结果无论哪种方式。 后者的优点是 (1) 如果您尽早退出,速度可能会更快;(2) 如果 Google 将限制提高到 1000 以上,它无需更改代码即可获得好处。删除用户(或奖励)时也可能会遇到问题。 我在一次测试中发现我可以在时限内删除 300 个对象。 这些对象比您的映射对象更复杂,具有 3 个属性和 5 个索引(包括隐式索引),而您的映射表可能只有 2 个属性和 2 个(隐式)索引。 [编辑:刚刚意识到我在知道 db.delete() 可以获取列表之前进行了此测试,这可能要快得多]。
BigTable 不一定能够完成关系数据库设计时擅长的事情。 相反,它将数据很好地分布在许多节点上。 但几乎所有网站都可以在单个数据库服务器上运行良好,但存在瓶颈,因此并不严格需要 BigTable 所做的事情。
另一件事:如果您对单个 http 请求执行 400 个数据存储查询,那么您会发现在达到请求固定配额之前就已经达到了数据存储固定配额。 当然,如果您完全在配额范围内,或者您首先完成了其他任务,那么这可能与您的应用程序无关。 但这两个配额之间的比例约为 8:1,我认为这暗示了 Google 期望我的数据模型是什么样子。
Could be a bit of both ;-)
If you're doing 400 queries on the Awards table, one for each result returned for a query on the mapping table, then I would expect that to be painful. The 1000-result limit on queries is there because BigTable thinks that returning 1000 results is at the limit of its ability to operate in a reasonable time. Based on the architecture, I'd expect the 400 queries to be way slower than the one query returning 400 results (400 log N vs. (log M) + 400).
The good news is that on GAE, memcaching a single hashtable containing all the awards and their points values is pretty straightforward (well, looked pretty straightforward when I cast an eye over the memcache docs a while back. I've not needed to do it yet).
Also, if you didn't already know,
for result in query.fetch(1000)
is way faster thanfor result in query
, and you're restricted to 1000 results either way. The advantages of the latter are (1) it might be faster if you bail out early, and (2) if Google ever increases the limit beyond 1000, it gets the benefit without a code change.You might also have problems when you delete a user (or an award). I found on one test that I could delete 300 objects inside the time limit. Those objects were more complex than your mapping objects, having 3 properties and 5 indices (including the implicit ones), whereas your mapping table probably only has 2 properties and 2 (implicit) indices. [Edit: just realised that I did this test before I knew that db.delete() can take a list, which is probably much faster].
BigTable does not necessarily do the things that relational databases are designed to do well. Instead, it distributes data well across many nodes. But almost all websites run fine with a bottleneck on a single db server, and hence don't strictly need the thing that BigTable does.
One other thing: if you're doing 400 datastore queries on a single http request, then you will find that you hit your datastore fixed quota well before you hit your request fixed quota. Of course if you're well within quotas, or if you're hitting something else first, then this might be irrelevant for your app. But the ratio between the two quotas is something like 8:1, and I take this as a hint what Google expects my data model to look like.
是的,是的,恐怕。
就您的数据模型而言,迄今为止处理此问题的最佳方法是根据用户记录存储总和,并在用户获得/失去奖励时更新它。 每次都计算他们的分数确实没有意义,因为绝大多数时候,分数都不会改变。 如果您将“UserAward”实体类型设置为“User”的子实体,则可以在单个原子事务中更新分数并插入或删除 UserAward 条目,从而确保您的计数始终准确。
onebyone 指出您可以对奖励表进行内存缓存。 这是一个好主意,但考虑到数据量有限,更好的方法是将其存储在本地内存中。 全局成员在 HTTP 请求之间持续存在,并且由于我认为您不会经常更新奖励表,因此您不必太担心使缓存失效。 只需在第一个请求时加载它(或者甚至将其硬编码到您的源代码中)。 如果您确实更改了奖励列表,部署新的次要更新将重置所有实例,导致它们重新加载。
对于查找,请记住,执行数据存储操作的主要成本是往返时间。 get() 操作通过 ID 查找 1 条或多条记录(可以批处理!)大约需要 20-40 毫秒。 然而,一次查询大约需要 160-200 毫秒。 因此,非规范化的力量。
Yes and yes, I'm afraid.
As far as your data model goes, the best way by far to handle this is to store the sum against the User record, and update it when a user gains/loses an award. There's really no point in counting their score every single time when the vast majority of the time, it will be unchanged. If you make the "UserAward" entity type a child entity of the "User", you can update the score and insert or delete the UserAward entry in a single atomic transaction, ensuring your count is always accurate.
onebyone points out that you can memcache the awards table. That's a good idea, but given the limited amount of data, an even better one is to store it in local memory. Global members persist between HTTP requests, and since I presume you don't update the awards table often, you don't have to worry much about invalidating the cache. Just load it on the first request (or even hardcode it into your source). If you do change the list of awards, deploying a new minor update will reset all the instances, causing them to reload.
For lookups, bear in mind that a substantial cost of doing datastore operations is the round-trip time. A get() operation, which looks up 1 or more records by ID (you can batch!) takes about 20-40ms. A query, however, takes about 160-200ms. Hence, the power of denormalization.
应用程序引擎的一个重要特点是存储很便宜,但时间永远不会过剩。 在应用程序引擎中建立多对多关系的最佳方法似乎是简单地在双方存储信息。 IE 中,一个用户有一个奖项列表,每个奖项都有一个用户列表。 要查找用户拥有的所有奖励,您只需查询特定用户的奖励表即可。
这个想法在这里得到了很好的体现:构建可扩展的复杂应用
One important idiom of app engine is that storage is cheap but time is never in surplus. It seems the best way to do many to many relationships in app engine is to simply store the information on both sides. IE a user has a list of awards and a each award has a list of users. To look up all the awards a user has you simply query the awards table for a certain user.
This idea is well demonstrated here: Building Scalable Complex Apps
Google BigTable 在 Google 分布式文件系统上运行。
数据是分布式的。 也许 400 行 mysql 仍然更好,但对于更大的数据,google BigTable 可能更快。
我认为这就是为什么他们鼓励我们使用内存缓存来提高速度。
Google BigTable run on Google Distributed File System.
The data is distributed. Maybe 400 rows mysql still have better, but for larger data google BigTable might faster.
I think thats why they encouraging us to use memcache to make it faster.
即使你提到 BigTable,我认为你正在云 SQL 上实现关系数据库。
你的模型没问题,这是做这样的事情的正确方法。 我认为没有充分的理由将聚合非规范化到用户表上。
您是否为快速表连接创建了索引? 这很简单。 您可能需要为所有涉及表连接的字段建立 BTree 索引。 无需对聚合字段(您对其进行求和)建立索引。 基本上 N:N 表的两个外键都应该建立索引。 如果这些外键引用其他两个表的主键,那就足够了。
在 100 行以上,外键上的简单 BTree 索引可以使吞吐量显着增加。
我正在 CloudSQL 上运行一个数据库,其中一些边缘表有超过 200 万条记录。 仅在 250 万条记录之后,我才考虑进行一些反规范化,这也是一些额外的索引,并且仍在聚合 SUM。 否则,每当添加新记录时,我都会对 SUM 字段进行不必要的更新。
只有当表的记录数超过 100 万条时,我们才必须考虑使用只读副本。 这时我们就可以区分只读取某些表而不写入的进程。
如果你使用 Django,当你根据他们的文档实现 LIMIT 时要小心; 因为它非常具有误导性。 当您在记录集上 [:100](拼接)时,实际发送到 SQL Server 的 SQL 并不是您所期望的。 我很难弄清楚这一点。 当您计划做一些会产生非常大规模的事情时,Django 不是一个好的选择。 但如果有 1000 条记录,那就没问题了。
Even you mention BigTable, I think you are implementing a relational database on cloud SQL.
Your model is all right, it is the proper way of doing something like this. I don't see a good reason to de-normalize aggregates onto user table.
Did you create indexes for fast table joining. It is pretty simple. You may need BTree indexes for all fields that involve table joining. No need to index the aggregating field (which you take the SUM of). Basically both foreign keys of the N:N table should be indexed. If those foreign keys refer to the primary key of other two tables, that is enough to go.
Above the order of 100 rows, a simple BTree index on foreign keys can have a decent and noticeable increase in throughput.
I'm running a database on CloudSQL where some edge tables have over 2 million records. Only after the 2.5 million records I'm considering some de-normalization, and that is also some extra indexes, and still aggregating for the SUM. Otherwise I would be doing unnecessary updates to SUM field whenever new records are added.
Only when the table was going over 1 million records, we had to consider using a read replica. And that is when we could distinguish between processes that only read some tables and not writing.
If you are using Django, beware when you implement LIMIT according to their documentation; because it is very misleading. When you [:100] (splice) on a record set, it is not what you expecting on the SQL that is actually sent to SQL server. I had a very hard time figuring that out. Django is not a good option when you plan to do something that would spawn to very large scale. But at the order of 1000 records, it would be fine.