提高大列多行数据库表(50列,5mm行)的查询性能
我们正在为用户数据构建缓存解决方案。数据当前存储在 sybase 中,分布在 5 - 6 个表中,但查询服务使用 hibernate 构建在其之上,我们的性能非常差。为了将数据加载到缓存中,需要 10 - 15 小时。
因此,我们决定在另一个关系数据库 (UDB) 中创建一个包含 50 - 60 列和 5mm 行的非规范化表,首先填充该表,然后使用 JDBC 从新的非规范化表填充缓存,这样构建我们缓存的时间会更短。这为我们提供了更好的性能,现在我们可以在大约一个小时内构建缓存,但这也不能满足我们在 5 分钟内构建缓存的要求。使用以下查询来查询反规范化表
select * from users where user id in (...)
,这里用户 id 是主键。我们还尝试了查询
select * from user where user_location in (...)
并在位置上创建了非唯一索引,但这也没有帮助。
那么有没有一种方法可以让我们的查询速度更快呢?如果没有,那么我们也愿意考虑一些 NOSQL 解决方案。
哪种 NOSQL 解决方案适合我们的需求。除了大桌子之外,我们每天都会对桌子进行大约 1 毫米的更新。
我读过有关 mongo db 的内容,似乎它可能有效,但没有人发布任何使用 mongo db 的经验,其中包含如此多的行和如此多的每日更新。
请让我们知道您的想法。
We are building an caching solution for our user data. The data is currently stored i sybase and is distributed across 5 - 6 tables but query service built on top of it using hibernate and we are getting a very poor performance. In order to load the data into the cache it would take in the range of 10 - 15 hours.
So we have decided to create a denormalized table of 50 - 60 columns and 5mm rows into another relational database (UDB), populate that table first and then populate the cache from the new denormalized table using JDBC so the time to build us cache is lower. This gives us a lot better performance and now we can build the cache in around an hour but this also does not meet our requirement of building the cache whithin 5 mins. The denormlized table is queried using the following query
select * from users where user id in (...)
Here user id is the primary key. We also tried a query
select * from user where user_location in (...)
and created a non unique index on location also but that also did not help.
So is there a way we can make the queries faster. If not then we are also open to consider some NOSQL solutions.
Which NOSQL solution would be suited for our needs. Apart from the large table we would be making around 1mm updates on the table on a daily basis.
I have read about mongo db and seems that it might work but no one has posted any experience with mongo db with so many rows and so many daily updates.
Please let us know your thoughts.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
与 MongoDB 相关的简短答案是肯定的 - 它可以通过这种方式在 RDBMS 前面创建非规范化缓存。其他人已经使用 MongoDB 来存储与您所描述的大小相似(和更大)的数据集,并且可以在 RAM 中保存该大小的数据集。这里缺少一些关于数据的细节,但它肯定不超出 MongoDB 的能力,并且是更常用的实现之一:
http://www.mongodb.org/display/DOCS/The+Database+and+Caching
关键是您的工作数据的大小设置以及您的可用 RAM(MongoDB 映射数据存入内存)。对于更大的解决方案、写入大量扩展和类似问题,可以采用多种方法(分片、副本集)。
鉴于给出的详细程度,很难确定 MongoDB 会满足您的所有要求,但考虑到其他人已经完成了类似的实现,并且根据给出的信息,没有理由它不起作用。
The short answer here, relating to MongoDB, is yes - it can be used in this way to create a denormalized cache in front of an RDBMS. Others have used MongoDB to store datasets of similar (and larger) sizes to the one you described, and can keep a dataset of that size in RAM. There are some details missing here in terms of your data, but it is certainly not beyond the capabilities of MongoDB and is one of the more frequently used implementations:
http://www.mongodb.org/display/DOCS/The+Database+and+Caching
The key will be the size of your working data set and therefore your available RAM (MongoDB maps data into memory). For larger solutions, write heavy scaling, and similar issues, there are numerous approaches (sharding, replica sets) that can be employed.
With the level of detail given it is hard to say for certain that MongoDB will meet all of your requirements, but given that others have already done similar implementations and based on the information given there is no reason it will not work either.