数据库表获取太多数据 - 需要另一个解决方案
我有一个网站,人们可以在其中添加他们最喜欢的电视剧。有一项功能可以让您核对已看过的剧集。
选中的每一集都会在数据库表中创建一条记录(包含 user_id、show_id 和 Episode_id)。
该表现在已超过 600.000 行,并且增长速度非常快!
我已经设置了索引,但是我感觉查询这个表时的性能越来越差。
我对新解决方案的想法:
因此,
user_id | show_id | episode_id
1 ....... 123 ......7675
1 ....... 123 ......7676
1 ....... 123 ......7677
1 ....... 456 ......5678
1 ....... 456 ......5679
1 ....... 456 ......5680
我可以这样做:
user_id | show_id | episode_ids
1 ....... 123 ......7675,7676,7677
1 ....... 456 ......5678,5679,5680
然后我必须将字符串拆分为数组,并使用 array.include?(some-id)。
这应该可以减轻数据库的负担,但是 Ruby 需要处理更繁重的数组代码。
我走在正确的轨道上吗?或者有人能想到更好的解决方案吗?
I have a site where people can add their favorite TV series. There is one feature that makes it possible to check off episodes that you have seen.
Each episodes that is checked off, creates one record in a DB table (with user_id, show_id and episode_id).
This table is now over 600.000 rows and is growing very fast!
I have indexes set up, but I feel like the performance when querying this table is getting worse and worse.
My thoughts for a new solution:
So instead of:
user_id | show_id | episode_id
1 ....... 123 ......7675
1 ....... 123 ......7676
1 ....... 123 ......7677
1 ....... 456 ......5678
1 ....... 456 ......5679
1 ....... 456 ......5680
I could do this:
user_id | show_id | episode_ids
1 ....... 123 ......7675,7676,7677
1 ....... 456 ......5678,5679,5680
Then I would have to split the string into an array, and use array.include?(some-id).
This should relieve the database, but there would be much heavier array code for Ruby to handle.
Am I on the right track? Or can anybody think of a better solution?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(4)
不不不,这绝对不是构建这样一个数据库的方式。 varchar 字段中的逗号分隔列表是您应该考虑的最不理想的反模式。
在我看来,您的性能问题似乎是基于猜测。因此,应该:
600k 行不算什么(在具有三个整数的表中)。真的。即使是最小的服务器上的内存也可以使用。从内存中查询表应该很快,您不必担心。
如果您完成了第 1 步(确实存在问题),请提出进一步的问题,其中包含整个相关架构、精确查询、解释计划和计时数据。
No No no, that is absolutely NOT the way to structure such a database. Comma-separated lists in varchar fields are the least desirable anti-pattern you should consider.
This sounds to me like your performance problems are based on guesswork. So instead:
600k rows is NOTHING (in a table with three ints). Really. This can fit into ram on even the tiniest of servers. Querying a table out of ram should be so fast you don't worry about it.
If you get past step 1 (there really is a problem), ask further questions containing your entire relevant schema, exact queries, explain plans and timing data.
是否对数据进行非规范化是一个有争议的问题。在特定情况下它可能有其优点,但从关系的角度来看,它可能不应该是您的首选。相反,解决这个问题的首选第一步应该是分析它并实施不改变数据结构但主要处理数据库系统及其环境的解决方案。因此:
话虽这么说,如果您发现您提出的解决方案是提高性能的最佳方法,请继续进行非规范化。关键是,您应该了解所有选项,并根据具体的性能相关目标选择最佳选项。
Whether you denormalize your data or not is a matter of debate. It can have its merits in specific circumstances, but from a relational point of view it probably shouldn't be your first choice. Instead, the preferred first steps in solving this problem should be to analyze it and implement solutions that don't change the structure of the data but predominantly deal with the database system and its environment. Therefore:
That being said - if you find that your proposed solution is the best way to improve performance, go ahead and denormalize. The point is that you should be are aware of all options and choose the best ones with concrete performance-related goals in mind.
以下是我构建表的方式:
您可以在 (userid, Episodeid) 上放置一个备用的唯一复合索引,或者使用单独的索引,一个在 userid 上,一个在 Episodeid 上。我可能会选择后者。
Here's how I'd structure the tables:
You could place an alternate unique composite index on (userid, episodeid) or use separate indexes, one on userid, one on episodeid. I'd probably go with the latter.
我会坚持使用标准化数据。这听起来更像是一个查询优化问题。请记住,mysql(假设您正在使用它)每个查询仅使用一个索引,并且通过设置复合索引可能会获得更好的性能。
还可以使用 mysql 查询浏览器中的 EXPLAIN 语句。更多信息请点击这里:
http://dev.mysql.com/doc/refman/5.1/en /解释.html
I would stick with the normalized data. It sounds more like a query optimization problem. Keep in mind that mysql (assuming you are using it) uses only one index per query and you might get better performance by setting up a composite index.
Also make use of the EXPLAIN statement in the mysql query browser. More info here:
http://dev.mysql.com/doc/refman/5.1/en/explain.html