数据库表获取太多数据 - 需要另一个解决方案

发布于 2024-09-26 05:52:09 字数 731 浏览 2 评论 0原文

我有一个网站,人们可以在其中添加他们最喜欢的电视剧。有一项功能可以让您核对已看过的剧集。

选中的每一集都会在数据库表中创建一条记录(包含 user_id、show_id 和 Episode_id)。
该表现在已超过 600.000 行,并且增长速度非常快!

我已经设置了索引,但是我感觉查询这个表时的性能越来越差。

我对新解决方案的想法:

因此,

user_id | show_id | episode_id  
1 ....... 123 ......7675  
1 ....... 123 ......7676   
1 ....... 123 ......7677  
1 ....... 456 ......5678  
1 ....... 456 ......5679  
1 ....... 456 ......5680  

我可以这样做:

user_id | show_id | episode_ids  
1 ....... 123 ......7675,7676,7677  
1 ....... 456 ......5678,5679,5680

然后我必须将字符串拆分为数组,并使用 array.include?(some-id)
这应该可以减轻数据库的负担,但是 Ruby 需要处理更繁重的数组代码。

我走在正确的轨道上吗?或者有人能想到更好的解决方案吗?

I have a site where people can add their favorite TV series. There is one feature that makes it possible to check off episodes that you have seen.

Each episodes that is checked off, creates one record in a DB table (with user_id, show_id and episode_id).
This table is now over 600.000 rows and is growing very fast!

I have indexes set up, but I feel like the performance when querying this table is getting worse and worse.

My thoughts for a new solution:

So instead of:

user_id | show_id | episode_id  
1 ....... 123 ......7675  
1 ....... 123 ......7676   
1 ....... 123 ......7677  
1 ....... 456 ......5678  
1 ....... 456 ......5679  
1 ....... 456 ......5680  

I could do this:

user_id | show_id | episode_ids  
1 ....... 123 ......7675,7676,7677  
1 ....... 456 ......5678,5679,5680

Then I would have to split the string into an array, and use array.include?(some-id).
This should relieve the database, but there would be much heavier array code for Ruby to handle.

Am I on the right track? Or can anybody think of a better solution?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(4

┾廆蒐ゝ 2024-10-03 05:52:10

不不不,这绝对不是构建这样一个数据库的方式。 varchar 字段中的逗号分隔列表是您应该考虑的最不理想的反模式。

在我看来,您的性能问题似乎是基于猜测。因此,应该:

  • 确定是否确实存在问题
  • 使用适当的工具查找问题原因
  • 在非生产环境中测试可能的解决方案。

600k 行不算什么(在具有三个整数的表中)。真的。即使是最小的服务器上的内存也可以使用。从内存中查询表应该很快,您不必担心。

如果您完成了第 1 步(确实存在问题),请提出进一步的问题,其中包含整个相关架构、精确查询、解释计划和计时数据。

No No no, that is absolutely NOT the way to structure such a database. Comma-separated lists in varchar fields are the least desirable anti-pattern you should consider.

This sounds to me like your performance problems are based on guesswork. So instead:

  • Determine if there really is a problem
  • Find the cause of it using appropriate instrumentation
  • Test possible solutions in a non-production environment.

600k rows is NOTHING (in a table with three ints). Really. This can fit into ram on even the tiniest of servers. Querying a table out of ram should be so fast you don't worry about it.

If you get past step 1 (there really is a problem), ask further questions containing your entire relevant schema, exact queries, explain plans and timing data.

So尛奶瓶 2024-10-03 05:52:10

是否对数据进行非规范化是一个有争议的问题。在特定情况下它可能有其优点,但从关系的角度来看,它可能不应该是您的首选。相反,解决这个问题的首选第一步应该是分析它并实施不改变数据结构但主要处理数据库系统及其环境的解决方案。因此:

  • 您的问题根源真的是数据库吗?或者是其他系统(网络、网络服务器、rails 等)?
  • 就查询响应时间而言,什么是可接受的?找出数据库在所有情况下都应遵守的具体数字。
  • 哪些查询变慢了?也许您有可以重构的缓慢、低效的查询。制定一个查询计划,看看优化器在做什么。
  • 您是否以正确的方式使用索引?
  • 调整您的 mysql 实例。通过调整您可以取得很多成果。
  • 看看您可以在硬件方面做一些事情(获得更多内存、更快的磁盘等)
  • 为最常用的查询创建视图(如果有)
  • 如果完成上述所有操作,您仍然可以进行分片。这会增加应用程序的一些复杂性,但它允许您在很大程度上扩展系统而不需要太多的努力。
  • 最终您可能会得出这样的结论:您必须使用“真正可扩展的”分布式键/值存储(nosql)。但对于 600k 行,要达到这一点还有很长的路要走。

话虽这么说,如果您发现您提出的解决方案是提高性能的最佳方法,请继续进行非规范化。关键是,您应该了解所有选项,并根据具体的性能相关目标选择最佳选项。

Whether you denormalize your data or not is a matter of debate. It can have its merits in specific circumstances, but from a relational point of view it probably shouldn't be your first choice. Instead, the preferred first steps in solving this problem should be to analyze it and implement solutions that don't change the structure of the data but predominantly deal with the database system and its environment. Therefore:

  • Is the source of your problem really the database ? Or is it some other system (network, webserver, rails, etc) ?
  • What is acceptable in terms of query response times ? Find concrete numbers that the database should adhere to under all circumstances.
  • Which queries are getting slower ? Maybe you have slow, inefficient queries that can be refactored. Make a query plan, see what the optimizer is doing.
  • Are you using indexes in the correct way ?
  • Tune your mysql instance. You can achieve a lot with tuning.
  • See that you can do something on the hardware side (get more memory, faster disks, etc)
  • Create views for the top-most used queries if there are any
  • If all of the above is done, you can still do sharding. This adds some complexity on top of your application but it will allow you to scale your system to a good extent without too much effort.
  • Eventually you may reach the conclusion that you must use a "truly scalable" distributed key/value store (nosql). But at 600k rows there is a long way to go until you reach this point.

That being said - if you find that your proposed solution is the best way to improve performance, go ahead and denormalize. The point is that you should be are aware of all options and choose the best ones with concrete performance-related goals in mind.

染柒℉ 2024-10-03 05:52:10

以下是我构建表的方式:

USERS
userid INTEGER PRIMARY KEY 
username text/varchar/whatever

SHOWS
showid INTEGER PK
showname   varchar or nvarchar or text  [depending on what database I was using]
etc etc


EPISODES
episodeid INTEGER PK
showid    INTEGER  FK references SHOWS   [index this field]
ordinal   DECIMAL   [indicates which episode  -- DECIMAL makes it easier to insert later an episode you overlooked] 
episodename text/varchar/nvarchar whatever   
etc etc

SEENIT
id  INTEGER AUTOINCREMENT  PK
userid  INTEGER    foreign key ref USERS
episodeid  INTEGER foreign key ref EPISODES

您可以在 (userid, Episodeid) 上放置一个备用的唯一复合索引,或者使用单独的索引,一个在 userid 上,一个在 Episodeid 上。我可能会选择后者。

Here's how I'd structure the tables:

USERS
userid INTEGER PRIMARY KEY 
username text/varchar/whatever

SHOWS
showid INTEGER PK
showname   varchar or nvarchar or text  [depending on what database I was using]
etc etc


EPISODES
episodeid INTEGER PK
showid    INTEGER  FK references SHOWS   [index this field]
ordinal   DECIMAL   [indicates which episode  -- DECIMAL makes it easier to insert later an episode you overlooked] 
episodename text/varchar/nvarchar whatever   
etc etc

SEENIT
id  INTEGER AUTOINCREMENT  PK
userid  INTEGER    foreign key ref USERS
episodeid  INTEGER foreign key ref EPISODES

You could place an alternate unique composite index on (userid, episodeid) or use separate indexes, one on userid, one on episodeid. I'd probably go with the latter.

夏雨凉 2024-10-03 05:52:10

我会坚持使用标准化数据。这听起来更像是一个查询优化问题。请记住,mysql(假设您正在使用它)每个查询仅使用一个索引,并且通过设置复合索引可能会获得更好的性能。
还可以使用 mysql 查询浏览器中的 EXPLAIN 语句。更多信息请点击这里:
http://dev.mysql.com/doc/refman/5.1/en /解释.html

I would stick with the normalized data. It sounds more like a query optimization problem. Keep in mind that mysql (assuming you are using it) uses only one index per query and you might get better performance by setting up a composite index.
Also make use of the EXPLAIN statement in the mysql query browser. More info here:
http://dev.mysql.com/doc/refman/5.1/en/explain.html

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文