需要帮助选择正确的 mysql 数据库分片、集群或分区方法

发布于 2024-11-04 18:07:45 字数 216 浏览 7 评论 0原文

我正在开发一个将使用三个表的应用程序。 1 - 100 万行产品。 2 - 5 亿行用户。 3 - 100亿行用户喜欢的产品。 表格会随着时间的推移而增长,但会保持在这些数字附近。 我想为这种数据库选择正确的方法。 我真的不太了解分片、集群或分区,但如果你们中的一些人能告诉我这个问题的最佳解决方案,我将专注于它,这将是一个巨大的帮助。 我只想要支持 mysql 的方法,如果我需要多个服务器来支持这种数据库? 谢谢。

i am developing an application that will use three tables.
1 - 1 million rows of products.
2 - 500 million rows of users.
3 - 10 billion rows of products that the users like.
the tables will grow with the time but will stay around those numbers.
i want to choose the right method for this kind of DB.
i really don't know much about sharding, clustering or partitioning but if some of you can tell me the best solution for this problem i will focus on it and its will be a huge help.
i want only methods that support mysql and if i need multiple servers for this kind of DB?
thanks.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(2

岁月打碎记忆 2024-11-11 18:07:46

您可以非常轻松地对该数据集进行分片,但您可能不必这样做,具体取决于您尝试执行的分析类型。如果这只是每个用户喜欢的历史记录,那么您可以使用 数据库分区按日期范围对数据进行分区,然后根据 user_id 进行子分区。

如果您经常更新日期(用户可以“不喜欢”某些东西),那么您可能需要考虑分片。这里有一个分片实现示例:Shard-Key-Mapper。您可以在此处对数据集执行分布式并行查询(例如 SQL 的 map/reduce):Shard-Query< /a>.

如果您进行分片,我应该建议按 user_id 进行分片并将产品表保留为“共享”表,该表在每个分片上重复。您应该使用基于目录的分片方法,该方法允许您在分片之间移动用户。有关单个用户的所有信息以及有关他们喜欢的内容的信息将一起存储在一个分片上。

You can shard this data set pretty easily but you might not have to depending on the type of analysis you are trying to do. If this is simply a history of what each user likes, then you can probably use database partitioning to partition the data by range on date, and then sub-partition on the user_id.

If you will frequently update the date (users can "unlike" things) then you probably need to look at sharding. There is an example sharding implementation here: Shard-Key-Mapper. You can execute distributed parallel queries over the dataset (like map/reduce for SQL) here: Shard-Query.

If you shard, I should suggest sharding by user_id and keeping the products table as the "shared" table which is duplicated on each shard. You should use a directory based sharding method that allows you to move a user between shards. All the information about a single user, and the information about what they like will be stored together on one shard.

錯遇了你 2024-11-11 18:07:46

我认为如果你真的不想要像 Hadoop 这样的 noSQL 解决方案,你就无法避免获得多个数据库(此处:MySQL)服务器。在我看来,MySQL 复制并不能为此类数据提供足够的可扩展性,因为主服务器将成为瓶颈。我也不是可扩展性专业人士,但我目前也在为我身边的类似问题考虑一个很好的解决方案。我想我会采用分片解决方案,将数据分区到多个节点上。我只是在考虑一种智能方法来创建从数据到分片的映射。但这取决于您的应用程序想要如何制作它。我认为您的“产品喜好”数据非常适合分区,因为它非常庞大。

顺便说一句:一篇有趣的文章反对分片:http://37signals.com/svn/posts/1509-mr-moore-gets-to-punt-on-sharding

I think if you really don't want a noSQL solution like Hadoop, you can't avoid to get multiple database (here: MySQL) servers. And a MySQL replication doesn't provide in my opinion enough scalability for this kind of data, because the master will become the bottleneck. I'm also not a scalability professional, but I am currently also thinking of a nice solution for a similar problem on my side. I think I will go with a sharding solution where I partition my data over multiple nodes. I am just thinking about an intelligent way to create the mapping from data to shard. But this depends on your application how you want to make it. I think your 'product liking' data is a good candidate for partitioning, because it's so huge.

BTW: An interesting article against sharding: http://37signals.com/svn/posts/1509-mr-moore-gets-to-punt-on-sharding

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文