建议使用以下哪种跨分片数据复制选项?

发布于 2025-01-04 16:53:43 字数 828 浏览 1 评论 0原文

高性能 mysql 一书建议,为了对博客应用程序进行分片,人们可能需要将评论数据放在 2 个分片上:首先,放在发表评论的人的分片上,然后放在存储帖子的分片上。

因此这就提出了如何可靠地复制这些数据的问题。建议使用以下哪种跨分片数据复制选项?

选项 1:从 PHP 脚本中进行 2 个单独的插入。
优点: a) 逻辑位于应用层。
缺点: a) 用户被保留 2 次插入。 b) 需要在每个尝试插入类似数据的客户端中重复此逻辑。
结论:看起来很合理。

选项2:形成联合表并使用一些触发器来处理重复项的插入。
优点: a) 应用层不需要担心多次插入
缺点: a) 每个分片都需要与其他分片建立联合连接; b) 联合将在 LAN 中的计算机上工作,但在 2 个不同的站点上又如何呢? c) 如果与联合服务器的连接失败怎么办?
结论:这似乎不是一个好主意。

选项 3: 消息传递,例如 RabbitMQ
优点: a) 不同的客户端可以在一个位置插入数据,并且所有订阅者都可以使用该插入。
缺点: a) 复杂; b) 可能会施加开销以托管消息传递服务器和客户端; c) 不确定如何与查找服务一起找到合适的分片
结论:不确定

选项 4:您的建议?

我将非常感谢你的帮助。

High performance mysql book suggests that for sharding a blog application, one may want to put comments data across 2 shards: first, on the shard of a person posting comment, and on the shard where the post is stored.

So this raises the question how to reliably duplicate this data. Which of the following data duplication options across shards is recommended?

Option 1: Make 2 separate inserts from the PHP script.
Pros: a) Logic is in application layer.
Cons: a) User is held for 2 inserts. b) This logic will need to be duplicated in every client trying to insert similar data.
Conclusion: Seems reasonable.

Option 2: Form federated tables and use some trigger to handle the insert of duplicate.
Pros: a) App layer doesn't need to worry about multiple inserts
Cons: a) Every shard need to have federated connection to every other shard; b) Federation will work on machines in LAN, but what about at 2 different sites. c) what if connection to federated server fails.
Conclusion: Doesn't seem like a sound idea.

Option 3: Messaging such as RabbitMQ
Pros: a) Different clients can insert data at one place, and all subscribers can consume the insert.
Cons: a) Complex; b) may impose overhead in order to host messaging server, and clients; c) not sure how will it work with a look-up service to locate appropriate shards
Conclusion: Not sure

Option 4: your suggestion?

I will greatly appreciate your help.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(1

心头的小情儿 2025-01-11 16:53:43

正如您所指出的,在各个分片之间设置触发器是愚蠢的;分片的全部原因是独立的数据库
运营。所以你可以立即扔掉它。

同时更新两个表是最少的方法
移动部件。从长远来看,它将是最可维护的。
如果出现问题,这将是最容易调试的。

但是如果响应时间很重要,那么您可能会考虑某种方式
消息传递方法:更新评论条目表,并将评论排队
更新用户评论表的消息。如果需要一个小时
要处理的消息——或者如果它在系统崩溃中丢失
——没什么大不了的,你总能恢复的。您绝对不应该使用消息传递方法来更新这两个表。

回答者:@kdgregory 链接:https://softwareengineering.stackexchange.com/a/134607/41398

As you point out, having triggers between the various shards is silly; the whole reason for sharding is independent database
operations. So you can throw it out right away.

Updating both tables at the same time is the approach with the fewest
moving parts. Over the long term, it will be the most maintainable.
And it will be the easiest to debug if something goes wrong.

But if response time is important, then you might think of some sort
of messaging approach: update the comments-by-entry table, and queue a
message to update the comments-by-user table. If it takes an hour for
that message to be processed -- or if it gets lost in a system crash
-- no big deal, you can always recover. By no means should you use a messaging approach to update both tables.

Answer by: @kdgregory Link: https://softwareengineering.stackexchange.com/a/134607/41398

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文