在网络服务器上完成数据库连接

发布于 2024-12-19 04:49:27 字数 490 浏览 2 评论 0原文

今天我发现网上一篇文章讨论 Facebook 的架构(虽然有点过时了)。在阅读时,我注意到帮助 Facebook 扩展的软件部分下的第三个要点指出:

Facebook 使用 MySQL,但主要作为键值持久存储, 将连接和逻辑移动到 Web 服务器上,因为优化是 在那里(Memcached 层的“另一侧”)更容易执行。

为什么要将复杂的联接移至 Web 服务器?数据库没有针对执行连接逻辑进行优化吗?这种方法似乎与我到目前为止所学到的知识相反,所以也许这个解释只是让我无法理解。

如果可能的话,有人可以解释一下这一点(一个例子会很有帮助)或者给我指出一篇(或两篇)好文章,以了解您如何以及为什么要这样做的好处(以及可能的例子)?

Today I found an article online discussing Facebooks architecture (though it's a bit dated). While reading it I noticed under the section Software that helps Facebook scale, the third bullet point states:

Facebook uses MySQL, but primarily as a key-value persistent storage,
moving joins and logic onto the web servers since optimizations are
easier to perform there (on the “other side” of the Memcached layer).

Why move complex joins to the web server? Aren't databases optimized to perform join logic? This methodology seems contrary to what I've learned up to this point, so maybe the explanation is just eluding me.

If possible, could someone explain this (an example would help tremendously) or point me to a good article (or two) for the benefits (and possibly examples) of how and why you'd want to do this?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(1

回眸一笑 2024-12-26 04:49:27

我不确定 Facebook 的情况如何,但我们有几个应用程序遵循类似的模型。其基础相当简单。

该数据库包含大量数据。在数据库级别执行联接确实会减慢我们对数据进行的任何查询,即使我们只返回一小部分。 (例如,父子关系中的 100 行父数据和 1000 行子数据)

但是,使用 .NET DataSet 对象,我们选择所需的行,然后在 DataSet 中创建 DataRelation 对象,我们看到性能的显着提升。

我无法回答为什么会这样,因为我对两者的内部工作原理都不了解,但我可以大胆猜测......

RDBMS(在我们的例子中是 Sql Server)必须处理文件中的数据。这些文件非常大,即使在我们的重量级 SQL Server 上,也只能将其中的大部分加载到内存中,因此磁盘 I/O 会受到影响。

当我们将其中的一小部分加载到数据集中时,连接完全发生在内存中,因此我们失去了访问磁盘的 I/O 损失。

尽管我无法完全解释性能提升的原因(并且我希望有更有知识的人告诉我我的猜测是否正确),但我可以告诉你,在某些情况下,当有大量数据,但您的应用程序只需要提取其中的一小部分,通过遵循所描述的模型可以显着提高性能。我们已经看到它将刚刚爬行的应用程序变成了闪电般快速的应用程序。

但如果操作不当,就会受到惩罚 - 如果您使机器的 RAM 过载,但操作不当或在任何情况下都操作不当,那么您也会遇到崩溃或性能问题。

I'm not sure about Facebook, but we have several applications where we follow a similar model. The basis is fairly straightforward.

The database contains huge amounts of data. Performing joins at the database level really slows down any queries we make on the data, even if we're only returning a small subset. (Say 100 rows of parent data, and 1000 rows of child data in a parent-child relationship for example)

However, using .NET DataSet objects, of we select in the rows we need and then create DataRelation objects within the DataSet, we see a dramatic boost in performance.

I can't answer why this is, as I'm not knowledgeable about the internal workings of either, but I can venture a guess...

The RDBMS (Sql Server in our case) has to deal with the data that lives in files. These files are very large, and only so much of it can be loaded into memory, even on our heavy-hitter SQL Servers, so it there is a penalty of disk I/O.

When we load a small portion of it into a Dataset, the join is happening entirely in memory, so we lose the I/O penalty of going to the disk.

Even though I can't explain the reason for the performance boost completely (and I'd love to have someone more knowledgeable tell me if my guess is right) I can tell you that in certain cases, when there is a VERY large amount of data, but your app only needs to pull a small subset of it, there is a noticeable boot in performance by following the model described. We've seen it turn apps that just crawl into lightning-quick apps.

But if done improperly, there is a penalty - if you overload the machine's RAM but doing it inappropriately or in every situation, then you'll have crashes or performance issues as well.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文