两个不同数据库中的两个表之间的联接有什么问题？

发布于 2024-10-19 17:40:49 字数 913 浏览 0 评论 0原文

我对您对连接不同数据库中的两个或多个表的陷阱的想法感兴趣。我将尝试举一个例子。

假设表 Table1 位于 DatabaseA 数据库中，而 Table2 位于 DatabaseB 中。假设我在 DatabaseA 中有一个视图，它从 Table1 中提取一些数据，并在 DatabaseA 中提取一些其他表。

该视图用于将数据推送到另一个数据库，我们将其称为DatabaseC。

如果我需要来自 Table2 的一些数据，我的本能是直接在此视图中加入 Table2 ，有点像这样 table1 内联 DatabaseB..table2 on [some]列]

这样做非常简单快捷，但我脑子里有一个唠叨的声音一直告诉我不要这样做。我担心的是无法根据 Table2 跟踪所有对象，因此如果我更改其中的某些内容，我必须非常小心并记住我使用此表的所有位置。因此，有点像破坏此视图（和两个数据库）的 SRP，因为此视图可以通过两个不同的操作进行更改（在两个不同的数据库上执行：更改 Table1 或更改 Table2)

我对你的意见很感兴趣。这是好主意还是坏主意？这种方法会出现什么问题（性能方面、维护方面等等），如果您有现实世界的经验，这种方法要么是一个大错误，要么是您的救星。

PS：我在谷歌和SO上搜索过这个主题，但找不到与此相关的任何内容。我很乐意接受来自 SO 用户的负票、重复问题和其他“谴责”，只是为了对这个问题有不同的看法。

PPS：我正在使用 SQL Server 2005。

谢谢，希望我说清楚了:)

原文

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

待＂谢繁草 2024-10-26 17:40:50

你唠叨的声音可能是对的。

最重要的问题是如何强制声明性引用完整性，因为您无法在数据库之间创建外键，因此迟早您将不得不处理不一致、不匹配或不完整的数据。

但如果你不关心这一点，我认为没有问题:-)

回复收藏 0 原文

∝单色的世界 2024-10-26 17:40:50

跨数据库连接的一些常见主题：

外键

正如其他人指出的，在没有外键的情况下，您需要滚动自己的引用完整性。本身不是问题，但当您无法控制一个或多个数据库中的数据时，问题可能会浮现出来。

一个相关的问题是 CASE 工具的使用。当对模式进行逆向工程时，他们将忽略不存在 FK->PK 关系的表之间的链接。

性能

如果数据库位于不同的服务器上，那么您将面临这些服务器上运行的任何其他内容的变幻莫测以及运行联接操作本身的成本。同样，如果服务器都在您的控制范围内，那么您可以监控这些内容，但情况可能并非如此。

耦合

如果您的解决方案依赖于其他数据库，则会出现多个故障点。如果数据库出现故障，这可能会级联到一个或多个系统。

数据修改

您的解决方案可能会与您认为是另一个数据库表中的静态数据耦合。但是，如果这些内容被意外（或故意）修改、复制或删除怎么办？同样，如果相关数据库超出了您的职权范围，其他团队/部门可能不知道您的系统是如何运行的。

确实，在很多情况下跨数据库连接是常态。我见过的几个例子：

Mart-Repository

高性能操作发生在集市上，而主数据存储保存在存储库中。 CRUD 操作在两者之间频繁或不频繁地发生（每晚更新、实时等）。

旧数据库

您可能会出于数据迁移和/或报告/审计目的公开旧数据库。

查找

您的一个或多个数据库可能包含可重复使用的静态查找信息。

因此，回答你的问题 - 这取决于你到底在做什么以及风险是否可以接受。还存在其他解决方案，例如复制，但同样，这的可行性取决于您的部门/公司的结构。

Some general themes re cross-database joins:

Foreign keys

As others have pointed out, in the absence of foreign keys, you'll need to roll your own referential integrity. Not a problem in itself, but issues can surface when you're not in control of the data in one or more of the databases.

A related issue is the use of CASE tools. When reverse-engineering a schema, they will overlook links between tables where a FK->PK relationship doesn't exist.

Performance

If the database are on different servers then you're exposed to the vagaries of whatever else is running on those servers as well as the cost of running the join operation itself. Again, if the servers are all within your control, this is something you can monitor but this may may not be the case.

Coupling

If your solution relies on other databases you have multiple points of failure. If a database goes down, this could cascade to one or more systems.

Data modification

Your solution may be coupled to what you believe to be static data in tables on another database. However, what if this were accidentally (or purposefully) amended, duplicated or deleted. Again, if the databases in question are out of your remit, other teams/departments may not be aware of how your system operates.

All this being, true, there are many cases where cross-database joins are the norm. A few examples I've seen:

Mart-Repository

Performant operations take place on the mart whilst the master data stash is kept on the repository. CRUD operations take place between the two on a frequent or infrequent basis (nightly update, real-time etc).

Legacy DB

You might expose a legacy database for data migration and or reporting/auditing purposes.

Lookup

One or more of your databases may contain static lookup information which can be re-used.

So to answer your question - it depends on what exactly you're doing and whether the risk is acceptable. Other solutions exist such as replication but again, how feasible this is will depend on the structure of your department/company.

回复收藏 0 原文

離殇 2024-10-26 17:40:50

你的问题的答案是......这取决于。

我注意到，当您保持查询良好且简单（更少的连接等）时，性能不会严重下降。

查询越复杂，优化器产生次优执行计划的可能性就越大。

优化器最终决定如何执行查询。查询越复杂，优化器获得“错误”操作顺序的机会就越大。

我最近尝试了这个问题...

我在单个数据库上运行了大约 8 个连接的查询。然后，我在同一服务器上以不同的名称放置该数据库的副本，然后修改查询，以便它将连接到数据库的第二个副本中的几个表。

作为单个数据库查询，它的运行时间不到 3 秒；考虑到数据量的预期。

跨数据库连接查询运行时间不到 3 分钟。

enter code here

The answer to your questions is...it depends.

I have noticed that there is no serious degradation in performance when you keep the queries nice and simple (fewer join etc).

The more complex the queries, the more chance that the optimizer will produce a suboptimal execution plan.

The optimizer ultimately gets to decide how to execute the query. The more complex the query, the more opportunity for the optimizer to get the order of operations "wrong".

I recently experimented with this problem...

I ran a query with roughly 8 joins on a single database. I then put up a copy of that database on the same server with a different name, and then I modified the query so that it would join to a couple tables in the second copy of the database.

As a single database query, it ran in under 3 seconds; expected given the volume of data.

The cross database joined query run in just under 3 minutes.